php - 使用 php 为 SEO 提取关键字对

Question

我目前正在研究长尾 SEO 的一些新想法。我有一个网站，人们可以在其中创建自己的博客，这已经带来了相当不错的长尾流量。我已经在文章的标题标签中显示文章标题。

但是，通常标题与内容中的关键字不能很好地匹配，我有兴趣在 php 实际确定的标题中添加一些关键字是最好的。

我尝试使用我制作的脚本来计算页面上最常见的单词。这工作正常，但问题是它提出了非常无用的词。

在我看来，有用的是制作一个 php 脚本，该脚本将提取频繁出现的单词对（或 3 个单词的集合），然后将它们放入一个按它们出现的频率排序的数组中。

我的问题：如何以更动态的方式解析文本以查找重复出现的单词对或三组单词。我该怎么办？

function extractCommonWords($string, $keywords){
  $stopWords = array('i','a','about','an','and','are','as','at','be','by','com','de','en','for','from','how','in','is','it','la','of','on','or','that','the','this','to','was','what','when','where','who','will','with','und','the','www');

  $string = preg_replace('/\s\s+/i', '', $string); // replace whitespace
  $string = trim($string); // trim the string
  $string = preg_replace('/[^a-zA-Z0-9 -]/', '', $string); // only take alphanumerical characters, but keep the spaces and dashes too…
  $string = strtolower($string); // make it lowercase

  preg_match_all('/\b.*?\b/i', $string, $matchWords);
  $matchWords = $matchWords[0];

  foreach ( $matchWords as $key=>$item ) {
      if ( $item == '' || in_array(strtolower($item), $stopWords) || strlen($item) <= 3 ) {
          unset($matchWords[$key]);
      }
  }   
  $wordCountArr = array();
  if ( is_array($matchWords) ) {
      foreach ( $matchWords as $key => $val ) {
          $val = strtolower($val);
          if ( isset($wordCountArr[$val]) ) {
              $wordCountArr[$val]++;
          } else {
              $wordCountArr[$val] = 1;
          }
      }
  }
  arsort($wordCountArr);
  $wordCountArr = array_slice($wordCountArr, 0, $keywords);
  return $wordCountArr;
}

score 2 · Accepted Answer

为了包含一些代码——这是另一个原始改编，它返回给定长度和出现次数的多词关键字——而不是去除所有常用词，它只过滤那些位于关键字开头和结尾的词。它仍然会返回一些废话，但这确实是不可避免的。

function getLongTailKeywords($str, $len = 3, $min = 2){ $keywords = array();
  $common = array('i','a','about','an','and','are','as','at','be','by','com','de','en','for','from','how','in','is','it','la','of','on','or','that','the','this','to','was','what','when','where','who','will','with','und','the','www');
  $str = preg_replace('/[^a-z0-9\s-]+/', '', strtolower(strip_tags($str)));
  $str = preg_split('/\s+-\s+|\s+/', $str, -1, PREG_SPLIT_NO_EMPTY);
  while(0<$len--) for($i=0;$i<count($str)-$len;$i++){ 
     $word = array_slice($str, $i, $len+1);
    if(in_array($word[0], $common)||in_array(end($word), $common)) continue;
    $word = implode(' ', $word);
    if(!isset($keywords[$len][$word])) $keywords[$len][$word] = 0;
    $keywords[$len][$word]++;
  }
  $return = array();
  foreach($keywords as &$keyword){
    $keyword = array_filter($keyword, function($v) use($min){ return !!($v>$min); });
    arsort($keyword);
    $return = array_merge($return, $keyword);
  }
  return $return;
}

^{在随机的 BBC 新闻文章上}运行代码*

忽略常用词、语法和标点符号的问题在于它们仍然在句子中具有含义。如果你删除它们，你充其量只是在改变意思，或者在最坏的情况下产生难以理解的短语。甚至提取“关键词”的想法本身也是有缺陷的，因为单词可能有不同的含义——当你从一个句子中删除它们时，你会把它们从上下文中删除。

这不是我的领域，但对自然语言进行了复杂的研究并且没有简单的解决方案 - 尽管一般理论是这样的：计算机无法破译单个文本的含义，它必须依赖交叉引用语义标记相关材料的语料库（这是一个巨大的开销）。

php - 使用 php 为 SEO 提取关键字对

1 回答 1

Related

Reference