0

I want to let my database article reclassified. I explode the text of a given article, and then see if there is one word of the article match 1 tag which it has appear in the category table, then update this article in this category name. My code is here. I want make a limit that every category max has 5 articles. but the update limit not work. Thanks.

<?php
header('Content-type:text/html; charset=utf-8');
$db = mysql_connect("localhost","root","root") or die("can not connect Mysql Server");
mysql_select_db("12",$db);
$result = mysql_query("SELECT title,content,id,cat,date FROM articles Order By date DESC"); //get all the articles
$count = 0;
$ids = array();
$categories = array('1','2','3','4','5','6','7','8','9','10');//category numbers, for 1 = art, 2 = travel... these are stored in another refrenced DB table
$curCategory = array_shift($categories);
echo $curCategory;
while ($row = mysql_fetch_array($result))
{
$tt = $row['title'].'&nbsp;'.$row['content'];
$tt = preg_replace('/[^a-zA-Z0-9 ]/','',$tt);
$words = preg_split("/\s+/",$tt);   
$uniqueWords = array_keys(array_flip($words)); // broken article sentence into words
$parts = '';
foreach($uniqueWords as $word){     
$parts[] = " tag1 = '$word' OR tag2 = '$word' OR tag3 = '$word' OR tag4 = '$word' OR tag5 = '$word' ";   
} 
$where = implode(" OR ", $parts);
mysql_select_db("12",$db);
mysql_query("SET NAMES utf8");
    $query1 = mysql_query("SELECT count(*) as count FROM tag1 WHERE ($where) AND category ='count($categories)' ");  //put the break words into reference table match out the category number
    $count = 0;
    while ($row = mysql_fetch_array($query1)) {
        $count = $row['count'];
    } 
    if($count) {
        $ids[] = $row['id'];
        $count++;
        if($count == 5) {
             mysql_query("UPDATE articles SET cat = '$curCategory' WHERE id in ('".implode("', '", $ids)."')"); //update every category max articles 
            if(!$curCategory = array_shift($categories)) {
                break;
            }
            $count = 0;
            $ids = array();
        }
    }
}
?>

reference table

category | tag1    | tag2   | tag3       | tag4    |  tag5  
1        | paint   | picture| sculpture  | photo   |  bronze   
2        | tourism | travel | tour       | journey |  trip
3        | style   | vogue  | fashion    | mode    |  Popular
... // 10 categories, category 1 = art , category 2 = travel ...
4

4 回答 4

2

Very very strange code. BUT... $ids[] = $row['id']; - your sql doesn't have id column, so no any ids in result. Maybe because you use $row in both outer and inner cycles - that's the problem.

Also, do you realize that article with 100 unique words (not much, right?) form a sql query with 500 OR? :)

And what about mysql_select_db and mysql_query("SET NAMES utf8"); - why they in the cycle, WHY?

于 2011-04-06T11:50:53.013 回答
0

Let's analyse this code:

// this query returns one row with column `count`, you're comparing column
// `category` to the literal string `count($categories)` where
// `$categories` is an array of numbers and therefore evaluates to `count(Array)`
$query1 = mysql_query("SELECT count(*) as count FROM tag1 WHERE ($where) AND category ='count($categories)' ");
$count = 0;
// warning: overwriting previous $row variable
while ($row = mysql_fetch_array($query1)) {
    // an if($row=...) is better since you've on row anyway
    // Contents of $row = array( 'count' => NUMBER );
    // You're overwriting $count with the number of found articles
    $count = $row['count'];
}
// unless the query failed or there are no articles found, the next condition is true
if($count) {
    // undeclared variable $ids; $row['id'] does not exist since it is overwritten
    $ids[] = $row['id'];
    // The next lines do not limit the number of updates, it only updates
    // if $count == 4; where $count is the number of articles in a category
    $count++;
    if($count == 5) {
         mysql_query("UPDATE articles SET cat = '$curCategory' WHERE id in ('".implode("', '", $ids)."')");
        // so if the current catgeory has five articles, quit?
        if(!$curCategory = array_shift($categories)) {
            break;
        }
        // otherwise, reset for the next category
        $count = 0;
        $ids = array();
    }
}

You should definitely look at your code and see if you understand everything. I'm sure that overwriting $row is not intended, neither is your query in $query1 correct. When naming your variables, make them more descriptive. Use $catCount_row instead of $row for example. Note that you're overwriting $count each time, perhaps you want to take that out of your while loop.

If you do not reach an article count of 4, no update will be done.

于 2011-04-06T12:10:10.240 回答
0

Firstly i think that your description of what your trying to accomplish is not understandable to the requirements of most SO Users, therefore to get a complete answer that is relevant to your question you would need to rewrite your question with more detail and structure.

You code at the moment is extremely messy and there's several incorrect ways of trying to accomplish certain tasks.

There are several issues that struck me and I will list them here:

  • Do you even check the manual to see if your using the function correctly
  • your select the same database twice (mysql_select_db('12',$db));
  • your creating a static array of categories and then removing the first element.. why?
  • your using array_keys(array_flip($words)); instead of array_unique
  • Your not using the count variable correctly, your just overwriting it when I believe you want to increment it
    • You can use mysql_result('count',$query) for this.
  • Where did the id come from in the database ($ids[] = $row['id'];) << WTF

And to be perfectly honest, the rest of the code is too messed u[p for even me to understand, It look's like you have copied bits of code from the web and crossed your fingers to me.

Also im pretty surprised that you have 50 points to offer as a bounty.

于 2011-04-06T12:22:03.603 回答
0

Scary.

The code is messy, as other answers explain well, but your choice of structure for the tag data is also going to cause you trouble.

Rather than 5 columns for 5 tags, create a separate tag table and link it to your articles:

article | tag       |
  1     | paint     |
  1     | picture   |
  1     | sculpture |
  1     | photo     |
  1     | bronze    |
  2     | tourism   |
  2     | travel    |
  2     | tour      |

Then when you tag, you don't need to worry about whether the tag is tag1 or tag2, or whether tag3 is NULL, or whether you're changing your mind and want 6 tags after all. The structure will work for anywhere between 0 and any number of tags, by making your "parts" bit of the query something like:

$parts = " tag in ('"
        .implode($uniqueWords,"', '")
        ."')";
// e.g. if uniqueWords = ['one','two','three'], $parts= "tag in ('one','two','three','')"

That implode probably needs tweaking to get all the quotes and brackets in the right places.

None of that answers your actual problem though. It's not clear to me if you're trying to find the first 5 keywords for your article, or any 5 tags, or the best 5 tags. I would suggest something like this.

Explode your article and when looking for unique words, count word occurrences, excluding common English words like "the". Then sort the unique words in occurrence order, most repeated words first. You have a list of the main words in your article, take the first five, they are the tags. Insert into table.

Alternatively, here is a solution that may sound messy but may be more efficient in the end. Write a database procedure to carry out this process entirely in MySQL. You need two tables:

tagstable - 1 column "tag" is the PK
| paint     |
| picture   |    articlewordstable - 1 column "word" is the PK - empty     
| sculpture |    | -   | 
| photo     |    | -   |     
| bronze    |

Insert the tokenised words into articlewordstable. Then query that table joining with tagstable:

SELECT word FROM articlewordstable
INNER JOIN tagstable
ON tag = word;

you'll get a list of words that are also tags. You can set a limit of 5 results, you could also do

SELECT word, count(word) occurrences FROM articlewordstable
INNER JOIN tagstable
ON tag = word
GROUP BY word
ORDER BY occurrences DESC;

Which would give you the most used words that also appear in your tags list. That too, can be limited to 5, then use as you see fit.

Hope this helps!

于 2011-04-12T20:09:55.447 回答