php - 如何将文本文件中的特定数据导入 mysql？

Question

我从 dbpedia 下载了一个文件，其内容如下所示：

<http://dbpedia.org/resource/Selective_Draft_Law_Cases> <http://dbpedia.org/ontology/wikiPageExternalLink>        <http://supreme.justia.com/cases/federal/us/245/366/> .
<http://dbpedia.org/resource/List_of_songs_recorded_by_Shakira> <http://dbpedia.org/ontology/wikiPageExternalLink> <http://www.shakira.com/> .
<http://dbpedia.org/resource/Bucharest_Symphony_Orchestra>   <http://dbpedia.org/ontology/wikiPageExternalLink> <http://www.symphorchestra.ro/> .
<http://dbpedia.org/resource/Bucharest_Symphony_Orchestra> <http://dbpedia.org/ontology/wikiPageExternalLink> <http://symphorchestra.ro> .
<http://dbpedia.org/resource/Bucharest_Symphony_Orchestra> <http://dbpedia.org/ontology/wikiPageExternalLink> <http://www.youtube.com/symphorchestra> .

我需要从每一行的第一部分提取标题（即Selective_draft_Law_Cases在第一行，List_of_songs_etc 在第二行等））并将其与同一行中的第三个元素的 URL 一起保存在 mysql 表中，即第一行第二行等。

我还需要跳过文件中包含不同、不相关信息的第一行。

在 PHP 中完成这项工作的最快方法是什么？

注意：该文件相当大（超过 1 GB，超过 600 万行）。

提前致谢！

score 1 · Accepted Answer

您应该使用正则表达式并使用 PHP 的preg_match函数，如果文件太大（这似乎是您的情况），您可能需要使用fopen + fgets + fclose以避免将整个文件加载到内存和工作行中按行。

您可以尝试测试file_get_contents读取文件的性能，但由于需要大量内存，这似乎不是您的情况更快的方法。

score 1 · Accepted Answer

我确信它可以被优化，但它是一个开始。尝试：

function insertFileToDb(){
    $myFile = "myFile.txt"; //your txt file containing the data
    $handle = fopen($myFile, 'r');

    //Read first line, but do nothing with it
    $contents = fgets($handle);

    //now read the rest of the file line by line
    while(!feof($handle)){
       $data = fgets($handle);

       //remove <> characters
       $vowels = array("<", ">");
       $data = str_replace($vowels, "", $data);

       //remove spaces to a single space for each line
       $data = preg_replace('!\s+!', ' ', $data);

       /*
        * Get values from array, 1st URL is $dataArr[0] and 2nd URL is $dataArr[2]
        * Explode on ' ' spaces
       */
       $dataArr = explode(" ", $data);

       //Get last part of uri from 1st element in array
       $title = $this->getLastPartOfUrl($dataArr[0]);   

       //Execute your sql query with $title and $dataArr[2] which is the url
       INSERT INTO `table` ...
    } 
    fclose($handle);
} 

function getLastPartOfUrl($url){
   $keys = parse_url($url); // parse the url
   $path = explode("/", $keys['path']); // splitting the path
   $last = end($path); // get the value of the last element
   return $last;
}

php - 如何将文本文件中的特定数据导入 mysql？

2 回答 2

Related

Reference