php - 中文提供 php preg_match_all

Question

我有一个带有 Bibtex 导出的文本文件。文本文件有许多遵循以下模式的条目。

@article{ls_leimeister,
  added-at = {2013-01-18T11:14:11.000+0100},
  author = {Wegener, R. and Leimeister, J. M.},
  biburl = {http://www.bibsonomy.org/bibtex/27bb26b4b4858439f81aa0ec777944ac5/ls_leimeister},
  journal = {International Journal of Technology Enhanced Learning (to appear)},
  keywords = {Challenges Communities: Factors Learning Success VirtualCommunity and itegpub pub_jml pub_rwe},
  note = {JML_390},
  title = {Virtual Learning Communities: Success Factors and Challenges},
  year = 2013
}

我想使用 php 并考虑 preg_match_all

以下没有让我到任何地方：

preg_match_all('/@^.*}$/', file_get_contents($file_path),$results);

我想从简单的开始，但这并没有真正奏效。我对 php RegEx 有点陌生。

完美的最终输出将是：

Array
    (
        [0] => Array
            (
                ['type'] => article
                ['unique_name'] => ls_leimeister
                ['added-at'] => 2013-01-18T11:14:11.000+0100
                ['author'] => Wegener, R. and Leimeister, J. M.
                ['biburl'] => http://www.bibsonomy.org/bibtex/27bb26b4b4858439f81aa0ec777944ac5/ls_leimeister
                ['journal'] => International Journal of Technology Enhanced Learning (to appear)
                ['keywords'] => Challenges Communities: Factors Learning Success VirtualCommunity and itegpub pub_jml pub_rwe
                ['note'] => JML_390
                ['title'] => Virtual Learning Communities: Success Factors and Challenges
                ['year'] => 2013
            )

        [1] => Array
            (
                [...] => …
            )

    )

score 2 · Accepted Answer

试试这个：这里我只提取了type，unique_name通过查看它，你可以提取所有其他的。

$str = '@article{ls_leimeister,
  added-at = {2013-01-18T11:14:11.000+0100},
  author = {Wegener, R. and Leimeister, J. M.},
  biburl = {http://www.bibsonomy.org/bibtex/27bb26b4b4858439f81aa0ec777944ac5/ls_leimeister},
  journal = {International Journal of Technology Enhanced Learning (to appear)},
  keywords = {Challenges Communities: Factors Learning Success VirtualCommunity and itegpub pub_jml pub_rwe},
  note = {JML_390},
  title = {Virtual Learning Communities: Success Factors and Challenges},
  year = 2013
}';

preg_match_all('/@(?P<type>\w+){(?P<unique_name>\w+),(.*)/',$str,$matches);

echo $matches['type'][0];
echo "<br>";
echo $matches['unique_name'][0];
echo "<br>";

echo "<pre>";
print_r($matches);

输出数组格式与您的略有不同，但您可以将此格式更改为您的格式。

score 0 · Accepted Answer

图案/^@([^{]+)\{([^,]+),\s*$|^\s*([^\R@=]+) = \{(.*?)}/ms：（演示）

这种模式有两种选择；每个包含两个捕获组。

type并被unique_name捕获并存储在元素[1]和中[2]。
所有其他键值对都存储在元素[3]和[4].

这种分离的数组存储允许在构建所需的输出数组结构时进行可靠的处理。

输入：

$bibtex='@BOOK{ko,
   title = {Wissenschaftlich schreiben leicht gemacht},
   publisher = {Haupt},
   year = {2011},
   author = {Kornmeier, M.},
   number = {3154},
   series = {UTB},
   address = {Bern},
   edition = {4},
   subtitle = {für Bachelor, Master und Dissertation}
}

@BOOK{nial,
   title = {Wissenschaftliche Arbeiten schreiben mit Word 2010},
   publisher = {Addison Wesley},
   year = {2011},
   author = {Nicol, N. and Albrecht, R.},
   address = {München},
   edition = {7}
}

@ARTICLE{shome,
   author = {Scholz, S. and Menzl, S.},
   title = {Alle Wege führen nach Rom},
   journal = {Medizin Produkte Journal},
   year = {2011},
   volume = {18},
   pages = {243-254},
   subtitle = {ein Vergleich der regulatorischen Anforderungen und Medizinprodukte
   in Europa und den USA},
   issue = {4}
}

@INBOOK{shu,
   author = {Schulz, C.},
   title = {Corporate Finance für den Mittelstand},
   booktitle = {Praxishandbuch Firmenkundengeschäft},
   year = {2010},
   editor = {Hilse, J. and Netzel, W and Simmert, D.B.},
   booksubtitle = {Geschäftsfelder Risikomanagement Marketing},
   publisher = {Gabler},
   pages = {97-107},
   location = {Wiesbaden}
}';

方法：（演示）

$pattern='/^@([^{]+)\{([^,]+),\s*$|^\s*([^\R@=]+) = \{(.*?)}/ms';
if(preg_match_all($pattern,$bibtex,$out,PREG_SET_ORDER)){
    foreach($out as $line){
        if(isset($line[1])){
            if(!isset($line[3])){  // this is the starting line of a new set
                if(isset($temp)){
                    $result[]=$temp;  // send $temp data to permanent storage
                }
                $temp=['type'=>$line[1],'unique_name'=>$line[2]];  // declare fresh new $temp
            }else{
                $temp[$line[3]]=$line[4];  // continue to store the $temp data
            }
        }
    }
    $result[]=$temp;  // store the final $temp data
}
var_export($result);

输出：

array (
  0 => 
  array (
    'type' => 'BOOK',
    'unique_name' => 'ko',
    'title' => 'Wissenschaftlich schreiben leicht gemacht',
    'publisher' => 'Haupt',
    'year' => '2011',
    'author' => 'Kornmeier, M.',
    'number' => '3154',
    'series' => 'UTB',
    'address' => 'Bern',
    'edition' => '4',
    'subtitle' => 'für Bachelor, Master und Dissertation',
  ),
  1 => 
  array (
    'type' => 'BOOK',
    'unique_name' => 'nial',
    'title' => 'Wissenschaftliche Arbeiten schreiben mit Word 2010',
    'publisher' => 'Addison Wesley',
    'year' => '2011',
    'author' => 'Nicol, N. and Albrecht, R.',
    'address' => 'München',
    'edition' => '7',
  ),
  2 => 
  array (
    'type' => 'ARTICLE',
    'unique_name' => 'shome',
    'author' => 'Scholz, S. and Menzl, S.',
    'title' => 'Alle Wege führen nach Rom',
    'journal' => 'Medizin Produkte Journal',
    'year' => '2011',
    'volume' => '18',
    'pages' => '243-254',
    'subtitle' => 'ein Vergleich der regulatorischen Anforderungen und Medizinprodukte
   in Europa und den USA',
    'issue' => '4',
  ),
  3 => 
  array (
    'type' => 'INBOOK',
    'unique_name' => 'shu',
    'author' => 'Schulz, C.',
    'title' => 'Corporate Finance für den Mittelstand',
    'booktitle' => 'Praxishandbuch Firmenkundengeschäft',
    'year' => '2010',
    'editor' => 'Hilse, J. and Netzel, W and Simmert, D.B.',
    'booksubtitle' => 'Geschäftsfelder Risikomanagement Marketing',
    'publisher' => 'Gabler',
    'pages' => '97-107',
    'location' => 'Wiesbaden',
  ),
)

这是我从中提取新示例输入字符串的站点。

php - 中文提供 php preg_match_all

2 回答 2

Related

Reference