php - 解析 RIPE 分配 RegEx

Question

我正在研究需要将 RIPE 分配数据库（ftp://ftp.ripe.net/ripe/stats/membership/alloclist.txt）解析为数据库的项目。

我需要抓住每一个成员，所以我需要它来匹配 xx 之间的所有内容。和\n\nxx。其中 x 是任何小写字母。

这是文件的几行：

ad.andorpac
    Servei de Telecomunicacions d'Andorra 

    19960627    194.158.64.0/19 ALLOCATED PA
    20050104    85.94.160.0/19  ALLOCATED PA
    20061117    91.187.64.0/19  ALLOCATED PA
    20091124    109.111.96.0/19 ALLOCATED PA
    20120925    185.4.52.0/22   ALLOCATED PA
    20110722    2a02:8060::/31

ae.absl-me
    Asia Broadcast Satellite Ltd (Middle East) 

    20110321    31.47.88.0/21   ALLOCATED PA

ae.adac
    Abu Dhabi Airports Company PJSC 

    20120402    37.218.224.0/21 ALLOCATED PA

我不是 RegEx 专家，谁能提供这些表达式？

score 1 · Accepted Answer

您不需要非常详细的正则表达式来解析它。您可以只按分隔符拆分，然后解析每个条目：

// $string contains the text
$entries = explode("\n\n", $string);
for($i=0; $i < sizeof($entries); $i+=2){
    parse_header($entries[$i]));
    parse_entries($entries[$i+1]);
}

“标题”类似于“ad.andorpac\nServei de Telecomunicacions(...)”，所以它应该很容易解析。要解析条目，您可以将它们拆分为“\n”，然后使用正则表达式处理每个条目以按空格拆分字段：

function parse_entry($entries){
    $strings = explode("\n", $entries);
    foreach($strings as $s){
        preg_match("/(?P<number>\d+)\s+(?P<addr>[\d\.\/]+)\s+(?P<str1>\w+)\s+(?P<str2>\w+)/",   
            $s, $result);
    // You can then access then the results
    echo $results["addr"]; // prints "185.4.52.0/22"
    }       
}

这不是一个完整的工作答案，但应该可以解决您的大部分问题。您可能还需要使用trim函数来删除匹配字符串的开头/结尾处的空格。

编辑

正则表达式的快速解释：

(?P<number>\d+) => matches one or more digits, and stores them in the "number" index
\s+ => matches one or more spaces and ignores them
(?P<addr>[\d\.\/]+) => matches the network address (one or more digits, dots or slashes)
\s+ => same
(?P<str1>\w+) => matches the first string ("ALLOCATED")
\s+ =>
(?P<str2>\w+) => matches the 2nd string

score 1 · Accepted Answer

就像是：

 preg_match_all("/[a-z]{2}\..*?\n\n[a-z]{2}\./s", $text, $matches)

应该管用。

这假设您对问题的描述是准确的！如果不是，您可以期望这个正则表达式被证明是非常脆弱的。

（即：\n\n[a-z]{2}\.发生不希望的实例，您希望每个匹配项中包含下一个条目的前 3 个字符，两个新行将始终存在于您的记录之间，并且它们将是相邻的，在包含任何空格等之间没有任何其他字符。）。

好的，我猜你并不想得到结尾的“xx”。作为比赛的一部分，因此替代方法是使用前瞻，例如：

     preg_match_all("/[a-z]{2}\..*?\n\n(?=[a-z]{2}\.)/s", $text, $matches)

php - 解析 RIPE 分配 RegEx

2 回答 2

编辑

Related

Reference