0

我正在尝试使用 RegEx 解析 ESPN 底线提要,但有点短。它没有获得所有的 NFL 比赛,它只获得了大约一半。

这是 ESPN 返回的数据。

 &nfl_s_delay=120&nfl_s_stamp=0909064837&nfl_s_left1=^Dallas 24   NY Giants 17 (FINAL)&nfl_s_right1_count=0&nfl_s_url1=http://sports.espn.go.com/nfl/boxscore?gameId=320905019&nfl_s_left2=Indianapolis at Chicago (1:00 PM ET)&nfl_s_right2_count=0&nfl_s_url2=http://sports.espn.go.com/nfl/preview?gameId=320909003&nfl_s_left3=Philadelphia at Cleveland (1:00 PM ET)&nfl_s_right3_count=0&nfl_s_url3=http://sports.espn.go.com/nfl/preview?gameId=320909005&nfl_s_left4=St. Louis at Detroit (1:00 PM ET)&nfl_s_right4_count=0&nfl_s_url4=http://sports.espn.go.com/nfl/preview?gameId=320909008&nfl_s_left5=New England at Tennessee (1:00 PM ET)&nfl_s_right5_count=0&nfl_s_url5=http://sports.espn.go.com/nfl/preview?gameId=320909010&nfl_s_left6=Atlanta at Kansas City (1:00 PM ET)&nfl_s_right6_count=0&nfl_s_url6=http://sports.espn.go.com/nfl/preview?gameId=320909012&nfl_s_left7=Jacksonville at Minnesota (1:00 PM ET)&nfl_s_right7_count=0&nfl_s_url7=http://sports.espn.go.com/nfl/preview?gameId=320909016&nfl_s_left8=Washington at New Orleans (1:00 PM ET)&nfl_s_right8_count=0&nfl_s_url8=http://sports.espn.go.com/nfl/preview?gameId=320909018&nfl_s_left9=Buffalo at NY Jets (1:00 PM ET)&nfl_s_right9_count=0&nfl_s_url9=http://sports.espn.go.com/nfl/preview?gameId=320909020&nfl_s_left10=Miami at Houston (1:00 PM ET)&nfl_s_right10_count=0&nfl_s_url10=http://sports.espn.go.com/nfl/preview?gameId=320909034&nfl_s_left11=San Francisco at Green Bay (4:25 PM ET)&nfl_s_right11_count=0&nfl_s_url11=http://sports.espn.go.com/nfl/preview?gameId=320909009&nfl_s_left12=Seattle at Arizona (4:25 PM ET)&nfl_s_right12_count=0&nfl_s_url12=http://sports.espn.go.com/nfl/preview?gameId=320909022&nfl_s_left13=Carolina at Tampa Bay (4:25 PM ET)&nfl_s_right13_count=0&nfl_s_url13=http://sports.espn.go.com/nfl/preview?gameId=320909027&nfl_s_left14=Pittsburgh at Denver (8:20 PM ET)&nfl_s_right14_count=0&nfl_s_url14=http://sports.espn.go.com/nfl/preview?gameId=320909007&nfl_s_left15=Cincinnati at Baltimore (7:00 PM ET)&nfl_s_right15_count=0&nfl_s_url15=http://sports.espn.go.com/nfl/preview?gameId=320910033&nfl_s_left16=San Diego at Oakland (10:15 PM ET)&nfl_s_right16_count=0&nfl_s_url16=http://sports.espn.go.com/nfl/preview?gameId=320910013&nfl_s_count=16&nfl_s_loaded=true

我正在使用两种 RegEx 模式,一种用于捕获已完成/正在进行的游戏:

preg_match_all('/nfl_s_left\d{1,2}=\^?(?P<awayteam>[a-zA-Z]+(?:\s+[a-zA-Z]+)*)\s+(?P<awayscore>\d+)\s+\^?(?P<hometeam>[a-zA-Z]+(?:\s+[a-zA-Z]+)*)\s+(?P<homescore>\d+)\s+(?P<time>\(.*?\))/', $content, $matches_in_progress, PREG_SET_ORDER);

还有一个可以捕捉尚未开始的游戏:

preg_match_all('/nhl_s_left\d=\^?(?P<awayteam>[a-zA-Z]+(?:\s+[a-zA-Z]+)*)\s+(at)+\s+\^?(?P<hometeam>[a-zA-Z]+(?:\s+[a-zA-Z]+)*)\s(?P<time>\(.*?\))/', $content, $matches_upcoming, PREG_SET_ORDER);

16场比赛中只有8场比赛,我不知道为什么。本周的比赛是Dallas at NY Giants, Philadelphia at Cleveland, New England at Tennessee, Atlanta at Kansas City, Jacksonville at Minnesota, Washington at New Orleans, 和Buffalo at NY Jets

我非常感谢任何帮助让我的 RegEx 匹配本周和未来几周的所有 16 场比赛。

编辑:修改字符串以匹配,忘记在应用正则表达式之前我删除了 %20 。

4

2 回答 2

0

您的正则表达式非常复杂。由于原始字符串被编码为 URL 参数,您可以使用它parse_str来获取 key => value 数组。然后使用正则表达式查找您想要的条目。

$string = "&nfl_s_delay=120&nfl_s_stamp=0909054449&nfl_s_left1=^Dallas%2024%20%20%20NY%20Giants%2017%20(FINAL)&nfl_s_right1_count=0&nfl_s_url1=http://sports.espn.go.com/nfl/boxscore?gameId=320905019&nfl_s_left2=Indianapolis%20at%20Chicago%20(1:00%20PM%20ET)&nfl_s_right2_count=0&nfl_s_url2=http://sports.espn.go.com/nfl/preview?gameId=320909003&nfl_s_left3=Philadelphia%20at%20Cleveland%20(1:00%20PM%20ET)&nfl_s_right3_count=0&nfl_s_url3=http://sports.espn.go.com/nfl/preview?gameId=320909005&nfl_s_left4=St.%20Louis%20at%20Detroit%20(1:00%20PM%20ET)&nfl_s_right4_count=0&nfl_s_url4=http://sports.espn.go.com/nfl/preview?gameId=320909008&nfl_s_left5=New%20England%20at%20Tennessee%20(1:00%20PM%20ET)&nfl_s_right5_count=0&nfl_s_url5=http://sports.espn.go.com/nfl/preview?gameId=320909010&nfl_s_left6=Atlanta%20at%20Kansas%20City%20(1:00%20PM%20ET)&nfl_s_right6_count=0&nfl_s_url6=http://sports.espn.go.com/nfl/preview?gameId=320909012&nfl_s_left7=Jacksonville%20at%20Minnesota%20(1:00%20PM%20ET)&nfl_s_right7_count=0&nfl_s_url7=http://sports.espn.go.com/nfl/preview?gameId=320909016&nfl_s_left8=Washington%20at%20New%20Orleans%20(1:00%20PM%20ET)&nfl_s_right8_count=0&nfl_s_url8=http://sports.espn.go.com/nfl/preview?gameId=320909018&nfl_s_left9=Buffalo%20at%20NY%20Jets%20(1:00%20PM%20ET)&nfl_s_right9_count=0&nfl_s_url9=http://sports.espn.go.com/nfl/preview?gameId=320909020&nfl_s_left10=Miami%20at%20Houston%20(1:00%20PM%20ET)&nfl_s_right10_count=0&nfl_s_url10=http://sports.espn.go.com/nfl/preview?gameId=320909034&nfl_s_left11=San%20Francisco%20at%20Green%20Bay%20(4:25%20PM%20ET)&nfl_s_right11_count=0&nfl_s_url11=http://sports.espn.go.com/nfl/preview?gameId=320909009&nfl_s_left12=Seattle%20at%20Arizona%20(4:25%20PM%20ET)&nfl_s_right12_count=0&nfl_s_url12=http://sports.espn.go.com/nfl/preview?gameId=320909022&nfl_s_left13=Carolina%20at%20Tampa%20Bay%20(4:25%20PM%20ET)&nfl_s_right13_count=0&nfl_s_url13=http://sports.espn.go.com/nfl/preview?gameId=320909027&nfl_s_left14=Pittsburgh%20at%20Denver%20(8:20%20PM%20ET)&nfl_s_right14_count=0&nfl_s_url14=http://sports.espn.go.com/nfl/preview?gameId=320909007&nfl_s_left15=Cincinnati%20at%20Baltimore%20(7:00%20PM%20ET)&nfl_s_right15_count=0&nfl_s_url15=http://sports.espn.go.com/nfl/preview?gameId=320910033&nfl_s_left16=San%20Diego%20at%20Oakland%20(10:15%20PM%20ET)&nfl_s_right16_count=0&nfl_s_url16=http://sports.espn.go.com/nfl/preview?gameId=320910013&nfl_s_count=16&nfl_s_loaded=true";
parse_str($string, $array);
var_dump($array);
于 2012-09-09T13:12:25.130 回答
0

发现错误。我没有d{1,2}在第二个正则表达式中匹配 10-16 场比赛。而且我也没有+(?:\.*)迫使圣路易斯被跳过。

我的新工作正则表达式是:

/nfl_s_left\d{1,2}=\^?(?P<awayteam>[a-zA-Z]+(?:\.*)+(?:\s+[a-zA-Z]+)*)\s+(at)+\s+\^?(?P<hometeam>[a-zA-Z]+(?:\s+[a-zA-Z]+)*)\s(?P<time>\(.*?\))/
于 2012-09-09T14:59:28.010 回答