python - 从变量 Python 中提取字符串

Question

在对解析的 url进行处理后，我将类似的东西存储在一个变量中''.join(soup.findAll(text=True))，我必须根据给定的参数获取学校以及分数以及它正在与谁一起玩......一些东西test.py "norfolk st."应该像'Norfolk St.'这样的分数。 0-38 Rutgers' ...我尝试使用 re.search()、string.find() 等几个函数并解析无法获取预期结果的字符？需要帮忙

Norfolk St. 


0 - 38




    Rutgers 
    Final


     South Florida 


    6 - 21


     Michigan St. 
    Final


     Chowan 


    7 - 47


     Charlotte 
    Final


     SE Louisiana 


    17 - 38


     (24) TCU 
    Final


     W. Kentucky 


    20 - 52


     Tennessee 
    Final


     S. Carolina St. 


    13 - 52


     (4) Clemson 
    Final


     Middle Tenn. St. 


    20 - 40


     North Carolina 
    Final


     Central Conn. St. 


    44 - 51


     Lehigh 
    Final OT


     Army 


    14 - 40


     Ball St. 
    Final

问题是我必须从此 url 获取足球盒得分http://sports.yahoo.com/college-football/scoreboard/?conf=all，无论用户在命令行参数中提供学校名称，它都必须转到此 URI 检查学校名称是否存在超链接，并且它必须重定向和获取盒子得分是这样的

1   2   3   4   Total
FAU 3   3   0   7   13
ECU 7   14  10  0   31

如果游戏正在进行，则检索到的分数应该休眠指定的秒数，然后检索最新的分数。所以我不确定我必须走哪条路！需要帮助，因为我是 python 新手。

score 0 · Accepted Answer

结构是 [team 1] [score] [team 2] [notes about the game]？例如 [team 1] = 'Norfolk St', [score] = '0-38', [team 2] = 'Rutgers', [notes about the game] = 'Final' 在你原来的例子中？

并且：您的目标是在命令行上输入一个团队以检索他们参加的所有比赛的记录（包括球队和得分）？

假设这两件事：我首先尝试按行标记：

lines = your_string.split('\n').
clean_lines = [l.strip() for l in lines]

然后我会建立一个实际游戏的列表：

In [8]: games = [clean_lines[i:i+4] for i in xrange(0, len(clean_lines), 4)]

In [9]: games
Out[9]: 
[['Norfolk St.', '0 - 38', 'Rutgers', 'Final'],
 ['South Florida', '6 - 21', 'Michigan St.', 'Final'],
 ['Chowan', '7 - 47', 'Charlotte', 'Final'],
 ['SE Louisiana', '17 - 38', '(24) TCU', 'Final'],
 ['W. Kentucky', '20 - 52', 'Tennessee', 'Final'],
 ['S. Carolina St.', '13 - 52', '(4) Clemson', 'Final'],
 ['Middle Tenn. St.', '20 - 40', 'North Carolina', 'Final'],
 ['Central Conn. St.', '44 - 51', 'Lehigh', 'Final OT'],
 ['Army', '14 - 40', 'Ball St.', 'Final']]

如果要查找给定球队参加的所有比赛，只需遍历比赛列表并检查球队的字符串是否出现在 0 或 2 索引中。但是，如果您要进行多次查找，最好构建一个字典，其中键是球队名称，值是他们参加的比赛的索引。

希望有帮助！

score 0 · Accepted Answer

我不会打扰正则表达式。根据文本，它看起来像字符串，减去空格，大致遵循并重复这种格式：

thing 1
score
thing 2
"final"

结果，我可以继续，清理字符串，遍历它，并将每组 4 作为字典的一部分返回。

例如：

def chunk(iterable, n):
    '''chunk([1, 2, 3, 4, 5, 6], 2) -> [[1, 2], [3, 4], [5, 6]]'''
    return [iterable[i:i+n] for i in range(0, len(iterable), n)]

def get_scores(raw):
    clean = [line.strip() for line in raw.split('\n') if line.strip() != '']
    return {thing1: (thing1, score, thing2) for (thing1, score, thing2, _) in chunk(clean, 4)}

然后，你可以这样做：

>>> raw = ''.join(soup.findAll(text=True))
>>> scores = get_scores(raw)
>>> print scores['Norfolk St.']
('Norfolk St.', '0 - 38', 'Rutgers')

如果您希望查找不区分大小写，您可以执行以下操作：

def get_scores(raw):
    clean = [line.strip().lower() for line in raw.split('\n') if line.strip() != '']
    return {thing1: (thing1, score, thing2) for (thing1, score, thing2, _) in chunk(clean, 4)}

如果您希望能够查找“诺福克街”中的任何一个。或“罗格斯”并获得相同的结果，您可以这样做：

def get_scores(raw):
    clean = [line.strip().lower() for line in raw.split('\n') if line.strip() != '']
    output = {}
    for (thing1, score, thing2, _) in chunk(clean, 4):
        data = (thing1, score, thing2)
        output[thing1] = data
        output[thing2] = data
    return output

python - 从变量 Python 中提取字符串

2 回答 2

Related

Reference