-5

我有一个包含以下文本的文件

1. Beatles - Revolver (1966)
2. Nirvana - Nevermind (1991)
3. Beatles - Sgt Pepper's Lonely Hearts Club Band (1967)
4. U2 - The Joshua Tree (1987)
5. Beatles - The Beatles (The White Album) (1968)
6. Beatles - Abbey Road (1969)
7. Guns N' Roses - Appetite For Destruction (1987)
8. Radiohead - Ok Computer (1997)
9. Led Zeppelin - Led Zeppelin 4 (1971)
10. U2 - Achtung Baby (1991)
11. Pink Floyd - Dark Side Of The Moon (1973)
12. Michael Jackson -Thriller (1982)
13. Rolling Stones - Exile On Main Street (1972)
14. Clash - London Calling (1979)
15. U2 - All That You Can't Leave Behind (2000)
16. Weezer - Pinkerton (1996)
17. Radiohead - The Bends (1995)
18. Smashing Pumpkins - Mellon Collie And The Infinite Sadness (1995)
19. Pearl Jam - Ten (1991)
20. Beach Boys - Pet Sounds (1966)
21. Weezer - Weezer (1994)
22. Nirvana - In Utero (1993)
23. Beatles - Rubber Soul (1965)
24. Eminem -The Eminem Show (2002)
25. R.E.M. - Automatic For The People (1992)
26. Radiohead - Kid A (2000)
27. Tool - Aenima (1996)
28. Smashing Pumpkins - Siamese Dream (1993)
29. Madonna - Ray Of Light (1998)
30. Rolling Stones - Sticky Fingers (1971)
...till line 99.

所以我必须将信息存储到一个字典中,其键是乐队名称,关联的值是一个包含该乐队所有最佳专辑的列表。该列表的每个条目都是一个由两个字段组成的元组:专辑名称和发行年份。我还必须去掉标点符号和括号。有人可以帮忙吗?

4

3 回答 3

2

试试这个作为初学者。这不是完美的,您需要从这里获取并根据您的需要进行调整。

import re

my_dict = {}
for record in songs:
    year = re.findall('\(([0-9]{4})\)', record)
    band = re.findall('[0-9]+\. (.*)', l.split('-')[0])
    song = re.findall('(.*) \(', record.split('-')[1].strip())

    if song and band and year:
        if my_dict.has_key(band): #alread present, append 
            my_dict[band].append((song, year))
        else: #create new entry
            my_dict[band] = [(song, year)]

print my_dict
于 2013-04-27T03:15:25.630 回答
1

我要做的是从文件中读取每一行,将其解析为字符串,将字符串拆分为 every .,然后将第一个字符串设为键,将第二个字符串设为值。前任:

albumDict = {}
file = open(/path/to/file, "r")
for line in file.readlines():
    splitLine = line.split(".")
    albumDict[splitLine[0]] = splitline[1]

编辑:注意:这不会检查重复条目,也不应该在专业环境中实施。如果您想让它可供多人使用,请添加检查以确保该密钥不存在。

于 2013-04-27T03:17:43.560 回答
1

这是一个可能更适合您的解决方案:

import re
from collections import defaultdict

band_dict = defaultdict(list)
pattern   = re.compile(r"\d+\. (?P<band>.+?) -\s?(?P<album>.+?) \((?P<year>\d+)\)")
with open("musiclist") as f:
    for line in f:
        match = pattern.match(line)
        if match:
            groupdict = match.groupdict()
            band_dict[groupdict['band']].append((groupdict['album'], groupdict['year']))
        else:
            print "Error, no match for line %s" % line

for band in band_dict:
    print band
    for album, year in band_dict[band]:
        print "\t%s: %s" % (album, year)

使用您提供的数据运行musiclist

Pink Floyd
    Dark Side Of The Moon: 1973
Beatles
    Revolver: 1966
    Sgt Pepper's Lonely Hearts Club Band: 1967
    The Beatles (The White Album): 1968
    Abbey Road: 1969
    Rubber Soul: 1965
Clash
    London Calling: 1979
Rolling Stones
    Exile On Main Street: 1972
    Sticky Fingers: 1971
Led Zeppelin
    Led Zeppelin 4: 1971
R.E.M.
    Automatic For The People: 1992
Guns N' Roses
    Appetite For Destruction: 1987
U2
    The Joshua Tree: 1987
    Achtung Baby: 1991
    All That You Can't Leave Behind: 2000
Nirvana
    Nevermind: 1991
    In Utero: 1993
Pearl Jam
    Ten: 1991
Tool
    Aenima: 1996
Beach Boys
    Pet Sounds: 1966
Madonna
    Ray Of Light: 1998
Radiohead
    Ok Computer: 1997
    The Bends: 1995
    Kid A: 2000
Eminem
    The Eminem Show: 2002
Weezer
    Pinkerton: 1996
    Weezer: 1994
Smashing Pumpkins
    Mellon Collie And The Infinite Sadness: 1995
    Siamese Dream: 1993
Michael Jackson
    Thriller: 1982
于 2013-04-27T04:30:18.873 回答