python - Python，sorted()：按第三列对足球统计数据进行排序

Question

我有一个文件中的足球统计数据。我可以用两个或多个空格分隔玩家的名字和每个统计数据。我试图让码领导者，所以我需要按第 4 列或第 3 个索引进行排序。

这是我的代码：

import re, sys
try:
   file = open("TTL.txt", "r")
except IOError:
   print "Could Not Open TTL"
sys.exit()
lines = file.readlines()
for line in lines:
   line = re.split("\s\s+", line)
def key_fct(lines):
   return (float(lines[3]))
srtlines = sorted(lines, key = key_fct, reverse = True)
for line in srtlines:
   print line
file.close()

样本输入：

abel161 8 77 443.0 5 0 11.7 147.2
Abyss ll 38 145 1158.0 11 6 12.8 55.9
AFFISHAUL 34 33 366.0 2 4 17.8 22.7
Assassin NinjaX 25 35 184.0 0 7 10.3 15.1
aubby57 23 165 839.0 11 0 10.5 75.3
B1U3 S4V10R 26 116 380.0 4 6 6.0 29.2
Bigkle 24 47 149.0 2 4 6.7 32.8
BLKSUP3RSA1YAN 5 52 65.0 3 1 9.9 22.7
Booksack 33 85 477.0 5 5 11.0 29.2
Brandon6154xx 23 106 809.0 8 0 17.6 97.0
budweizerbeast 35 472 1640.0 27 9 6.8 94.5
BulkKiller1 31 455 3012.0 40 5 12.6 182.6
Carnage311 30 369 2349.0 25 6 12.8 158.3
cinemagiic 32 12 -8.0 0 2 -1.3 -0.6
Cmfc bumble bee 20 41 253.0 1 0 12.3 28.9
CMFCplaya 19 78 366.0 4 4 9.5 48.9

我收到两个错误：

$./sort.py 
Traceback (most recent call last):
  File "./sort.py", line 39, in <module>
    srtlines = sorted(lines, key = key_fct, reverse=True)
  File "./sort.py", line 37, in key_fct
    return (float(lines[3]))  
ValueError: invalid literal for float(): l

我的文件不是列表列表，但如果我拆分每一行并尝试按第三个索引排序，我仍然会得到文件中名字的第四个字符。

score 1 · Accepted Answer

你的问题在这里：

for line in lines:
   line = re.split("\s\s+", line)

您不能像这样修改列表的值 - 您只是为变量分配一个新值，然后在下次循环运行时将其替换。这根本没有任何作用。

相反，使用列表推导来构造一个新列表：

lines = [re.split("\s\s+", line) for line in lines]

score 0 · Accepted Answer

您在这里遇到的问题是以下代码：

for line in lines:
    line = re.split("\s\s+", line)

如您所料，这不是将行重新分配到列表中。你实际上根本没有改变线路。我的建议是创建一个这样的新列表：

splitLines = []
for line in lines:
    splitLines.append(re.split("\s\s+", line))

或者改用索引：

for i in range(len(lines)):
    lines[i] = re.split("\s\s+", lines[i])

希望这对你有帮助！

score 0 · Accepted Answer

您的代码的问题是，您正在迭代lines但在循环内部分配一个新值line不会修改原始列表的内容。它仍然是一个字符串列表，因此lines[3]实际上返回每行的第三个字符而不是您预期的第三个项目。

lines = file.readlines()
for line in lines:
   line = re.split("\s\s+", line)  #This thing won't affect original list

你可以做：

lines = file.readlines()
for i,line in enumerate(lines):
   lines[i] = re.split("\s\s+", line)

或更好：

import re
with open('abc') as f:
    lines = [re.split("\s\s+", line) for line in f]
    lines.sort(key = lambda x: float(x[3]), reverse = True)    
    print lis

score 0 · Accepted Answer

除了其他人提到的问题之外，您还有一个基本的解析问题：考虑那些流氓Cmfc bumblebee和Assassin NinjaX。如果我们在空格上拆分它们的数据行，我们最终会在数据中包含太多字段，因为名称包含空格。例如， Element[3]不会从一个数据记录到另一个数据记录具有一致的含义。

这是处理问题的另一种方法：

# Read the data, naively splitting on whitespace.
with open(sys.argv[1]) as fh:
    football_data = [line.split() for line in fh]

# Reorganize the data.
for i, fd in enumerate(football_data):
    # stats: the last 7 elements.
    # name:  anything to the left of the stats.
    stats = [float(n) for n in fd[-7:]]
    name  = ' '.join(fd[0:-7])
    football_data[i] = [name] + stats

# Sort as needed.
football_data.sort(key = lambda fd: fd[3], reverse = True)

python - Python，sorted()：按第三列对足球统计数据进行排序

4 回答 4

Related

Reference