我正在使用正则表达式来清理名称列表,以便它们正常。假设这个列表是...
000000AAAAAARob Alsod ## Notice multiple 0's and A's?
AAAPerson Person ## Here, too
Jeff the awesome Guy ## Four words...
Jenna DEeath ## A name like this can exist.
GEOFFERY EVERDEEN ## All caps
shy guy ## All lowercase
Theone Normalperson ## Example name. This one is fine.
Guywith Whitespace ## Trailing or leading whitespace is a nono.
所以,正如你所看到的,人们没有正确格式化他们的名字,所以我需要一个程序来突出所有不需要的东西。这包括:
名称开头的数字。
后面没有小写的任何大写。即 AAAAAAAJosh
任何都是大写的。
任何不以大写开头的东西。即乔希
尾随和前导空格。
我认为这就是我需要过滤掉的所有内容。最终产品应如下所示:
Rob Alsod ## No more 0's and A's.
Person Person ## No more leading A's (or other letters).
Jeff Guy ## No lowercase words in his name.
Jenna DEeath ## HASN'T removed the D in the middle.
## Name removed as it was all uppercase.
## Name removed as it was all lowercase.
Theone Normalperson ## Nothing changed.
Guywith Whitespace ## Removed whitespace.
编辑:对此感到抱歉。这是我当前的代码:
# Enter your code for "Name Cleaning" here.
import re
namenum = []
num = 0
for sen in open('file.txt'):
namenum += [sen.split(',')]
namenum[num][0] = re.sub(r'\s[a-z]+', '', namenum[num][0])
namenum[num][0] = re.sub(r'^([0-9]*)', '', namenum[num][0])
namenum[num][0] = re.sub(r'^[A-Z]*?\s[A-Z]*?$', '', namenum[num][0])
namenum[num][0] = re.sub(r'[^a-zA-Z ][A-Z]*(?=[A-Z])', '', namenum[num][0])
namenum[num][0] = re.sub(r'\b[a-z]+\b', '', namenum[num][0])
namenum[num][0] = re.sub(r'^\s*', '', namenum[num][0])
namenum[num][0] = re.sub(r'\s*$', '', namenum[num][0])
if namenum[num][0] == '':
namenum[num][0] = 'Invalid Name'
num += 1
for i in range(len(namenum)):
namenum[i][1] = int(namenum[i][1].strip())
namenum = sorted(namenum, key=lambda item: (-item[1], item[0]))
for i in range(0, len(namenum)):
print(namenum[i][0]+','+str(namenum[i][1]))
它完成了一半的工作,但由于某种原因它错过了一些东西。
这是输出:
AAAAAARob Alsod
AAAPerson Person
Guywith Whitespace
Invalid Name
Invalid Name
Jeff Guy
Jenna DEeath
Theone Normalperson
我还尝试输入一个名字,就像harry hamilton
它回馈harry
,它应该已经删除。