python - 如何使用python将字符串拆分为带引号的句子和数字

Question

大家好，我是 python 新手，希望能得到一些帮助！

我有多个这样的字符串：

21357.53 84898.10 Mckenzie Meadows Golf Course 80912.48 84102.38

而且我试图弄清楚如何根据一组带有引号的单词（即。"Mckenzie Meadows Golf Course"）和不带引号的双打来拆分行。

然后我将字符串重新排列为这种格式：

"Mckenzie Meadows Golf Course" 21357.53 84898.10 80912.48 84102.38

重新安排我会用

for row in data:
    outfile.write('{0} {1} {2} {3} {4}'.format(row[2], row[0], row[1], row[3], row[4]))
    outfile.write('\n')

但我只是不确定如何让单引号句子脱颖而出。谢谢您的帮助！

score 2 · Accepted Answer

你可以试试这个：

s = "21357.53 84898.10 Mckenzie Meadows Golf Course 80912.48 84102.38"
sList = s.split(' ')
words = []
nums = []
for l in sList:
    if l.isalpha():
        words.append(l)
    elif l.isdigit():
        nums.append(l)

wordString = "\"%s\"" %  " ".join(words)
row = [wordString] + nums

此时，row包含您想要的行。

score 2 · Accepted Answer

这就是我会这样做的方式：

import re

tgt='21357.53 84898.10 Mckenzie Meadows Golf Course 80912.48 84102.38'

nums=[m.group() for m in re.finditer(r'[\d\.]+',tgt)]
words=[m.group() for m in re.finditer(r'[a-zA-Z]+',tgt)]
print '"{}" {}'.format(' '.join(words),' '.join(nums))

印刷：

"Mckenzie Meadows Golf Course" 21357.53 84898.10 80912.48 84102.38

或者，您可以测试 Python 认为的浮点数以找到它们：

nums=[]
words=[]
for e in tgt.split():
    try:
        nums.append(float(e))
    except ValueError:
        words.append(e)

print words,nums

最后，如果您有 4 个浮点数和一个字符串 (float,float,string,float,float) 的固定格式，您可以执行以下操作：

li=tgt.split()
nums=' '.join(li[0:2]+li[-2:])
words=' '.join(li[2:-2])
print words,nums

score 1 · Accepted Answer

使用正则表达式的代码：

import re

s = '21357.53 84898.10 Mckenzie Meadows Golf Course 80912.48 84102.38'
row = re.search('([0-9.]+)\s([0-9.]+)\s([\w ]+)\s([0-9.]+)\s([0-9.]+)', s)
if row:
    print '"{0}" {1} {2} {3} {4}'.format(row.group(3), row.group(1), row.group(2), row.group(4), row.group(5))

将打印（带双引号）：

 "Mckenzie Meadows Golf Course" 21357.53 84898.10 80912.48 84102.38

score 0 · Accepted Answer

使用str方法：

>>> s = '21357.53 84898.10 Mckenzie Meadows Golf Course 80912.48 84102.38'
>>> temp = s.split()
>>> temp
['21357.53', '84898.10', 'Mckenzie', 'Meadows', 'Golf', 'Course', '80912.48', '84102.38']
>>> row = [temp[0], temp[1], '"'+' '.join(temp[2:-2])+'"', temp[-2], temp[-1]]
>>> row
['21357.53', '84898.10', '"Mckenzie Meadows Golf Course"', '80912.48', '84102.38']
>>> print '{0} {1} {2} {3} {4}'.format(row[2], row[0], row[1], row[3], row[4])
"Mckenzie Meadows Golf Course" 21357.53 84898.10 80912.48 84102.38

score 0 · Accepted Answer

使用str方法filter、和lambda：

>>> words = "21357.53 84898.10 Mckenzie Meadows Golf Course 80912.48 84102.38".split()
>>> print '"%s" %s'%(" ".join(filter(lambda x: x.isalpha(), words)), " ".join(filter(lambda x: not x.isalpha(), words)))
"Mckenzie Meadows Golf Course" 21357.53 84898.10 80912.48 84102.38

更严格地说，不假设所有非字母词都是浮点数（使用reduce）：

>>> words = "21357.53 84898.10 Mckenzie Meadows Golf Course 80912.48 84102.38".split()
>>> print '"%s" %s'%(" ".join(filter(lambda x: x.isalpha(), words)), " ".join(filter(lambda x: reduce(lambda y, z: z.isdigit() and z, x.split('.'), True), words)))
"Mckenzie Meadows Golf Course" 21357.53 84898.10 80912.48 84102.38

python - 如何使用python将字符串拆分为带引号的句子和数字

5 回答 5

Related

Reference