python - Python：拆分混合字符串

Question

我从以下形式的文件中读取了一些行：

line = a   b  c  d,e,f    g   h  i,j,k,l   m   n

我想要的是没有“，”分隔元素的行，例如，

a   b  c  d    g   h  i   m   n 
a   b  c  d    g   h  j   m   n
a   b  c  d    g   h  k   m   n
a   b  c  d    g   h  l   m   n
a   b  c  e    g   h  i   m   n
a   b  c  e    g   h  j   m   n
a   b  c  e    g   h  k   m   n
a   b  c  e    g   h  l   m   n
.   .  .  .    .   .  .   .   .
.   .  .  .    .   .  .   .   .

首先我会分裂line

sline = line.split()

现在我将迭代sline并寻找可以用“，”作为分隔符分割的元素。问题是我并不总是知道我必须从这些元素中得到多少。有任何想法吗？

score 3 · Accepted Answer

你的问题不是很清楚。如果你想去掉逗号后的任何部分（正如你的文字所暗示的那样），那么一个相当易读的单行应该做：

cleaned_line = " ".join([field.split(",")[0] for field in line.split()])

如果要将包含逗号分隔字段的行扩展为多行（如您的示例所示），则应使用该itertools.product函数：

import itertools
line = "a   b  c  d,e,f    g   h  i,j,k,l   m   n"
line_fields = [field.split(",") for field in line.split()]
for expanded_line_fields in itertools.product(*line_fields):
    print " ".join(expanded_line_fields)

这是输出：

a b c d g h i m n
a b c d g h j m n
a b c d g h k m n
a b c d g h l m n
a b c e g h i m n
a b c e g h j m n
a b c e g h k m n
a b c e g h l m n
a b c f g h i m n
a b c f g h j m n
a b c f g h k m n
a b c f g h l m n

如果出于某种原因保持原始间距line.split()很重要，那么您可以替换为re.findall("([^ ]*| *)", line)：

import re
import itertools
line = "a   b  c  d,e,f    g   h  i,j,k,l   m   n"
line_fields = [field.split(",") for field in re.findall("([^ ]+| +)", line)]
for expanded_line_fields in itertools.product(*line_fields):
    print "".join(expanded_line_fields)

这是输出：

a   b  c  d    g   h  i   m   n
a   b  c  d    g   h  j   m   n
a   b  c  d    g   h  k   m   n
a   b  c  d    g   h  l   m   n
a   b  c  e    g   h  i   m   n
a   b  c  e    g   h  j   m   n
a   b  c  e    g   h  k   m   n
a   b  c  e    g   h  l   m   n
a   b  c  f    g   h  i   m   n
a   b  c  f    g   h  j   m   n
a   b  c  f    g   h  k   m   n
a   b  c  f    g   h  l   m   n

score 3 · Accepted Answer

使用regex,itertools.product和一些字符串格式：

该解决方案也保留了初始间距。

>>> import re
>>> from itertools import product
>>> line = 'a   b  c  d,e,f    g   h  i,j,k,l   m   n'
>>> items = [x[0].split(',') for x in re.findall(r'((\w+,)+\w)',line)]
>>> strs = re.sub(r'((\w+,)+\w+)','{}',line)
>>> for prod in product(*items):
...     print (strs.format(*prod))
...     
a   b  c  d    g   h  i   m   n
a   b  c  d    g   h  j   m   n
a   b  c  d    g   h  k   m   n
a   b  c  d    g   h  l   m   n
a   b  c  e    g   h  i   m   n
a   b  c  e    g   h  j   m   n
a   b  c  e    g   h  k   m   n
a   b  c  e    g   h  l   m   n
a   b  c  f    g   h  i   m   n
a   b  c  f    g   h  j   m   n
a   b  c  f    g   h  k   m   n
a   b  c  f    g   h  l   m   n

另一个例子：

>>> line = 'a   b  c  d,e,f    g   h  i,j,k,l   m   n q,w,e,r  f o   o'
>>> items = [x[0].split(',') for x in re.findall(r'((\w+,)+\w)',line)]
>>> strs = re.sub(r'((\w+,)+\w+)','{}',line)
for prod in product(*items):
    print (strs.format(*prod))
...     
a   b  c  d    g   h  i   m   n q  f o   o
a   b  c  d    g   h  i   m   n w  f o   o
a   b  c  d    g   h  i   m   n e  f o   o
a   b  c  d    g   h  i   m   n r  f o   o
a   b  c  d    g   h  j   m   n q  f o   o
a   b  c  d    g   h  j   m   n w  f o   o
a   b  c  d    g   h  j   m   n e  f o   o
a   b  c  d    g   h  j   m   n r  f o   o
a   b  c  d    g   h  k   m   n q  f o   o
a   b  c  d    g   h  k   m   n w  f o   o
a   b  c  d    g   h  k   m   n e  f o   o
a   b  c  d    g   h  k   m   n r  f o   o
a   b  c  d    g   h  l   m   n q  f o   o
a   b  c  d    g   h  l   m   n w  f o   o
a   b  c  d    g   h  l   m   n e  f o   o
a   b  c  d    g   h  l   m   n r  f o   o
a   b  c  e    g   h  i   m   n q  f o   o
a   b  c  e    g   h  i   m   n w  f o   o
a   b  c  e    g   h  i   m   n e  f o   o
a   b  c  e    g   h  i   m   n r  f o   o
a   b  c  e    g   h  j   m   n q  f o   o
a   b  c  e    g   h  j   m   n w  f o   o
a   b  c  e    g   h  j   m   n e  f o   o
a   b  c  e    g   h  j   m   n r  f o   o
a   b  c  e    g   h  k   m   n q  f o   o
a   b  c  e    g   h  k   m   n w  f o   o
a   b  c  e    g   h  k   m   n e  f o   o
a   b  c  e    g   h  k   m   n r  f o   o
a   b  c  e    g   h  l   m   n q  f o   o
a   b  c  e    g   h  l   m   n w  f o   o
a   b  c  e    g   h  l   m   n e  f o   o
a   b  c  e    g   h  l   m   n r  f o   o
a   b  c  f    g   h  i   m   n q  f o   o
a   b  c  f    g   h  i   m   n w  f o   o
a   b  c  f    g   h  i   m   n e  f o   o
a   b  c  f    g   h  i   m   n r  f o   o
a   b  c  f    g   h  j   m   n q  f o   o
a   b  c  f    g   h  j   m   n w  f o   o
a   b  c  f    g   h  j   m   n e  f o   o
a   b  c  f    g   h  j   m   n r  f o   o
a   b  c  f    g   h  k   m   n q  f o   o
a   b  c  f    g   h  k   m   n w  f o   o
a   b  c  f    g   h  k   m   n e  f o   o
a   b  c  f    g   h  k   m   n r  f o   o
a   b  c  f    g   h  l   m   n q  f o   o
a   b  c  f    g   h  l   m   n w  f o   o
a   b  c  f    g   h  l   m   n e  f o   o
a   b  c  f    g   h  l   m   n r  f o   o

score 1 · Accepted Answer

如果我正确理解了你的例子你需要遵循

import itertools
sss = "a   b  c  d,e,f    g   h  i,j,k,l   m   n  d,e,f "
coma_separated = [i for i in sss.split() if ',' in i]
spited_coma_separated = [i.split(',') for i in coma_separated]
symbols = (i for i in itertools.product(*spited_coma_separated)) 
                     #use generator statement to save memory
for s in symbols:
    st = sss
    for part, symb in zip(coma_separated, s):
        st = st.replace(part, symb, 1) # To prevent replacement of the 
                                       # same coma separated group replace once 
                                       # for first occurance
    print (st.split()) # for python3 compatibility

score 1 · Accepted Answer

大多数其他答案只产生一行，而不是您似乎想要的多行。

为了实现你想要的，你可以通过多种方式工作。

递归解决方案对我来说似乎最直观：

def dothestuff(l):
    for n, i in enumerate(l):
        if ',' in i:
            # found a "," entry
            items = i.split(',')
            for j in items:
                for rest in dothestuff(l[n+1:]):
                    yield l[:n] + [j] + rest
            return
    yield l


line = "a   b  c  d,e,f    g   h  i,j,k,l   m   n"
for i in dothestuff(line.split()): print i

score 0 · Accepted Answer

for i in range(len(line)-1):
    if line[i] == ',':
        line = line.replace(line[i]+line[i+1], '')

score 0 · Accepted Answer

import itertools
line_data = 'a   b  c  d,e,f    g   h  i,j,k,l   m   n'
comma_fields_indices = [i for i,val in enumerate(line_data.split()) if "," in val]
comma_fields = [i.split(",") for i in line_data.split() if "," in i]
all_comb = []
for val in itertools.product(*comma_fields):
    sline_data = line_data.split()
    for index,word in enumerate(val):
        sline_data[comma_fields_indices[index]] = word
    all_comb.append(" ".join(sline_data))
print all_comb

python - Python：拆分混合字符串

6 回答 6

Related

Reference