3

我从以下形式的文件中读取了一些行:

line = a   b  c  d,e,f    g   h  i,j,k,l   m   n

我想要的是没有“,”分隔元素的行,例如,

a   b  c  d    g   h  i   m   n 
a   b  c  d    g   h  j   m   n
a   b  c  d    g   h  k   m   n
a   b  c  d    g   h  l   m   n
a   b  c  e    g   h  i   m   n
a   b  c  e    g   h  j   m   n
a   b  c  e    g   h  k   m   n
a   b  c  e    g   h  l   m   n
.   .  .  .    .   .  .   .   .
.   .  .  .    .   .  .   .   .

首先我会分裂line

sline = line.split()

现在我将迭代sline并寻找可以用“,”作为分隔符分割的元素。问题是我并不总是知道我必须从这些元素中得到多少。有任何想法吗?

4

6 回答 6

3

你的问题不是很清楚。如果你想去掉逗号后的任何部分(正如你的文字所暗示的那样),那么一个相当易读的单行应该做:

cleaned_line = " ".join([field.split(",")[0] for field in line.split()])

如果要将包含逗号分隔字段的行扩展为多行(如您的示例所示),则应使用该itertools.product函数:

import itertools
line = "a   b  c  d,e,f    g   h  i,j,k,l   m   n"
line_fields = [field.split(",") for field in line.split()]
for expanded_line_fields in itertools.product(*line_fields):
    print " ".join(expanded_line_fields)

这是输出:

a b c d g h i m n
a b c d g h j m n
a b c d g h k m n
a b c d g h l m n
a b c e g h i m n
a b c e g h j m n
a b c e g h k m n
a b c e g h l m n
a b c f g h i m n
a b c f g h j m n
a b c f g h k m n
a b c f g h l m n

如果出于某种原因保持原始间距line.split()很重要,那么您可以替换为re.findall("([^ ]*| *)", line)

import re
import itertools
line = "a   b  c  d,e,f    g   h  i,j,k,l   m   n"
line_fields = [field.split(",") for field in re.findall("([^ ]+| +)", line)]
for expanded_line_fields in itertools.product(*line_fields):
    print "".join(expanded_line_fields)

这是输出:

a   b  c  d    g   h  i   m   n
a   b  c  d    g   h  j   m   n
a   b  c  d    g   h  k   m   n
a   b  c  d    g   h  l   m   n
a   b  c  e    g   h  i   m   n
a   b  c  e    g   h  j   m   n
a   b  c  e    g   h  k   m   n
a   b  c  e    g   h  l   m   n
a   b  c  f    g   h  i   m   n
a   b  c  f    g   h  j   m   n
a   b  c  f    g   h  k   m   n
a   b  c  f    g   h  l   m   n
于 2013-06-25T08:42:16.857 回答
3

使用regex,itertools.product和一些字符串格式:

该解决方案也保留了初始间距。

>>> import re
>>> from itertools import product
>>> line = 'a   b  c  d,e,f    g   h  i,j,k,l   m   n'
>>> items = [x[0].split(',') for x in re.findall(r'((\w+,)+\w)',line)]
>>> strs = re.sub(r'((\w+,)+\w+)','{}',line)
>>> for prod in product(*items):
...     print (strs.format(*prod))
...     
a   b  c  d    g   h  i   m   n
a   b  c  d    g   h  j   m   n
a   b  c  d    g   h  k   m   n
a   b  c  d    g   h  l   m   n
a   b  c  e    g   h  i   m   n
a   b  c  e    g   h  j   m   n
a   b  c  e    g   h  k   m   n
a   b  c  e    g   h  l   m   n
a   b  c  f    g   h  i   m   n
a   b  c  f    g   h  j   m   n
a   b  c  f    g   h  k   m   n
a   b  c  f    g   h  l   m   n

另一个例子:

>>> line = 'a   b  c  d,e,f    g   h  i,j,k,l   m   n q,w,e,r  f o   o'
>>> items = [x[0].split(',') for x in re.findall(r'((\w+,)+\w)',line)]
>>> strs = re.sub(r'((\w+,)+\w+)','{}',line)
for prod in product(*items):
    print (strs.format(*prod))
...     
a   b  c  d    g   h  i   m   n q  f o   o
a   b  c  d    g   h  i   m   n w  f o   o
a   b  c  d    g   h  i   m   n e  f o   o
a   b  c  d    g   h  i   m   n r  f o   o
a   b  c  d    g   h  j   m   n q  f o   o
a   b  c  d    g   h  j   m   n w  f o   o
a   b  c  d    g   h  j   m   n e  f o   o
a   b  c  d    g   h  j   m   n r  f o   o
a   b  c  d    g   h  k   m   n q  f o   o
a   b  c  d    g   h  k   m   n w  f o   o
a   b  c  d    g   h  k   m   n e  f o   o
a   b  c  d    g   h  k   m   n r  f o   o
a   b  c  d    g   h  l   m   n q  f o   o
a   b  c  d    g   h  l   m   n w  f o   o
a   b  c  d    g   h  l   m   n e  f o   o
a   b  c  d    g   h  l   m   n r  f o   o
a   b  c  e    g   h  i   m   n q  f o   o
a   b  c  e    g   h  i   m   n w  f o   o
a   b  c  e    g   h  i   m   n e  f o   o
a   b  c  e    g   h  i   m   n r  f o   o
a   b  c  e    g   h  j   m   n q  f o   o
a   b  c  e    g   h  j   m   n w  f o   o
a   b  c  e    g   h  j   m   n e  f o   o
a   b  c  e    g   h  j   m   n r  f o   o
a   b  c  e    g   h  k   m   n q  f o   o
a   b  c  e    g   h  k   m   n w  f o   o
a   b  c  e    g   h  k   m   n e  f o   o
a   b  c  e    g   h  k   m   n r  f o   o
a   b  c  e    g   h  l   m   n q  f o   o
a   b  c  e    g   h  l   m   n w  f o   o
a   b  c  e    g   h  l   m   n e  f o   o
a   b  c  e    g   h  l   m   n r  f o   o
a   b  c  f    g   h  i   m   n q  f o   o
a   b  c  f    g   h  i   m   n w  f o   o
a   b  c  f    g   h  i   m   n e  f o   o
a   b  c  f    g   h  i   m   n r  f o   o
a   b  c  f    g   h  j   m   n q  f o   o
a   b  c  f    g   h  j   m   n w  f o   o
a   b  c  f    g   h  j   m   n e  f o   o
a   b  c  f    g   h  j   m   n r  f o   o
a   b  c  f    g   h  k   m   n q  f o   o
a   b  c  f    g   h  k   m   n w  f o   o
a   b  c  f    g   h  k   m   n e  f o   o
a   b  c  f    g   h  k   m   n r  f o   o
a   b  c  f    g   h  l   m   n q  f o   o
a   b  c  f    g   h  l   m   n w  f o   o
a   b  c  f    g   h  l   m   n e  f o   o
a   b  c  f    g   h  l   m   n r  f o   o
于 2013-06-25T09:15:14.077 回答
1

如果我正确理解了你的例子你需要遵循

import itertools
sss = "a   b  c  d,e,f    g   h  i,j,k,l   m   n  d,e,f "
coma_separated = [i for i in sss.split() if ',' in i]
spited_coma_separated = [i.split(',') for i in coma_separated]
symbols = (i for i in itertools.product(*spited_coma_separated)) 
                     #use generator statement to save memory
for s in symbols:
    st = sss
    for part, symb in zip(coma_separated, s):
        st = st.replace(part, symb, 1) # To prevent replacement of the 
                                       # same coma separated group replace once 
                                       # for first occurance
    print (st.split()) # for python3 compatibility
于 2013-06-25T08:49:35.470 回答
1

大多数其他答案只产生一行,而不是您似乎想要的多行。

为了实现你想要的,你可以通过多种方式工作。

递归解决方案对我来说似乎最直观:

def dothestuff(l):
    for n, i in enumerate(l):
        if ',' in i:
            # found a "," entry
            items = i.split(',')
            for j in items:
                for rest in dothestuff(l[n+1:]):
                    yield l[:n] + [j] + rest
            return
    yield l


line = "a   b  c  d,e,f    g   h  i,j,k,l   m   n"
for i in dothestuff(line.split()): print i
于 2013-06-25T08:56:45.997 回答
0
for i in range(len(line)-1):
    if line[i] == ',':
        line = line.replace(line[i]+line[i+1], '')
于 2013-06-25T08:28:11.000 回答
0
import itertools
line_data = 'a   b  c  d,e,f    g   h  i,j,k,l   m   n'
comma_fields_indices = [i for i,val in enumerate(line_data.split()) if "," in val]
comma_fields = [i.split(",") for i in line_data.split() if "," in i]
all_comb = []
for val in itertools.product(*comma_fields):
    sline_data = line_data.split()
    for index,word in enumerate(val):
        sline_data[comma_fields_indices[index]] = word
    all_comb.append(" ".join(sline_data))
print all_comb
于 2013-06-25T09:13:24.883 回答