我已经输出了一个列表。每当以下数字不等于其前一个值时,我想将其分解为单独的列表。
x = [1,4,4,5,5,8,8,10,10,25,25,70,70,90,90,100,2,3,3,4,4,5,5,8,8,9,20,21,21,22,23)
我想要像这样的列表
a = [1,4,4,5,5,8,8,10,10,25,25,70,70,90,90,100)
b = [2,3,3,4,4,5,5,8,8,9)
c = [20,21,21,22]
d = [23]
为了回答你的问题:
我有 [...] 一份清单。每当以下数字不等于其前一个值时,我想将其分解为单独的列表。
例子:
import itertools
l = [38, 1200, 1200, 306, 306, 391, 391, 82, 82, 35, 35, 902, 902, 955, 955, 13]
for x, v in itertools.groupby(l):
# `v` is an iterator that yields all subsequent elements
# that have the same value
# `x` is that value
print list(v)
输出是:
$ python test.py
[38]
[1200, 1200]
[306, 306]
[391, 391]
[82, 82]
[35, 35]
[902, 902]
[955, 955]
[13]
这显然是你要求的?
至于你的模式,这里有一些生成器函数(至少)产生你期望给定输入的输出:
import itertools
def split_sublists(input_list):
sublist = []
for val, l in itertools.groupby(input_list):
l = list(l)
if not sublist or len(l) == 2:
sublist += l
else:
sublist += l
yield sublist
sublist = []
yield sublist
input_list = [1,4,4,5,5,8,8,10,10,25,25,70,70,90,90,100,2,3,3,4,4,5,5,8,8,9,20,21,21,22,23]
for sublist in split_sublists(input_list):
print sublist
输出:
$ python test.py
[1, 4, 4, 5, 5, 8, 8, 10, 10, 25, 25, 70, 70, 90, 90, 100]
[2, 3, 3, 4, 4, 5, 5, 8, 8, 9]
[20, 21, 21, 22]
[23]
numpy 版本:
>>> inds = np.where(np.diff(x))[0]
>>> out = np.split(x,inds[np.diff(inds)==1][0::2]+2)
>>> for n in out:
... print n
[ 38 1200 1200 306 306 391 391 82 82 35 35 902 902 955 955
13]
[955 847 847 835 835 698 698 777 777 896 896 923 923 940 940 569 569 53
53 411]
[ 53 1009 1009 1884]
[1009 878]
[ 923 886 886 511 511 942 942 1067 1067 1888 1888 243 243 1556]
您的新案例是相同的:
>>> inds = np.where(np.diff(x))[0]
>>> out = np.split(x,inds[np.diff(inds)==1][0::2]+2)
>>> for n in out:
... print n
...
[ 1 4 4 5 5 8 8 10 10 25 25 70 70 90 90 100]
[2 3 3 4 4 5 5 8 8 9]
[20 21 21 22]
[23]
从x
列表开始:
%timeit inds = np.where(np.diff(x))[0];out = np.split(x,inds[np.diff(inds)==1][0::2]+2)
10000 loops, best of 3: 169 µs per loop
如果x
是一个 numpy 数组:
%timeit inds = np.where(np.diff(arr_x))[0];out = np.split(arr_x,inds[np.diff(inds)==1][0::2]+2)
10000 loops, best of 3: 135 µs per loop
对于较大的系统,您可能期望 numpy 比纯 python 具有更好的性能。
这是我的丑陋解决方案:
x = [38, 1200, 1200, 306, 306, 391, 391, 82, 82, 35, 35, 902, 902, 955, 955, 13, 955, 847, 847, 835, 83, 5698, 698, 777, 777, 896, 896, 923, 923, 940, 940, 569, 569, 53, 53, 411]
def weird_split(alist):
sublist = []
for i, n in enumerate(alist[:-1]):
sublist.append(n)
# make sure we only create a new list if the current one is not empty
if len(sublist) > 1 and n != alist[i-1] and n != alist[i+1]:
yield sublist
sublist = []
# always add the last element
sublist.append(alist[-1])
yield sublist
for sublist in weird_split(x):
print sublist
和输出:
[38, 1200, 1200, 306, 306, 391, 391, 82, 82, 35, 35, 902, 902, 955, 955, 13]
[955, 847, 847, 835]
[83, 5698]
[698, 777, 777, 896, 896, 923, 923, 940, 940, 569, 569, 53, 53, 411]
首先,您还没有为 定义行为[1, 0, 0, 1, 0, 0, 1]
,因此将其拆分为[1, 0, 0, 1]
,[0, 0]
和[1]
.
其次,有很多特殊情况需要正确处理,因此比您预期的要长。如果它直接将东西放入列表中,这也会被缩短,但是生成器是个好东西,所以我确保不这样做。
首先,使用完整的迭代器接口而不是yield
快捷方式,因为它允许在外部和内部迭代之间更好地共享状态,而无需subsection
每次迭代都创建一个新的生成器。嵌套s 可能能够在更少的空间内完成此操作,但在这种情况下,我认为冗长是可以接受的def
。yield
所以,设置:
class repeating_sections:
def __init__(self, iterable):
self.iter = iter(iterable)
try:
self._cache = next(self.iter)
self.finished = False
except StopIteration:
self.finished = True
我们需要定义产生的子迭代器,直到它找到不匹配的对。因为 end 将从迭代器中删除,我们yield
在下一次调用时需要它_subsection
,所以将它存储在_cache
.
def _subsection(self):
yield self._cache
try:
while True:
item1 = next(self.iter)
try:
item2 = next(self.iter)
except StopIteration:
yield item1
raise
if item1 == item2:
yield item1
yield item2
else:
yield item1
self._cache = item2
return
except StopIteration:
self.finished = True
__iter__
应该返回self
迭代:
def __iter__(self):
return self
__next__
除非完成,否则返回一个小节。请注意,如果要使行为可靠,则用尽该部分很重要。
def __next__(self):
if self.finished:
raise StopIteration
subsection = self._subsection()
return subsection
for item in subsection:
pass
一些测试:
for item in repeating_sections(x):
print(list(item))
#>>> [38, 1200, 1200, 306, 306, 391, 391, 82, 82, 35, 35, 902, 902, 955, 955, 13]
#>>> [955, 847, 847, 835, 835, 698, 698, 777, 777, 896, 896, 923, 923, 940, 940, 569, 569, 53, 53, 411]
#>>> [53, 1009, 1009, 1884]
#>>> [1009, 878]
#>>> [923, 886, 886, 511, 511, 942, 942, 1067, 1067, 1888, 1888, 243, 243, 1556]
for item in repeating_sections([1, 0, 0, 1, 0, 0, 1]):
print(list(item))
#>>> [1, 0, 0, 1]
#>>> [0, 0]
#>>> [1]
显示这一点的一些时机并非完全没有意义:
SETUP="
x = [38, 1200, 1200, 306, 306, 391, 391, 82, 82, 35, 35, 902, 902, 955, 955, 13, 955, 847, 847, 835, 83, 5698, 698, 777, 777, 896, 896, 923, 923, 940, 940, 569, 569, 53, 53, 411]
x *= 5000
class repeating_sections:
def __init__(self, iterable):
self.iter = iter(iterable)
try:
self._cache = next(self.iter)
self.finished = False
except StopIteration:
self.finished = True
def _subsection(self):
yield self._cache
try:
while True:
item1 = next(self.iter)
try:
item2 = next(self.iter)
except StopIteration:
yield item1
raise
if item1 == item2:
yield item1
yield item2
else:
yield item1
self._cache = item2
return
except StopIteration:
self.finished = True
def __iter__(self):
return self
def __next__(self):
if self.finished:
raise StopIteration
subsection = self._subsection()
return subsection
for item in subsection:
pass
def weird_split(alist):
sublist = []
for i, n in enumerate(alist[:-1]):
sublist.append(n)
# make sure we only create a new list if the current one is not empty
if len(sublist) > 1 and n != alist[i-1] and n != alist[i+1]:
yield sublist
sublist = []
# always add the last element
sublist.append(alist[-1])
yield sublist
"
python -m timeit -s "$SETUP" "for section in repeating_sections(x):" " for item in section: pass"
python -m timeit -s "$SETUP" "for section in weird_split(x):" " for item in section: pass"
结果:
10 loops, best of 3: 150 msec per loop
10 loops, best of 3: 207 msec per loop
差别不大,但速度更快。
def group(l,skip=0):
prevind = 0
currind = skip+1
for val in l[currind::2]:
if val != l[currind-1]:
if currind-prevind-1 > 1: yield l[prevind:currind-1]
prevind = currind-1
currind += 2
if prevind != currind:
yield l[prevind:currind]
对于您定义的列表,哪个在调用时返回skip=1
[38, 1200, 1200, 306, 306, 391, 391, 82, 82, 35, 35, 902, 902, 955, 955]
[13, 955, 847, 847, 835, 835, 698, 698, 777, 777, 896, 896, 923, 923, 940, 940, 569, 569, 53, 53]
[411, 53, 1009, 1009]
[1884, 1009]
[878, 923, 886, 886, 511, 511, 942, 942, 1067, 1067, 1888, 1888, 243, 243, 1556]
还有一个更简单的示例列表[1,1,3,3,2,5]
:
for g in group(l2):
print g
[1, 1, 3, 3]
[2, 5]
原因skip
是该函数的可选参数是在您的示例中38被包括在内,尽管它不等于1200。如果这是一个错误,那么只需删除 skip 并最初设置currind
为 equal 。1
解释:
在一个列表中[a,b,c,d,e,...]
。我们想连续比较两个元素,即a == b
, c == d
,然后当比较没有返回时True
,捕获所有之前的元素(不包括那些已经捕获的元素)。为此,我们需要跟踪最后一次捕获发生的位置,最初是0
(即没有捕获)。然后我们遍历每个对,遍历列表中的第二个元素,currind
默认情况下(当不跳过元素时)是一个。然后将我们得到l[currind::2]
的值与它之前的值进行比较l[currind-1]
。是从的初始值开始currind
的每个第二个元素的索引(默认情况下)。如果值不currind
1
match 那么我们需要执行捕获,但前提是生成的捕获包含一个术语!因此currind-prevind-1
> 1(因为列表切片的长度为 -1,因此需要为 2 或更多才能提取至少 1 个元素)。l[prevind:currind-1]
执行此捕获,从不匹配(或0
默认)的最后一个比较的索引直到每个比较对中第一个值之前a,b
的元素等c,d
。然后prevind
设置为currind-1
即捕获的最后一个元素的索引。然后我们增加currind
2 以进入 next 的索引val
。最后,如果还剩下一对,我们将其提取出来。
所以对于[1,1,3,3,2,5]
:
val is 1, at index 1. comparing to value at 0 i.e 1
make currind the index of last element of the next pair
val is 3, at index 3. comparing to value at 2 i.e 3
make currind the index of last element of the next pair
val is 5, at index 5. comparing to value at 4 i.e 2
not equal so get slice between 0,4
[1, 1, 3, 3]
make currind the index of last element of the next pair #happens after the for loop
[2, 5]