7

我有一些需要过滤的结构。有没有办法在 Python 中很好地做到这一点?

我有一种丑陋的做法,但我想清理它:

original_header = ['a','b','c']
original_rows = [[1,0,1], [0,0,0], [1,0,0]]

processed_header, processed_rows = some_cool_utility(original_header, original_rows)

assert_equals(['a', 'c'], processed_header)
assert_equals([[1,1], [0,0], [1,0]], processed_rows)
4

6 回答 6

5
original_header = ['a','b','c']
original_rows = [[1,0,1], [0,0,0], [1,0,0]]

#transpose rows to get columns
columns = zip(*original_rows)

#build list which is true if the column should be kept (is not a column of all zeros)  
not_all_zero = [  any(x) for x in columns ]

#filter the lists based on columns
processed_header = [x for i,x in enumerate(original_header) if not_all_zero[i] ]
processed_columns = [ x for i,x in enumerate(columns) if not_all_zero[i] ]

#transpose the remaining columns back into rows.
processed_rows = zip(*processed_columns)

print (processed_header)  #['a', 'c']
print (processed_rows)    #[(1, 1), (0, 0), (1, 0)]

请注意,这将返回元组列表而不是列表列表。如果你真的需要一个列表列表,你可以processed_rows = map(list, processed_rows)

于 2012-08-17T12:33:15.333 回答
5

使用NumPy

import numpy as np

original_rows = np.asarray([[1,0,1], [0,0,0], [1,0,0]])
original_labels = np.asarray(["a", "b", "c"])

# Get locations where columns are all zeros.
nonzero_cols = np.any(original_rows!=0, axis=0)

# Get data only where column is not all zeros.
nonzero_data = original_rows[:, nonzero_cols]
nonzero_labels = original_labels[nonzero_cols]
于 2012-08-17T12:42:48.237 回答
2

这应该有效:

>>> original_header = ['a','b','c']
>>> original_rows = [[1,0,1], [0,0,0], [1,0,0]]
>>> row_major = zip(*original_rows)
>>> filtered = [(h, col) 
...             for h, col 
...             in zip(original_header, row_major) 
...             if any(col)]
>>> header, rows = zip(*filtered)
>>> header
('a', 'c')
>>> rows
((1, 0, 1), (1, 0, 0))
>>> zip(*rows)
[(1, 1), (0, 0), (1, 0)]
>>> 

编辑:固定;列表理解添加了filtered一个额外的转置,我并没有真正认真地看

于 2012-08-17T12:36:34.650 回答
1

如果您不拘泥于数据的格式,将数据存储为字典会使这变得更简单:

original_header = ['a','b','c']
original_rows = [[1,0,1], [0,0,0], [1,0,0]]

# Restructure data into easier-to-process dict
to_dict = dict(zip(original_header, zip(*original_rows)))
print to_dict # {'a': (1, 0, 1), 'b': (0, 0, 0), 'c': (1, 0, 0)}

# Filter out keys with all-zero values
filtered_dict = {k:v for (k, v) in dictify.items()
                 if not all(x==0 for x in v)}

print filtered_dict # Output: {'a': (1, 0, 1), 'c': (1, 0, 0)}
于 2012-08-17T12:46:34.030 回答
0

仅作记录:

def some_cool_utility(header, rows):
    data = [element for element in zip(header, rows) if any(element[1])]
    head, rows = zip(*data)
    return head, rows
于 2012-08-17T15:11:20.740 回答
0

以下作品:

all_rows = original_rows[:] #make a copy
all_rows.insert(0, original_header)

all_columns = list(zip(*all_rows)) #transpose
filtered_columns = [col for col in all_columns if any(col[1:])] #remove columns that only contain 0's
filtered_rows = [list(tp) for tp in zip(*filtered_columns)] #transpose back, convert each element to a list

processed_header = filtered_rows[0]
processed_rows = filtered_rows[1:]
于 2012-08-17T12:46:04.110 回答