像这样的东西怎么样:
制作数据框:
In [82]: v = [
....: (1, "000010101001010101011101010101110101", "aaa"),
....: (0, "111101010100101010101110101010111010", "bb"),
....: (0, "100010110100010101001010101011101010", "ccc"),
....: (1, "000010101001010101011101010101110101", "ddd"),
....: (1, "110100010101001010101011101010111101", "eeee"),
....: ]
In [83]:
In [83]: df = pandas.DataFrame(v)
我们可以使用fromiter
orarray
来获得ndarray
:
In [84]: d ="000010101001010101011101010101110101"
In [85]: np.fromiter(d, int) # better: np.fromiter(d, int, count=len(d))
Out[85]:
array([0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 1, 1, 0,
1, 0, 1, 0, 1, 0, 1, 1, 1, 0, 1, 0, 1])
In [86]: np.array(list(d), int)
Out[86]:
array([0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 1, 1, 0,
1, 0, 1, 0, 1, 0, 1, 1, 1, 0, 1, 0, 1])
可能有一种巧妙的矢量化方式来做到这一点,但我只需将明显的 per-entry 函数应用于值并继续我的一天:
In [87]: df[1]
Out[87]:
0 000010101001010101011101010101110101
1 111101010100101010101110101010111010
2 100010110100010101001010101011101010
3 000010101001010101011101010101110101
4 110100010101001010101011101010111101
Name: 1
In [88]: df[1] = df[1].apply(lambda x: np.fromiter(x, int)) # better with count=len(x)
In [89]: df
Out[89]:
0 1 2
0 1 [0 0 0 0 1 0 1 0 1 0 0 1 0 1 0 1 0 1 0 1 1 1 0 1 aaa
1 0 [1 1 1 1 0 1 0 1 0 1 0 0 1 0 1 0 1 0 1 0 1 1 1 0 bb
2 0 [1 0 0 0 1 0 1 1 0 1 0 0 0 1 0 1 0 1 0 0 1 0 1 0 ccc
3 1 [0 0 0 0 1 0 1 0 1 0 0 1 0 1 0 1 0 1 0 1 1 1 0 1 ddd
4 1 [1 1 0 1 0 0 0 1 0 1 0 1 0 0 1 0 1 0 1 0 1 0 1 1 eeee
In [90]: df[1][0]
Out[90]:
array([0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 1, 1, 0,
1, 0, 1, 0, 1, 0, 1, 1, 1, 0, 1, 0, 1])