python - Python : big csv file import

Question

I'm currently unsuccessfully trying to import a big csv dataset with Python. Basically, I've got a big csv file made of stocks quotations (one stock by column with for each stock another column which contains the dividends). I'm using the csv Module but the fact is that I can't get a np.array which columns are the stocks quotations.Python creates a np.array by rows and I would like a np.array by column. How can I do??

thanks for you help!!

score 2 · Accepted Answer

我会推荐使用Pandas库。它还使您能够按较小的块读取大的 csv 文件。这是文档中的一个示例：

数据：

year indiv zit xit
0 1977 A 1.2 0.60
1 1977 B 1.5 0.50
2 1977 C 1.7 0.80
3 1978 A 0.2 0.06
4 1978 B 0.7 0.20
5 1978 C 0.8 0.30
6 1978 D 0.9 0.50

指定块大小（你得到一个可迭代的）：

reader = read_table(’tmp.sv’, sep=’|’, chunksize=4)


for chunk in reader:
.....: print chunk

输出：

year indiv zit xit
0 1977 A 1.2 0.60
1 1977 B 1.5 0.50
2 1977 C 1.7 0.80
3 1978 A 0.2 0.06
year indiv zit xit
0 1978 B 0.7 0.2
1 1978 C 0.8 0.3
2 1978 D 0.9 0.5

注意！如果您需要进一步处理您的股票数据，Pandas 是最好的选择。

score 0 · Accepted Answer

我创建了一小块函数，它确实读取 csv 文件的路径并立即返回 dict 列表，然后你很容易循环遍历列表，

def read_csv_data(path):
    """
        Reads CSV from given path and Return list of dict with Mapping
    """
    data = csv.reader(open(path))
    # Read the column names from the first line of the file
    fields = data.next()
    data_lines = []
    for row in data:
        items = dict(zip(fields, row))
        data_lines.append(items)
    return data_lines

可能这会帮助你

问候

score 0 · Accepted Answer

您正在寻找的是ndarray.shape和ndarray.reshape功能。

http://www.scipy.org/Tentative_NumPy_Tutorial

否则，你可以简单地按你的方式阅读它，然后做一个转置

x = x.transpose()

其中 x 是一个 ndarray。

http://docs.scipy.org/doc/numpy/reference/generated/numpy.ndarray.transpose.html

所有这些小东西通常都在文档中。我建议先仔细阅读这些内容。

python - Python : big csv file import

3 回答 3

Related

Reference