我有一个大的 .csv 文件。我只想选择包含时间/日期的列和我通过标题知道的其他 20 列。
作为测试,我尝试只取标题为“TIMESTAMP”的列,我知道这是 .csv 中的 4207823 行,它只包含日期和时间。下面的代码选择 TIMESTAMP 列,但也继续从其他列中获取值,如下所示:
import csv
import numpy as np
import pandas
low_memory=False
f = pandas.read_csv('C:\Users\mmso2\Google Drive\MABL Wind\_Semester 2 2016\Wind Farm Info\DataB\DataB - NaN2.csv', dtype = object)#convert file to variable so it can be edited
time = f[['TIMESTAMP']]
time = time[0:4207823]#test to see if this stops time taking other data
print time
输出
TIMESTAMP
0 2007-08-15 21:10:00
1 2007-08-15 21:20:00
2 2007-08-15 21:30:00
3 2007-08-15 21:40:00
4 2007-08-15 21:50:00
5 2007-08-15 22:00:00
6 2007-08-15 22:10:00
7 2007-08-15 22:20:00
8 2007-08-15 22:30:00
9 2007-08-15 22:40:00
10 2007-08-15 22:50:00
11 2007-08-15 23:00:00
12 2007-08-15 23:10:00
13 2007-08-15 23:20:00
14 2007-08-15 23:30:00
15 2007-08-15 23:40:00
16 2007-08-15 23:50:00
17 2007-08-16 00:00:00
18 2007-08-16 00:10:00
19 2007-08-16 00:20:00
20 2007-08-16 00:30:00
21 2007-08-16 00:40:00
22 2007-08-16 00:50:00
23 2007-08-16 01:00:00
24 2007-08-16 01:10:00
25 2007-08-16 01:20:00
26 2007-08-16 01:30:00
27 2007-08-16 01:40:00
28 2007-08-16 01:50:00
29 2007-08-16 02:00:00 #these are from the TIMESTAMP column
... ...
679302 221.484 #This is from another column
679303 NaN
679304 2015-09-23 06:40:00
679305 NaN
679306 NaN
679307 2015-09-23 06:50:00
679308 NaN
679309 NaN
679310 2015-09-23 07:00:00