我有一个 csv 文件,它有近 10000000 行,结构如下:
date , code , ret
2001-01-01,000001,0.1
2001-01-01,000002,0.01
2001-01-02,000001,0.05
2001-01-02,000002,0.02
“日期”和“代码”字段只是一个键。我想快速子集文件,像这样
subset(code='000001')
date , code , ret
2001-01-01,000001,0.1
2001-01-02,000001,0.05
或者
subset(date='2001-01-01')
date , code , ret
2001-01-01,000001,0.1
2001-01-01,000002,0.01
应该如何选择正确的数据结构以使其高效工作?