python - 是否有工具可以帮助将文件视为数据库中的表？

Question

我有 csv 文件，并希望将它们视为数据库表。当然，我可以将这些文件转换成表格。但是如果有可能直接在命令行中执行它会很好（以类似grep, head,tail和sort的awk方式使用）。

例如，我想select选择文件的特定列（由其名称给出），或选择where某些列具有某些值的行，或order by其中一个列。

score 4 · Accepted Answer

既然你用 python 和 ipython 标记了它，我假设你想看看在 ipython 提示符下执行此操作会是什么样子。所以，这是一个简单的 CSV 文件 people.csv：

first,last,age
John,Smith,20
Jane,Smith,19
Frank,Jones,30

现在，这是一个使用它的 ipython 会话：

In [1]: import csv
In [2]: from operator import *
In [3]: with open('foo.csv') as f: people = list(csv.DictReader(f))
In [4]: [p['age'] for p in sorted(people, key=itemgetter('first')) if p['last'] == 'Smith']
Out[4]: ['19', '20']

将 CSV 文件作为字典列表读入内存需要一行。

鉴于此，您可以在其上运行列表推导。

因此，p['age']按名称选择一列；另一列的sorted(people, itemgetter('first'))订单，并且if p['last'] == 'Smith'是 where 子句。

第二个有点笨拙，但我们可以解决这个问题：

In [5]: def orderby(table, column): return sorted(table, key=itemgetter(column))
In [6]: [p['age'] for p in orderby(people, 'first') if p['last'] == 'Smith']
Out[6]: ['19', '20']

您甚至可以group by在的帮助下创建子句itertools，尽管在这里您肯定希望为 groupby 和应用于组的聚合定义辅助函数，我认为它仍然可能有点突破限制......</p >

In [7]: from itertools import *
In [8]: def ilen(iterable): return sum(1 for _ in iterable)
In [9]: def group(table, column): return groupby(table, itemgetter(column))
In [10]: [(k, ilen(g)) for k, g in group(people, 'last')]
Out[10]: [('Smith', 2), ('Jones', 1)]
In [11]: def glen(kg): return kg[0], sum(1 for _ in kg[1])
In [12]: [glen(g) for g in group(people, 'last')]
Out[12]: [('Smith', 2), ('Jones', 1)]
In [13]: def gsum(kg, column): return kg[0], sum(int(x[column]) for x in kg[1])
In [14]: [gsum(g, 'age') for g in group(people, 'last')]
Out[14]: [('Smith', 39), ('Jones', 30)]

但是，有几点需要记住：

它需要将整个内容读入内存。
没有“索引”。有了数据库，从10万人中选出20个史密斯只需要log(100000)+20步；用一个列表，它需要 100000 步。
您必须适当地订购操作。当您想要排序，然后过滤行，然后过滤列（如上例所示）时，一切都很容易；如果您想要不同的顺序（特别是如果您想按未选择的列进行排序或过滤），您可能需要编写更复杂的理解，而使用数据库则完全没有问题。

请记住，将 CSV 文件转换为 sqlite 表只需大约 5 行代码。所以，我认为你最好使用一个只运行你的 5 行 Python 程序并将你转储到 sqlite 命令行的脚本。

score 3 · Accepted Answer

由于您使用 'python' 标记了它，python 的 'pandas' 模块提供了一个 DataFrame 对象，该对象提供了您似乎想要的功能。使用 pandas.read_csv() 读取 CSV 文件。此处提供了有关 DataFrames 的快速入门：http: //pandas.pydata.org/pandas-docs/stable/dsintro.html#dataframe

python - 是否有工具可以帮助将文件视为数据库中的表？

2 回答 2

Related

Reference