python - Csvkit 库使用

Question

我正在寻找使用csvkit作为库而不是从命令行将给定的 excel 文件转换为 csv。我无法找到有关库使用语法的任何信息。任何人都可以阐明如何为此目的使用 csvkit 作为库吗？

我的测试用例很简单 - 获取 input.xlsx 或 input.xls，转换并保存为 output.csv。这是我迄今为止尝试过的，它基于在其他地方找到的建议：

import csvkit

with open('input.xlsx') as csvfile:
    reader = in2csv(csvfile)
    # below is just to test whether the file could be accessed
    for row in reader:
        print(row)

给

Traceback (most recent call last):
  File "excelconvert.py", line 6, in <module>
    reader = in2csv(csvfile)
NameError: name 'in2csv' is not defined

这里有一个类似的问题，但答案似乎只是参考了未启动或实际上没有解释库使用语法的文档，它只是列出了类。有一个答案表明语法可能类似于 csv 模块，这是我在上面尝试过的，但我无处可去。

score 2 · Accepted Answer

文档强烈建议这是一个命令行工具，而不是在 Python 解释器内部使用。您可以执行以下操作从命令行将文件转换为 csv（或者您可以在 shell 脚本中弹出它）：

in2csv your_file.xlsx > your_new_file.csv

如果您想读取该文件，只需执行此操作（它与您拥有的类似，但您不需要任何外部模块，只需使用内置 Python）：

with open('input.xlsx') as csvfile:
    reader = csvfile.readlines() # This was the only line of your code I changed
    # below is just to test whether the file could be accessed
    for row in reader:
        print(row)

或者您可以使用该os模块调用命令行：

# Careful, raw sys call. Use subprocess.Popen 
# if you need to accept untrusted user input here
os.popen("in2csv your_file.xlsx > your_new_file.csv").read()

上面的片段之一可能是您需要的，但如果您真的在寻找惩罚，您可以尝试in2csv从解释器内部使用该文件。以下是你可能会做的事情（在我能找到的文档中没有对此的支持，只是我在解释器中闲逛）：

>>> from csvkit import in2csv
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ImportError: cannot import name in2csv
>>> import csvkit
>>> help(csvkit)
Help on package csvkit:

NAME
    csvkit

FILE
    c:\python27\lib\site-packages\csvkit\__init__.py

DESCRIPTION
    This module contains csvkit's superpowered alternative to the standard Python
    CSV reader and writer. It can be used as a drop-in replacement for the standard
    module.

    .. warn::

        Since version 1.0 csvkit relies on `agate <http://agate.rtfd.org>`_'s
    CSV reader and writer. This module is supported for legacy purposes only and you
    should migrate to using agate.

PACKAGE CONTENTS
    cleanup
    cli
    convert (package)
    exceptions
    grep
    utilities (package)

所以你不能直接从 csvkit 导入 in2csv （因为它没有在下面列出PACKAGE CONTENTS）。但是，如果您进行一些搜索，您会发现您可以从csvkit.utilities. 但它只会从这里变得更糟。如果你像上面那样做更多的“帮助搜索”（即从解释器调用帮助），你会发现这个类被设计为从命令行使用。所以从解释器内部使用真的很痛苦。这是尝试使用默认值的示例（导致爆炸）：

>>> from csvkit.utilities import in2csv
>>> i = in2csv.In2CSV()
>>> i.main()
usage:  [-h] [-d DELIMITER] [-t] [-q QUOTECHAR] [-u {0,1,2,3}] [-b]
        [-p ESCAPECHAR] [-z FIELD_SIZE_LIMIT] [-e ENCODING] [-S] [-H] [-v]
        [-l] [--zero] [-f FILETYPE] [-s SCHEMA] [-k KEY] [--sheet SHEET]
        [-y SNIFF_LIMIT] [--no-inference]
        [FILE]
: error: You must specify a format when providing data via STDIN (pipe).

看一下 in2csv.py 模块，你必须给它打补丁args才能让它从解释器内部做你想做的事情。同样，这不是为在解释器内部使用而设计的，它被设计为从 cmd 行调用（args如果从 cmd 行调用它，则定义为）。像这样的东西似乎在运行，但我没有彻底测试它：

>>> from csvkit.utilities import in2csv
>>> i = in2csv.In2CSV()
>>> from collections import namedtuple
>>> i.args = namedtuple("patched_args", "input_path filetype no_inference")
>>> i.args.input_path = "/path/to/your/file.xlsx"
>>> i.args.no_inference = True
>>> i.args.filetype = None
>>> i.main()

python - Csvkit 库使用

1 回答 1

Related

Reference