105

I want to copy a CSV file to a Postgres table. There are about 100 columns in this table, so I do not want to rewrite them if I don't have to.

I am using the \copy table from 'table.csv' delimiter ',' csv; command but without a table created I get ERROR: relation "table" does not exist. If I add a blank table I get no error, but nothing happens. I tried this command two or three times and there was no output or messages, but the table was not updated when I checked it through PGAdmin.

Is there a way to import a table with headers included like I am trying to do?

4

5 回答 5

147

这行得通。第一行中有列名。

COPY wheat FROM 'wheat_crop_data.csv' DELIMITER ';' CSV HEADER
于 2013-07-16T01:46:21.977 回答
28

使用 Python 库pandas,您可以轻松地创建列名并从 csv 文件推断数据类型。

from sqlalchemy import create_engine
import pandas as pd

engine = create_engine('postgresql://user:pass@localhost/db_name')
df = pd.read_csv('/path/to/csv_file')
df.to_sql('pandas_db', engine)

if_exists参数可以设置为替换或附加到现有表,例如df.to_sql('pandas_db', engine, if_exists='replace')。这也适用于其他输入文件类型,此处此处的文档。

于 2015-04-30T00:45:31.150 回答
13

未经许可由终端替代

NOTES的pg 文档 说

该路径将被解释为相对于服务器进程的工作目录(通常是集群的数据目录),而不是客户端的工作目录。

所以,一般来说,使用psql或任何客户端,即使在本地服务器中,你也会遇到问题......而且,如果你正在为其他用户表达 COPY 命令,例如。在 Github README 中,读者会遇到问题...

使用客户端权限表示相对路径的唯一方法是使用STDIN

当指定 STDIN 或 STDOUT 时,数据通过客户端和服务器之间的连接传输。

正如这里所记得的

psql -h remotehost -d remote_mydb -U myuser -c \
   "copy mytable (column1, column2) from STDIN with delimiter as ','" \
   < ./relative_path/file.csv
于 2017-01-04T13:22:32.760 回答
4

我已经使用这个功能一段时间了,没有任何问题。您只需要提供 csv 文件中的数字列,它将从第一行获取标题名称并为您创建表:

create or replace function data.load_csv_file
    (
        target_table  text, -- name of the table that will be created
        csv_file_path text,
        col_count     integer
    )

    returns void

as $$

declare
    iter      integer; -- dummy integer to iterate columns with
    col       text; -- to keep column names in each iteration
    col_first text; -- first column name, e.g., top left corner on a csv file or spreadsheet

begin
    set schema 'data';

    create table temp_table ();

    -- add just enough number of columns
    for iter in 1..col_count
    loop
        execute format ('alter table temp_table add column col_%s text;', iter);
    end loop;

    -- copy the data from csv file
    execute format ('copy temp_table from %L with delimiter '','' quote ''"'' csv ', csv_file_path);

    iter := 1;
    col_first := (select col_1
                  from temp_table
                  limit 1);

    -- update the column names based on the first row which has the column names
    for col in execute format ('select unnest(string_to_array(trim(temp_table::text, ''()''), '','')) from temp_table where col_1 = %L', col_first)
    loop
        execute format ('alter table temp_table rename column col_%s to %s', iter, col);
        iter := iter + 1;
    end loop;

    -- delete the columns row // using quote_ident or %I does not work here!?
    execute format ('delete from temp_table where %s = %L', col_first, col_first);

    -- change the temp table name to the name given as parameter, if not blank
    if length (target_table) > 0 then
        execute format ('alter table temp_table rename to %I', target_table);
    end if;
end;

$$ language plpgsql;
于 2017-05-25T23:11:36.890 回答
-1

You can use d6tstack which creates the table for you and is faster than pd.to_sql() because it uses native DB import commands. It supports Postgres as well as MYSQL and MS SQL.

import pandas as pd
df = pd.read_csv('table.csv')
uri_psql = 'postgresql+psycopg2://usr:pwd@localhost/db'
d6tstack.utils.pd_to_psql(df, uri_psql, 'table')

It is also useful for importing multiple CSVs, solving data schema changes and/or preprocess with pandas (eg for dates) before writing to db, see further down in examples notebook

d6tstack.combine_csv.CombinerCSV(glob.glob('*.csv'), 
    apply_after_read=apply_fun).to_psql_combine(uri_psql, 'table')
于 2018-12-17T04:13:02.250 回答