0

任何人都可以帮我在每行的开头/结尾删除这些双引号吗?

我有一个大的 csv(800k 行)并且想要创建插入语句以将数据放入 SQL DB。我知道代码真的很难看,但我以前从未使用过 Python……非常感谢任何帮助……

#Script file to read from .csv containing raw location data (zip code database)
#SQL insert statements are written to another CSV
#Duplicate zip codes are removed


import csv

Blockquote

csvfile = open('c:\Canada\canada_zip.csv', 'rb')
dialect = csv.Sniffer().sniff(csvfile.readline())
csvfile.seek(0)
reader = csv.reader(csvfile, dialect)
reader.next()

ofile  = open('c:\Canada\canada_inserts.csv', 'wb')
writer = csv.writer(ofile, dialect)

#DROP / CREATE TABLE
createTableCmd = '''DROP TABLE PopulatedPlacesCanada       \n\
CREATE TABLE PopulatedPlacesCanada                         \n\
(                                                  \n\
ID INT primary key identity not null,      \n\
Zip VARCHAR(10),                           \n\
City nVARCHAR(100),                        \n\
County nvarchar(100),                      \n\
StateCode varchar(3),                      \n\
StateName nvarchar(100),                   \n\
Country nvarchar(30),                      \n\
Latitude float,                            \n\
Longitude float,                           \n\
PopulationCount int,                       \n\
Timezone int,                              \n\
Dst  bit                                   \n\
)'''
writer.writerow([createTableCmd])

table = 'PopulatedPlacesCanada'
db_fields = 'Zip, City, County, StateCode, StateName, Country, Latitude, Longitude,         PopulationCount, Timezone, Dst'
zip_codes = set()

count = 0

for row in reader:
  if row[0] not in zip_codes: #only add row if zip code is unique
    count = count + 1
    zipCode = row[0] #not every row in the csv is needed so handpick them using row[n]
    city = row[1].replace("\'", "").strip()
    county = ""
    state_abr = row[2]
    state = row[3].replace("\'", "").strip()
    country = 'Canada'
    lat = row[8]
    lon = row[9]
    pop = row[11]
    timezone = row[6]
    dst = row[7]
    if dst == 'Y':
      dst= '1'
    if dst == 'N':
      dst = '0'
    query = "INSERT INTO {0}({1}) VALUES ('{2}', '{3}', '{4}', '{5}', '{6}', '{7}', {8}, {9}, {10}, {11}, {12})".format(table, db_fields, zipCode, city, county, state_abr, state, country, lat, lon, pop, timezone, dst)
    writer.writerow([query])
    zip_codes.add(row[0])
    if count == 100:  #Go statement to make sql batch size manageable
      writer.writerow(['GO'])
4

2 回答 2

0

You are not writing a CSV file. Don't use a csv writer for it, as it probably adds the additional ascaping to your data. Instead, use

ofile = file( 'load.sql', 'w')
# Raw write, no newline added:
ofile.write(...)
# or, with newline at the end:
print >>ofile, "foobar."

It's the CSV writer that is adding the quotes to your line: most CSV dialects expect strings to be wrapped in quotes when they contain certain characters, such as , or ; or even spaces. However, as you are writing SQL and not CSV, you don't need or want this.

于 2012-01-24T07:31:38.257 回答
0

首先是 2 个指针:-
1) 对多行字符串使用三重引号,而不是三撇号。
2)无需将“\n\”放在多行字符串中。

要从一行中删除引号,请使用 python 的正则表达式模块而不是字符串替换。

import re
quotes = re.compile('^["\']|["\']$')
city = quotes.sub( row[3] )
state = quotes.sub( row[4] )

或者,您可以使用 strip 和要从两端删除的字符;一次只有一个字符 AFAIK:-

city = row[3].strip('"').strip("'")
state = row[4].strip('"').strip("'")

最后,不要将 csv 模块用于文件输出,因为它需要“上下文”。只需打开文件,然后写入即可。

ofile = file( 'canada_inserts.sql','w' )
ofile.write( createTableCmd + '\n' )
for row in reader:
...
   ofile.write( query + '\n' )
于 2012-01-23T19:03:21.510 回答