0

我正在使用scrapy并将从网页获取的数据写入CSV文件

我的pipeline代码:

def __init__(self):
    self.file_name = csv.writer(open('example.csv', 'wb'))
    self.file_name.writerow(['Title', 'Release Date','Director'])

def process_item(self, item, spider):
    self.file_name.writerow([item['Title'].encode('utf-8'),
                                item['Release Date'].encode('utf-8'),
                                item['Director'].encode('utf-8'),
                                ])
    return item 

我在 CSV 文件中的输出格式是:

Title,Release Date,Director
And Now For Something Completely Different,1971,Ian MacNaughton
Monty Python And The Holy Grail,1975,Terry Gilliam and Terry Jones
Monty Python's Life Of Brian,1979,Terry Jones
.....

但是是否可以将title其值写入一列,Release date将其值写入下一列,Director并将其值写入下一列(因为 CSV 是逗号分隔值),格式如下所示。

        Title,                                 Release Date,            Director
And Now For Something Completely Different,      1971,              Ian MacNaughton
Monty Python And The Holy Grail,                 1975,     Terry Gilliam and Terry Jones
Monty Python's Life Of Brian,                    1979,              Terry Jones

任何帮助,将不胜感激。提前致谢。

4

2 回答 2

1

TSV(制表符分隔值)可能会为您提供所需的内容,但当线条的长度非常不同时,它通常会变得难看。

您可以轻松编写一些代码来生成这样的表格。诀窍是您需要在输出之前拥有所有行才能计算列的宽度。

你可以在互联网上找到很多片段,这是我以前使用的一个

于 2012-05-31T07:50:19.030 回答
1

更新——重构代码以便:

  1. 使用@madjar 建议的生成器函数和
  2. 更适合 OP 提供的代码片段。

目标输出

我正在尝试使用texttable. 它产生与问题中相同的输出。此输出可能会写入 csv 文件(记录将需要为适当的 csv 方言进行按摩,我找不到仍然使用csv.writer并仍然获得每个字段中的填充空格的方法。

                  Title,                      Release Date,             Director            
And Now For Something Completely Different,       1971,              Ian MacNaughton        
Monty Python And The Holy Grail,                  1975,       Terry Gilliam and Terry Jones 
Monty Python's Life Of Brian,                     1979,                Terry Jones    

编码

这是生成上述结果所需的代码草图:

from texttable import Texttable

# ----------------------------------------------------------------
# Imagine data to be generated by Scrapy, for each record:
# a dictionary of three items. The first set ot functions
# generate the data for use in the texttable function

def process_item(item):
    # This massages each record in preparation for writing to csv
    item['Title'] = item['Title'].encode('utf-8') + ','
    item['Release Date'] = item['Release Date'].encode('utf-8') + ','
    item['Director'] = item['Director'].encode('utf-8')
    return item

def initialise_dataset():
    data = [{'Title' : 'Title',
         'Release Date' : 'Release Date',
         'Director' : 'Director'
         }, # first item holds the table header
            {'Title' : 'And Now For Something Completely Different',
         'Release Date' : '1971',
         'Director' : 'Ian MacNaughton'
         },
        {'Title' : 'Monty Python And The Holy Grail',
         'Release Date' : '1975',
         'Director' : 'Terry Gilliam and Terry Jones'
         },
        {'Title' : "Monty Python's Life Of Brian",
         'Release Date' : '1979',
         'Director' : 'Terry Jones'
         }
        ]

    data = [ process_item(item) for item in data ]
    return data

def records(data):
    for item in data:
        yield [item['Title'], item['Release Date'], item['Director'] ]

# this ends the data simulation part
# --------------------------------------------------------

def create_table(data):
    # Create the table
    table = Texttable(max_width=0)
    table.set_deco(Texttable.HEADER)
    table.set_cols_align(["l", "c", "c"])
    table.add_rows( records(data) )

    # split, remove the underlining below the header
    # and pull together again. Many ways of cleaning this...
    tt = table.draw().split('\n')
    del tt[1] # remove the line under the header
    tt = '\n'.join(tt)
    return tt

if __name__ == '__main__':
    data = initialise_dataset()
    table = create_table(data)
    print table
于 2012-05-31T12:16:08.043 回答