list - Csv 解析程序 & 如何将多个列表展平为单个列表

Question

我一直在做一个小程序，我需要做以下事情：

获取一个 csv 文件“domains_prices.csv”，其中包含一列域，然后是每个域的价格，例如：

http://www.example1.com,$20
http://www.example2.net,$30

等等

然后是第二个文件“orders_list.csv”，它只是来自第一个文件中列出的相同域的博客文章 URL 的单列，例如：

http://www.exmaple2.net/blog-post-1
http://www.example1.com/some-article
http://www.exmaple3.net/blog-post-feb-19

等等

我需要根据第一个文件中的域检查 orders_list 中的完整 url，并检查该域上博客文章的价格，然后将所有博客文章 url 输出到一个新文件中，每个文件的价格例如：

http://www.example2.net/blog-post-1, $20

然后在输出文件的末尾会有一个总量。

我的计划是为 domain_prices 创建一个字典，其中 k,v 作为 domain & price，然后将 orders_list 中的所有 url 放在一个列表中，然后将该列表中的元素与字典中的价格进行比较。

这是我的代码，我坚持到最后，我已经 parsed_orders_list 并且它似乎将所有 url 作为单独的列表返回，所以我认为我应该将所有这些 url 放入一个列表中？

最后最后注释掉的代码是我打算做的操作，一旦我有正确的 url 列表来将它们与 dict 的 k、v 进行比较，我不确定这是否也正确。

请注意，这也是我从头开始创建的第一个完整的 python 程序，所以如果它很可怕，那就是为什么 :)

import csv
from urlparse import urlparse

#get the csv file with all domains and prices in
reader = csv.reader(open("domains_prices.csv", 'r'))

#get all the completed blog post urls
reader2 = csv.reader(open('orders_list.csv', 'r'))

domains_prices={}


orders_list = []




for row in reader2:
    #put the blog post urls into a list
    orders_list.append(','.join(row))


for domain, price in reader:
    #strip the domains
    domain = domain.replace('http://', '').replace('/','')

    #insert the domains and prices into the dictionary
    domains_prices[domain] = price


for i in orders_list:
    #iterate over the blog post urls orders_list and
    #then parse them with urlparse
    data = urlparse(i)

    #use netloc to get just the domain from each blog post url
    parsed_orders =  data.netloc


    parsed_orders_list = parsed_orders.split()


    print parsed_orders_list


"""
for k in parsed_orders:
    if k in domains_prices:
        print k, domains_prices[k]
"""

score 0 · Accepted Answer

在其他人的帮助下，我已经弄清楚了，对“for i in orders_list”部分进行了以下更改

parsed_orders = []

for i in orders_list:
#iterate over the blog post urls orders_list and
#then parse them with urlparse
data = urlparse(i)

#use netloc to get just the domain from each blog post url then put each netloc url into a list
parsed_orders.append(data.netloc)


#print parsed_orders - to check that Im getting a list of netloc urls back

#Iterate over the list of urls and dict of domains and prices to match them up
for k in parsed_orders:
    if k in domains_prices:
        print k, domains_prices[k]

list - Csv 解析程序 & 如何将多个列表展平为单个列表

1 回答 1

Related

Reference