2

我有一堆网址,我想下载文本并进行进一步分析。我是蟒蛇新手。我有两个问题:(1)我有一个非常奇怪的类型错误;(2) 结果没有写入数据帧。我的代码如下:

smallURL= ['http://www.walesonline.co.uk/business/business-news/more-70-jobs-created-bio-12836127','http://economictimes.indiatimes.com/articleshow/61006825.cms?utm_source=contentofinterest&utm_medium=text&utm_campaign=cppst','http://100seguro.com.ar/telefonica-pone-en-venta-su-aseguradora-antares-vida/','http://13wham.com/news/local/urmc-opens-newest-urgent-care-facility']

import pandas
import datetime


f = open('myfile', 'w')

#lista= ['http://www.walesonline.co.uk/business/business-news/more-70-jobs-created-bio-12836127','http://economictimes.indiatimes.com/articleshow/61006825.cms?utm_source=contentofinterest&utm_medium=text&utm_campaign=cppst','http://100seguro.com.ar/telefonica-pone-en-venta-su-aseguradora-antares-vida/','http://13wham.com/news/local/urmc-opens-newest-urgent-care-facility']

df = pandas.DataFrame(columns=('d', 'datetime', 'title', 'text','keywords', 'url'))

from newspaper import Article 

for index in range(len(smallURL)):

#url = "https://www.bloomberg.com/news/articles/2017-11-10/microsoft-and-google-turn-to-ai-to-catch-amazon-in-the-cloud"
    article = Article(smallURL[index])
#1 . Download the article
    #try:
    article.download()
    #f.write('article.title+\n')
    #except:
    #pass
#2. Parse the article
    try:
        article.parse()
        f.write('article.title+\n')
    except:
        pass
#Print article title
    #print(article.title)
    article.title
#3. Fetch Author Name(s)
    print(article.authors)
#4. Fetch Publication Date
    if article.publish_date is None:
        d = datetime.datetime.now().date()
    else:
        d = article.publish_date
#5. Print article text
    print(article.text)
#6. Natural Language Processing on Article to fetch Keywords
    #article.nlp()
    #Print Keywords
    print(article.keywords)
#7. Generate Summary of the article
    #print(article.url)
    print(article.url)
    df.loc[index]  = [d, datetime.datetime.now().date(), article.title, article.text,article.keywords,article.url]

我的输出包括:

[] http://100seguro.com.ar/telefonica-pone-en-venta-su-aseguradora-antares-vida/ 追溯(最近一次通话最后):

文件“”,第 1 行,在 runfile('C:/Users/theiman/Desktop/untitled7.py', wdir='C:/Users/theiman/Desktop')

文件“C:\Users\theiman\AppData\Local\Continuum\anaconda3\lib\site-packages\spyder\utils\site\sitecustomize.py”,第 710 行,运行文件 execfile(文件名,命名空间)

文件“C:\Users\theiman\AppData\Local\Continuum\anaconda3\lib\site-packages\spyder\utils\site\sitecustomize.py”,第 101 行,在 execfile exec(compile(f.read(), filename , 'exec'), 命名空间)

文件“C:/Users/theiman/Desktop/untitled7.py”,第 57 行,在 df.loc[index] = [d, datetime.datetime.now().date(), article.title, article.text, article.keywords,article.url]

文件“C:\Users\theiman\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\indexing.py”,第 179 行,在setitem self._setitem_with_indexer(indexer, value)

_setitem_with_indexer 中的文件“C:\Users\theiman\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\indexing.py”,第 425 行 self.obj._data = self.obj.append(value )。_数据

文件“C:\Users\theiman\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\frame.py”,第 4533 行,附加其他 = other._convert(datetime=True, timedelta=True )

文件“C:\Users\theiman\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\generic.py”,第 3472 行,在 _convert copy=copy))。完成(自己)

文件“C:\Users\theiman\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\internals.py”,第 3227 行,转换返回 self.apply('convert', **kwargs)

文件“C:\Users\theiman\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\internals.py”,第 3091 行,在 apply = getattr(b, f)(**kwargs)

文件“C:\Users\theiman\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\internals.py”,第 1892 行,在 convert values = fn(values.ravel(), **fn_kwargs )

文件“C:\Users\theiman\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\dtypes\cast.py”,第 740 行,在 soft_convert_objects 值 = lib.maybe_convert_objects(values, convert_datetime=datetime )

文件“pandas/_libs/src\inference.pyx”,第 1204 行,在 pandas._libs.lib.maybe_convert_objects

TypeError:不可散列的类型:'tzutc'

关于出了什么问题以及如何解决它的任何想法?谢谢!!

4

0 回答 0