1

我一直在关注在线教程,但我不想使用标题附带的教程数据,而是想使用以下代码:

我遇到的问题是我的表没有标题,所以它使用第一行作为标题。如何设置“Ride”和“Queue Time”的定义标题?

谢谢

import requests
import lxml.html as lh
import pandas as pd

url='http://www.ridetimes.co.uk/'

page = requests.get(url)

doc = lh.fromstring(page.content)

tr_elements = doc.xpath('//tr')

r_elements = doc.xpath('//tr')

col=[]
i=0
#For each row, store each first element (header) and an empty list
for t in tr_elements[0]:
    i+=1
    name=t.text_content()
    print '%d:"%s"'%(i,name)
    col.append((name,[]))
    print(col)
4

3 回答 3

0

使用 pandas 获取表,然后分配列名:

import pandas as pd

url='http://www.ridetimes.co.uk/'
df = pd.read_html(url)[0]

df.columns = ['Ride', 'Queue Time']

输出:

print (df)
               Ride             Queue Time
0  Spinball Whizzer                 0 mins
1           Nemesis                 5 mins
2          Oblivion                 5 mins
3        Wicker Man                 5 mins
4        The Smiler                10 mins
5              Rita                20 mins
6          TH13TEEN                25 mins
7         Galactica  Currently Unavailable
8        Enterprise  Currently Unavailable
于 2019-06-06T12:57:09.127 回答
0

试试这个怎么样:

>>> pd.DataFrame(col,columns=["Ride","Queue Time"])
               Ride Queue Time
0  Spinball Whizzer         []
1            0 mins         []

如果我是正确的,那么这就是答案。

于 2019-06-06T12:36:23.097 回答
0

考虑使用与页面相同的源来更新返回 json 的值。您向 url 添加一个随机数以防止提供缓存的结果。这不仅适用于所有组类型thrill

import requests
import random 
import pandas as pd

i = random.randint(1,1000000000000000000)
r = requests.get('http://ridetimes.co.uk/queue-times-new.php?r=' + str(i)).json() #to prevent cached results being served
df = pd.DataFrame([(item['ride'], item['time']) for item in r], columns = ['Ride', ' Queue Time'])
print(df)

如果你只想要thrill组然后修改这一行:

df = pd.DataFrame([(item['ride'], item['time']) for item in r if item['group'] == 'Thrill'], columns = ['Ride', ' Queue Time'])
于 2019-06-06T21:55:05.027 回答