python - Python - Twisted：以表格形式发布

Question

嗨，大家好！

我仍在发现 Twisted，并且我制作了这个脚本来将 HTML 表格的内容解析为 excel。这个脚本运行良好！我的问题是我怎么能做同样的事情，只有一个网页（http://bandscore.ielts.org/）但是有很多 POST 请求能够获取所有结果，用 beautifulSoup 解析它然后把它们进入excel？

解析源代码并将其放入 excel 中是可以的，但我不知道如何使用 Twisted 进行 POST 请求以便在

这是我用于解析（使用 Twisted）许多不同页面的脚本（我希望能够编写相同的脚本，但在同一页面上使用许多不同的 POST 数据而不是很多页面）：

from twisted.web import client
from twisted.internet import reactor, defer
from bs4 import BeautifulSoup as BeautifulSoup
import time
import xlwt

start = time.time()
wb = xlwt.Workbook(encoding='utf-8')
ws = wb.add_sheet("BULATS_IA_PARSED")
global x
x = 0
Countries_List = ['Afghanistan','Armenia','Brazil','Argentina','Armenia','Australia','Austria','Azerbaijan','Bahrain','Bangladesh','Belgium','Belize','Bolivia','Bosnia and Herzegovina','Brazil','Brunei Darussalam','Bulgaria','Cameroon','Canada','Central African Republic','Chile','China','Colombia','Costa Rica','Croatia','Cuba','Cyprus','Czech Republic','Denmark','Dominican Republic','Ecuador','Egypt','Eritrea','Estonia','Ethiopia','Faroe Islands','Fiji','Finland','France','French Polynesia','Georgia','Germany','Gibraltar','Greece','Grenada','Hong Kong','Hungary','Iceland','India','Indonesia','Iran','Iraq','Ireland','Israel','Italy','Jamaica','Japan','Jordan','Kazakhstan','Kenya','Kuwait','Latvia','Lebanon','Libya','Liechtenstein','Lithuania','Luxembourg','Macau','Macedonia','Malaysia','Maldives','Malta','Mexico','Monaco','Montenegro','Morocco','Mozambique','Myanmar (Burma)','Nepal','Netherlands','New Caledonia','New Zealand','Nigeria','Norway','Oman','Pakistan','Palestine','Papua New Guinea','Paraguay','Peru','Philippines','Poland','Portugal','Qatar','Romania','Russia','Saudi Arabia','Serbia','Singapore','Slovakia','Slovenia','South Africa','South Korea','Spain','Sri Lanka','Sweden','Switzerland','Syria','Taiwan','Thailand','Trinadad and Tobago','Tunisia','Turkey','Ukraine','United Arab Emirates','United Kingdom','United States','Uruguay','Uzbekistan','Venezuela','Vietnam']
urls = ["http://www.cambridgeesol.org/institutions/results.php?region=%s&type=&BULATS=on" % Countries for Countries in Countries_List]


def finish(results):
    global x
    for result in results:
        print 'GOT PAGE', len(result), 'bytes'
        soup = BeautifulSoup(result)
        tableau = soup.findAll('table')
    try:
        rows = tableau[3].findAll('tr')
        print("Fetching")
        for tr in rows:
        cols = tr.findAll('td')
        y = 0
        x = x + 1
        for td in cols:
            texte_bu = td.text
            texte_bu = texte_bu.encode('utf-8')
            #print("Writing...")
                    #print texte_bu
            ws.write(x,y,td.text)
            y = y + 1
    except(IndexError):
        print("No IA for this country")
        pass

    reactor.stop()

waiting = [client.getPage(url) for url in urls]
defer.gatherResults(waiting).addCallback(finish)

reactor.run()
wb.save("IALOL.xls")
print "Elapsed Time: %s" % (time.time() - start)

非常感谢您的帮助！

score 2 · Accepted Answer

你有两个选择。继续使用getPage并告诉它使用POST而不是GET或使用Agent.

API 文档getPage将您引导至API 文档HTTPClientFactory以发现其他支持的选项。

后面的 API 文档明确涵盖method并暗示（但解释不好）postdata。因此，要使用以下方式进行POSTgetPage：

d = getPage(url, method='POST', postdata="hello, world, or whatever.")

有一个howto 样式的文档Agent（链接自整个 web howto 文档索引。这给出了发送带有正文的请求的示例（即，参见FileBodyProducer示例）。

python - Python - Twisted：以表格形式发布

1 回答 1

Related

Reference