python - Web 抓取 Ali Express 订单数据

Question

我正在尝试从全球速卖通网站上抓取一些数据，但我不知道如何继续。开始手动执行此操作，但我猜这很容易花费我几个小时：/我基本上想提取以下数据集：

(i) 每个国家的订单

对于给定的产品，我想在 Excel 中获得目的地国家/地区的约 1000 个最后订单。例如以下产品：https ://www.aliexpress.com/item/Bluedio-T4S-Active-Noise-Cancelling-Wireless-Bluetooth-Headphones-wireless-Headset-with-Mic/32821244791.html?spm= 2114.search0103.3.1.3b0615cfrdkG5X&ws_ab_test=searchweb0_0,searchweb201602_1_10152_10151_10065_10344_10068_10342_10343_10340_10341_10084_10083_10618_10304_10307_10306_10302_5711211_10313_10059_10534_100031_10103_10627_10626_10624_10623_10622_5722411_10621_10620_5711311,searchweb201603_25,ppcSwitch_5&algo_expid=ce68d26f-337b-49ac-af00-48c5b4c4c5c4-0&algo_pvid=ce68d26f-337b-49ac-af00-48c5b4c4c5c4&transAbTest=ae803_3&priceBeautifyAB=0

图：交易记录

在这里，我的目标是获得一个带有列的 excel：日期（或其他一些唯一标识符） - 买方国家 - 件数。因此，对于图片上的第一个买家，这将类似于“2018 年 3 月 10 日 00:11”-“RU”-“1 件”。然后是 CSV 文件中的大约 100-120 个这些页面（总共大约 1000 个客户）。

任何人都可以帮助我如何在 Python 中进行编码吗？或者关于我可以使用的工具的任何想法？

(ii) 每个子类别的总订单

对于给定的（子）类别，例如“美容与健康 - 保健”（https://www.aliexpress.com/category/200002496/health-care.html?spm=2114.search0103.3.19.696619daL05kcB&site= glo&g=y ) 我想汇总 100 页产品中的所有订单。在图片中，订单以黄色圈出。

图：订单数量的产品

所以输出可能只是这个类别中的订单总数。（这将是超过 100 页的总和，每页 48 个产品）

这在 Python 中是可能的吗？我对 Python 有一些非常基本的经验，但还不足以真正构建这样的东西。

如果有人可以帮助我入门，将不胜感激！

提前非常感谢！

布鲁斯

更新：感谢 Delirious Lettuce，我设法做到了（i）。对于（ii）我已经构建了以下代码，它可以正常工作约 5 页，但在此之后开始省略产品/跳转。这是因为代码吗？或者这可能是因为他们限制从服务器提取太多数据？

import bs4
import csv
from urllib.request import urlopen as uReq
from bs4 import BeautifulSoup as soup

filename="Dresses.csv"
f=open(filename,"w")
headers="product_ID, orders\n"
f.write(headers)

for p in range(1,100):

my_url='https://www.aliexpress.com/category/200003482/dresses/' + str(p) 
+'.html?site=glo&g=y&SortType=total_tranpro_desc&needQuery=n&tag='
#had to split the above link because it did not fit on one line

uClient=uReq(my_url)
page_html=uClient.read()
uClient.close()
page_soup=soup(page_html,"html.parser")
containers=page_soup.findAll("div",{"class":"item"})


for container in containers:
    em_order = container.em
    order_num = em_order.text
    product_ID = container.input["value"]
    f.write(product_ID + "," + order_num + "\n")

f.close()

score 0 · Accepted Answer

部分答案，因为我现在没有时间看第 2 部分，但这是我使用Python 3.6.4. 我稍后也会尝试更新第 2 部分。

import csv

import requests


def _get_transactions(*, product_id, page_num):
    headers = {
        'user-agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 '
                      '(KHTML, like Gecko) Chrome/65.0.3325.146 Safari/537.36'
    }
    params = {
        'productId': product_id,
        'type': 'default',
        'page': page_num
    }
    url = 'https://feedback.aliexpress.com/display/evaluationProductDetailAjaxService.htm'
    r = requests.get(url, headers=headers, params=params)
    r.raise_for_status()
    return r.json()


def get_product_transactions(*, product_id, transaction_pages=1):
    transactions = []
    for page_num in range(1, transaction_pages + 1):
        current_transactions = _get_transactions(
            product_id=product_id,
            page_num=page_num
        )
        transactions.extend(current_transactions['records'])
    return transactions


if __name__ == '__main__':
    product_id = '32821244791'
    transactions = get_product_transactions(
        product_id=product_id,
        transaction_pages=3
    )

    with open('{}_transactions.csv'.format(product_id), 'w') as f:
        writer = csv.DictWriter(f, fieldnames=('date', 'country', 'pieces'))
        writer.writeheader()
        for transaction in transactions:
            writer.writerow({
                'date': transaction['date'],
                'country': transaction['countryCode'],
                'pieces': transaction['quantity']
            })

输出文件'32821244791_transactions.csv'

date,country,pieces
12 Mar 2018 14:42,hu,1
12 Mar 2018 14:16,be,1
12 Mar 2018 13:47,kr,1
12 Mar 2018 13:25,br,1
12 Mar 2018 13:13,ru,3
12 Mar 2018 12:41,fr,1
12 Mar 2018 11:42,es,1
12 Mar 2018 11:15,ru,1
12 Mar 2018 11:05,ru,1
12 Mar 2018 10:45,ro,1
12 Mar 2018 10:44,ru,1
12 Mar 2018 10:00,kz,1
12 Mar 2018 10:00,in,1
12 Mar 2018 09:51,fr,1
12 Mar 2018 09:39,nl,1
12 Mar 2018 09:26,fr,1
12 Mar 2018 09:24,ru,1
12 Mar 2018 09:19,cz,1
12 Mar 2018 09:00,ru,1
12 Mar 2018 08:46,ru,1
12 Mar 2018 08:33,no,1
12 Mar 2018 08:32,pl,1
12 Mar 2018 08:21,br,1
12 Mar 2018 08:20,ru,1

python - Web 抓取 Ali Express 订单数据

(i) 每个国家的订单

(ii) 每个子类别的总订单

1 回答 1

Related

Reference