0

我有一个包含 url 列表的文本文件,如下所示:

https://www.ebay.com/itm/Egyptian-Comfort-1800-Count-4-Piece-Bed-Sheet-Set-Deep-Pocket-Bed-Sheets/142436469971?epid=1760442729&hash=item2129e00cd3%3Ag%3A7gIAAOSw3YBdRVJd&_trkparms= %2526rpp_cid%253D601435485fceeb223c6f4511&var=442541824291

这里我只想在读取文本文件时打印 epid=1760442729 。

我努力了:

result = []`
with open('deals.txt', 'r') as f:
for line in f:
    if line.startswith('?epid='):
        break
    result.append(line)
print(result[0].split('epid='))

但我没有得到预期的结果。

任何帮助或建议都会对我有所帮助。提前致谢

4

3 回答 3

0
import re

s = """https://www.ebay.com/itm/Egyptian-Comfort-1800-Count-4-Piece-Bed-Sheet-Set-Deep-Pocket-Bed-Sheets/142436469971?epid=1760442729&hash=item2129e00cd3%3Ag%3A7gIAAOSw3YBdRVJd&_trkparms=%2526rpp_cid%253D601435485fceeb223c6f4511&var=442541824291
https://www.ebay.com/itm/Egyptian-Comfort-1800-Count-4-Piece-Bed-Sheet-Set-Deep-Pocket-Bed-Sheets/142436469971?epid=172442729&hash=item2129e00cd3%3Ag%3A7gIAAOSw3YBdRVJd&_trkparms=%2526rpp_cid%253D601435485fceeb223c6f4511&var=442541824291"""

for i in re.findall(r'epid=(\d+)&', s, re.MULTILINE):
    print(f'epid = {i}')
epid = 1760442729
epid = 172442729
于 2021-02-11T16:33:41.163 回答
0

如果总是使用相同的结构,我会用子字符串来做

result = []
with open('deals.txt', 'r') as f:
for line in f:
    a= line.find('epid=')
    b= line.find('&hash=')
    print(line[a:b])
    result.append(line)
于 2021-02-11T16:45:47.680 回答
0

使用旨在解析 URL 的库。

例子:

from urllib.parse import urlparse, parse_qs

URL='https://www.ebay.com/itm/Egyptian-Comfort-1800-Count-4-Piece-Bed-Sheet-Set-Deep-Pocket-Bed-Sheets/142436469971?epid=1760442729&hash=item2129e00cd3%3Ag%3A7gIAAOSw3YBdRVJd&_trkparms=%2526rpp_cid%253D601435485fceeb223c6f4511&var=442541824291'

url_component = urlparse(URL)
query_component = parse_qs(url_component.query)
epid_data = query_component['epid'][0]
print(f'epid = {epid_data}')
于 2021-02-11T17:22:23.280 回答