0

我有这个表达:

<a class="a-link-normal" href="https://www.amazon.it/Philips-GC8735-PerfectCare-Generatore-Vapore/dp/B01J5FGW66/ref=gbph_img_s-3_7347_c3de3e94?smid=A11IL2PNWYJU7H&amp;pf_rd_p=82ae57d3-a26a-4d56-b221-3155eb797347&amp;pf_rd_s=slot-3&amp;pf_rd_t=701&amp;pf_rd_i=gb_main&amp;pf_rd_m=A11IL2PNWYJU7H&amp;pf_rd_r=MDQJBKEMGBX38XMPSHXB" id="dealImage"></a>

我需要得到“/dp/”旁边的 10 个字母(B01J5FGW66)

我怎样才能做一个功能呢?

4

3 回答 3

2

使用正则表达式:

import re
s = '<a class="a-link-normal" href="https://www.amazon.it/Philips-GC8735-PerfectCare-Generatore-Vapore/dp/B01J5FGW66/ref=gbph_img_s-3_7347_c3de3e94?smid=A11IL2PNWYJU7H&amp;pf_rd_p=82ae57d3-a26a-4d56-b221-3155eb797347&amp;pf_rd_s=slot-3&amp;pf_rd_t=701&amp;pf_rd_i=gb_main&amp;pf_rd_m=A11IL2PNWYJU7H&amp;pf_rd_r=MDQJBKEMGBX38XMPSHXB" id="dealImage"></a>'
print(re.search(r"dp\/([A-Za-z0-9]{10})\/", s)[1])

输出:B01J5FGW66

解释:

开始于"dp/"

dp\/ 

捕获由 () 分隔的组,匹配 10 个(通过 {10})小写字母 (az)、大写字母 (AZ) 和数字 (0-9):

([A-Za-z0-9]{10})

结束于"/"

\/

使用re.search我们可以在您的字符串中搜索该表达式s并访问第一个捕获组的结果[1]

请注意,如果找不到匹配项,您可能需要添加额外的代码:

m = re.search(r"dp\/([A-Za-z0-9]{10})\/", s)
if m is not None:
    print(m[1])
else:
    # if nothing is found, search return None
    print("No match")
于 2018-09-07T15:03:26.993 回答
0

我假设您总是只想要 dp (下一条路线)旁边的斜线之间的内容,并且这 10 个字符有点无关紧要。有点笨拙,但这有效:

>>> x = '<a class="a-link-normal" href="https://www.amazon.it/Philips-GC8735-PerfectCare-Generatore-Vapore/dp/B01J5FGW66/ref=gbph_img_s-3_7347_c3de3e94?smid=A11IL2PNWYJU7H&amp;pf_rd_p=82ae57d3-a26a-4d56-b221-3155eb797347&amp;pf_rd_s=slot-3&amp;pf_rd_t=701&amp;pf_rd_i=gb_main&amp;pf_rd_m=A11IL2PNWYJU7H&amp;pf_rd_r=MDQJBKEMGBX38XMPSHXB" id="dealImage"></a>'
>>> splits = x.split("/")
>>> dp_index = splits.index('dp')
>>> result = splits[dp_index+1] # Get the next one over
>>> result
'B01J5FGW66'

把它放在一个功能中,你可以这样做:

def get_route_next_to_dp(html_str):
    splits = html_str.split("/")
    dp_index = splits.index('dp')
    result = splits[dp_index+1] # Get the next one over
    return result

用法可能如下所示:

html_str = '<a class="a-link-normal" href="https://www.amazon.it/Philips-GC8735-PerfectCare-Generatore-Vapore/dp/B01J5FGW66/ref=gbph_img_s-3_7347_c3de3e94?smid=A11IL2PNWYJU7H&amp;pf_rd_p=82ae57d3-a26a-4d56-b221-3155eb797347&amp;pf_rd_s=slot-3&amp;pf_rd_t=701&amp;pf_rd_i=gb_main&amp;pf_rd_m=A11IL2PNWYJU7H&amp;pf_rd_r=MDQJBKEMGBX38XMPSHXB" id="dealImage"></a>'
route_next_to_dp = get_route_next_to_dp(html_str)
print(route_next_to_dp)

输出

'B01J5FGW66'

如预期的。

于 2018-09-07T15:08:51.563 回答
0

试试这个:它基本上使用正则表达式并计算接下来的 10 个字符串并检查是否找到。

import re
my_string='<a class="a-link-normal" href="https://www.amazon.it/Philips-GC8735-PerfectCare-Generatore-Vapore/dp/B01J5FGW66/ref=gbph_img_s-3_7347_c3de3e94?smid=A11IL2PNWYJU7H&amp;pf_rd_p=82ae57d3-a26a-4d56-b221-3155eb797347&amp;pf_rd_s=slot-3&amp;pf_rd_t=701&amp;pf_rd_i=gb_main&amp;pf_rd_m=A11IL2PNWYJU7H&amp;pf_rd_r=MDQJBKEMGBX38XMPSHXB" id="dealImage"></a>'
m = re.search(r"dp\/([A-Za-z0-9]{10})\/", my_string)
if m.group(1):
    print(m.group(1))
于 2018-09-07T15:18:41.163 回答