-3

如何使用 Python 和 BeautifulSoup(或 lxml / XPath 或其他方式)从(url)中提取字体名称“Open Sans”和两个链接?

<style>
...

    @font-face {
        font-family: "Open Sans";
        src: url("/fonts/OpenSans-Regular-webfont.woff2") format("woff2"),
             url("/fonts/OpenSans-Regular-webfont.woff") format("woff");
    }

...
</style>

在此先感谢您的帮助!

4

1 回答 1

1
^\s*font-family:\s*"(.*)";$|^.*\surl\("(.*?)"\).*$
# coding=utf8
# the above tag defines encoding for this document and is for Python 2.x compatibility

import re

regex = r"^\s*font-family:\s*\"(.*)\";$|^.*\surl\(\"(.*?)\"\).*$"

test_str = ("<style>\n"
    "...\n\n"
    "    @font-face {\n"
    "        font-family: \"Open Sans\";\n"
    "        src: url(\"/fonts/OpenSans-Regular-webfont.woff2\") format(\"woff2\"),\n"
    "             url(\"/fonts/OpenSans-Regular-webfont.woff\") format(\"woff\");\n"
    "    }\n\n"
    "...\n"
    "</style>")

matches = re.finditer(regex, test_str, re.MULTILINE)

for matchNum, match in enumerate(matches, start=1):

    print ("Match {matchNum} was found at {start}-{end}: {match}".format(matchNum = matchNum, start = match.start(), end = match.end(), match = match.group()))

    for groupNum in range(0, len(match.groups())):
        groupNum = groupNum + 1

        print ("Group {groupNum} found at {start}-{end}: {group}".format(groupNum = groupNum, start = match.start(groupNum), end = match.end(groupNum), group = match.group(groupNum)))

# Note: for Python 2.7 compatibility, use ur"" to prefix the regex and u"" to prefix the test string and substitution.

这里

于 2019-12-28T13:43:06.340 回答