python - 美丽的汤：从右到左的文字

Question

现在我正在为这个页面获取美丽汤 4 中的所有段落标签：

<p class="MsoNormal" style="text-align: center"><b>
                            <span lang="EN-US" style="font-family: Arial; color: blue">
                            <font size="4">1 </font></span>
                            <span lang="AR-SA" dir="RTL" style="font-family: Arial; color: blue">
                            <font size="4">&#1600;</font></span><span lang="EN-US" style="font-family: Arial; color: blue"><font size="4"> 
                            с&#1199;р&#1241; фати&#1211;&#1241;</font></span></b></p>

我正在尝试获取 2 个字体标签中的内容，但是文本与右侧对齐。我认为这与 dir="RTL"

但我想要从左到右。

score 0 · Accepted Answer

您可以尝试以下方法：

for elem in soup.findAll('font'):
    print elem.text.strip()

这是因为您获得的 unicode 字符串包含多个属于Separator, Space Category [Zs]. 你可以自己看看：

import unicodedata

for c in elem.text:
    print unicodedata.category(c),

python - 美丽的汤：从右到左的文字

1 回答 1

Related

Reference