0

此代码运行大量数据,但所有阿拉伯语单词都反向编写:

from bidi.algorithm import get_display
import os 
import matplotlib.pyplot as plt
from wordcloud import WordCloud
os.chdir("C:")
f = open('example.txt', 'r', encoding = 'utf-8')
data = arabic_reshaper.reshape(f.read())
WordCloud = WordCloud(font_path='arial',background_color='white', mode='RGB',width=2000,height=1000).generate(data)
plt.title("wordcloud")
plt.imshow(WordCloud)
plt.axis("off")
plt.show()

这是我的数据:

أحمد
خالد
سلمان
سليمان
عبدالله
عبدالرحمن
عبدالرحمن
خالد
صالح

最后这是我得到的:

WordCloud 与阿拉伯语反向

有人可以帮我解决吗?

4

3 回答 3

4

首先你需要导入 arabic_resharper 包然后使用 get_display 函数并将其传递给 wordcloud,如下所示:

from bidi.algorithm import get_display
import os
import matplotlib.pyplot as plt
from wordcloud import WordCloud
import arabic_reshaper # this was missing in your code

# os.chdir("C:")
f = open('example.txt', 'r', encoding='utf-8')
data = arabic_reshaper.reshape(f.read())
data = get_display(data) # add this line
WordCloud = WordCloud(font_path='arial', background_color='white',
                  mode='RGB', width=2000, height=1000).generate(data)
plt.title("wordcloud")
plt.imshow(WordCloud)
plt.axis("off")
plt.show()
于 2018-11-27T04:45:15.090 回答
1

更新

好的。刚刚创建了一个小的阿拉伯语包装器 ( ar_wordcloud) 来执行此操作。我希望它有所帮助。

$ pip install ar_wordcloud

from ar_wordcloud import ArabicWordCloud
awc = ArabicWordCloud(background_color="white")

t = 'أهلاً وسهلا، اللغة العربية جميلة'
wc = awc.from_text(t)

在此处输入图像描述


或者,这是另一个没有包装器的示例:

from collections import Counter

from wordcloud import WordCloud          # pip install wordcloud
import matplotlib.pyplot as plt          
# -- Arabic text dependencies
from arabic_reshaper import reshape      # pip install arabic-reshaper
from bidi.algorithm import get_display   # pip install python-bidi

rtl = lambda w: get_display(reshape(f'{w}'))

COUNTS = Counter("السلام عليكم ورحمة الله و بركاته السلام كلمة جميلة".split())
counts = {rtl(k):v for k, v in COUNTS.most_common(10)}

font_file = './NotoNaskhArabic-Regular.ttf' # download from: https://www.google.com/get/noto
wordcloud = WordCloud(font_path=font_file).generate_from_frequencies(counts)
plt.imshow(wordcloud, interpolation="bilinear")
plt.axis("off")
plt.show()

结果:

在此处输入图像描述

此外,这里有一个公关讨论:https ://github.com/amueller/word_cloud/pull/315

于 2020-05-15T07:39:33.217 回答
1

这是一个关于如何生成阿拉伯语 wordCloud 的好例子。

import arabic_reshaper
from bidi.algorithm import get_display


reshaped_text = arabic_reshaper.reshape(text)
bidi_text = get_display(reshaped_text)
wordcloud = WordCloud(font_path='NotoNaskhArabic-Regular.ttf').generate(bidi_text)
wordcloud.to_file("worCloud.png")

这是一个关于如何做到这一点的链接:Google colab

于 2020-06-08T12:30:54.867 回答