python - 为条件频率分布创建标记和文本元组

Question

我想创建一个表格，显示某些单词在 3 个文本中的频率，而文本是列，单词是行。

在表格中，我想查看哪个单词在哪个文本中出现的频率。

这些是我的文字和文字：

texts = [text1, text2, text3]
words = ['blood', 'young', 'mercy', 'woman', 'man', 'fear', 'night', 'happiness', 'heart', 'horse']

为了创建条件频率分布，我想创建一个看起来像 lot = [('text1', 'blood'), ('text1', 'young'), ... ('text2' ， '血液'）， ...）

我试图创造很多这样的：

lot = [(words, texte)
    for word in words
    for text in texts]

而不是 lot = ('text1', 'blood') etc. 而不是 'text1' 是列表中的整个文本。

如何创建用于条件频率分布函数的元组列表？

score 0 · Accepted Answer

希望我正确理解了您的问题。我认为您正在将变量“单词”和“文本”分配给他们自己的元组。

尝试以下操作：

texts = [text1, text2, text3]
words = ['blood', 'young', 'mercy', 'woman', 'man', 'fear', 'night', 'happiness', 'heart', 'horse']
lot = [(word, text)
for word in words
for text in texts]

编辑：因为变化是如此微妙，我应该详细说明一下。在您的原始代码中，您将“单词”和“文本”都设置为它们自己的元组，即您分配的是整个数组而不是数组的每个元素。

score 0 · Accepted Answer

我认为这个嵌套列表理解可能是你想要做的？

lot = [(word, 'text'+str(i))
    for i,text in enumerate(texts)
    for word in text.split()
    if word in words]

但是，您可能要考虑使用 aCounter代替：

from collections import Counter
counts = {}
for i, text in enumerate(texts):
   C = Counter(text.split())
   for word in words:
      if word in C:
         counts[word]['text'+str(i)] = C[word]
      else: 
         counts[word]['text'+str(i)] = 0

python - 为条件频率分布创建标记和文本元组

2 回答 2

Related

Reference