我想交叉检查两个 word 文档中的名称,然后在同一个程序中打印通用名称。我该怎么做?我是使用正则表达式还是只使用 in 函数?
问问题
18609 次
5 回答
14
从 Word 文档中获取文本后,这真的很容易:
document_1_text = 'This is document one'
document_2_text = 'This is document two'
document_1_words = document_1_text.split()
document_2_words = document_2_text.split()
common = set(document_1_words).intersection( set(document_2_words) )
unique = set(document_1_words).symmetric_difference( set(document_2_words) )
如果您不确定如何从 Word 文档中获取文本:
from win32com.client import Dispatch
def get_text_from_doc(filename):
word = Dispatch('Word.Application')
word.Visible = False
wdoc = word.Documents.Open(filename)
if wdoc:
return wdoc.Content.Text.strip()
于 2012-08-27T04:38:38.340 回答
6
str1 = "Hello world its a demo"
str2 = "Hello world"
str1_words = set(str1.split())
str2_words = set(str2.split())
common = str1_words & str2_words
输出:
common = {'Hello', 'world'}
于 2019-08-13T11:21:48.583 回答
0
您需要存储一个文档中的单词,然后通过第二个文档的单词检查每个单词是否在前一个文档中。所以,如果我有两个字符串而不是文档,我可以这样做:
a = "Hello world this is a string"
b = "Hello world not like the one before"
将单词存储在字符串中:
d = {}
for word in a.split():
d[word] = true
for word in b.split():
if d[word]:
print(word)
于 2012-08-13T17:40:02.177 回答
0
str1 = "Hello world its a demo"
str2 = "Hello world"
for ch in str1.split():
for ch2 in str2.split():
if ch == ch2:
print ch
于 2017-07-11T10:23:29.380 回答
0
刚上这个贴,没看到这个方法,所以我只想补充一下,你可以这样做:
from collections import Counter
foo = "This is a string"
bar = "This string isn't like the one before"
baz = Counter(foo.split(" ")) + Counter(bar.split(" "))
baz = sorted(baz, reverse=True, key=lambda x: x[1])
Baz现在是一个看起来像这样的字典
Counter({'This': 2, 'string': 2, 'is': 1, 'a': 1, "isn't": 1, 'like': 1, 'the': 1, 'one': 1, 'before': 1})
现在你可以看到这两个字符串有共同的“ This”和“string”
您还可以在对它们使用Counter( )之前使用.lower()将所有字符串(foo和bar)转换为小写,以便所有内容均等计算
于 2021-04-19T09:49:45.940 回答