当尝试在长文本输入中查找实体时,Google Cloud 的自然语言程序会将单词组合在一起,然后获取它们不正确的实体。这是我的程序:
def entity_recognizer(nouns):
os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = "/Users/superaitor/Downloads/link"
text = ""
for words in nouns:
text += words + " "
client = language.LanguageServiceClient()
if isinstance(text, six.binary_type):
text = text.decode('utf-8')
document = types.Document(
content=text.encode('utf-8'),
type=enums.Document.Type.PLAIN_TEXT)
encoding = enums.EncodingType.UTF32
if sys.maxunicode == 65535:
encoding = enums.EncodingType.UTF16
entity = client.analyze_entities(document, encoding).entities
entity_type = ('UNKNOWN', 'PERSON', 'LOCATION', 'ORGANIZATION',
'EVENT', 'WORK_OF_ART', 'CONSUMER_GOOD', 'OTHER')
for entity in entity:
#if entity_type[entity.type] is "PERSON":
print(entity_type[entity.type])
print(entity.name)
这里的名词是一个单词列表。然后我把它变成一个字符串(我尝试了多种方法,都给出了相同的结果),但是程序会输出如下输出:
PERSON
liberty secularism etching domain professor lecturer tutor royalty
government adviser commissioner
OTHER
business view society economy
OTHER
business
OTHER
verge industrialization market system custom shift rationality
OTHER
family kingdom life drunkenness college student appearance income family
brink poverty life writer variety attitude capitalism age process
production factory system
关于如何解决这个问题的任何意见?