1

我有一些示例代码,用于使用 Google 的自然语言 API 分析实体及其情绪。对于我的 Pandas 数据框中的每条记录,我想返回一个字典列表,其中每个元素都是一个实体。但是,我在尝试让它处理生产数据时遇到了问题。这是示例代码

from google.cloud import language_v1 # version 2.0.0
import os 
os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = 'path/to/json'
import pandas as pd 

# establish client connection
client = language_v1.LanguageServiceClient()

# helper function 
def custom_analyze_entity(text_content):
    global client
    #print("Accepted Input::" + text_content)
    document = language_v1.Document(content=text_content, type_=language_v1.Document.Type.PLAIN_TEXT, language = 'en')
    response = client.analyze_entity_sentiment(request = {'document': document})
    # a document can have many entities
    # create a list of dictionaries, every element in the list is a dictionary that represents an entity
    # the dictionary is nested
    l = []
    #print("Entity response:" + str(response.entities))
    for entity in response.entities:
        #print('=' * 20)
        temp_dict = {}
        temp_meta_dict = {}
        temp_mentions = {}
        temp_dict['name'] = entity.name
        temp_dict['type'] = language_v1.Entity.Type(entity.type_).name
        temp_dict['salience'] = str(entity.salience)
        sentiment = entity.sentiment
        temp_dict['sentiment_score'] = str(sentiment.score)
        temp_dict['sentiment_magnitude'] = str(sentiment.magnitude)
        for metadata_name, metadata_value in entity.metadata.items():
            temp_meta_dict['metadata_name'] = metadata_name
            temp_meta_dict['metadata_value'] = metadata_value
        temp_dict['metadata'] = temp_meta_dict
        for mention in entity.mentions:
            temp_mentions['mention_text'] = str(mention.text.content)
            temp_mentions['mention_type'] = str(language_v1.EntityMention.Type(mention.type_).name)
        temp_dict['mentions'] = temp_mentions
        #print(u"Appended Entity::: {}".format(temp_dict))
        l.append(temp_dict)
    return l

我已经在样本数据上对其进行了测试,并且效果很好

# works on sample data 
data= ['Grapes are good. Bananas are bad.', 'the weather is not good today', 'Michelangelo Caravaggio, Italian painter, is known for many arts','look i cannot articulate how i feel today but its amazing to be back on the field with runs under my belt.']
input_df = pd.DataFrame(data=data, columns = ['freeform_text'])

for i in range(len(input_df)):
    op = custom_analyze_entity(input_df.loc[i,'freeform_text'])
    input_df.loc[i, 'entity_object'] = op

但是当我尝试使用下面的代码通过生产数据解析它时,它会因多索引错误而失败。我无法使用示例熊猫数据框重现错误。

for i in range(len(input_df)):
    op = custom_analyze_entity(input_df.loc[i,'freeform_text'])
    input_df.loc[i, 'entity_object'] = op
... 
Traceback (most recent call last):
  File "<stdin>", line 3, in <module>
  File "/opt/conda/default/lib/python3.6/site-packages/pandas/core/indexing.py", line 670, in __setitem__
    iloc._setitem_with_indexer(indexer, value)
  File "/opt/conda/default/lib/python3.6/site-packages/pandas/core/indexing.py", line 1667, in _setitem_with_indexer
    "cannot set using a multi-index "
ValueError: cannot set using a multi-index selection indexer with a different length than the value
4

1 回答 1

2

尝试这样做:

input_df.loc[0, 'entity_object'] = ""
for i in range(len(input_df)):
    op = custom_analyze_entity(input_df.loc[i,'freeform_text'])
    input_df.loc[i, 'entity_object'] = op

或者对于您的特定情况,您不需要使用loc函数。

input_df["entity_object"] = ""
    for i in range(len(input_df)):
        op = custom_analyze_entity(input_df.loc[i,'freeform_text'])     
        input_df["entity_object"][i] = op
于 2020-10-22T12:01:15.443 回答