0

我正在努力从字典中捕获 IBM Watson 实体分析的结果。我想通过一个函数提取每个链接的情绪。我创建了一个函数来提取单个 url。但是试图存储结果的字典仅捕获最后一个 url 结果。我是 Python 新手,感谢任何帮助。

这是我的实体分析代码,

# function to process an URL
def processurl(url_to_analyze):
  # end point
  endpoint = f"{URL}/v1/analyze"

  # credentials
  username = "apikey"
  password = API_KEY

  # parameters
  parameters = {
      
      "version": "2020-08-01"
      
  }

  # headers
  headers = {
      "Content-Type":"application/json"
  }

  # watson options
  watson_options = {
      "url": url_to_analyze,
      "features": {
          "entities": {
              "sentiment": True,
              "emotion": True,
              "limit":10
          }
      }

  }

  # return
  response = requests.post(endpoint,
                           data=json.dumps(watson_options),
                           headers=headers,
                           params=parameters,
                           auth=(username,password)
                           )
  return response.json()

这是我创建的用于从上面传递结果的函数

# create a function to extract the entities from the result data
def getentitylist(data,threshold):
  result = []
  for entity in data["entities"]:
    relevance = float(entity["relevance"])
    if relevance > threshold:
      result.append(entity["text"])
  return result

遍历 URL 后,我似乎无法将结果存储在字典中,以便我可以将其传递给我的函数以获取实体结果

# method II: loop through news api urls and perform entity analysis and store it in a dictionary
entitydict = {}
for url in url_to_analyze:
  entitydict.update(processurl(url))
4

1 回答 1

0

我看不到你在哪里打电话getentitylist,但在你的 url 循环中

entitydict = {}
for url in url_to_analyze:
  entitydict.update(processurl(url))

update将根据键值更新字典。IE。这将覆盖字典中已经存在的任何键的值。因为您的回复将类似于:

{
  "usage": {
    "text_units": 1,
    "text_characters": 2708,
    "features": 1
  },
  "retrieved_url": "http://www.cnn.com/",
  "language": "en",
  "entities": [
    {
      "type": "Company",
      "text": "CNN",
      "sentiment": {
        "score": 0.0,
        "label": "neutral"
      },
      "relevance": 0.784947,
      "disambiguation": {
        "subtype": [
          "Broadcast",
          "AwardWinner",
          "RadioNetwork",
          "TVNetwork"
        ],
        "name": "CNN",
        "dbpedia_resource": "http://dbpedia.org/resource/CNN"
      },
      "count": 9
    }
  ]
}

将要更新的键位于顶层,即。usage, retrieved_url, retrieved_url, entities. 因此entitydict将仅包含最后一个 url 的响应,因为这些键的先前值将被覆盖。

您应该做的是使用 url 作为每个响应的关键。

entitydict = {}
for url in url_to_analyze:
  entitydict.update({url : processurl(url)})
于 2021-06-21T09:09:28.753 回答