我想知道 API 端点是否有任何方式允许analyzeSyntax
API响应 JSON 不包含字典的子属性(partOfSpeech
如果它们是)*_UNKNOWN
?在查看文档输入的partOfSpeech
详细信息时,我找不到任何方法来限制.
这是只有在响应后清理数据时才会处理的事情吗?
每个 API 文档的示例查询在一个名为的文件中request.json
:
{
"encodingType": "UTF8",
"document": {
"type": "PLAIN_TEXT",
"content": "Google, headquartered in Mountain View, unveiled the new Android phone at the Consumer Electronic Show. Sundar Pichai said in his keynote that users love their new Android phones."
}
}
执行的命令:
curl "https://language.googleapis.com/v1/documents:analyzeSyntax?key=${API_KEY}" \
-s \
-X POST \
-H "Content-Type: application/json" \
--data-binary @request.json > response.json
回复样本:
{
"sentences": [
{
"text": {
"content": "Google, headquartered in Mountain View, unveiled the new Android phone at the Consumer Electronic Show.",
"beginOffset": 0
}
},
{
"text": {
"content": "Sundar Pichai said in his keynote that users love their new Android phones.",
"beginOffset": 105
}
}
],
"tokens": [
{
"text": {
"content": "Google",
"beginOffset": 0
},
"partOfSpeech": {
"tag": "NOUN",
"aspect": "ASPECT_UNKNOWN",
"case": "CASE_UNKNOWN",
"form": "FORM_UNKNOWN",
"gender": "GENDER_UNKNOWN",
"mood": "MOOD_UNKNOWN",
"number": "SINGULAR",
"person": "PERSON_UNKNOWN",
"proper": "PROPER",
"reciprocity": "RECIPROCITY_UNKNOWN",
"tense": "TENSE_UNKNOWN",
"voice": "VOICE_UNKNOWN"
},
"dependencyEdge": {
"headTokenIndex": 7,
"label": "NSUBJ"
},
"lemma": "Google"
},
{
"text": {
"content": ",",
"beginOffset": 6
},
"partOfSpeech": {
"tag": "PUNCT",
"aspect": "ASPECT_UNKNOWN",
"case": "CASE_UNKNOWN",
"form": "FORM_UNKNOWN",
"gender": "GENDER_UNKNOWN",
"mood": "MOOD_UNKNOWN",
"number": "NUMBER_UNKNOWN",
"person": "PERSON_UNKNOWN",
"proper": "PROPER_UNKNOWN",
"reciprocity": "RECIPROCITY_UNKNOWN",
"tense": "TENSE_UNKNOWN",
"voice": "VOICE_UNKNOWN"
},
"dependencyEdge": {
"headTokenIndex": 0,
"label": "P"
},
"lemma": ","
},
...
...
此响应 JSON 为819行,其中314行(接近响应的 40%!)是属性*_UNKNOWN
值partOfSpeech
。因此,完全无用,但显着增加了响应中的数据量。
该文档似乎没有提供可以帮助解决此问题的参数。我是否遗漏了什么,或者这个 API 是否不支持删除这些密钥的参数*_UNKNOWN
?这是否只能通过数据清理在响应后进行管理?