python - New Google Natural Language API

Question

I've recently used the Language API to gather sentiment predictions for a work project. I had about 1,300 unlabeled documents and we used NLTK's tools initially, which was based on a dictionary of terms with polarity estimates of each word in the dictionary. I turned to the API, and after reviewing the predictions, the API produced much better results than NLTK.

I understand that the engineers probably won't want to release the details of the prediction engine, but I am curious how it works at a high level. If anybody could enlighten me or point me in the right direction, I'd appreciate it. For example, "it uses a Neural Network, trained on billions of observations," would be a reasonable answer.

Again, I'm using this for a work project and I'd like to be able to give a brief justification of why I switched from NLTK to the API (the improved results should speak for themselves, but I will definitely get "well, how does it work?").

score 3 · Accepted Answer

The language API is a pipeline of state-of-the-art machine-learned systems that are trained on a combination of public data (like the Penn Treebank) and proprietary data annotated by Google's linguists.

Performance improvements compared to something like NLTK come from a combination of more and better data for training, as well as cutting edge machine learning algorithms, including but not limited to neural networks.

Related links that discuss some of the algorithms:

https://research.googleblog.com/2016/05/announcing-syntaxnet-worlds-most.html (Parsing algorithms)
https://www.wired.com/2016/11/googles-search-engine-can-now-answer-questions-human-help/ (Mentions the linguist team)
https://research.googleblog.com/2016/08/acl-2016-research-at-google.html (Recent publications from the NLP research team)

python - New Google Natural Language API

1 回答 1

Related

Reference