我正在标记文本以搜索名词和形容词:
text = u"""Developed at the Vaccine and Gene Therapy Institute at the Oregon Health and Science University (OHSU), the vaccine proved successful in about fifty percent of the subjects tested and could lead to a human vaccine preventing the onset of HIV/AIDS and even cure patients currently on anti-retroviral drugs."""
nltk.pos_tag(nltk.word_tokenize(text))
这导致:
[('Developed', 'NNP'), ('at', 'IN'), ('the', 'DT'), ('Vaccine', 'NNP'), ('and', 'CC') , ('Gene', 'NNP'), ('Therapy', 'NNP'), ('Institute', 'NNP'), ('at', 'IN'), ('the', 'DT') , ('Oregon', 'NNP'), ('Health', 'NNP'), ('and', 'CC'), ('Science', 'NNP'), ('University', 'NNP') , ('(', 'NNP') , ('OHSU', 'NNP'), (')', 'NNP'), (',', ','), ('the', 'DT'), ('vaccine', 'NN'), ('proved', 'VBD'), ('successful', 'JJ') , ('in', 'IN'), ('about', 'IN'), ('fifty', 'JJ'), ('percent', 'NN'), ('of', 'IN') , ('the', 'DT'), ('subjects', 'NNS'), ('tested', 'VBD'), ('and', 'CC'), ('could', 'MD') , ('lead', 'VB'), ('to', 'TO'), ('a', 'DT'), ('human', 'NN'), ('vaccine', 'NN') , ('预防', 'VBG'), ('the', 'DT'), ('onset', 'NN'), ('of', 'IN'), ('HIV/AIDS', 'NNS' '), ('和','CC'), ('even', 'RB'), ('治愈', 'NN'), ('患者', 'NNS'), ('当前', 'RB'), ('on', 'IN'), ('anti-retroviral', 'JJ'), ('drugs', 'NNS'), ('.', '.')]
标记句子时是否有正确检测括号的内置方法?