java - 匹配字节范围从注释到文本文档、Python 或 Java

Question

我正在使用 MPQA 意见语料库，其中注释和文档保存在单独的文件中。注释文件包含文档中的字符偏移量（字节跨度），
例如 850,861

string  GATE_direct-subjective   
expression-intensity="medium"
attitude-link="a4"
nested-source="w, patient" 
intensity="medium" 
polarity="negative"

如何将这些字节跨度匹配到文本文档中？我很感激任何想法！我更喜欢使用 Python，但 Java 中的解决方案也很好。

score 0 · Accepted Answer

I'm not 100% sure I'm understanding the question properly, but if you need a substring and you have character positions the solution is simple.

Python solution:

>>> sometext = "Grant D is a great guy."
>>> character_offset = [0, 7]
>>> subString = sometext[character_offset[0]:character_offset[1]]
>>> print subString
Grant D
>>>

java - 匹配字节范围从注释到文本文档、Python 或 Java

1 回答 1

Related

Reference