我正在使用 google-cloud/language api 进行#annotate 调用,并从我从各种在线资源中获取的评论的 csv 中分析实体和情绪。
首先,我要分析的字符串包括commentId,所以我重新格式化:
youtubez22htrtb1ymtdlka404t1aokg2kirffb53u3pya0,i just bot a Nostromo... ( ._.)
youtubez22oet0bruejcdf0gacdp431wxg3vb2zxoiov1da,Good Job Baby! MSI Propeller Blade Technology!
youtubez22ri11akra4tfku3acdp432h1qyzap3yy4ziifc,"exactly, i have to deal with that damned brick, and the power supply can't be upgraded because of it, because as far as power supply goes, i have never seen an external one on newegg that has more power then the x51's"
youtubez23ttpsyolztc1ep004t1aokg5zuyqxfqykgyjqs,"I like how people are liking your comment about liking the fact that Sky DID put Deadlox's channel in the description instead of Ryan's. Nice Alienware thing logo thing, btw"
youtubez12zjp5rupbcttvmy220ghf4ctqnerqwa04,"You know, If you actually made this. People would actually buy it."
因此它不包含任何评论 ID:
I just bot a Nostromo... ( ._.)
Good Job Baby! MSI Propeller Blade Technology!\n"exactly, i have to deal with that damned brick, and the power supply can't be upgraded because of it, because as far as power supply goes, i have never seen an external one on newegg that has more power then the x51's"
"I like how people are liking your comment about liking the fact that Sky DID put Deadlox's channel in the description instead of Ryan's. Nice Alienware thing logo thing, btw"
"You know, If you actually made this. People would actually buy it."
在发送对谷歌云/语言的请求以#annotate 文本后。我收到一个响应,其中包括各种子字符串的情绪和幅度。每个字符串还被赋予一个beginOffset
值,该值与原始字符串(请求中的字符串)中的字符串索引相关。
{ content: 'i just bot a Nostromo... ( ._.)\nGood Job Baby!',
beginOffset: 0 }
{ content: 'MSI Propeller Blade Technology!\n"exactly, i have to deal with that damned brick, and the power supply can't be upgraded because of it, because as far as power supply goes, i have never seen an external one on newegg that has more power then the x51's"\n"I like how people are liking your comment about liking the fact that Sky DID put Deadlox's channel in the description instead of Ryan's.',
beginOffset: 50 }
{ content: 'Nice Alienware thing logo thing, btw"\n"You know, If you actually made this.',
beginOffset: 462 }
然后我的目标是在原始字符串中找到原始注释,这应该足够简单。像(originalString[beginOffset])
......
这个值不正确!
我假设它们不包含某些字符,但我尝试了多种正则表达式,但似乎没有什么能完美运行。有谁知道可能导致问题的原因???