regex - 如何在 SparkNLP 中使用 RegexMatcher

问问题 2020-03-19T21:21:04.180

366 次

情况就是这样。我想用 Scala 内核在 Jupyterlab 上运行 SparkNLP。我想使用RegexMatcher注释。我将模式保存在一个名为patterns.txts3 存储桶的文件中。我尝试了下面的实现

import com.johnsnowlabs.nlp.util.io.ExternalResource
import com.johnsnowlabs.nlp.util.io.ReadAs.LINE_BY_LINE
val document = new DocumentAssembler().setInputCol("text").setOutputCol("document")
val regexmatcher = new RegexMatcher().
  setInputCols(Array("document")).
  setOutputCol("match").
  setStrategy("MATCH_ALL").
  setRules(ExternalResource("s3://bucket_name/patterns.txt", LINE_BY_LINE, Map("format" -> "text", "delimiter" -> " ")))
val pipeline_regex = new Pipeline().setStages(Array(document, regexmatcher))
val regex_match = pipeline_regex.fit(dev_data)
regex_match.transform(dev_data).select('match).show(false)

但是，它似乎根本不起作用，patterns.txt也没有使用。如何修复它。

regex - 如何在 SparkNLP 中使用 RegexMatcher

0 回答 0

Related

Reference