nlp - 您如何确保节 CoreNLPClient 的可行端点？

Question

我想使用 stanza CoreNLPClient 来提取名词短语，类似于这个方法。

但是，我似乎找不到一个好的端口来启动服务器。默认为 9000，但经常被占用，如错误消息所示：

PermanentlyFailedException：错误：无法在端口 9000 上启动 CoreNLP 服务器（可能那里已经运行了某些东西）

编辑：python.exe 正在使用端口 9000，这就是为什么我不能关闭进程以为 CoreNLPClient 腾出空间。

然后，当我选择其他端口，例如 7999、8000 或 8080 时，服务器一直在无限监听，不执行连续的代码行，只显示以下内容：

2021-07-19 12:05:55 信息：使用命令启动服务器：java -Xmx8G -cp C:\Users\timjo\stanza_corenlp* edu.stanford.nlp.pipeline.StanfordCoreNLPServer -port 7998 -timeout 60000 -threads 5 - maxCharLength 100000 -quiet True -serverProperties corenlp_server-2e15724b8064491b.props -preload -outputFormat 序列化

我安装了最新版本的节，并且正在 VS Code 中的 .ipynb 文件中运行以下代码：

# sample sentence
sentence = "Albert Einstein was a German-born theoretical physicist." 

# start the client as indicated in the docs
with CoreNLPClient(properties='corenlp_server-2e15724b8064491b.props', endpoint='https://localhost:7998', memory='8G', be_quiet=True) as client:
     matches = client.tregex(text=sentence, pattern = 'NP')

# extract the noun phrases and their indices
noun_phrases = [[text, begin, end] for text, begin, end in
     zip([sentence[match_id]['spanString'] for sentence in matches['sentences'] for match_id in sentence],
         [sentence[match_id]['characterOffsetBegin'] for sentence in matches['sentences'] for match_id in sentence],
         [sentence[match_id]['characterOffsetEnd'] for sentence in matches['sentences'] for match_id in sentence])]

主要问题：如何确保服务器在打开的端口上启动，然后关闭？ 我希望有一种半自动的方式来查找打开/关闭占用的端口以供客户端运行。

score 1 · Accepted Answer

一般来说，选择另一个没有其他用途的数字就足够了——也许是 9017？有很多数字可供选择！但更谨慎的选择是使用 try/catch 在 while 循环中创建 CoreNLPClient 并增加端口号，直到找到打开的端口号。

score 0 · Accepted Answer

经过 2 个小时的研究，我现在知道以下内容：

考虑到 python 使用端口 9000 ，因此不能选择端口。非正式证据表明，这与使用 jupyter notebook 而不是“常规”python .py 文件有关。
关于客户端在使用其他端点时不关闭：我应该简单地使用http://localhost:port'而不是https://....

希望这可以帮助其他人解决这个问题。我想这是我的非计算机科学背景渗透出来的。

（编辑以解决拼写错误）

nlp - 您如何确保节 CoreNLPClient 的可行端点？

2 回答 2

Related

Reference