问题标签 [natural-language-processing]

问问题

For questions regarding programming in ECMAScript (JavaScript/JS) and its various dialects/implementations (excluding ActionScript). Note JavaScript is NOT the same as Java! Please include all relevant tags on your question; e.g., [node.js], [jquery], [json], [reactjs], [angular], [ember.js], [vue.js], [typescript], [svelte], etc.

130 问题

0 投票

1 回答

183 浏览

deep-learning - how does LSTM and GRU gates decide which word to keep in the memory

the update gate in a GRU decides which word to keep in the cell or to be clear what is the cell state. how does the update gate in gru decide when to be close to 1 and when to be close to 0? Basically, how does it decide to keep a word and not to keep a word? Thanks

2018-10-02T12:30:56.587

0 投票

3 回答

652 浏览

python - 用于在文档中查找有意义的词对的 Python 工具

我正在编写一个程序，该程序从 Twitter 收集推文，并评估文本以找到热门话题。我计划使用 NLTK 来阻止条款并对数据进行一些其他操作。

我需要的是一个工具，它可以确定推文中的两个相邻单词是否应该被视为一个词。例如，如果“假新闻”在 Twitter 上流行，我不想将这两个词视为不同的词。另一个例子，如果每个人都在推特上谈论“计算机科学”，那么将计算机和科学视为两个不同的术语是没有意义的，因为它们指的是同一个主题。是否存在可以找到此类术语的工具？

python nltk natural-language-processing

2018-10-09T21:27:08.080

0 投票

0 回答

82 浏览

python - 无监督机器翻译 Facebook 研究

我对来自 facebook 研究的这个模型有疑问https://github.com/facebookresearch/UnsupervisedMT

我想修改训练过程，但要做到这一点，我需要更好地理解代码。特别是在文件UnsupervisedMT/NMT/src/trainer.py 里面def enc_dec_step的第 472 行

我无法理解解码器函数到底在做什么，以及为什么它需要sent2作为参数。我的意思是，我认为 self.decoder() 是一个函数，它从语言 1 获取句子 1 的编码状态，以在语言 2 的整个词汇表 (lang2_id) 上输出激活的二维张量，并且执行次数与它输出的句子的单词数，因此输出应该是大小的张量（输出句子的长度）x（词汇中的单词数），但我不明白为什么它甚至需要成对的句子（sent2）要做到这一点。

无论如何，这只是猜测，重点是我想详细看看这个函数到底在做什么，但是我对编码仍然不是很好，所以我找不到这个函数是在哪里定义的。

据我了解，它可能是它正在初始化一个 TransformerDecoder 的实例（它取决于设置），但即使是这样，我也不知道如何理解实际发生的事情，而且它似乎我认为这没有任何意义。

任何人都可以帮忙吗？

python machine-learning machine-translation natural-language-processing

2018-10-09T21:33:15.570

0 投票

1 回答

297 浏览