问题标签 [word2vec]

问问题

For questions regarding programming in ECMAScript (JavaScript/JS) and its various dialects/implementations (excluding ActionScript). Note JavaScript is NOT the same as Java! Please include all relevant tags on your question; e.g., [node.js], [jquery], [json], [reactjs], [angular], [ember.js], [vue.js], [typescript], [svelte], etc.

2111 问题

0 投票

1 回答

16938 浏览

python - “文件”对象没有属性“rfind”

我正在尝试将 word2vec 保存到文件中。

我在 genericpath.py 中收到以下错误

我哪里错了？

python word2vec

2014-05-24T11:29:06.733

0 投票

2 回答

14146 浏览

nlp - word2vec lemmatization of corpus before training

Word2vec seems to be mostly trained on raw corpus data. However, lemmatization is a standard preprocessing for many semantic similarity tasks. I was wondering if anybody had experience in lemmatizing the corpus before training word2vec and if this is a useful preprocessing step to do.

2014-05-26T20:35:36.227

0 投票

0 回答

399 浏览

python - 使用 Python 读取 Mikolov 训练好的词向量

我正在尝试在这里读取二进制文件。该文件包含由 Mikolov 在word2vec程序中训练的单词表示，格式如下：

前 12 个字节包含字符串：“3000000 300\n”

后续字节：“<1st variable word string>[space]<4*300 bytes to form 300 dimension float vector> [May be something there] <2nd word>....<3000000th word>[space]<4*300字节>"

使用此C代码：

我可以读取每个单词存储在buff和相应的向量存储在M. 但是当我在Python这个测试代码中尝试相同的策略时：

它产生结果：

3000000 300

</s>真的

在真

为真

; 错误的

显然是错误的，因为第三个词必须是that。我无法弄清楚我在这里做错了什么！

python c++word2vec

2014-07-25T10:48:43.033

0 投票

6 回答

17577 浏览

python - python word2vec没有安装

我一直在尝试使用我的 Python2.7 解释器在我的 Windows 7 机器上安装 word2vec：https ://github.com/danielfrg/word2vec

我尝试setup.py从解压缩的目录下载 zip & running python install 并运行pip install. 但是在这两种情况下，它都会返回以下错误：

访问似乎有问题subprocess.call()，所以经过一番谷歌搜索后，我设法将shell=Trueword2vec 添加到该行setup.py，然后抛出此错误：

老实说，我什至不确定我应该从这里去哪里。我还尝试安装 make 并将路径变量设置为安装中的 .exe 文件，任何建议将不胜感激，谢谢。

更新：

虽然 word2vec 模块无法运行一个名为的包genism似乎运行良好，但它也有一些很棒的其他 NLP 功能http://radimrehurek.com/gensim/

python pip gnuwin32 word2vec

2014-09-03T11:20:13.293

0 投票

1 回答

687 浏览

python - 保持对象在 python 的另一个程序中使用它们

我正在使用 word2vec 来计算两个单词之间的相似度。所以对于我使用 GoogleNews 的模型。该模型非常庞大，因此需要大量时间来加载。

我想加载它并保存在一个变量/对象中，这样每当我运行 python 程序时，我应该能够调用

如何做到这一点？任何想法？

python object python-3.x word2vec

2014-09-11T08:30:50.937

0 投票

0 回答

725 浏览

python - 将 word2vc 数据文件读取到 python 时出现 MemoryError

我正在尝试在 Windows 7 中使用 word2vec。我有 24GB 的 RAM 和 i7 处理器，并且我使用的是 64 位 python。我正在尝试遵循 Radim 的教程。我想访问 word2vec 原始页面提供的 google 30 亿文件中的向量。当我运行该行时：

我收到以下错误：

我不知道如何解决这个问题，因为文件只有 1.3GB，而且我有足够的可用内存空间。

python memory word2vec

2014-09-12T00:31:16.720

0 投票

1 回答

2938 浏览

nlp - 为什么以这种方式计算 gensim.word2vec 中两个词袋之间的相似度？

这是我从 gensim.word2Vec 中摘录的代码，我知道两个单词的相似度可以通过余弦距离来计算，但是两个单词集呢？该代码似乎使用每个 wordvec 的平均值，然后计算两个平均向量的余弦距离。我对word2vec知之甚少，这样的过程是否有一些基础？

nlp gensim word2vec

2014-09-24T07:08:51.947

0 投票

3 回答

16043 浏览

machine-learning - Word2Vec：维数

我正在将 Word2Vec 与大约 11,000,000 个标记的数据集一起使用，希望同时进行单词相似性（作为下游任务的同义词提取的一部分），但我不知道我应该在 Word2Vec 中使用多少维度。有没有人根据令牌/句子的数量对要考虑的维度范围有很好的启发式方法？

machine-learning nlp word2vec

2014-10-26T02:45:00.540

0 投票

2 回答

3456 浏览

text - 如何使用单词的向量表示（从 Word2Vec 等获得）作为分类器的特征？

我熟悉使用 BOW 特征进行文本分类，其中我们首先找到语料库的词汇量大小，它成为我们特征向量的大小。对于每个句子/文档，以及它的所有组成词，我们然后根据该词在该句子/文档中的缺席/存在来放置 0/1。

但是，既然我正在尝试使用每个单词的向量表示，那么创建全局词汇表是否必不可少？

text vector nlp text-classification word2vec

2014-10-26T03:45:28.250

0 投票

3 回答

17534 浏览

python - Load PreComputed Vectors Gensim

I am using the Gensim Python package to learn a neural language model, and I know that you can provide a training corpus to learn the model. However, there already exist many precomputed word vectors available in text format (e.g. http://www-nlp.stanford.edu/projects/glove/). Is there some way to initialize a Gensim Word2Vec model that just makes use of some precomputed vectors, rather than having to learn the vectors from scratch?

Thanks!

python nlp gensim word2vec

2014-11-26T01:35:41.743

1 2 3 4 5 6 7 8 9 10

问题标签 [word2vec]

Reference