我运行 TWE 模型的源代码。我需要编译python的C扩展。我已经为 Python 2.7 和 Cython 安装了 Microsoft Visual C++ 编译器。
首先,我需要运行 TWE/train.py:
import gensim
sentence_word = gensim.models.word2vec.LineSentence("tmp/word.file")
print "Training the word vector..."
w = gensim.models.Word2Vec(sentence_word,size=400, workers=20)
sentence = gensim.models.word2vec.CombinedSentence("tmp/word.file","tmp/topic.file")
print "Training the topic vector..."
w.train_topic(topic_number, sentence)
print "Saving the topic vectors..."
w.save_topic("output/topic_vector.txt")
print "Saving the word vectors..."
w.save_wordvector("output/word_vector.txt")`
二、TWE/gensim/models/wor2vec.py:
try:
raise ImportError # ignore for now
from gensim_addons.models.word2vec_inner import train_sentence_sg,train_sentence_cbow, FAST_VERSION, train_sentence_topic
except ImportError:
try:
import pyximport
print 'import pyximport'
models_dir = os.path.dirname(__file__) or os.getcwd()
print 'models_dir'
pyximport.install(setup_args={"include_dirs": [models_dir, get_include()]})
print 'pyximport' # is the follow code's problem
from word2vec_inner import train_sentence_sg, train_sentence_cbow,
FAST_VERSION, train_sentence_topic
print 'from word2vec'
except:
FAST_VERSION = -1
def train_sentence_sg(model, sentence, alpha, work=None):
...
def train_sentence_cbow(model, sentence, alpha, work=None, neu1=None):
...
class Word2Vec(utils.SaveLoad):
...
def train(self, sentences, total_words=None, word_count=0, chunksize=100):
if FAST_VERSION < 0:
import warnings
warnings.warn("Cython compilation failed, training will be slow. Do you have Cython installed? `pip install cython`")
logger.info("training model with %i workers on %i vocabulary and %i features, "
"using 'skipgram'=%s 'hierarchical softmax'=%s 'subsample'=%s and 'negative sampling'=%s" %
(self.workers, len(self.vocab), self.layer1_size, self.sg, self.hs, self.sample, self.negative))
def worker_train():
...
if self.sg:
job_words = sum(train_sentence_topic(self, sentence, alpha, work) for sentence in job)
else:
ob_words = sum(train_sentence_cbow(self, sentence, alpha, work, neu1) for sentence in job)`
...
第三,我已经用 setup.py 编译了 TWE/gensim/models/word2vec_inner.pyx:
from distutils.core import setup
from distutils.extension import Extension
from Cython.Build import cythonize
import numpy
extensions = [
Extension("word2vec_inner", ["word2vec_inner.pyx"],
include_dirs=[numpy.get_include()])
]
setup(
name="word2vec_inner",
ext_modules=cythonize(extensions),
)
通过使用命令“python setup.py install”,我已经编译了 word2vec_inner.pyx。但出现以下错误:
E:\Python27\python.exe D:/pycharm/TWE/TWE1/train.py wordmap.txt model-final.tassign 100
import pyximport
models_dir
pyximport
word2vec_inner.c
e:\python27\lib\site-packages\numpy\core\include\numpy\npy_1_7_deprecated_api.h(12) : Warning Msg : Using deprecated NumPy API, disable it by #defining NPY_NO_DEPRECATED_API NPY_1_7_API_VERSION
C:\Users\hp\.pyxbld\temp.win32-2.7\Release\pyrex\gensim\models\word2vec_inner.c(15079) : warning C4244:
'initializing' : conversion from 'double' to 'float', possible loss of data
C:\Users\hp\.pyxbld\temp.win32-2.7\Release\pyrex\gensim\models\word2vec_inner.c(15085) : warning C4244 : 'initializing' : conversion from 'double' to 'float', possible loss of data
LINK : fatal error LNK1104: cannot open file 'C:\Users\hp\.pyxbld\lib.win32-2.7\gensim\models\word2vec_inner.pyd'
Training the word vector...
D:\pycharm\TWE\TWE1\gensim\models\word2vec.py:410: UserWarning: Cython compilation failed, training will be slow. Do you have Cython installed? `pip install cython`
warnings.warn("Cython compilation failed, training will be slow. Do you have Cython installed? `pip install cython`")
PROGRESS: at 100.00% words, alpha 0.02500, 2556 words/s
Training the topic vector...
D:\pycharm\TWE\TWE1\gensim\models\word2vec.py:882: UserWarning: Cython compilation failed, training will be slow. Do you have Cython installed? `pip install cython`
warnings.warn("Cython compilation failed, training will be slow. Do you have Cython installed? `pip install cython`")
Exception in thread Thread-23:
Traceback (most recent call last):
File "E:\Python27\lib\threading.py", line 801, in __bootstrap_inner
self.run()
File "E:\Python27\lib\threading.py", line 754, in run
self.__target(*self.__args, **self.__kwargs)
File "D:\pycharm\TWE\TWE1\gensim\models\word2vec.py", line 909, in worker_train
job_words = sum(train_sentence_topic(self, sentence, alpha, work) for sentence in job)
File "D:\pycharm\TWE\TWE1\gensim\models\word2vec.py", line 909, in <genexpr>
job_words = sum(train_sentence_topic(self, sentence, alpha, work) for sentence in job)
NameError: global name 'train_sentence_topic' is not defined
Saving the topic vectors...
Saving the word vectors...
Process finished with exit code 0
我检查了 .pyx 文件是否已正确编译并且还安装了 cython。总之,它无法从 gensim/models/word2vec_inner 或 gensim_addons/models/word2vec_inner 导入 train_sentence_sg、train_sentence_cbow、FAST_VERSION、train_sentence_topic。所以就出现了这些问题。但为什么?我已经在两个方向上正确编译了 .pyx 文件。任何人都可以帮助我吗?这个问题困扰了我好几天。请帮帮我,谢谢!