在关于用于监督分类的fasttext 的论文中,作者通过更改某些参数指定了各种数量的隐藏单元(h 是第 3,4 页上的那个 - 在表 1 中,您会看到“它有 10 个隐藏单元,我们在有和没有的情况下对其进行评估bigrams。”)但在阅读文档后,似乎没有“隐藏单元”参数可以更改。有没有办法指定隐藏单元的数量?或者这与指定 -dim 选项相同吗?
问问题
655 次
1 回答
0
k
是没有。类
来自https://arxiv.org/pdf/1607.01759v3.pdf的第 2.1 节
更准确地说,计算复杂度是 O(kh),其中 k 是类数,h 是文本表示的维度。
在预测文本分类中的类时,来自文档:
参数 k 是可选的,默认等于 1。为了获得一段文本的 k 个最可能的标签,请使用:
$ ./fasttext predict model.bin test.txt k
在训练模型__label__*
时,这在使用标签执行监督训练时在训练数据中隐式指定。
从示例教程:
$ wget https://s3-us-west-1.amazonaws.com/fasttext-vectors/cooking.stackexchange.tar.gz && tar xvzf cooking.stackexchange.tar.gz
--2017-05-23 09:03:26-- https://s3-us-west-1.amazonaws.com/fasttext-vectors/cooking.stackexchange.tar.gz
Resolving s3-us-west-1.amazonaws.com... 54.231.236.45
Connecting to s3-us-west-1.amazonaws.com|54.231.236.45|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 457609 (447K) [application/x-gzip]
Saving to: ‘cooking.stackexchange.tar.gz.1’
cooking.stackexchange.tar.gz.1 100%[================================================================>] 446.88K 385KB/s in 1.2s
2017-05-23 09:03:28 (385 KB/s) - ‘cooking.stackexchange.tar.gz.1’ saved [457609/457609]
x cooking.stackexchange.id
x cooking.stackexchange.txt
x readme.txt
$ cat readme.txt
The data in this archive is derived from the user-contributed content on the
Cooking Stack Exchange website (https://cooking.stackexchange.com/), used under
CC-BY-SA 3.0 (http://creativecommons.org/licenses/by-sa/3.0/).
The original data dump can be downloaded from:
https://archive.org/download/stackexchange/cooking.stackexchange.com.7z
and details about the dump obtained from:
https://archive.org/details/stackexchange
We distribute two files, under CC-BY-SA 3.0:
- cooking.stackexchange.txt, which contains all question titles and
their associated tags (one question per line, tags are prefixed by
the string "__label__") ;
- cooking.stackexchange.id, which contains the corresponding row IDs,
from the original data dump.
于 2017-05-23T01:05:56.683 回答