4

在关于用于监督分类的fasttext 的论文中,作者通过更改某些参数指定了各种数量的隐藏单元(h 是第 3,4 页上的那个 - 在表 1 中,您会看到“它有 10 个隐藏单元,我们在有和没有的情况下对其进行评估bigrams。”)但在阅读文档后,似乎没有“隐藏单元”参数可以更改。有没有办法指定隐藏单元的数量?或者这与指定 -dim 选项相同吗?

4

1 回答 1

0

k是没有。类

来自https://arxiv.org/pdf/1607.01759v3.pdf的第 2.1 节

更准确地说,计算复杂度是 O(kh),其中 k 是类数,h 是文本表示的维度。


在预测文本分类中的类时,来自文档

参数 k 是可选的,默认等于 1。为了获得一段文本的 k 个最可能的标签,请使用:

$ ./fasttext predict model.bin test.txt k


在训练模型__label__*时,这在使用标签执行监督训练时在训练数据中隐式指定。

示例教程

$ wget https://s3-us-west-1.amazonaws.com/fasttext-vectors/cooking.stackexchange.tar.gz && tar xvzf cooking.stackexchange.tar.gz
--2017-05-23 09:03:26--  https://s3-us-west-1.amazonaws.com/fasttext-vectors/cooking.stackexchange.tar.gz
Resolving s3-us-west-1.amazonaws.com... 54.231.236.45
Connecting to s3-us-west-1.amazonaws.com|54.231.236.45|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 457609 (447K) [application/x-gzip]
Saving to: ‘cooking.stackexchange.tar.gz.1’

cooking.stackexchange.tar.gz.1      100%[================================================================>] 446.88K   385KB/s    in 1.2s    

2017-05-23 09:03:28 (385 KB/s) - ‘cooking.stackexchange.tar.gz.1’ saved [457609/457609]

x cooking.stackexchange.id
x cooking.stackexchange.txt
x readme.txt


$ cat readme.txt 
The data in this archive is derived from the user-contributed content on the
Cooking Stack Exchange website (https://cooking.stackexchange.com/), used under
CC-BY-SA 3.0 (http://creativecommons.org/licenses/by-sa/3.0/).

The original data dump can be downloaded from:
https://archive.org/download/stackexchange/cooking.stackexchange.com.7z
and details about the dump obtained from:
https://archive.org/details/stackexchange

We distribute two files, under CC-BY-SA 3.0:

 - cooking.stackexchange.txt, which contains all question titles and
   their associated tags (one question per line, tags are prefixed by
   the string "__label__") ;

 - cooking.stackexchange.id, which contains the corresponding row IDs,
   from the original data dump.
于 2017-05-23T01:05:56.683 回答