“unsupervised-learning”的相关标签问题

0 投票

2 回答

46 浏览

neural-network - 如何在时间序列数据中一起识别移动点

我有一个时间序列的点，即定期从某个 api 获取 x 和 y 坐标，我想弄清楚哪些点在查看它们的 x 和 y 坐标时实际上是一起移动的。有人可以给我一个关于这个问题的起点，我应该选择 KMeans 还是一些监督学习算法。

2015-04-08T16:42:15.623

0 投票

2 回答

13797 浏览

nlp - 什么是远程监督？

根据我的理解，远程监督是指定段落中的单个单词（通常是句子）试图传达的概念的过程。

例如，数据库维护结构化关系concerns( NLP, this sentence).

我们的远程监督系统将输入以下句子："This is a sentence about NLP."

基于这个句子，它将识别实体，因为作为预处理步骤，该句子将通过命名实体识别器NLP& this sentence。

由于我们的数据库有它NLP并且this sentence通过它的键相关联，concern(s)因此将输入句子识别为表达关系Concerns(NLP, this sentence)。

我的问题有两个：

1）那有什么用？是不是以后我们的系统可能会在“狂野”中看到一个句子，例如That sentence is about OPP并意识到它看到了与之前类似的东西，从而实现了新的关系，例如concerns(OPP, that sentence).，仅基于单词/单个标记？

2）是否考虑到句子的实际单词？例如，动词“是”和副词“关于”，意识到（通过 WordNet 或其他一些下位词系统）这在某种程度上类似于高阶概念“关注”？

有没有人有一些代码用于生成我可以查看的远程监督系统，即交叉引用 KB（如 Freebase）和语料库（如 NYTimes）并生成远程监督数据库的系统？我认为这将大大有助于澄清我对远程监督的概念。

nlp stanford-nlp supervised-learning unsupervised-learning

2015-04-11T08:29:40.360

0 投票

1 回答

179 浏览

machine-learning - 聚类标签的无偏预测

我有兴趣评估通过无监督聚类发现的聚类标签的可预测性。假设我有一个由患者组成的数据集，并且我使用无监督聚类技术根据他们的基因表达谱对他们进行分组。我的方法发现了 4 个集群。现在，我想知道这个集群成员是否可以从表达式数据中预测出来。使用全数据无监督集群标签作为输出变量，我以交叉验证的方式训练监督分类器。因此，我使用 80% 的数据训练分类器，并评估另外 20% 的准确度。

这种方法是否有偏差，因为输出集群标签是从完整数据中学习的？如果是，我怎样才能以公正的方式做到这一点？如果我以交叉验证的方式进行聚类，我认为我需要手动关联每个不同折叠之间的聚类。由于我对四个集群中的一个集群与其他集群的可预测性特别感兴趣，因此我必须通过某种手动分析找出数据的每个折叠集群中哪个集群。

machine-learning cluster-analysis prediction supervised-learning unsupervised-learning

2015-04-15T09:02:06.830

0 投票

1 回答

941 浏览

machine-learning - calculating similarity between two profiles for number of common features

I am working on a clustering problem of social network profiles and each profile document is represented by number of times the 'term of interest occurs' in the profile description. To do clustering effectively, I am trying to find the correct similarity measure (or distance function) between two of the profiles.

So lets say I have following table of profiles

Now, going by calculating euclidean distance, I get

Now, this is fine, but there are two questions coming to my mind

Here we are disregarding number of features that are common, for example, even though profile 1 and profile 3 are nearest, going by human intuition, profile 1 and profile 2 at least have some value in all three interests -basketball, cricket and python and hence these two profiles likely be more similar rather than profile 1 and profile 3 where one of them(profile 3) does not mention python in profile. I also don't want just count of similar features for distance which will yield surely wrong results.

My first question - Is there any way I can accommodate this intuition by any of the established ways?

My second question - there can be some profile authors more verbose than others, how to adjust this? because verbose author of profile having 4 occurrences of python may be same as less verbose author 2 occurrences of python.

I was not able to come up with good title for the question. So sorry if its confusing.

machine-learning cluster-analysis similarity unsupervised-learning

2015-05-04T07:19:29.967

0 投票

1 回答

1943 浏览

r - 自组织地图可视化结果解读

使用 R Kohonen包，我获得了一个“代码”图，其中显示了代码簿向量。

代码图

请问，相邻节点的码本向量不应该是相似的吗？为什么左边的前 2 个节点如此不同？

有没有办法在一个有意义的组织中组织它，如下图所示？来源从这里。高度贫困的国家聚集在底部。世界贫困地图

r machine-learning cluster-analysis som unsupervised-learning

2015-05-21T08:37:47.480

0 投票

1 回答

481 浏览

image-processing - 深度网络框架中卷积稀疏编码的实现

我想实现一些类似于本文中描述的卷积稀疏编码过程：http: //cs.nyu.edu/~ylan/files/publi/koray-nips-10.pdf 我尝试了不同的框架（caffe，eblearn火炬），但似乎缺乏对无监督特征学习过程的教程/支持，例如这个。作者说这篇特别的文章是使用 eblearn 完成的，但我没有发现那里没有无监督的学习过程。有没有人尝试过实现这些算法，如果有的话，他使用了哪些库/框架？谢谢

image-processing machine-learning feature-detection deep-learning unsupervised-learning

2015-06-12T22:41:47.567

0 投票

1 回答

2477 浏览

machine-learning - 隐马尔可夫模型：是否有可能随着状态数的增加准确率降低？

我使用 Baum-Welch 算法为越来越多的状态构建了几个隐马尔可夫模型。我注意到在 8 个状态之后，验证分数下降超过 8 个状态。所以我想知道由于某种过度拟合，隐马尔可夫模型的准确性是否会随着状态数量的增加而降低？

提前致谢！

machine-learning hidden-markov-models markov unsupervised-learning markov-models

2015-07-07T12:37:48.297

0 投票

2 回答

927 浏览

machine-learning - 如何对大相似度矩阵进行层次聚类

我有大约 50K 数据集，其值可能介于 0 到 10 之间。我想应用 HAC 对这些数据进行聚类。但是要应用 HAC，我需要准备一个 N*N 相似度矩阵。

对于 N = 50 K ，即使我使用short，这个矩阵也会太大而无法保存在内存中。

有什么方法可以批量进行 HAC 或任何其他方法可以帮助我应用具有 50K 数据点的 HAC。我打算在java中实现它。

我也担心需要花费的总时间，任何关于此的指示都会非常有帮助。

machine-learning hierarchical-clustering unsupervised-learning

2015-07-27T14:28:01.263

0 投票

2 回答

12627 浏览

machine-learning - 无监督学习中的训练/测试拆分是否必要/有用？

在监督学习中，我有典型的训练/测试拆分来学习算法，例如回归或分类。关于无监督学习，我的问题是：训练/测试拆分是否必要且有用？如果是，为什么？

machine-learning unsupervised-learning

2015-07-28T10:14:16.660

0 投票

1 回答

220 浏览

machine-learning - 马尔可夫链 - 具有“未见”观察的样本的可能性（概率 0）

我有一个大的马尔可夫链和一个样本，我想计算它的可能性。问题是样本中的某些观察或转换不会发生在马尔可夫链中，这使得总可能性为 0（或对数似然 - 无穷大）。不可能使用更多的数据来构建马尔可夫链。我想知道是否有办法仍然有一个有意义的可能性。

我已经尝试过滤掉样本中的这些“未知”观察结果并单独报告它们。但问题是我想将样本的可能性与同一样本的可能性进行比较，但在转换之后。转换后的样本具有不同数量的“未知”观察值。所以我认为我不能比较这两种可能性，因为它们是用不同数量的观察值计算出来的。

有没有办法仍然计算可以比较的有意义的可能性？我正在考虑对样本中观察的概率进行平均，但我找不到任何关于正确的信息。

提前致谢！

machine-learning markov-chains markov unsupervised-learning markov-models

2015-08-14T10:34:05.927

问题标签 [unsupervised-learning]

Reference