
1) My data have biological origin, collected in a period of 120s, from a
 subject receiving, each time, one of possible three stimuli (response label 1
 to 3), in a random manner, one stimulus per second (trial). Sampling 
 frequency is 256 Hz and 61 different sensors (input variables). So, my 
 dataset has 120x256 rows and 62 columns (1 response label + 61 input 
2) My goal is to identify if there is an underlying pattern for each stimulus.
 For that, I would like to use deep learning neural networks to test my
 hypothesis, but not in a conventional way (to predict the stimulus from a
 single observation/row).
3) My approach is to divide the whole dataset, after shuffling per row
 (avoiding any time bias), in training and validation sets (50/50) and then to
 run the deep learning algorithm. The division does not segregate trial events
 (120), so each training/validation sets should contain data (rows) from the
 same trial (but never the same row). If there is a dominant pattern per
 stimulus, the validation confusion matrix error should be low. If there is a
 dominant pattern per trial, the validation confusion matrix error should be
 high. So, the validation confusion matrix error is my indicator of the
 presence of a hidden pattern per stimulus;




1 回答 1



如果深度神经网络、SVM 或任何其他分类器的分类效果优于偶然性,则意味着:

  1. 在训练集样本中存在关于每个预测类的信息(模式)
  2. 并且该模式可由分类器学习
  3. 并且该信息并非特定于训练集(没有过度学习

因此,如果分类性能超过机会,则上述 3 个条件为真。如果不是,那么一个或多个条件可能是错误的。训练变量可能不包含任何有助于预测类别的信息。或者选择了预测变量,但是它们与类之间的关系太复杂,分类器无法学习。或者分类器过度学习,并且 CV 集性能处于机会水平或更差。

这是一篇论文(开放访问),它使用类似的逻辑来论证 fMRI 活动包含有关人们正在查看的图像的信息:


注意:根据使用的分类器(尤其是 DNN,但决策树较少),这只会告诉您是否存在模式,它不会告诉您该模式是什么。

于 2016-04-19T02:13:23.077 回答