“imputation”的相关标签问题

0 投票

1 回答

1070 浏览

python - fancyimpute 的 SoftImpute 是否需要标准化数据？

页面https://pypi.python.org/pypi/fancyimpute有一行

这表明我需要对输入数据进行规范化。但是我没有在互联网上找到任何详细信息，这究竟是什么意思。我是否必须事先规范化我的数据以及究竟是什么预期的？

2017-02-08T14:31:12.997

0 投票

0 回答

1164 浏览

r - 使用 TestMCARNormality 测试 MCAR

我有一个 DF (dfNA)，每列包含少量缺失数据。这个数据框是一个更大的数据框 (wideRawDF) 的一个子集，我想为其估算缺失值。

为了插补数据，我需要确定数据是否缺少 MCAR/NMAR/MAR，以便我可以应用正确的插补方法。

colsNA 是包含 NA 值的列的字符串，它的派生如下：

为了简化和更好地理解为什么我在使用 TestMCARNormality 时遇到错误，我决定只传递具有 NA 值的列，而不是包含完整值的列。

我将wideRawDF子集如下：

TestMCARNormality是一个测试缺失数据是否为 MCAR 的函数。

使用此函数时，我将 dfNA 传递给它时收到以下错误：

我无法弄清楚错误指的是什么，因为我的数据框中缺少值：

我的数据框中也有数字数据：

我搜索了错误并在此页面上找到了源代码，但我不是一个强大的程序员并且很难理解它。任何有助于了解这一点的帮助将不胜感激。

下面是我正在使用的文件的 dput() 输出。

wideRawDF 这是原始 DF，其列包含缺失值和完整值

colsNA 这是包含 NA 值的列的字符串

dfNA 是列的子集 DF，其中包含 NA 值

r missing-data imputation

2017-02-08T22:30:35.313

0 投票

3 回答

7279 浏览

r - 'R', 'mice', missing variable imputation - how to only do one column in sparse matrix

I have a matrix that is half-sparse. Half of all cells are blank (na) so when I try to run the 'mice' it tries to work on all of them. I'm only interested in a subset.

Question: In the following code, how do I make "mice" only operate on the first two columns? Is there a clean way to do this using row-lag or row-lead, so that the content of the previous row can help patch holes in the current row?

Places that I have looked for answers:

help document (link)
google of course...
https://stats.stackexchange.com/questions/99334/fast-missing-data-imputation-in-r-for-big-data-that-is-more-sophisticated-than-s

r imputation r-mice

2017-02-10T14:00:24.363

0 投票

2 回答

2064 浏览

r - 如何使用 R 中基于面板数据的客户 ID 的所有列的中值插补填充缺失值？

r panel median imputation

2017-02-15T19:28:14.337

0 投票

3 回答

263 浏览

r - 如何在给定最大间隙参数的情况下用零替换连续的 NA（在 R 中）

我想NA用零替换每行的所有连续值，但前提是连续NAs 的数量小于 parmeter maxgap。

这与函数非常相似zoo::na.locf

给

[1] 不适用 1 2 3 3 3 5 6 7 不适用不适用不适用

有两件事与我的目标不同：我也想替换领先的 NA，并且我想用 0 而不是最后一个非 NA 值替换 2 个连续的 NA。

我想得到

0 1 2 3 0 0 5 6 7 不适用不适用不适用

我如何在 R 中做到这一点。我可以使用 tidyverse 中的函数吗？

r na imputation tidyverse

2017-02-17T14:03:19.060

0 投票

0 回答

405 浏览

imputation - stata多重插补插补链式

我一直在尝试在一个大数据集（600k 行）上使用 stata 进行多重插补，但我遇到了一些我无法解释的错误。也尝试了不同的方法，但我总是遇到一些问题。希望你能帮助我，我对多重插补有点陌生。对德国变量感到抱歉，但我想你不会有大问题。

错误：

估算 m=1 到 m=5 matsize too small 您试图创建一个包含太多行或列的矩阵，或者试图拟合一个包含太多变量的模型。你需要增加matsize；目前是 400。使用 set matsize；请参阅帮助 matsize。

增加垫子尺寸（450 和更高），仍然是相同的错误

如果您使用因子变量并包含具有大量缺失单元格的交互，请增加 matsize 或设置 emptycells drop 以减小所需的矩阵大小；请参阅帮助设置空单元格。

使用set emptycells drop，仍然是同样的错误

如果您使用因子变量，您可能会不小心将连续变量视为分类变量，从而产生大量类别。使用 c。此类变量的运算符。leistungsfähig Rehadauer1 sb_n bb_n sa_n Bewilligungsdiagnosegruppen1 Berufstellung Arbeitsunf Erwerbst Berufsgrkl famstand1 Rehaart ORT 在 m = 1 的插补期间发生错误

问题是，只有Rehadauer1不是分类变量..所以我必须写c.Rehadauer1，或者它是什么意思？

mi 和 ice () 的另一种方法：

这里的问题是，如果在没有 nopp 的情况下使用，它总是会抛出错误“检测到完美预测”。

imputation

2017-03-02T16:16:48.793

0 投票

1 回答

1949 浏览

python - How does knnimpute work?

From https://stackoverflow.com/a/35684975/4533188 I got that K-Nearest Neighbour Imputation works like this:

For the current observation get the distance to all the other observations.
For each missing value in the current observation, consider all those k nearest observations that have no missing value in the feature in question.
From those feature values of those observations: Calculate the mean (or some similar statistic) - this is the value which is used for the imputation.

The key step is 1: How do we calculate the distance if not all values are available? The post above points towards the Heterogeneous Euclidean-Overlap Metric. However I am interested in the implementation of knn-imputation of fancyimpute. I tracked it back to https://github.com/hammerlab/knnimpute, more specifically https://github.com/hammerlab/knnimpute/blob/master/knnimpute/few_observed_entries.py and I looked at the code. However I am not able to figure out how it works.

Can someone please explain to me, how the knnimpute works there? How is does the distance calculation work here?

python machine-learning knn imputation

2017-03-02T19:48:22.423

0 投票

1 回答

139 浏览

missing-data - 缺失数据的估算表的熵（不确定性）？

大家好，我的数据不完整（A.Original），*- 代表缺失值。表 A 的估算版本显示在表B.Imputed中。在表 B 中，A(4,3) 表示 A 具有非缺失值的概率。
我的问题是测量表 B 的熵（不确定性）。
我们如何测量表 B 的熵（不确定性）？ 有什么方法吗？
非常感谢任何建议和批评:)
谢谢。

missing-data entropy information-theory imputation

2017-03-02T21:14:52.970

0 投票

1 回答

1489 浏览

r - 输入数据必须具有类中值

我正在做一个学校项目，我需要对缺失的数据进行插补，在用老鼠插补后，我试图生成具有完整功能的完整数据集。

当我一一运行它们时，一切正常，但是我想使用 for 循环以防万一我想要的不仅仅是m = 5插补。现在，当尝试运行时for-loop，我总是得到错误

完整错误（插补[1]）：输入数据必须具有“mids”类。

但是，当我查找课程时，它是中间的，这里出了什么问题？

这是我的代码：

有人可以帮我吗？

r missing-data imputation r-mice

2017-03-08T11:11:11.977

0 投票

0 回答

83 浏览

r - 预测中的缺失值

我有一个简单的线性回归，训练集中缺少值。我发现了许多很好的插补方法来估计这种情况的良好模型。

我找不到的是当这个点也可能有缺失值时如何用模型预测一个新点。有没有通用的方法？

r statistics prediction missing-data imputation

2017-03-10T14:04:13.220

问题标签 [imputation]

Reference