问题标签 [iqr]
For questions regarding programming in ECMAScript (JavaScript/JS) and its various dialects/implementations (excluding ActionScript). Note JavaScript is NOT the same as Java! Please include all relevant tags on your question; e.g., [node.js], [jquery], [json], [reactjs], [angular], [ember.js], [vue.js], [typescript], [svelte], etc.
scala - SANSA 堆栈上的 Scala 异常检测
我试图执行 git repo 中可用的异常检测算法
以下代码片段会引发错误。
但是,当程序计算 IQR 时出现以下错误。我知道我们需要用 seq 替换 set。由于我是 scala 的新手,我不知道如何更改数据类型。toSeq 函数已应用于所有数据点。
:154: 错误:类型不匹配;找到:Set[(String, String, Object)] 需要:Seq[(String, String, Object)] val test = clusterOfSubject.map(f => outDetection.iqr2(f, anomalyListLimit))
r - dplyr:在组中组合和过滤
我filter
只想 x1、x2 和 x3 值以及按组id
(但在我的示例中,我没有成功结合across
我的变量(x1、x2 和 x3):
请问,有什么想法吗?提前致谢!!
pandas - 通过计算上下栅栏内的平均值来估算缺失值
我想在我的数据框中估算“年龄”的缺失值。这是一个浮动对象。
通过这样做,
- 我想得到 IQR 并计算上下栅栏。
- 然后我想用数据集上下栅栏之间的平均值替换缺失值。
我正在尝试在我的代码中执行此操作,但无法使其正常工作。
我得到 TypeError 说:无法使用 dtyped [float64] 数组和 [bool] 类型的标量执行 'ror_'
先感谢您!
python - ufunc 'multiply' did not contain a loop with signature matching types (dtype(' dtype('
Context
Context
I'm trying to find outliers in all columns of a dataframe with python.
Steps:
- Created a function to find outliers via IQR
- Tested the function on one column.
- Implemented the function on all columns with a for loop.
My level
I'm completely new to Machine learning and data science. I only know python and pandas so I'm currently expanding my knowledge in machine learning. I don't know a lot of theory about which data types machine learning algorithms can handle and why missing values are a problem, etc.
Overview of the data
Code to find outliers in one column
I created a function to find the IQR and will return the indices and values of the outliers.
When I call the function:
output:
Question 1
Why is this giving me no outliers? Look below
Note: Looking at the stats, there's probably something seriously wrong with the data
Code to find outliers in all columns (for loop)
output:
Question 2
What does this error mean/ why am I getting this?
For Question 1, your code seems to work fine on my end, but of course I don't have your original data.
For Question 2, there are two problems. The first is that you are passing the column names to find_outliers_tukey
instead of the columns themselves. Use iteritems
to iterate over pairs of (column name, column Series)
:
The second problem, which you'll run into after solving the first problem, is that your location
column is not a column with, so you won't be able to find outliers for it. Make sure to only iterate over the columns that you actually want to perform the calculation on.
Context
Context
I'm trying to find outliers in all columns of a dataframe with python.
Steps:
- Created a function to find outliers via IQR
- Tested the function on one column.
- Implemented the function on all columns with a for loop.
My level
I'm completely new to Machine learning and data science. I only know python and pandas so I'm currently expanding my knowledge in machine learning. I don't know a lot of theory about which data types machine learning algorithms can handle and why missing values are a problem, etc.
Overview of the data
Code to find outliers in one column
I created a function to find the IQR and will return the indices and values of the outliers.
When I call the function:
output:
Question 1
Why is this giving me no outliers? Look below
Note: Looking at the stats, there's probably something seriously wrong with the data
Code to find outliers in all columns (for loop)
output:
Question 2
What does this error mean/ why am I getting this?
For Question 1, your code seems to work fine on my end, but of course I don't have your original data.
For Question 2, there are two problems. The first is that you are passing the column names to find_outliers_tukey
instead of the columns themselves. Use iteritems
to iterate over pairs of (column name, column Series)
:
The second problem, which you'll run into after solving the first problem, is that your location
column is not a column with, so you won't be able to find outliers for it. Make sure to only iterate over the columns that you actually want to perform the calculation on.
r - 如何识别具有多个分组的异常值
我正在尝试从特定relabs
列的数据集中识别异常值,但我需要在Control
列中分别计算它们的值为 1 和 2,其中conc
列等于“NK”也分组为Treatment
.
带有 reprex 的数据集(应该有 40 个异常值,手动检查):
我应该得到的数据集:
我使用函数 identify_outliers by Treatment
andconc
它运行良好,但我也需要这个函数从我的数据集中分别从conc
“NK”和Control
“1”和“2”计算异常值,而不是组合,然后排除所有异常值anti_join
。
我当前的代码,它没有分别计算来自“NK”和“1”和“2”的异常值。Identify_outliers
来自rstatix。
我通过以下方式排除异常值anti_join
:
我有很多数据,所以如果我能弄清楚如何用 R 来做,我不想手动做。
有没有办法计算异常值 和 分组Treatment
,conc
以及如果Treatment
等于“NK”并且控制是“1”和“2”,则单独计算它?
在excel中手动计算异常值:
excel - 如何在excel中选择统计范围
我有很多数字的数组。这些是 4 个月期间的每日读数。每行代表一个不同的设备。我需要计算趋势线并忽略异常值。我尝试计算 IQR,然后是范围,然后过滤掉不在范围内的异常值。问题是有很多重复的结果,所以 q1 和 q3 很容易成为相同的数字。有没有办法在一个范围内选择中心 90% 的结果?让我们说:[1,1,1,1,1,1,1,1,2,3,4,5,7,8,9,9,9,9,9,9] 选择 [1,1 ,1,1,1,1,1,2,3,4,5,7,8,9,9,9,9,9] 让生活更艰难,我也希望能够选择相关的日期与进行测量的日期。日期在单独的一行中。
python - 定义函数以删除异常值
我创建了一个函数来删除这样的异常数据:
但是当我使用箱线图检查时,异常值仍然没有被删除。代码有什么问题?
python - 没有异常值的 pandas.DataFrame 中的 groupby 操作
对于 pandas.Series,我知道如何删除异常值。像这样:
我想对 DataFrame 的不同系列/列进行细化
我通常会做类似的事情
但是,在这种情况下,它也会平均异常值,我想从平均中忽略它。
请注意,随机数据比异常值在每列中的位置不同。因此,应仅在该列/系列中忽略异常值
结果应该是一个 DataFrame,有 26 行(每个字母一个index
)和 3 列,平均值没有异常值
我可以遍历列df
并执行第一个代码块。但是有更好的方法吗?
欢迎提出建议。接受任何方法
python - 在数组行的 Numpy 分位数计算中排除零
我有一个二维数组,每行都有零值。
有没有办法通过排除计算中的零值来计算每行的 0.75 分位数?
[12,1,2,30,2,2]
例如,在第二行中,计算中只应使用6 个非零值。我尝试使用np.quantile()
,但它会在计算中包含所有零值。似乎 Numpy 也没有掩码数组np.ma
版本quantile()
。