0 投票

1 回答

99 浏览

r - 在 R 中找到各组结果的百分比频率

我有一个非常大的数据框，表示来自基于代理的模型的时间序列数据，如下所示：

ABM 模型运行数据

该数据集中的每一行代表模型的一个周期，它可以运行任意长度的时间并终止于三个结局之一：“统一”、“稳定性”或“不稳定”。

我正在构建一个大图，显示按维度和连接分面的时间序列数据，并且我想通过结束来分隔运行，以便以特定结尾结束的所有运行在图中获得单独的颜色。我希望每条线的粗细是该批次中每种结束发生的相对频率。

为了做到这一点，我需要在该数据中添加另一列“计数”，它计算特定结束在按维度和连接分组的一批运行中出现的次数，然后让该数字出现在每一行中以那个结局为特征。

因此，假设运行 1 到 10 的维度 ==4 和连接 ==2。其中四次以“稳定”告终，两次以“不稳定”告终，两次以“团结”告终。我希望“计数”列是 4、2 和 2，对于该批次数据中的每一行，它们都有各自的结尾。

这是困难的一个。提前致谢！

r aggregate group-summaries

2015-05-05T21:39:43.017

0 投票

1 回答

188 浏览

r - Combining columns of a table based on age range

I have a table in R that looks like (below is just a sample):

The rows are income levels, and the columns are age levels. I am essentially creating this table to see if age is related to income via a Chi-squared test. The numbers in the table are numbers of occurrences e.g. There are 2 people aged 17 in my dataset with income of 10000.

Both age and income level of type "num" in R so are continuous.

I want to essentially combine the columns for age so that I get a table with everyone who has income of 10k and is between age 15-25, age 25-35, etc. so I end up with much fewer columns.

Note also that colnames(tbl) = "15","17", "18", not "Age" - I haven't defined an overarching name for my columns and rows.

I note this answer does something similar but not sure how to apply it given I don't have a name for my columns e.g. "mpg" (in the case of the link).

Any ideas?

r aggregate summarization cbind group-summaries

2015-05-28T10:38:42.210

0 投票

1 回答

5163 浏览

r - dplyr 总结嵌套 group_by

我有一个这样的数据框：

我想计算一个类别中每一天的金额总和。我的尝试（见代码）都不够。

错误的输出 --> 这里的总和是在每组上计算的

错误的输出 --> 这里每天计算总和，但不考虑类别

到目前为止，我没有成功地结合这两种说法。如何嵌套两个 group_by 语句来计算每个类别中每一天的金额总和？

嵌套组，如：

summarise(group_by(group_by(testData, Date), Category), sum(Amount), dates = toString(Date))

不能按预期工作。

我听说过dplyr - 汇总加权数据 summarise_each但无法使其工作：

r group-by nested dplyr group-summaries

2015-07-02T07:23:43.993

0 投票

1 回答

1487 浏览

r - R dplyr 总结基于条件

我有一组根据我们生成的报告从网站下载的项目数据集。这个想法是根据下载次数删除不再需要的报告。逻辑基本上是统计去年下载的所有报告，检查它们是否在本年度中位数附近的两个绝对偏差之外，检查报告是否在过去 4 周内下载，如果是，如何下载很多次

我有下面的代码不起作用，我想知道是否有人可以帮助它给了我错误：对于 n_recent_downloads 部分

FUN(X[[1L]], ...) 中的错误：仅在具有所有数值变量的数据帧上定义

r dplyr group-summaries

2015-08-20T08:53:15.113

0 投票

1 回答

304 浏览

r - 根据其他列的 nrows 按组汇总数据表

我知道下面的这个命令将通过按组添加人口并将其除以每个组的行数来总结我的表格。

但是，我想要做的是将总人口数除以每组中另一列的行数 。像这样的东西：

这里的重点是geoid id6和id7是的子区域ct E1010，因此和的人口id6应该id7与E1010它们所在的较大区域的人口比例相等。

预期结果

使用下面的可重现示例，这是我想要得到的结果：

可重现的例子

r data.table group-summaries

2015-10-27T10:17:20.883

0 投票

1 回答

748 浏览

reporting-services - SSRS Display list of distinct values and totals from another column for those values

This seems like something that should be so very simple and easy but I'm completely missing it. I've got a set of transactions I'm trying to report on. Each has a set of data that I'm displaying grouped by a business date. Two parts of this are card type and amount.

What I want to do is for each Business Date group, below the list of transactions, display a totals summary which would look something like:

[date]

Transaction 1 Visa ...... amount .....

Transaction 2 Debit ...... amount .....

Transaction 3 Visa ...... amount .....

Transaction 4 Debit ...... amount .....

Transaction 5 Debit ...... amount .....

Transaction 6 Discover ...... amount .....

Transaction 7 Gift Card ...... amount .....

Transaction 8 Visa ...... amount .....

Transaction 9 Discover ...... amount .....

Transaction 10 Visa ...... amount .....

Summary of totals for [date]

Visa - $xxxx.xx

Discover - $xxxx.xx

Debit - $xxxx.xx

...

Total - $xxxx.xx

Preferably getting each value programmatically instead of manually setting up each cell for each card. I've seen some answers on the totals expressions side but not the distinct values part and those answers would have me manually set up each individual cell.

reporting-services ssrs-2008 subtotal group-summaries

2015-10-30T13:03:58.627

0 投票

0 回答

278 浏览

.net - 自定义组摘要（总值、项目值）

我有一个 gridview 摘要自定义计算，其中包含一个总和和一个每个项目。我需要根据您创建的组动态重新计算值，保持总量，这已经是正确的。

在 DevExpress Docs 中，我发现：https ://www.devexpress.com/Support/Center/Question/Details/Q273195

在 SO 中，我发现：如何将加权平均摘要添加到 DevExpress XtraGrid？

我的方法：

.net devexpress xtragrid group-summaries

2015-11-06T14:01:26.023

0 投票

1 回答

719 浏览

r - dplyr 的相对频率以及与每个组相关的动态创建的列

我正在关注为多个类别创建摘要列的非常有用的解决方案。如链接解决方案中所述，我正在使用为每个子组生成百分比列的代码。

链接解决方案中的相关示例代码：

该代码生成所需的值：

问题

我想修改此代码以动态创建与dplyr调用中传递的第二个类别中可用的唯一类别相关的列。这将是gear在所附示例的情况下。因此，在附加示例的情况下，生成的数据框将如下所示：

尝试

对于少数类别，我假设我可以使用中的值的汇总conditionally，如此处所讨论的，我将尝试dplyr仅针对指定条件执行语句sumBfoo = sum(B[A=="foo"]))。但是，这种方法在处理多个类别时效率低下。可以使用循环开发外部dplyr解决方案并跳过所需类别的唯一值，但我希望在dplyr.

样品表

从广义上讲，我想创建一个类似于下面的表：

但我只对行比例感兴趣，没有计数和总数以及其他小工具。

r dataframe dplyr summary group-summaries

2015-11-30T13:26:42.613

0 投票

1 回答

98 浏览

r - 在 dplyr 中同时为分组值导出唯一值计数和汇总值的有效方法

我有兴趣找到一种有效的方式来按组表获取摘要，该表将包含：

计算每组的唯一值
选定变量的一组原始描述性统计数据

例如，在生成描述性统计数据的情况下，我使用以下代码：

这将产生所需的输出：

我有兴趣用反映每组值计数的数字来丰富数据。关于计数，这可以简单地完成：

这将生成所需的数据：

问题

当我想同时应用这两种转换时，就会出现问题。

尝试 1

例如代码：

会产生：

没有先前生成的描述性统计数据。

尝试 2

编码：

预计会失败：

Error: n does not take arguments

尝试 3（工作）

编码：

将提供所需的数据：

我认为这是生成此摘要的极其低效的方式。特别是在处理大表时，动态创建对象效率低下。我有兴趣以一种更有效的方式获得相同的结果，而不是仅仅为了合并而创建对象。特别是，我想做的dplyr将对应于从表的先前版本中得出额外的摘要。例如：

团体
生成描述性统计数据
分组后返回数据
产生一些额外的统计数据并添加到最终数据中

r dataframe aggregate dplyr group-summaries

2015-12-07T12:46:56.770

0 投票

2 回答

5809 浏览

r - Parallel wilcox.test using group_by and summarise

There must be an R-ly way to call wilcox.test over multiple observations in parallel using group_by. I've spent a good deal of time reading up on this but still can't figure out a call to wilcox.test that does the job. Example data and code below, using magrittr pipes and summarize().

The buggy calls yield this error:

Thanks for your help; I hope it will be helpful to others with similar questions as well.

r dplyr magrittr group-summaries

2016-01-03T20:39:07.650

问题标签 [group-summaries]

FUN(X[[1L]], ...) 中的错误：仅在具有所有数值变量的数据帧上定义

预期结果

可重现的例子

链接解决方案中的相关示例代码：

问题

尝试

样品表

问题

尝试 1

尝试 2

尝试 3（工作）

Reference