0

I have a table in R that looks like (below is just a sample):

|       | 15 | 17 | 18 | 22 | 25 | 26 | 27 | 29 | 
|-------|----|----|----|----|----|----|----|----|
| 10000 | 1  | 2  | 1  | 2  | 4  | 3  | 5  | 2  |
| 20000 | 0  | 0  | 0  | 0  | 0  | 0  | 0  | 0  |
| 30000 | 0  | 0  | 0  | 0  | 0  | 0  | 0  | 0  |
| 40000 | 0  | 0  | 0  | 1  | 2  | 3  | 6  | 3  |
| 50000 | 0  | 0  | 0  | 0  | 0  | 0  | 1  | 1  |
| 60000 | 0  | 0  | 0  | 0  | 0  | 0  | 0  | 0  |

The rows are income levels, and the columns are age levels. I am essentially creating this table to see if age is related to income via a Chi-squared test. The numbers in the table are numbers of occurrences e.g. There are 2 people aged 17 in my dataset with income of 10000.

Both age and income level of type "num" in R so are continuous.

I want to essentially combine the columns for age so that I get a table with everyone who has income of 10k and is between age 15-25, age 25-35, etc. so I end up with much fewer columns.

Note also that colnames(tbl) = "15","17", "18", not "Age" - I haven't defined an overarching name for my columns and rows.

I note this answer does something similar but not sure how to apply it given I don't have a name for my columns e.g. "mpg" (in the case of the link).

Any ideas?

4

1 回答 1

1

在这里制作了我自己的矩阵,但也应该适用于 df。

mat <- matrix(sample(1:10,8500,replace = TRUE),ncol=85)
colnames(mat) <- 15:99
levs <- cut(as.numeric(colnames(mat)),seq(15,105,10),right = FALSE)
res <- sapply(as.character(unique(levs)),function(x)rowSums(mat[,levs==x]))

编辑:如果您想要与 mat 中相同的 colnames,但根据类别计数,另外执行:

res <- res[,levs] # expands the res df to one category count col pr. original col in mat.
colnames(res) <- colnames(mat) # renames cols to reflect input matrix mat.
于 2015-05-28T11:27:29.040 回答