1
df = data.frame(table(train$department , train$outcome)) 

这里的部门和结果都是因素,所以它给了我一个看起来像给定图像的数据框

is_outcome 是二进制的,df 看起来像这样

仅包含 2 个变量(字段),而我希望此部门列成为数据框的一部分,即包含 3 个变量的数据框

                         0    1 
Analytics             4840  512
Finance               2330  206 
HR                    2282  136 
Legal                  986   53 
Operations           10325 1023
Procurement           6450  688
R&D                    930   69
Sales & Marketing    15627 1213 
Technology            6370  768 

我学到的一种方法是...

df = data.frame(table(train$department , train$is_outcome))
write.csv(df,"df.csv")
rm(df)
df = read.csv("df.csv")
colnames(df) = c("department", "outcome_0","outcome_1")

但我无法在程序中每次都保存文件

有没有办法直接做。

4

1 回答 1

0

When you are trying to create tables from a matrix in R, you end up with trial.table. The object trial.table looks exactly the same as the matrix trial, but it really isn’t. The difference becomes clear when you transform these objects to a data frame. Take a look at the outcome of this code:

   > trial.df <- as.data.frame(trial)
    > str(trial.df)
  ‘data.frame’: 2 obs. of 2 variables:
   $ sick  : num 34 11
    $ healthy: num 9 32

Here you get a data frame with two variables (sick and healthy) with each two observations. On the other hand, if you convert the table to a data frame, you get the following result:

  > trial.table.df <- as.data.frame(trial.table)
    > str(trial.table.df)
   ‘data.frame’: 4 obs. of 3 variables:
    $ Var1: Factor w/ 2 levels “risk”,”no_risk”: 1 2 1 2
    $ Var2: Factor w/ 2 levels “sick”,”healthy”: 1 1 2 2
     $ Freq: num 34 11 9 32

The as.data.frame() function converts a table to a data frame in a format that you need for regression analysis on count data. If you need to summarize the counts first, you use table() to create the desired table.

Now you get a data frame with three variables. The first two — Var1 and Var2 — are factor variables for which the levels are the values of the rows and the columns of the table, respectively. The third variable — Freq — contains the frequencies for every combination of the levels in the first two variables.

In fact, you also can create tables in more than two dimensions by adding more variables as arguments, or by transforming a multidimensional array to a table using as.table(). You can access the numbers the same way you do for multidimensional arrays, and the as.data.frame() function creates as many factor variables as there are dimensions.

于 2018-09-15T13:33:47.860 回答