0

In a dataframe, I want to be able to separate columns with numeric types from columns with strings/characters.

Here is my data:

test=data.frame(col1=sample(1:20,10),col2=sample(31:50,10),
col3=sample(101:150,10),col4=sample(c('a','b','c'),10,replace=T))

Which looks like

   col1 col2 col3 col4
1     2   41  132    c
2    11   47  141    b
3    13   39  135    a
4    12   31  117    b
5    19   42  106    a
6     8   50  118    a
7    14   33  149    a
8     6   48  148    b
9    16   37  150    b
10    9   34  140    a

Now here is the strange thing if I do typeof a row/col containing a character, R says it is an integer

> typeof(test[1,4])
[1] "integer"

If I do something like this

> apply(test,2,typeof)
       col1        col2        col3        col4 
"character" "character" "character" "character" 

R says they are all characters. Also,

> lapply(test,typeof)
[1] "integer" "integer" "integer" "integer"

Again, what is going on and is there a good way to distinguish between columns with characters and columns with integers?

4

4 回答 4

2

apply适用于数组和矩阵,而不是数据帧。

要处理数据框,它首先将其转换为矩阵。

您的数据框有一个因子列,因此数组将所有内容转换为字符。懒得告诉你。

如您所见,sapply这是要走的路,并且class可能是您想要找出的东西。尽管还有mode, typoeof, 和storage.mode取决于您想知道的内容:

> test$col5=letters[1:10]  # really character, not a factor
> test$col3=test$col3*pi # lets get some decimals in there


> sapply(test, mode)
       col1        col2        col3        col4        col5 
  "numeric"   "numeric"   "numeric"   "numeric" "character" 
> sapply(test, class)
       col1        col2        col3        col4        col5 
  "integer"   "integer"   "numeric"    "factor" "character" 
> sapply(test, typeof)
       col1        col2        col3        col4        col5 
  "integer"   "integer"    "double"   "integer" "character" 
> sapply(test, storage.mode)
       col1        col2        col3        col4        col5 
  "integer"   "integer"    "double"   "integer" "character" 
于 2014-09-18T09:02:17.823 回答
0

data.frame(col4=sample(c('a','b','c'),10,replace=T)) col4 是一个因素。

apply(test,2,typeof): 如果 dim(test) == 2L 它将首先使用 as.matrix(test)。

于 2014-09-18T09:10:10.393 回答
0

好的,我想出了我自己的问题,对不起:

sapply(test,class)
于 2014-09-18T09:00:06.577 回答
0

col4是一个因素:

str(test)
#'data.frame':  10 obs. of  4 variables:
#$ col1: int  11 14 8 19 10 12 7 18 3 16
#$ col2: int  46 39 35 38 42 37 34 32 41 31
#$ col3: int  113 147 138 118 132 139 131 119 108 111
#$ col4: Factor w/ 3 levels "a","b","c": 1 3 2 3 2 3 3 3 1 3

typeof内部因子是具有类factorlevels属性的整数(由 报告)。apply将 data.frame 强制转换为矩阵。由于矩阵只能保存一种数据类型,因此在应用之前所有内容都被强制转换为字符typeof

用于class区分数据类型和lapply(或sapply)循环列。

于 2014-09-18T09:01:59.700 回答