我有一个数据框data
,其中有一列名为“Project License”,它代表一个分类变量,因此,在 R 术语中,是一个factor。我正在尝试创建一个新列,其中根据我的分类将开源软件许可证组合成更大的类别。但是,当我尝试组合(合并)该因子的级别时,我最终会得到一个列,其中所有级别都丢失或未更改,或者出现错误消息,例如以下消息:
因子错误(数据[[“项目许可证”]],级别 = 分类,标签 = c(“高度限制”,:无效的“标签”;长度 4 应为 1 或 6
这是我用于此功能的代码(从函数中提取):
myLevels <- c('gpl', 'lgpl', 'bsd',
'other', 'artistic', 'public')
myLabels <- c('GPL', 'LGPL', 'BSD',
'Other', 'Artistic', 'Public')
licenses <- factor(data[["Project License"]],
levels = myLevels, labels = myLabels)
data[["Project License"]] <- licenses
classification <- c(highly = c('gpl'),
restrictive = c('lgpl', 'public'),
permissive = c('bsd', 'artistic'),
unknown = c('other'))
restrictiveness <-
factor(data[["Project License"]],
levels = classification,
labels = c('Highly Restrictive', 'Restrictive',
'Permissive', 'Unknown'))
data[["License Restrictiveness"]] <- restrictiveness
我还尝试了一些其他方法(包括“R Inferno”中第 8.2.5 节中描述的方法),但到目前为止还没有成功。
我做错了什么以及如何解决这个问题?谢谢!
更新(数据):
> head(data, n=20)
Project ID Project License
1 45556 lgpl
2 41636 bsd
3 95627 gpl
4 66930 gpl
5 51103 gpl
6 65637 gpl
7 41834 gpl
8 70998 gpl
9 95064 gpl
10 48810 lgpl
11 95934 gpl
12 90909 gpl
13 6538 website
14 16439 gpl
15 41924 gpl
16 78987 gpl
17 58662 zlib
18 1904 bsd
19 93838 public
20 90047 lgpl
> str(data)
'data.frame': 45033 obs. of 2 variables:
$ Project ID : chr "45556" "41636" "95627" "66930" ...
$ Project License: chr "lgpl" "bsd" "gpl" "gpl" ...
- attr(*, "SQL")=Class 'base64' chr "ClNFTEVDVCBncm91cF9pZCwgbGljZW5zZQpGUk9NIHNmMDMxNC5ncm91cHMKV0hFUkUgZ3JvdXBfaWQgPCAxMDAwMDA="
- attr(*, "indicatorName")=Class 'base64' chr "cHJqTGljZW5zZQ=="
- attr(*, "resultNames")=Class 'base64' chr "UHJvamVjdCBJRCwgUHJvamVjdCBMaWNlbnNl"
更新 2(数据):
> unique(data[["Project License"]])
[1] "lgpl" "bsd" "gpl" "website" "zlib"
[6] "public" "other" "ibmcpl" "rpl" "mpl11"
[11] "mit" "afl" "python" "mpl" "apache"
[16] "osl" "w3c" "iosl" "artistic" "apsl"
[21] "ibm" "plan9" "php" "qpl" "psfl"
[26] "ncsa" "rscpl" "sunpublic" "zope" "eiffel"
[31] "nethack" "sissl" "none" "opengroup" "sleepycat"
[36] "nokia" "attribut" "xnet" "eiffel2" "wxwindows"
[41] "motosoto" "vovida" "jabber" "cvw" "historical"
[46] "nausite" "real"