我是 R 新手,正在练习使用来自 Kaggle 的 Titanic 数据集。我试图将姓氏、名字、称呼和额外信息分开到单独的列中,以便我可以尝试对乘客的年龄进行分类 - 成人或儿童。
以下是来自训练数据集的示例数据:
head(traindf,5)
# Source: local data frame [5 x 12]
#
# PassengerId Survived Pclass
# 1 1 0 3
# 2 2 1 1
# 3 3 1 3
# 4 4 1 1
# 5 5 0 3
# Variables not shown: Name (chr), Sex (fctr), Age (dbl), SibSp (int), Parch
# (int), Ticket (fctr), Fare (dbl), Cabin (fctr), Embarked (fctr)
以下是包含名称的示例:
select(traindf,Survived,Pclass,Name,Sex)
# Source: local data frame [891 x 4]
#
# Survived Pclass Name Sex
# 1 0 3 Braund, Mr. Owen Harris male
# 2 1 1 Cumings, Mrs. John Bradley (Florence Briggs Thayer) female
# 3 1 3 Heikkinen, Miss. Laina female
# 4 1 1 Futrelle, Mrs. Jacques Heath (Lily May Peel) female
# 5 0 3 Allen, Mr. William Henry male
# 6 0 3 Moran, Mr. James male
# 7 0 1 McCarthy, Mr. Timothy J male
# 8 0 3 Palsson, Master. Gosta Leonard male
# 9 1 3 Johnson, Mrs. Oscar W (Elisabeth Vilhelmina Berg) female
# 10 1 2 Nasser, Mrs. Nicholas (Adele Achem) female
我可以使用以下代码将姓氏与列的其余部分分开:
require(tidyr) # for the separate() function
traindfnames <- traindf %>%
separate(Name, c("Lastname","Salutation"), sep = ",")
traindfnames
# Source: local data frame [891 x 13]
#
# PassengerId Survived Pclass Lastname
# 1 1 0 3 Braund
# 2 2 1 1 Cumings
# 3 3 1 3 Heikkinen
# 4 4 1 1 Futrelle
# 5 5 0 3 Allen
# 6 6 0 3 Moran
# 7 7 0 1 McCarthy
# 8 8 0 3 Palsson
# 9 9 1 3 Johnson
# 10 10 1 2 Nasser
# .. ... ... ... ...
# Variables not shown: Salutation (chr), Sex (fctr), Age (dbl), SibSp (int),
# Parch (int), Ticket (fctr), Fare (dbl), Cabin (fctr), Embarked (fctr)
但是,当我尝试为名字添加字段时:
traindfnames <- traindf %>%
separate(Name, c("Lastname","Salutation","firstname"), sep =",,")
我收到此错误:
# Error: Values not split into 3 pieces at 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209, 210, 211, 212, 213, 2
我是否使用了不正确的语法或一列中的 3 个字段是不可能的?