string - R：拆分数字字符串

Question

我正在尝试拆分 40 位的数字字符串（即拆分123456789123456789123456789为1 2 3 4等）

不幸的是strsplit，由于它需要字符而不起作用，并且使用转换字符串as.character不起作用，因为它很长，并且 R 会自动切断长数字的小数（最多为 22 位小数）。因此，我最终得到"1.2345e+35"一个字符串，而不是完整的数字。

是否有的数字变体strsplit或解决小数截止问题的方法？我似乎无法在 stackoverflow 上找到答案，但如果之前已经回答过，我深表歉意。提前致谢！

score 6 · Accepted Answer

如果 R 正在计算数字，我不知道解决方案。如果数字在数据文件中，我认为下面的代码可能有效。虽然，如果数字在数据文件中，可能会有更简单的解决方案。

a1 <- read.table("c:/users/Mark W Miller/simple R programs/long_number.txt", colClasses = 'character')

# a1 <- c('1234567891234567891234567891234567891234') ;

a1 <- as.character(a1) ;
a2 <- strsplit(a1, "") ;
a3 <- unlist(a2) ;
a4 <- as.vector(as.numeric(a3)) ;
a4
# [1] 1 2 3 4 5 6 7 8 9 1 2 3 4 5 6 7 8 9 1 2 3 4 5 6 7 8 9 1 2 3 4 5 6 7 8 9 1 2 3 4

编辑

我意识到我可能不明白这个问题，我的回答可能很愚蠢。不过，如果您有一个包含非常长数字的完整数据集，您可以使用下面的代码将它们全部拆分。请注意，文件“three_long_numbers.txt”中没有引号，数据以数字开头：

a1 <- read.table("c:/users/Mark W Miller/simple R programs/three_long_numbers.txt", colClasses = 'character')
a1

#      V1                                        
# [1,] "1234567891234567891234567891234567891234"
# [2,] "1888678912345678912345678912345678912388"
# [3,] "1234999891234567891234567891234567891239"

# a1 <- matrix(c(
# "1234567891234567891234567891234567891234",
# "1888678912345678912345678912345678912388",
# "1234999891234567891234567891234567891239"), nrow=3, byrow=T)

a1 <- as.matrix(a1) ;
a2 <- strsplit(a1, "") ;
a3 <- unlist(a2) ;
a3 <- as.numeric(a3) ;
a4 <- matrix(a3, nrow=dim(a1)[1], byrow=T)
a4

score 3 · Accepted Answer

您可以简单地执行此操作以拆分为数字向量：

s <- "123456789123456789123456789"
as.numeric(strsplit(s,"")[[1]])

# [1] 1 2 3 4 5 6 7 8 9 1 2 3 4 5 6 7 8 9 1 2 3 4 5 6 7 8 9

或者，如果您希望将它们拆分为字符向量：

strsplit(s,"")[[1]]

# [1] "1" "2" "3" "4" "5" "6" "7" "8" "9" "1" "2" "3" "4" "5" "6" "7" "8" 
# "9" "1" "2" "3" "4" "5" "6"
# [25] "7" "8" "9"

score 1 · Accepted Answer

这是另一种似乎比我一年前的回答更直接的方法：

拆分单个向量：

a1 <- c('1234567891234567891234567891234567891234')
a2 <- read.fwf(textConnection(a1), widths=rep(1, nchar(a1)), colClasses = 'numeric', header=FALSE)
a2
  V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13 V14 V15 V16 V17 V18 V19 V20 V21 V22 V23 V24 V25 V26 V27 V28 V29 V30 V31 V32 V33 V34 V35 V36 V37 V38 V39 V40
1  1  2  3  4  5  6  7  8  9   1   2   3   4   5   6   7   8   9   1   2   3   4   5   6   7   8   9   1   2   3   4   5   6   7   8   9   1   2   3   4

读取包含以下三个等长长数的文件：

# 1234567891234567891234567891234567891234
# 1888678912345678912345678912345678912388
# 1234999891234567891234567891234567891239

a1 <- read.table("c:/users/mmiller21/simple R programs/three_long_numbers.txt", colClasses = 'character', header = FALSE)
a2 <- read.fwf("c:/users/mmiller21/simple R programs/three_long_numbers.txt", widths=rep(1, max(nchar(a1$V1))), colClasses = 'numeric', header=FALSE)
a2

  V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13 V14 V15 V16 V17 V18 V19 V20 V21 V22 V23 V24 V25 V26 V27 V28 V29 V30 V31 V32 V33 V34 V35 V36 V37 V38 V39 V40
1  1  2  3  4  5  6  7  8  9   1   2   3   4   5   6   7   8   9   1   2   3   4   5   6   7   8   9   1   2   3   4   5   6   7   8   9   1   2   3   4
2  1  8  8  8  6  7  8  9  1   2   3   4   5   6   7   8   9   1   2   3   4   5   6   7   8   9   1   2   3   4   5   6   7   8   9   1   2   3   8   8
3  1  2  3  4  9  9  9  8  9   1   2   3   4   5   6   7   8   9   1   2   3   4   5   6   7   8   9   1   2   3   4   5   6   7   8   9   1   2   3   9

读取包含以下三个长度不等的长数的文件：

# 1234567891234567891234567891234567891234
# 188867891234567891234567891234567891238
# 12349998912345678912345678912345678912

a1 <- read.table("c:/users/mmiller21/simple R programs/three_long_numbersb.txt", colClasses = 'character', header = FALSE)
a2 <- read.fwf("c:/users/mmiller21/simple R programs/three_long_numbersb.txt", widths=rep(1, max(nchar(a1$V1))), colClasses = 'numeric', header=FALSE)
a2

  V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13 V14 V15 V16 V17 V18 V19 V20 V21 V22 V23 V24 V25 V26 V27 V28 V29 V30 V31 V32 V33 V34 V35 V36 V37 V38 V39 V40
1  1  2  3  4  5  6  7  8  9   1   2   3   4   5   6   7   8   9   1   2   3   4   5   6   7   8   9   1   2   3   4   5   6   7   8   9   1   2   3   4
2  1  8  8  8  6  7  8  9  1   2   3   4   5   6   7   8   9   1   2   3   4   5   6   7   8   9   1   2   3   4   5   6   7   8   9   1   2   3   8  NA
3  1  2  3  4  9  9  9  8  9   1   2   3   4   5   6   7   8   9   1   2   3   4   5   6   7   8   9   1   2   3   4   5   6   7   8   9   1   2  NA  NA

这是在包含多列的数据文件中拆分一列长数字的代码。在此示例中，第 2 列中的每个数字具有相同的长度：

# -10 1234567891234567891234567891234567891234 -100
# -20 1888678912345678912345678912345678912388 -200
# -30 1234999891234567891234567891234567891239 -300

a1 <- read.table("c:/users/mark w miller/simple R programs/three_long_numbers_Oct25_2013.txt", colClasses = c('numeric', 'character', 'numeric'), header = FALSE)
a2 <- read.fwf(textConnection(a1$V2), widths=rep(1, nchar(a1$V2)[1]), colClasses = 'numeric', header=FALSE)
  V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13 V14 V15 V16 V17 V18 V19 V20 V21 V22 V23 V24 V25 V26 V27 V28 V29 V30 V31 V32 V33 V34 V35 V36 V37 V38 V39 V40
1  1  2  3  4  5  6  7  8  9   1   2   3   4   5   6   7   8   9   1   2   3   4   5   6   7   8   9   1   2   3   4   5   6   7   8   9   1   2   3   4
2  1  8  8  8  6  7  8  9  1   2   3   4   5   6   7   8   9   1   2   3   4   5   6   7   8   9   1   2   3   4   5   6   7   8   9   1   2   3   8   8
3  1  2  3  4  9  9  9  8  9   1   2   3   4   5   6   7   8   9   1   2   3   4   5   6   7   8   9   1   2   3   4   5   6   7   8   9   1   2   3   9

string - R：拆分数字字符串

3 回答 3

Related

Reference