0

我正在尝试将我的数据从 csv 文件中提取到 R 中。数据当前的格式如下:

,"[{time=2014-01-01T00:00:00, NDVI=0.3793765496776215}, {time=2014-02-01T00:00:00, NDVI=...

,"[{time=2014-01-01T00:00:00, NDVI=0.4071076986818826}, {time=2014-02-01T00:00:00, ...

,"[{time=2014-01-01T00:00:00, NDVI=0.3412131556625801}, {time=2014-02-01T00:00:00, NDVI=...

这些行中的每一行都是不同区域的数据。我想以这种格式拥有它:

Region [1]

Time       NDVI

[1]        [1]

[2]        [2]

[3]        [3]

[...]      [...]


Region [2]

Time       NDVI

[1]        [1]

[2]        [2]

[3]        [3]

[...]      [...]


Region [3]

Time       NDVI

[1]        [1]

[2]        [2]

[3]        [3]

[...]      [...]

我怎样才能做到这一点?

4

1 回答 1

0

也许有一个包可以解析这个。tidyverse但是,您可以使用该包进行一些数据转换。

您可以通过以下方式读取数据readLines()

dat <- readLines("test.txt")

在这种情况下,它看起来像这样:

dat <- c(",\"[{time=2014-01-01T00:00:00, NDVI=0.3793765496776215}, {time=2014-02-01T00:00:00, NDVI=2}]\"", 
"", ",\"[{time=2014-01-01T00:00:00, NDVI=0.4071076986818826}, {time=2014-02-01T00:00:00, NDVI=3}]\"", 
"", ",\"[{time=2014-01-01T00:00:00, NDVI=0.3412131556625801}, {time=2014-02-01T00:00:00, NDVI=4}]\""
)

然后,您可以使用 for 循环进行一些数据转换并将结果存储在列表中。

library(tidyverse)
dat <- c(",\"[{time=2014-01-01T00:00:00, NDVI=0.3793765496776215}, {time=2014-02-01T00:00:00, NDVI=2}]\"", 
         "", ",\"[{time=2014-01-01T00:00:00, NDVI=0.4071076986818826}, {time=2014-02-01T00:00:00, NDVI=3}]\"", 
         "", ",\"[{time=2014-01-01T00:00:00, NDVI=0.3412131556625801}, {time=2014-02-01T00:00:00, NDVI=4}]\""
)

l <- list()
counter <- 1
for (line in dat){
  if(nchar(line)>0){
    line <- as.data.frame(line) %>%
        # We need to remove some unwanted strings
      mutate(line = str_replace_all(line, 
                                    c("\\\""="", ",\\["= "", "\\]" = ""))) %>% 
        # The lines can be separated into rows where the string "}, {" occurs
               separate_rows(line, line, sep = "\\}, \\{") %>% 
        # again removing some unwanted strings
      mutate(line = str_replace_all(line, c("\\{"="", "\\}"=""))) %>% 
        # add a unique identifier for each observation
      mutate(observation = row_number()) %>% 
        # separete the rows where a "," occurs
      separate_rows(line, line, sep =",") %>% 
      separate(., line, into = c("category", "value"), sep = "=") %>% 
        # put it into the long format
      pivot_wider(names_from = category, values_from = value)
    l[[counter]] <- line
    counter <- counter+1
  }
}

l
#> [[1]]
#> # A tibble: 2 x 3
#>   observation time                ` NDVI`           
#>         <int> <chr>               <chr>             
#> 1           1 2014-01-01T00:00:00 0.3793765496776215
#> 2           2 2014-02-01T00:00:00 2                 
#> 
#> [[2]]
#> # A tibble: 2 x 3
#>   observation time                ` NDVI`           
#>         <int> <chr>               <chr>             
#> 1           1 2014-01-01T00:00:00 0.4071076986818826
#> 2           2 2014-02-01T00:00:00 3                 
#> 
#> [[3]]
#> # A tibble: 2 x 3
#>   observation time                ` NDVI`           
#>         <int> <chr>               <chr>             
#> 1           1 2014-01-01T00:00:00 0.3412131556625801
#> 2           2 2014-02-01T00:00:00 4

reprex 包(v0.3.0)于 2020-03-24 创建

于 2020-03-24T08:22:30.160 回答