r - 使用变量而不是数据集名称使用 data() 将数据集加载到 R 中

Question

我正在尝试使用 data() 函数将数据集加载到 R 中。当我使用数据集名称（例如data(Titanic)或data("Titanic")）时，它工作正常。对我不起作用的是使用变量而不是其名称加载数据集。例如：

# This works fine:
> data(Titanic)

# This works fine as well:
> data("Titanic")

# This doesn't work:
> myvar <- Titanic
> data(myvar)
**Warning message:
In data(myvar) : data set ‘myvar’ not found**

为什么 R 寻找名为“myvar”的数据集，因为它没有被引用？由于这是默认行为，是否有办法加载存储在变量中的数据集？

作为记录，我想做的是创建一个使用“arules”包并使用 Apriori 挖掘关联规则的函数。因此，我需要将数据集作为参数传递给该函数。

myfun <- function(mydataset) {
    data(mydataset)    # doesn't work (data set 'mydataset' not found)
    rules <- apriori(mydataset)
}

编辑- sessionInfo() 的输出：

> sessionInfo()
R version 3.0.0 (2013-04-03)
Platform: i386-w64-mingw32/i386 (32-bit)

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] arules_1.0-14   Matrix_1.0-12   lattice_0.20-15 RPostgreSQL_0.4 DBI_0.2-7      

loaded via a namespace (and not attached):
[1] grid_3.0.0  tools_3.0.0

我得到的实际错误（例如，使用示例数据集“xyz”）：

xyz <- data.frame(c(1,2,3))
data(list=xyz)
Warning messages:
1: In grep(name, files, fixed = TRUE) :
  argument 'pattern' has length > 1 and only the first element will be used
2: In grep(name, files, fixed = TRUE) :
  argument 'pattern' has length > 1 and only the first element will be used
3: In if (name %in% names(rds)) { :
  the condition has length > 1 and only the first element will be used
4: In grep(name, files, fixed = TRUE) :
  argument 'pattern' has length > 1 and only the first element will be used
5: In if (name %in% names(rds)) { :
  the condition has length > 1 and only the first element will be used
6: In grep(name, files, fixed = TRUE) :
  argument 'pattern' has length > 1 and only the first element will be used

...

...

32: In data(list = xyz) :
  c("data set ‘1’ not found", "data set ‘2’ not found", "data set ‘3’ not found")

score 17 · Accepted Answer

使用list论据。见?data。

data(list=myvar)

您还需要myvar是一个字符串。

myvar <- "Titanic"

请注意，这myvar <- Titanic只是因为泰坦尼克号数据集的延迟加载而起作用（我认为）。包中的大多数数据集都是以这种方式加载的，但对于其他类型的数据集，您仍然需要该data命令。

score 6 · Accepted Answer

使用变量作为字符。否则，您将处理“泰坦尼克号”的内容而不是其名称。您可能还需要使用 get 将字符值转换为对象名称。

myvar <- 'Titanic'

myfun <- function(mydataset) {
    data(list=mydataset)   
    str(get(mydataset))
}

myfun(myvar)

score 1 · Accepted Answer

如果包已经加载，你可以使用 get() 函数将数据集分配给一个局部变量：

data_object = get(myvar, asNamespace('<package_name>'))

或者简单地说：

data_object = get(myvar)

score -3 · Accepted Answer

我正在回答我自己的问题，但我终于找到了解决方案。引用 R 帮助：

“在所有当前加载的包中搜索数据集，然后在当前工作目录的‘数据’目录（如果有）中搜索。”

因此，只需将数据集写入文件并将其放入名为“data”的目录并位于工作目录中。

> write.table(mydataset,file="dataset.csv",sep=",",quote=TRUE,row.names=FALSE)  # I intend to create a csv file, so I use 'sep=","' to separate the entries by a comma, 'quote=TRUE' to quote all the entries, and 'row.names=F to prevent the creation of an extra column containing the row names (which is the default behavior of write.table() )

# Now place the dataset into a "data" directory (either via R or via the operating system, doesn't make any difference):
> dir.create("data")  # create the directory
> file.rename(from="dataset.csv",to="data/dataset.csv")  # move the file

# Now we can finally load the dataset:
> data("mydataset")  # data(mydataset) works as well, but quoted is preferable - less risk of conflict with another object coincidentally named "mydataset" as well

score -3 · Accepted Answer

分配名称 <- read.csv(file.choose())

这行代码打开你的本地机器，只需选择你要加载的数据集R环境

r - 使用变量而不是数据集名称使用 data() 将数据集加载到 R 中

5 回答 5

Related

Reference