r - 将堆栈溢出查询中的数据导入 R 的代码

Question

当我尝试在 Stack Overflow 中回答有关 R 的问题时，我的大部分时间都花在尝试重建作为示例给出的数据（除非问题作者足够好以将它们作为 R 代码提供）。

所以我的问题是，如果有人只是问一个问题并通过以下方式给出他的示例数据框：

a  b   c
1 11 foo
2 12 bar
3 13 baz
4 14 bar
5 15 foo

您是否有技巧或功能可以轻松地将其导入 R 会话，而无需键入整个data.frame()指令？

提前感谢您的任何提示！

PS：抱歉，如果我的问题标题中的“查询”一词不是很好，但似乎您不能在堆栈溢出的问题标题中使用“问题”一词:-)

score 25 · Accepted Answer

也许textConnection()是你想要的：

R> zz <- read.table(textConnection("a  b   c
1 11 foo
2 12 bar
3 13 baz
4 14 bar
5 15 foo"), header=TRUE)
R> zz
  a  b   c
1 1 11 foo
2 2 12 bar
3 3 13 baz
4 4 14 bar
5 5 15 foo
R>

它允许您将文本视为阅读的“连接”。您也可以只复制和粘贴，但从剪贴板访问更多地依赖于操作系统，因此便携性较差。

score 23 · Accepted Answer

R 的最新版本现在提供的击键选项比将textConnection列数据输入到 read.table 和朋友的路径更低。面对这个：

zz
  a  b   c
1 1 11 foo
2 2 12 bar
3 3 13 baz
4 4 14 bar
5 5 15 foo

可以简单地插入 :<- read.table(text="之后zz，删除回车，然后", header=TRUE)在最后一个之后插入foo并键入 [enter]。

zz<- read.table(text="  a  b   c
1 1 11 foo
2 2 12 bar
3 3 13 baz
4 4 14 bar
5 5 15 foo", header=TRUE)

也可以scan用来有效地输入长序列的纯数字或纯字符向量条目。面对： 67 75 44 25 99 37 6 96 77 21 31 41 5 52 13 46 14 70 100 18 ，可以简单地输入：zz <- scan()并点击 [enter]。然后粘贴选定的数字并再次按 [enter]，也许第二次会导致双回车，控制台应响应“读取 20 个项目”。

> zz <- scan()
1: 67  75  44  25  99  37   6  96  77  21  31  41   5  52  13  46  14  70 100  18
21: 
Read 20 items

“角色”任务。粘贴到控制台并编辑掉多余的换行符并添加引号后，然后点击 [enter]：

> countries <- scan(what="character")
1:     'republic of congo'
2:     'republic of the congo'
3:     'congo, republic of the'
4:     'congo, republic'
5: 'democratic republic of the congo'
6: 'congo, democratic republic of the'
7: 'dem rep of the congo'
8: 
Read 7 items

score 13 · Accepted Answer

You can also ask the questioner to use the dput function which dumps any data structure in a way that can be just copy-pasted into R. e.g.

> zz
  a  b   c
1 1 11 foo
2 2 12 bar
3 3 13 baz
4 4 14 bar
5 5 15 foo

> dput(zz)
structure(list(a = 1:5, b = 11:15, c = structure(c(3L, 1L, 2L, 
1L, 3L), .Label = c("bar", "baz", "foo"), class = "factor")), .Names = c("a", 
"b", "c"), class = "data.frame", row.names = c(NA, -5L))

> xx <- structure(list(a = 1:5, b = 11:15, c = structure(c(3L, 1L, 2L, 
+ 1L, 3L), .Label = c("bar", "baz", "foo"), class = "factor")), .Names = c("a", 
+ "b", "c"), class = "data.frame", row.names = c(NA, -5L))
> xx
  a  b   c
1 1 11 foo
2 2 12 bar
3 3 13 baz
4 4 14 bar
5 5 15 foo

score 4 · Accepted Answer

只想添加这个，因为我现在经常使用它，我认为它非常有用。有一个包溢出（下面的安装说明）具有读取复制数据帧的功能。假设我从一个 SO 帖子开始，其中包含如下所示的数据，但没有dput输出。

  Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1          5.1         3.5          1.4         0.2  setosa
2          4.9         3.0          1.4         0.2  setosa
3          4.7         3.2          1.3         0.2  setosa
4          4.6         3.1          1.5         0.2  setosa
5          5.0         3.6          1.4         0.2  setosa
6          5.4         3.9          1.7         0.4  setosa

现在，如果我直接复制该数据，然后运行以下

library(overflow)
soread()
# data.frame “mydf” created in your workspace
#   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
# 1          5.1         3.5          1.4         0.2  setosa
# 2          4.9         3.0          1.4         0.2  setosa
# 3          4.7         3.2          1.3         0.2  setosa
# 4          4.6         3.1          1.5         0.2  setosa
# 5          5.0         3.6          1.4         0.2  setosa
# 6          5.4         3.9          1.7         0.4  setosa

我现在有一个mydf与我在全局环境中复制的名称相同的数据框，因此我不必等待 OP 发布dput他们的数据框。我可以使用参数更改数据框的名称，该out参数（显然）默认为mydf. 还有一些其他有用的功能可用于处理包中的 SO 帖子（例如sopkgs()，它会临时安装一个包，以便您可以帮助解决有关您以前未安装的包的问题）。

如果你留library(overflow)在你的.Rprofile, 那么soread()从 SO 帖子中导入数据会非常快速。

overflow可从 GitHub 获得，并且可以安装

library(devtools)
install_github("overflow", "sebastian-c")

r - 将堆栈溢出查询中的数据导入 R 的代码

4 回答 4

Related

Reference