r - correct parameters to download file using Amazon s3 API GET requests

Question

I would like to be able to download a .csv file from my Amazon S3 bucket using R.

I have started using the API that is documented here http://docs.amazonwebservices.com/AmazonS3/latest/API/RESTObjectGET.html

I am using the package httr to create the GET request, I just need to work out what the correct parameters are to be able to download the relevant file.

I have set the response-content-type to text/csv as I know its a .csv file I hope to download...but the response I get is as follows:

Response [https://s3-zone.amazonaws.com/bucket.name/file.name.csv?response-content-type=text%2Fcsv]
  Status: 200
  Content-type: text/csv
Date and Time,Open,High,Low,Close,Volume
2007/01/01 22:51:00,5683.00,5683.00,5673.00,5673.00,64
2007/01/01 22:52:00,5675.00,5676.00,5674.00,5674.00,17
2007/01/01 22:53:00,5674.00,5674.00,5673.00,5674.00,42
2007/01/01 22:54:00,5675.00,5676.00,5674.00,5676.00,36
2007/01/01 22:55:00,5675.00,5676.00,5675.00,5676.00,18
2007/01/01 22:56:00,5676.00,5677.00,5674.00,5677.00,64
2007/01/01 22:57:00,5678.00,5678.00,5677.00,5677.00,45
2007/01/01 22:58:00,5679.00,5680.00,5678.00,5680.00,30
 .../01/01 22:59:00,5679.00,5679.00,5677.00,5678.00,19

And no file is downloaded and the data seems to be in the response...I can extract the string of characters that is created in the response, which represents the data, and I guess with some effort it can be converted into a data.frame as originally desired, but is there a better way of downloading the data...straight from the GET command, and then using read.csv to read the data? I think that it is a parameter issues...just not sure what parameters need to be set for the file to be downloaded.

If people suggest the conversion of the string...This is the structure of the string I have...what commands would I need to do to convert it into a data.frame?

chr "Date and Time,Open,High,Low,Close,Volume\r\n2007/01/01 22:51:00,5683.00,5683.00,5673.00,5673.00,64\r\n2007/01/01 22:52:00,5675."| __truncated__

Thanks

HLM

score 3 · Accepted Answer

你的第二个问题的答案：

> chr <- "Date and Time,Open,High,Low,Close,Volume\r\n2007/01/01 22:51:00,5683.00,5683.00,5673.00,5673.00,64\r\n"
> read.csv(text=chr)
        Date.and.Time Open High  Low Close Volume
1 2007/01/01 22:51:00 5683 5683 5673  5673     64

如果你想提高 read.csv 的速度，试试这个：

chr <- "Date and Time,Open,High,Low,Close,Volume\r\n2007/01/01 22:51:00,5683.00,5683.00,5673.00,5673.00,64\r\n"
 read.csv(text=chr, colClasses=c("POSIXct", rep("numeric", 5) ) )

假设 URL 设置正确（我们还没有什么可以测试的）我想知道您是否想查看GET( ...)$content

也许：

infile <- read.csv(text=GET(...)$content, colClasses=c("POSIXct", rep("numeric", 5) ) )

编辑：

这是不正确的，因为数据以“原始”格式出现。在将其编码为文本之前，需要从原始转换。我对 Nabble 进行了快速搜索（毕竟它肯定对某些东西有好处）以找到驻留在 Web 上的 csv 文件。这就是最终奏效的方法：

read.csv(text=rawToChar( 
                 GET(
                  "http://nseindia.com/content/equities/scripvol/datafiles/16-11-2012-TO-16-11-2012ACCEQN.csv"
                   )[["content"]] ) )
  Symbol Series        Date Prev.Close Open.Price High.Price Low.Price Last.Price Close.Price
1    ACC     EQ 16-Nov-2012     1404.4    1410.95    1410.95   1369.45    1374.95      1378.1
  Average.Price Total.Traded.Quantity Turnover.in.Lacs Deliverable.Qty X..Dly.Qt.to.Traded.Qty
1       1393.62                132921          1852.41           56899                   42.81

score 2 · Accepted Answer

这是一种方法：

library(taRifx) # for stack.list
test <- "Date and Time,Open,High,Low,Close,Volume\r\n2007/01/01 22:51:00,5683.00,5683.00,5673.00,5673.00,64\r\n2007/01/01 22:51:00,5683.00,5683.00,5673.00,5673.00,64\r\n"
stack( sapply( strsplit( test, "\\n" )[[1]], strsplit, split="," ) )

    [,1]                  [,2]      [,3]      [,4]      [,5]      [,6]      
ret "Date and Time"       "Open"    "High"    "Low"     "Close"   "Volume\r"
new "2007/01/01 22:51:00" "5683.00" "5683.00" "5673.00" "5673.00" "64\r"    
new "2007/01/01 22:51:00" "5683.00" "5683.00" "5673.00" "5673.00" "64\r"

现在转换为 data.frame：

testdat <- stack( sapply( strsplit( test, "\\n" )[[1]], strsplit, split="," ) )
rownames(testdat) <- seq(nrow(testdat)) # Because duplicate rownames aren't allowed in data.frames
colnames(testdat) <- testdat[1,]
testdat <- testdat[-1,]
as.data.frame(testdat)
        Date and Time    Open    High     Low   Close Volume\r
2 2007/01/01 22:51:00 5683.00 5683.00 5673.00 5673.00     64\r
3 2007/01/01 22:51:00 5683.00 5683.00 5673.00 5673.00     64\r

r - correct parameters to download file using Amazon s3 API GET requests

2 回答 2

编辑：

Related

Reference