我遇到了一个奇怪的子集问题。问题是我可以对一列进行子集化,但不能对另一列进行子集化。两列似乎都已被 readHTMLTable 以相同的方式解析。
要复制的代码
require(XML)
theurl <- "http://en.wikipedia.org/wiki/List_of_stock_exchanges"
html <- htmlParse(theurl)
seData <- readHTMLTable(html)[[2]]
names(seData) = c("Rank","EX","Economy","HQ","MarketCap","TradeValue")
seData = transform(seData,MarketCap = as.numeric(gsub(",","",MarketCap)))
seData = transform(seData,TradeValue = as.numeric(gsub(",","",TradeValue)))
我想为印度证券交易所子集,所以我使用了:
> subset(seData,seData$Economy == "India")
[1] Rank EX Economy HQ MarketCap TradeValue
<0 rows> (or 0-length row.names)
> subset(seData,seData$Economy == " India")
[1] Rank EX Economy HQ MarketCap TradeValue
<0 rows> (or 0-length row.names)
尽管已验证有两行应满足条件,但我没有返回任何行,但我可以轻松地对另一列“EX”执行相同的操作:
> subset(seData,seData$EX == "JSE Limited")
Rank EX Economy HQ MarketCap TradeValue
17 17 JSE Limited SouthAfrica Johannesburg 903 287
我已经运行了其他功能,两列看起来完全一样..
> sapply(seData,class)
Rank EX Economy HQ MarketCap TradeValue
"factor" "factor" "factor" "factor" "numeric" "numeric"
> levels(seData$Economy)
[1] " Australia" " Brazil" " Canada"
[4] " China" " Germany" " Hong Kong"
[7] " India" " Japan" " Russia"
...
> levels(seData$EX)
[1] "Australian Securities Exchange" "BME Spanish Exchanges"
[3] "BM&F Bovespa" "Bombay Stock Exchange"
[5] "Deutsche Börse" "Hong Kong Stock Exchange"
[7] "JSE Limited" "Korea Exchange"
...
我错过了什么?我使用的子集命令有什么问题?:(
subset(seData,seData$Economy == " India")