有时我习惯了特定 R 包的设计,并想在 CRAN 中搜索该作者的所有包(例如,让我们使用 Hadley Wickham)。我怎样才能进行这样的搜索(我想使用 R,但这不一定是搜索模式)?
4 回答
Not exactly by author but perhaps access by maintainer would also be useful?
http://cran.r-project.org/web/checks/check_summary_by_maintainer.html#summary_by_maintainer
EDIT by Tyler Rinker
DWin's suggestion can be brought to fruition with these lines of code:
search.lib <- function(term, column = 1){
require(XML)
URL <- "http://cran.r-project.org/web/checks/check_summary_by_maintainer.html#summary_by_maintainer"
dat <-readHTMLTable(doc=URL, which=1, header=T, as.is=FALSE)
names(dat) <- trimws(names(dat))
dat$Maintainer[dat$Maintainer == ""] <- NA
dat$Maintainer = zoo::na.locf(dat$Maintainer)
if (is.numeric(column)) {
dat[agrep(term, dat[, column]), 1:3]
} else {
dat[agrep(term, dat[, agrep(column, colnames(dat))]), 1:3]
}
}
search.lib("hadley")
search.lib("bolker")
search.lib("brewer", 2)
Crantastic 可以按作者搜索。你可以用 crantastic 做更多的事情,但你正在寻找的功能已经在那里提供了。
## restrict to first 100 packages (by alphabetical order)
pkgs <- unname(available.packages()[, 1])[1:100]
desc_urls <- paste(options("repos")$repos,"/web/packages/", pkgs,
"/DESCRIPTION", sep = "")
desc <- lapply(desc_urls, function(x) read.dcf(url(x)))
authors <- sapply(desc, function(x) x[, "Author"])
因为我是个自恋者(而 Hadley Wickham 在前 100 个中没有包裹 [这在 2012 年是真的,但现在不可能是真的,在 2018 年! ]):
pkgs[grep("Bolker",authors)]
# [1] "ape"
这个解决方案的主要问题是,真正做到这一点(而不仅仅是前 100 个包裹)意味着要为包裹信息打 CRAN 3000 多次......
编辑:更好的解决方案,基于 Jeroen Oom 在同一个地方的解决方案:
recent.packages.rds <- function(){
mytemp <- tempfile()
download.file(paste0(options("repos")$repos,"/web/packages/packages.rds"),
mytemp)
mydata <- as.data.frame(readRDS(mytemp), row.names=NA)
mydata$Published <- as.Date(mydata[["Published"]])
mydata
}
mydata <- recent.packages.rds()
unname(as.character(mydata$Package[grep("Wickham",mydata$Author)]))
# [1] "classifly" "clusterfly" "devtools" "evaluate" "fda"
# [6] "geozoo" "ggmap" "ggplot2" "helpr" "hints"
# [11] "HistData" "hof" "itertools" "lubridate" "meifly"
# [16] "memoise" "munsell" "mutatr" "normwhn.test" "plotrix"
# [21] "plumbr" "plyr" "productplots" "profr" "Rd2roxygen"
# [26] "reshape" "reshape2" "rggobi" "roxygen" "roxygen2"
# [31] "scales" "sinartra" "stringr" "testthat" "tourr"
# [36] "tourrGui"
Bolker 的上述解决方案非常快并且仍然有效,但自 2018 年以来,有一个名为pkgsearch的包可以输出更完整的信息。下面是一个demo,延续了无耻自吹自擂的趋势:
r$> pkgsearch::advanced_search(Author = "Waldir", size = 100)
- "advanced search" --------------------------------------------------------------------- 11 packages in 0.001 seconds -
# package version by @ title
1 100 matlab2r 1.0.0 Waldir Leoncio 1M Translation Layer from MATLAB to R
2 100 simExam 1.0.0 Waldir Leoncio 3y Generate Simulated Data for IRT-Enabled Exams
3 83 citation 0.6.2 Jan Philipp Dietrich 1M Software Citation Tools
4 83 LOGAN 1.0.0 Denise Reis Costa 3y Log File Analysis in International Large-Scale Assessments
5 82 TruncExpFam 1.0.0 Waldir Leoncio 7d Truncated Exponential Family
6 61 contingencytables 1.0.0 Waldir Leoncio 1M Statistical Analysis of Contingency Tables
7 60 DIscBIO 1.2.0 Waldir Leoncio 10M A User-Friendly Pipeline for Biomarker Discovery in Single-Cell Transcriptomics
8 51 BayesSUR 2.0.1 Zhi Zhao 3M Bayesian Seemingly Unrelated Regression
9 44 lsasim 2.1.2 Waldir Leoncio 4M Functions to Facilitate the Simulation of Large Scale Assessment Data
10 39 BayesMallows 1.1.0 Oystein Sorensen 3M Bayesian Preference Learning with the Mallows Rank Model
11 11 xaringan 0.22 Yihui Xie 8M Presentation Ninja
请注意,我必须size
从默认值 10 增加,否则我不会得到所有的包。
为了与上述答案的输出进行比较:
r$> unname(as.character(mydata$Package[grep("Waldir",mydata$Author)]))
[1] "BayesMallows" "BayesSUR" "citation" "contingencytables" "DIscBIO" "LOGAN" "lsasim" "matlab2r" "simExam"
[10] "TruncExpFam" "xaringan"