18

有时我习惯了特定 R 包的设计,并想在 CRAN 中搜索该作者的所有包(例如,让我们使用 Hadley Wickham)。我怎样才能进行这样的搜索(我想使用 R,但这不一定是搜索模式)?

4

4 回答 4

14

Not exactly by author but perhaps access by maintainer would also be useful?

http://cran.r-project.org/web/checks/check_summary_by_maintainer.html#summary_by_maintainer

EDIT by Tyler Rinker

DWin's suggestion can be brought to fruition with these lines of code:

search.lib <- function(term, column = 1){
    require(XML)
    URL <- "http://cran.r-project.org/web/checks/check_summary_by_maintainer.html#summary_by_maintainer"
    dat <-readHTMLTable(doc=URL, which=1, header=T, as.is=FALSE)
    names(dat) <- trimws(names(dat))
    dat$Maintainer[dat$Maintainer == ""] <- NA
    dat$Maintainer = zoo::na.locf(dat$Maintainer)
    if (is.numeric(column)) {
        dat[agrep(term, dat[, column]), 1:3]
    } else {
        dat[agrep(term, dat[, agrep(column, colnames(dat))]), 1:3]
    }
}

search.lib("hadley")
search.lib("bolker")
search.lib("brewer", 2)
于 2012-04-10T03:05:28.443 回答
14

Crantastic 可以按作者搜索。你可以用 crantastic 做更多的事情,但你正在寻找的功能已经在那里提供了。

于 2012-04-10T01:22:36.050 回答
11

改编自available.packages 按发布日期

## restrict to first 100 packages (by alphabetical order)
pkgs <- unname(available.packages()[, 1])[1:100]
desc_urls <- paste(options("repos")$repos,"/web/packages/", pkgs, 
    "/DESCRIPTION", sep = "")
desc <- lapply(desc_urls, function(x) read.dcf(url(x)))
authors <- sapply(desc, function(x) x[, "Author"])

因为我是个自恋者(而 Hadley Wickham 在前 100 个中没有包裹 [这在 2012 年是真的,但现在不可能是真的,在 2018 年! ]):

pkgs[grep("Bolker",authors)]
# [1] "ape"

这个解决方案的主要问题是,真正做到这一点(而不仅仅是前 100 个包裹)意味着要为包裹信息打 CRAN 3000 多次......

编辑:更好的解决方案,基于 Jeroen Oom 在同一个地方的解决方案:

recent.packages.rds <- function(){
    mytemp <- tempfile()
    download.file(paste0(options("repos")$repos,"/web/packages/packages.rds"),
                  mytemp)
    mydata <- as.data.frame(readRDS(mytemp), row.names=NA)
    mydata$Published <- as.Date(mydata[["Published"]])
    mydata
}

mydata <- recent.packages.rds()
unname(as.character(mydata$Package[grep("Wickham",mydata$Author)]))
# [1] "classifly"    "clusterfly"   "devtools"     "evaluate"     "fda"         
# [6] "geozoo"       "ggmap"        "ggplot2"      "helpr"        "hints"       
# [11] "HistData"     "hof"          "itertools"    "lubridate"    "meifly"      
# [16] "memoise"      "munsell"      "mutatr"       "normwhn.test" "plotrix"     
# [21] "plumbr"       "plyr"         "productplots" "profr"        "Rd2roxygen"  
# [26] "reshape"      "reshape2"     "rggobi"       "roxygen"      "roxygen2"    
# [31] "scales"       "sinartra"     "stringr"      "testthat"     "tourr"       
# [36] "tourrGui"  
于 2012-04-10T01:57:19.290 回答
1

Bolker 的上述解决方案非常快并且仍然有效,但自 2018 年以来,有一个名为pkgsearch的包可以输出更完整的信息。下面是一个demo,延续了无耻自吹自擂的趋势:

r$> pkgsearch::advanced_search(Author = "Waldir", size = 100)                                                                                                                               
- "advanced search" --------------------------------------------------------------------- 11 packages in 0.001 seconds -
  #     package           version by                     @ title                                                                          
  1 100 matlab2r          1.0.0   Waldir Leoncio        1M Translation Layer from MATLAB to R                                             
  2 100 simExam           1.0.0   Waldir Leoncio        3y Generate Simulated Data for IRT-Enabled Exams                                  
  3  83 citation          0.6.2   Jan Philipp Dietrich  1M Software Citation Tools                                                        
  4  83 LOGAN             1.0.0   Denise Reis Costa     3y Log File Analysis in International Large-Scale Assessments                     
  5  82 TruncExpFam       1.0.0   Waldir Leoncio        7d Truncated Exponential Family                                                   
  6  61 contingencytables 1.0.0   Waldir Leoncio        1M Statistical Analysis of Contingency Tables                                     
  7  60 DIscBIO           1.2.0   Waldir Leoncio       10M A User-Friendly Pipeline for Biomarker Discovery in Single-Cell Transcriptomics
  8  51 BayesSUR          2.0.1   Zhi Zhao              3M Bayesian Seemingly Unrelated Regression                                        
  9  44 lsasim            2.1.2   Waldir Leoncio        4M Functions to Facilitate the Simulation of Large Scale Assessment Data          
 10  39 BayesMallows      1.1.0   Oystein Sorensen      3M Bayesian Preference Learning with the Mallows Rank Model                       
 11  11 xaringan          0.22    Yihui Xie             8M Presentation Ninja   

请注意,我必须size从默认值 10 增加,否则我不会得到所有的包。

为了与上述答案的输出进行比较:

r$> unname(as.character(mydata$Package[grep("Waldir",mydata$Author)]))                        
 [1] "BayesMallows"      "BayesSUR"          "citation"          "contingencytables" "DIscBIO"           "LOGAN"             "lsasim"            "matlab2r"          "simExam"          
[10] "TruncExpFam"       "xaringan"
于 2022-01-28T08:08:50.597 回答