1

我希望将以下 PDF 读入 R 中的一个整洁的数据框中: PDF Table。该表甚至跨越 70 多页。

我擅长阅读每个单元格有一行的表格,但我不确定如何将这些知识扩展到行有不同数量行的情况

任何帮助将非常感激!

4

1 回答 1

3

我建议你使用tabulizer. 最好从 pdf 文件中提取表格。这是您共享的文件的代码:

library(tabulizer)
lst <- extract_tables(file = '8-31-2020 Paragraph IV Update_0.pdf') 
#Format
renames <- function(x)
{
  colnames(x) <- x[1,]
  x <- x[2:dim(x)[1],,drop=F]
  return(as.data.frame(x))
}
#Apply
lst21 <- lapply(lst,renames)
#Bind all
df <- do.call(rbind,lst21)

输出(一些行):

head(df)

                                       DRUG NAME   DOSAGE FORM              STRENGTH
1                               Abacavir Sulfate       Tablets                300 mg
2                                       Abacavir Oral Solution              20 mg/mL
3 Abacavir Sulfate, Dolutegravir\rand Lamivudine       Tablets  600 mg/50 mg/300\rmg
4               Abacavir Sulfate and\rLamivudine       Tablets         600 mg/300 mg
5   Abacavir Sulfate, Lamivudine\rand Zidovudine       Tablets 300 mg/150 mg/300\rmg
6                            Abiraterone Acetate       Tablets                125 mg
          RLD/NDA DATE OF\rSUBMISSION NUMBER OF\rANDAs\rSUBMITTED 180-DAY\rSTATUS
1   Ziagen\r20977           1/28/2009                           1        Eligible
2   Ziagen\r20978          12/27/2012                           1        Eligible
3 Triumeq\r205551           8/14/2017                           5                
4  Epzicom\r21652           9/27/2007                           1        Eligible
5 Trizivir\r21205           3/22/2011                           1        Eligible
6   Yonsa\r210308           7/23/2018                           1                
  180-DAY\rDECISION\rPOSTING\rDATE DATE OF\rFIRST\rAPPLICANT\rAPPROVAL
1                        2/11/2020                           6/18/2012
2                        2/11/2020                           9/26/2016
3                                                                     
4                        2/11/2020                           9/29/2016
5                        2/11/2020                           12/5/2013
6                                                                     
  DATE OF FIRST\rCOMMERCIAL\rMARKETING BY\rFTF EXPIRATION\rDATE OF LAST\rQUALIFYING\rPATENT
1                                    6/19/2012                                    5/14/2018
2                                    9/15/2017                                    5/14/2018
3                                                                                 12/8/2029
4                                    9/29/2016                                    5/14/2018
5                                   12/17/2013                                    5/14/2018
6                                                                                 3/17/2034
于 2020-09-10T19:08:11.967 回答