r - 如何从r中的文本文件中提取文本的特定部分？

Question

我有很多包含下面给出的文本的文本文件。

\\论文：hep-th/9201003

发件人：DIJKGRAAF%IASNS.BITNET@pucc.PRINCETON.EDU

日期：美国东部时间 92 年 1 月 2 日星期四 14:06 (54kb)

标题：交集理论、可积层次和拓扑场论

作者：罗伯特·迪克格拉夫

评论：73 页，大部分数字未收录。1991 年 7 月 16 日至 27 日在 Cargese 暑期学校就“量子场论中的新对称原理”进行的讲座。

\\ 在这些讲义中，我们回顾了关于黎曼曲面模空间的交集理论、KdV 类型的可积层次、矩阵模型和拓扑量子场论之间的各种关系。我们特别解释了为什么 Kontsevich 所考虑的矩阵积分类型自然地表现为与最小模型相关的 tau 函数。我们的出发点是拓扑 (p,1) 模型的字符串方程的极其简单的形式，其中所谓的 Baker-Akhiezer 函数由（广义）Airy 函数给出。\\

我有 10 个文件夹，范围从 1992 年到 2003 年。每个文件夹都包含数千个文件。每个文件都有上面给出的结构。我想提取每个文件的最后一部分并保存在新文件中。这部分是论文的摘要。每个文件都有不同的摘要。我已经为我的问题编写了以下代码，但无法获得目标。

for(j in 1992:1992)
{
    dir.create(paste("C:\\Users\\Abdul Samad Alvi\\Desktop\\mydata\\",j, sep = ""))
    setwd(paste("C:\\Users\\Abdul Samad Alvi\\Desktop\\dataset\\",j, sep = ""))
    listoffile=list.files()
    for(i in 1:length(listoffile))
    {
        setwd(paste("C:\\Users\\Abdul Samad Alvi\\Desktop\\dataset\\",j, sep = ""))
        filetext=readLines(listoffile[i])
        newtext=unlist(strsplit(filetext,'\\\\'))[3]
        setwd(paste("C:\\Users\\Abdul Samad Alvi\\Desktop\\mydata\\",j, sep = ""))
        write.table(newtext,file = listoffile[i],sep = "")

    }
}

score 0 · Accepted Answer

如果您的文本中的模式始终是一个空行，后跟\\，那么您可以像这样提取文本（假设your_text是单个字符串）：

library(stringr)
str_extract(string = your_text, pattern = "(?<=\n\\\\)(.*)(?=\\\\)")

这应该可以解决您正在努力解决的最大问题。

对评论的补充：为了获得一个大字符串，而不是字符串向量，您可以使用paste0()collapse 参数：

filetext <- readLines("001.txt")
filetext <- paste0(filetext, collapse = " ")

之后，您可以应用答案开头描述的一般情况：

newtext <- str_extract(string = filetext, pattern = "(?<=\\s{2}\\\\\\\\)(.*)(?=\\\\\\\\)")

score 0 · Accepted Answer

strsplit应该有帮助！

text <- "\\ Paper: hep-th/9201003 From: DIJKGRAAF%IASSNS.BITNET@pucc.PRINCETON.EDU Date: Thu, 2 Jan 92 14:06 EST (54kb) Title: Intersection Theory, Integrable Hierarchies and Topological Field Theory Authors: Robbert Dijkgraaf Comments: 73 pages, most figures are not included. Lectures given at the Cargese Summer School on `New Symmetry Principles in Quantum Field Theory,'
July 16-27, 1991. \\ In these lecture notes we review the various relations between intersection theory on the moduli space of Riemann surfaces, integrable hierarchies of KdV type, matrix models, and topological quantum field theories. We explain in particular why matrix integrals of the type considered by Kontsevich naturally appear as tau-functions associated to minimal models. Our starting point is the extremely simple form of the string equation for the topological (p,1) models, where the so-called Baker-Akhiezer function is given by a (generalized) Airy function. \\"


unlist(strsplit(text,'\\\\'))[3]

广义长度：

tail(unlist(strsplit(text,'\\\\')), 1)

r - 如何从r中的文本文件中提取文本的特定部分？

2 回答 2

Related

Reference