0

我有一个 100 万行的文件,一旦读取readLines可以压缩为:

prob <- readLines("offendingFile.txt")
dput(prob)

c("000005928484|Name Nmee Leonel                        |YUMBO               |El Placer de El Cerrito   ALG 76248     |114|80041725|20140424|4132638|20140425|P|PED.ELE/100098-114       |Corregimiento de amaime", 
"", "          ||90300105       |V-1 MUIMERP NALBOC            |6.0000|30.820000|.0000|.00000000000000|6.0000|458114.67", 
"000005928484|Name Nmee Leonel                        |YUMBO               |El Placer de El Cerrito   ALG 76248     |114|80041725|20140424|4132638|20140425|P|PED.ELE/100098-114       |Corregimiento de amaime", 
"", "          ||90400105       |V-2 MUIMERP NALBOC            |3.0000|29.170000|.0000|.00000000000000|3.0000|169750.62", 
"000005928484|Name Nmee Leonel                        |YUMBO               |El Placer de El Cerrito   ALG 76248     |114|80041725|20140424|4132638|20140425|P|PED.ELE/100098-114       |Corregimiento de amaime", 
"", "          ||90700101       |V-OCIMONOCE LOREMIPSUM        |12.0000|5.980000|.0000|.00000000000000|12.0000|107118.18", 
"000815004980|Odrareg Oinotna Namzug S. En C.S.       |YUMBO               |Rozo (Palmira)            ALG 76520     |114|80041726|20140424|4132636|20140425|P|PED.ELE/100099-114       |Corregimiento de palmira"
)

我想删除文件中出现的 LFLF 序列和空格(这将导致删除第 2、5 和 8行并将第3 行附加到 1;6 到 4 和 9 到 7(原始行编号))。所以我尝试了:

prob2 <- gsub("\n {2,}", "", prob) #  didn't do anything
gsub("[\r\n] {2,}", "", prob)
gsub("\r?\n {2,}|\r {2,}", "", prob)

最后两行是从这个 SO post借来的。

我应该如何进行?

预期输出:

dput(prob2)

c("000005928484|Name Nmee Leonel                        |YUMBO               |El Placer de El Cerrito   ALG 76248     |114|80041725|20140424|4132638|20140425|P|PED.ELE/100098-114       |Corregimiento de amaime        ||90300105       |V-1 MUIMERP NALBOC            |6.0000|30.820000|.0000|.00000000000000|6.0000|458114.67", 
"000005928484|Name Nmee Leonel                        |YUMBO               |El Placer de El Cerrito   ALG 76248     |114|80041725|20140424|4132638|20140425|P|PED.ELE/100098-114       |Corregimiento de amaime        ||90400105       |V-2 MUIMERP NALBOC            |3.0000|29.170000|.0000|.00000000000000|3.0000|169750.62", 
"000005928484|Name Nmee Leonel                        |YUMBO               |El Placer de El Cerrito   ALG 76248     |114|80041725|20140424|4132638|20140425|P|PED.ELE/100098-114       |Corregimiento de amaime        ||90700101       |V-OCIMONOCE LOREMIPSUM        |12.0000|5.980000|.0000|.00000000000000|12.0000|107118.18", 
"000815004980|Odrareg Oinotna Namzug S. En C.S.       |YUMBO               |Rozo (Palmira)            ALG 76520     |114|80041726|20140424|4132636|20140425|P|PED.ELE/100099-114       |Corregimiento de palmira"
)
4

0 回答 0