4

例如,有“2,881,423”,如何从中删除“,”。我有数百万数据需要完成。是否可以进行批量操作?因此,我可以将任何工具用于 PC 或 Mac。

"Position","Value",
"1","1",
"2","1",
"3","1",
"4","2",
"5","2",

...

"2,881,423","19",
"2,881,424","22",
"2,881,425","23",
"2,881,426","23",
"2,881,427","25",
"2,881,428","25",
"2,881,429","25",

...

像上面一样是来自 csv 的一些片段。

4

3 回答 3

2

以下代码将完成这项工作 - 它将遍历文件夹中具有给定掩码的所有文件:

Sub RemoveCommas()

Dim RegX_Comma As Object
'
Dim FileStream As Object
Dim FileContent As String
Dim SourceFolder As String
Dim FileName As String
'
Set RegX_Comma = CreateObject("VBScript.RegExp")
RegX_Comma.Pattern = "(?<=\d),(?=\d)" 'Comma between any digits
RegX_Comma.IgnoreCase = True
RegX_Comma.Global = True

Set FileStream = CreateObject("ADODB.Stream")
SourceFolder = "D:\DOCUMENTS\" 'Must be specified with trailing "\"

FileName = Dir(InputFolder & "*.txt") 'Specify ANY mask using wildcards, e.g. "*.csv*
Do While FileName <> ""

    FileStream.Open
    FileStream.Charset = "ASCII" 'Change encoding as required
    FileStream.LoadFromFile (SourceFolder & FileName)
    FileContent = RegX_Comma.Replace(FileStream.ReadText, "")
    FileStream.Position = 0
    FileStream.WriteText FileContent
    FileStream.SetEOS
    FileStream.SaveToFile SourceFolder & FileName, 2 'Will overwrite the existing file
    FileStream.Close

FileName = Dir
Loop

End Sub

根据内联注释对代码进行必要的修改。

祝你好运!)

于 2013-01-20T08:01:46.013 回答
2

在 Python 中:

import csv
with open("myfile.csv", "rb") as infile, open("output.csv", "wb") as outfile:
    reader = csv.reader(infile)
    writer = csv.writer(outfile)
    for row in reader:
        writer.writerow(item.replace(",", "") for item in row)
于 2013-01-20T08:25:16.477 回答
0

由于您的目标是使用 R 中的数据,因此您可以在将数据读入 R 后进行替换:

df <- Path/To/File.csv
df$varname <- as.numeric(gsub(",", "", df$varname))

df您的数据框在哪里,并且varname是变量的名称。这不会检查逗号是否在两位数之间,因此您需要确保仅将您希望为数字的变量传递给此变量,而不是逗号实际上是数据一部分的任何字符串列。

这是一个类似的问题,询问如何从 R 中解决问题:

当某些数字包含逗号作为千位分隔符时如何读取数据?

于 2016-05-23T00:04:29.473 回答