1

i have this huge csv file, it's 4GB, don't know how many rows but 320 columns.

since it can't be open in any program (except using 3rd party programs to split the file into multiple pieces) i'm trying to fins a way to extract the data i need. i only need about 10-15 columns from it.

i saw many solutions on the net (most in vbs) but i couldn't get any of them to work. i'd get errors and i don't know vbs to be able to troubleshoot them.

can anyone help please?

thank you

PS here's one example of the vbs code i found and tried using that i had no luck with.

the original error was "800a01f4 variable is undefined", on the net it was suggested to take out OPTION EXPLICIT. once i do that the next error is "800a01fa class not defined".

in both cases the line giving the error is "Set adoJetCommand = New ADODB.Command"

Option Explicit



Dim adoCSVConnection, adoCSVRecordSet, strPathToTextfile
Dim strCSVFile, adoJetConnection,adoJetCommand, strDBPath


Const adCmdText = &H0001

' Specify path to CSV file.
strPathToTextFile = "C:\Users\natalie.rynda\Documents\Temp\RemailMatch\"

' Specify CSV file name.
strCSVFile = "NPIOld.csv"

' Specify Access database file.
strDBPath = "C:\Users\natalie.rynda\Documents\Temp\RemailMatch\NPIs.mdb"

' Open connection to the CSV file.
Set adoCSVConnection = CreateObject("ADODB.Connection")
Set adoCSVRecordSet = CreateObject("ADODB.Recordset")

' Open CSV file with header line.
adoCSVConnection.Open "Provider=Microsoft.Jet.OLEDB.4.0;" & _
    "Data Source=" & strPathtoTextFile & ";" & _
    "Extended Properties=""text;HDR=YES;FMT=Delimited"""

adoCSVRecordset.Open "SELECT * FROM " & strCSVFile, adoCSVConnection

' Open connection to MS Access database.
Set adoJetConnection = CreateObject("ADODB.Connection")
adoJetConnection.ConnectionString = "DRIVER=Microsoft Access Driver (*.mdb);" _
    & "FIL=MS Access;DriverId=25;DBQ=" & strDBPath & ";"
adoJetConnection.Open

' ADO command object to insert rows into Access database.
Set adoJetCommand = New ADODB.Command


Set adoJetCommand.ActiveConnection = adoJetConnection
adoJetCommand.CommandType = adCmdText

' Read the CSV file.
Do Until adoCSVRecordset.EOF
    ' Insert a row into the Access database.
    adoJetCommand.CommandText = "INSERT INTO NPIs " _
        & "(NPI, EntityTypeCode, ReplacementNPI, EIN, MAddress1, MAddress2, MCity, MState, MZIP, SAddress1, SAddress2, SCity, SState, SZIP, ProviderEnumerationDate, LastUpdateDate, NPIDeactivationReasonCode, NPIDeactivationDate, NPIReactivationDate) " _
        & "VALUES (" _
            & "'" & adoCSVRecordset.Fields("NPI").Value & "', " _
            & "'" & adoCSVRecordset.Fields("Entity Type Code").Value & "', " _
            & "'" & adoCSVRecordset.Fields("Replacement NPI").Value & "', " _
            & "'" & adoCSVRecordset.Fields("Employer Identification Number (EIN)").Value & "', " _
            & "'" & adoCSVRecordset.Fields("Provider First Line Business Mailing Address").Value & "', " _
            & "'" & adoCSVRecordset.Fields("Provider Second Line Business Mailing Address").Value & "', " _
            & "'" & adoCSVRecordset.Fields("Provider Business Mailing Address City Name").Value & "', " _
            & "'" & adoCSVRecordset.Fields("Provider Business Mailing Address State Name").Value & "', " _
            & "'" & adoCSVRecordset.Fields("Provider Business Mailing Address Postal Code").Value & "', " _
            & "'" & adoCSVRecordset.Fields("Provider First Line Business Practice Location Address").Value & "', " _
            & "'" & adoCSVRecordset.Fields("Provider Second Line Business Practice Location Address").Value & "', " _
            & "'" & adoCSVRecordset.Fields("Provider Business Practice Location Address City Name").Value & "', " _
            & "'" & adoCSVRecordset.Fields("Provider Business Practice Location Address State Name").Value & "', " _
            & "'" & adoCSVRecordset.Fields("Provider Business Practice Location Address Postal Code").Value & "', " _
            & "'" & adoCSVRecordset.Fields("Provider Enumeration Date").Value & "', " _
            & "'" & adoCSVRecordset.Fields("Last Update Date").Value & "', " _
            & "'" & adoCSVRecordset.Fields("NPI Deactivation Reason Code").Value & "', " _
            & "'" & adoCSVRecordset.Fields("NPI Deactivation Date").Value & "', " _
            & "'" & adoCSVRecordset.Fields("NPI Reactivation Date").Value & "')"
    adoJetCommand.Execute
    adoCSVRecordset.MoveNext
Loop



' Clean up.
adoCSVRecordset.Close
adoCSVConnection.Close
adoJetConnection.Close
4

2 回答 2

1

如果您的 CSV 文件简单明了,在意想不到的地方没有换行符或逗号,那么标准的 *nix 工具awk会很有用。它可以让您轻松地将您正在寻找的 15 列提取到一个新的 CSV 文件中。这篇博文解释了如何在 CSV 文件上使用它。

假设您想从中提取第 1、3 和 7 列file.csv,那么您可以使用命令执行此操作

awk -F, '{print $1","$3","$7;}' file.csv

您的 Windows 机器可能尚未awk安装。有几个选项:

  • 你可以在 MSYS中找到它,它基本上在 Windows 中为你提供了一个类 Unix 的 shell 环境。对我来说,这似乎是最简单的方法。

  • 另一种选择似乎是Gawk for Windows,但我没有这方面的经验,所以不能保证。

  • 您可以尝试使用 Windows PowerShell 获得相同的结果,如本博文中所述 - 如果您有可用的。同样,我没有尝试过的经验。

  • 最后但同样重要的是,您可以切换到 Linux,例如在虚拟机中。awk通常在 *nix 环境中可用。

如果您正在解析更尴尬的 CSV 文件,请查看使用 gawk 解析 csv 文件以获取大量建议。

于 2012-07-27T01:05:21.133 回答
0

在 VBE 编辑器中

在此处输入图像描述

然后在列表中找到 Microsoft Activex 数据对象库。不确定哪个版本可能合适,但可能是 6

在此处输入图像描述

您的代码似乎不知道 ADODB.COMMAND 是什么,这应该可以解决这个问题。我只知道在设置引用时我能够复制您的代码,并且能够成功地单步执行它。希望这有助于解释

于 2012-07-27T19:56:07.967 回答