1

我有一个完整的股票价格数据表。每行都有唯一的股票代码和日期组合。我通过获取包含每天每个股票价格数据的 CSV 文件来加载新数据。我知道 CSV 文件中有重复项。我只想添加尚未在我的数据表中的数据。最快的方法是什么?

我应该尝试添加每一行并捕获每个异常吗?或者,我是否应该通过读取我的数据表来比较每一行与我的数据表以查看该行是否已经存在?或者,还有其他选择吗?

附加信息

这就是我一直在做的事情。对于 CSV 文件中的每一行,我都会读取我的数据表以查看它是否已经存在。

Dim strURL As String
    Dim strBuffer As String
    strURL = "http://ichart.yahoo.com/table.csv?s=" & tickerValue
    strBuffer = RequestWebData(strURL)
    Dim sReader As New StringReader(strBuffer)
    Dim List As New List(Of String)
    Do While sReader.Peek >= 0
        List.Add(sReader.ReadLine)
    Loop
    List.RemoveAt(0)
    Dim lines As String() = List.ToArray
    sReader.Close()
    For Each line In lines
        Dim checkDate = line.Split(",")(0).Trim()
        Dim dr As OleDbDataReader
        Dim cmd2 As New OleDb.OleDbCommand("SELECT * FROM " & tblName & " WHERE Ticker = ? AND [Date] = ?", con)
        cmd2.Parameters.AddWithValue("?", tickerValue)
        cmd2.Parameters.AddWithValue("?", checkDate)
        dr = cmd2.ExecuteReader
        If dr.Read() = 0 Then
            Dim cmd3 As OleDbCommand = New OleDbCommand("INSERT INTO " & tblName & " (Ticker, [Date], [Open], High, Low, [Close], Volume, Adj_Close) VALUES (?, ?, ?, ?, ?, ?, ?, ?)", con)
            cmd3.Parameters.Add("@Ticker", OleDbType.VarChar).Value = tickerValue
            cmd3.Parameters.Add("@[Date]", OleDbType.VarChar).Value = checkDate
            cmd3.Parameters.Add("@[Open]", OleDbType.VarChar).Value = line.Split(",")(1).Trim
            cmd3.Parameters.Add("@High", OleDbType.VarChar).Value = line.Split(",")(2).Trim
            cmd3.Parameters.Add("@Low", OleDbType.VarChar).Value = line.Split(",")(3).Trim
            cmd3.Parameters.Add("@[Close]", OleDbType.VarChar).Value = line.Split(",")(4).Trim
            cmd3.Parameters.Add("@Volume", OleDbType.VarChar).Value = line.Split(",")(5).Trim
            cmd3.Parameters.Add("@Adj_Close", OleDbType.VarChar).Value = line.Split(",")(6).Trim
            cmd3.ExecuteNonQuery()
        Else
        End If

这就是我已经切换到的,它给出了这个异常:The changes you requested to the table were not successful because they would create duplicate values in the index, primary key, or relationship. Change the data in the field or fields that contain duplicate data, remove the index, or redefine the index to permit duplicate entries and try again. 我可以每次都捕获这个异常并忽略它,直到我遇到新的一行。

Dim strURL As String = "http://ichart.yahoo.com/table.csv?s=" & tickerValue
    Debug.WriteLine(strURL)
    Dim strBuffer As String = RequestWebData(strURL)
    Using streamReader = New StringReader(strBuffer)
        Using reader = New CsvReader(streamReader)
            reader.ReadHeaderRecord()
            While reader.HasMoreRecords
                Dim dataRecord As DataRecord = reader.ReadDataRecord()
                Dim cmd3 As OleDbCommand = New OleDbCommand("INSERT INTO " & tblName & " (Ticker, [Date], [Open], High, Low, [Close], Volume, Adj_Close) VALUES (?, ?, ?, ?, ?, ?, ?, ?)", con)
                cmd3.Parameters.Add("@Ticker", OleDbType.VarChar).Value = tickerValue
                cmd3.Parameters.Add("@[Date]", OleDbType.VarChar).Value = dataRecord.Item("Date")
                cmd3.Parameters.Add("@[Open]", OleDbType.VarChar).Value = dataRecord.Item("Open")
                cmd3.Parameters.Add("@High", OleDbType.VarChar).Value = dataRecord.Item("High")
                cmd3.Parameters.Add("@Low", OleDbType.VarChar).Value = dataRecord.Item("Low")
                cmd3.Parameters.Add("@[Close]", OleDbType.VarChar).Value = dataRecord.Item("Close")
                cmd3.Parameters.Add("@Volume", OleDbType.VarChar).Value = dataRecord.Item("Volume")
                cmd3.Parameters.Add("@Adj_Close", OleDbType.VarChar).Value = dataRecord.Item("Adj Close")
                cmd3.ExecuteNonQuery()
            End While
        End Using
    End Using

我只想使用最有效的方法。

更新

根据下面的答案,这是我到目前为止的代码:

 Dim strURL As String = "http://ichart.yahoo.com/table.csv?s=" & tickerValue
    Dim strBuffer As String = RequestWebData(strURL)
    Using streamReader = New StringReader(strBuffer)
        Using reader = New CsvReader(streamReader)
            ' the CSV file has a header record, so we read that first
            reader.ReadHeaderRecord()

            While reader.HasMoreRecords
                Dim dataRecord As DataRecord = reader.ReadDataRecord()
                Dim cmd3 As OleDbCommand = New OleDbCommand("INSERT INTO " & tblName & "(Ticker, [Date], [Open], High, Low, [Close], Volume, Adj_Close) " & "SELECT ?, ?, ?, ?, ?, ?, ?, ? " & "FROM DUAL " & "WHERE NOT EXISTS (SELECT 1 FROM " & tblName & " WHERE Ticker = ? AND [Date] = ?)", con)
                cmd3.Parameters.Add("@Ticker", OleDbType.VarChar).Value = tickerValue
                cmd3.Parameters.Add("@[Date]", OleDbType.VarChar).Value = dataRecord.Item("Date")
                cmd3.Parameters.Add("@[Open]", OleDbType.VarChar).Value = dataRecord.Item("Open")
                cmd3.Parameters.Add("@High", OleDbType.VarChar).Value = dataRecord.Item("High")
                cmd3.Parameters.Add("@Low", OleDbType.VarChar).Value = dataRecord.Item("Low")
                cmd3.Parameters.Add("@[Close]", OleDbType.VarChar).Value = dataRecord.Item("Close")
                cmd3.Parameters.Add("@Volume", OleDbType.VarChar).Value = dataRecord.Item("Volume")
                cmd3.Parameters.Add("@Adj_Close", OleDbType.VarChar).Value = dataRecord.Item("Adj Close")
                cmd3.Parameters.Add("@Ticker", OleDbType.VarChar).Value = tickerValue
                cmd3.Parameters.Add("@[Date]", OleDbType.VarChar).Value = dataRecord.Item("Date")
                cmd3.ExecuteNonQuery()
            End While
        End Using
    End Using

它给了我这个错误Data type mismatch in criteria expression.

4

1 回答 1

1

大多数 DBMS 支持 INSERT 命令的(非标准)子句以忽略重复项,例如:

MySQL:插入忽略...

SQLite:插入或忽略进入...

这是非批处理模式下最快的方法,因为您不必在写入之前读取数据库。

您可以使用以下方法对标准 SQL 执行相同操作:

INSERT INTO ... 
SELECT <your values> 
WHERE NOT EXISTS ( <query for your values by id> );

或者(当您明确需要 FROM 子句时):

INSERT INTO ... 
SELECT <your values> 
FROM DUAL 
WHERE NOT EXISTS ( <query for your values by id> );

编辑

MS Access 没有内置的 DUAL 表(即,始终只包含一行的表),但 Access 需要 FROM 子句。所以你必须建立自己的 DUAL 表:

CREATE TABLE DUAL (DUMMY INTEGER);
INSERT INTO DUAL VALUES (1);

您只需一劳永逸地做到这一点。然后,在您的代码中,您会像这样插入

INSERT INTO MyTable (A,B,C,D)
SELECT 123, 456, 'Hello', 'World'
FROM DUAL
WHERE NOT EXISTS (SELECT 1 FROM MyTable WHERE A = 123 AND B = 456);

因此,对于您的示例,请使用:

Dim cmd3 As OleDbCommand = New OleDbCommand(_ 
    "INSERT INTO " & tblName &  _ 
    "(Ticker, [Date], [Open], High, Low, [Close], Volume, Adj_Close) " & _ 
    "SELECT ?, ?, ?, ?, ?, ?, ?, ? " & _ 
    "FROM DUAL " & _
    "WHERE NOT EXISTS (SELECT 1 FROM tblName WHERE Ticker = ? AND [Date] = ? AND ...)", con)

(WHERE 子句取决于您的键列)

于 2013-11-11T17:07:56.277 回答