2

我有一个从 .csv 文件导入数据的 SSIS 包。该文件中的每个条目都有双引号 ( ") 限定符,但介于两者之间。我还添加了逗号 ( ,) 作为列分隔符。我不能给你我正在使用的原始数据,但这里是我的数据如何在平面文件源中传递的示例:

"ID-1","A "B"", C, D, E","Today"
"ID-2","A, B, C, D, E,F","Yesterday"
"ID-3","A and nothing else","Today"

正如您所看到的,第二列可以包含引号(和逗号),这会破坏我的 SSIS 导入,并出现指向此行的错误。我不太熟悉正则表达式,但我听说这在这种情况下可能会有所帮助。

在我看来,我需要"用单引号(')替换所有双引号(),除了......

  • ...一行开头的所有引号
  • ...一行末尾的所有引号
  • ...报价是其中的一部分","

你们中的任何人都可以帮我解决这件事吗?会很好!

提前致谢!

4

4 回答 4

1

要根据您的规范用单引号替换双引号,请使用这个简单的正则表达式。此正则表达式将允许行首和/或行尾有空格。

string pattern = @"(?<!^\s*|,)""(?!,""|\s*$)";
string resultString = Regex.Replace(subjectString, pattern, "'", RegexOptions.Multiline);

这是模式的解释:

// (?<!^\s*|,)"(?!,"|\s*$)
// 
// Options: ^ and $ match at line breaks
// 
// Assert that it is impossible to match the regex below with the match ending at this position (negative lookbehind) «(?<!^\s*|,)»
//    Match either the regular expression below (attempting the next alternative only if this one fails) «^\s*»
//       Assert position at the beginning of a line (at beginning of the string or after a line break character) «^»
//       Match a single character that is a “whitespace character” (spaces, tabs, and line breaks) «\s*»
//          Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»
//    Or match regular expression number 2 below (the entire group fails if this one fails to match) «,»
//       Match the character “,” literally «,»
// Match the character “"” literally «"»
// Assert that it is impossible to match the regex below starting at this position (negative lookahead) «(?!,"|\s*$)»
//    Match either the regular expression below (attempting the next alternative only if this one fails) «,"»
//       Match the characters “,"” literally «,"»
//    Or match regular expression number 2 below (the entire group fails if this one fails to match) «\s*$»
//       Match a single character that is a “whitespace character” (spaces, tabs, and line breaks) «\s*»
//          Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»
//       Assert position at the end of a line (at the end of the string or before a line break character) «$»
于 2012-09-07T15:56:42.220 回答
0

在使用双引号和逗号加载 CSV 时,有一个限制是添加了额外的双引号,并且数据也包含在双引号中,您可以在源文件的预览中查看。因此,添加派生列任务并给出以下表达式:-

(REPLACE(REPLACE( RIGHT(SUBSTRING(TRIM(COL2),1,LEN(COL2) - 1),LEN(COL2) - 2)) ,"","@"),"\"\"","\" "),"@",")

粗体部分删除用双引号括起来的数据。

试试这个,如果这有帮助,请告诉我

于 2013-03-26T18:04:41.607 回答
0

您可以使用正则表达式匹配模式拆分列

/(?:(?<=^")|(?<=",")).*?(?:(?="\s*$)|(?=","))/g

请参阅此演示

于 2012-09-07T14:52:03.513 回答
0

"在将值插入 CSV 目标之前,对 CSV 目标使用文本限定符,添加派生列表达式

REPLACE(REPLACE([Column1],",",""),"\"","")

这将保留"在您的文本字段中

于 2019-02-21T15:49:21.350 回答