2

我在http://gskinner.com/RegExr/找到了这个正则表达式模式

,(?=(?:[^"]*"[^"]*")*(?![^"]*"))

用于模式匹配 CSV 分隔值(更具体地说,分隔逗号,可以拆分),在该站点上与我的测试数据完美配合。您可以在测试时在链接的站点的底部面板中看到我认为的 JavaScript 实现。

但是,当我尝试在 C#/.net 中实现这一点时,匹配并不能正常工作。我的实现:

Regex r = new Regex(",(?=(?:[^\"]*\"[^\"]*\")*(?![^\"]*\"))", RegexOptions.ECMAScript);
//get data...
foreach (string match in r.Split(sr.ReadLine()))
{
    //lblDev.Text = lblDev.Text + match + "<br><br><br><p>column:</p><br>";
    dtF.Columns.Add(match);
}

//more of the same to get rows

在某些数据行上,结果与上面站点上生成的结果完全匹配,但在其他数据行上,前 6 行左右无法拆分或根本不存在于拆分数组中。

谁能告诉我为什么该模式的行为方式不同?

我的测试数据:

CategoryName,SubCategoryName,SupplierName,SupplierCode,ProductTitle,Product Company ,ProductCode,Product_Index,ProductDescription,Product BestSeller,ProductDimensions,ProductExpressDays,ProductBrandName,ProductAdditionalText ,ProductPrintArea,ProductPictureRef,ProductThumnailRef,ProductQuantityBreak1 (QB1),ProductQuantityBreak2 (QB2),ProductQuantityBreak3 (QB3),ProductQuantityBreak4 (QB4),ProductPlainPrice1,ProductPlainPrice2,ProductPlainPrice3,ProductPlainPrice4,ProductColourPrice1,ProductColourPrice2,ProductColourPrice3,ProductColourPrice4,ProductExtraColour1,ProductExtraColour2,ProductExtraColour3,ProductExtraColour4,SellingPrice1,SellingPrice2,SellingPrice3,SellingPrice4,ProductCarriageCost1,ProductCarriageCost2,ProductCarriageCost3,ProductCarriageCost4,BLACK,BLUE,WHITE,SILVER,GOLD,RED,YELLOW,GREEN,ProductOtherColors,ProductOrigination,ProductOrganizationCost,ProductCatalogEntry,ProductPageNumber,ProductPersonalisationType1 (PM1),ProductPrintPosition,ProductCartonQuantity,ProductCartonWeight,ProductPricingExpering,NewProduct,ProductSpecialOffer,ProductSpecialOfferEnd,ProductIsActive,ProductRepeatOrigination,ProductCartonDimession,ProductSpecialOffer1,ProductIsExpress,ProductIsEco,ProductIsBiodegradable,ProductIsRecycled,ProductIsSustainable,ProductIsNatural
Audio,Speakers and Headphones,The Prime Time Company,CM5064:In-ear headphones,Silly Buds,,10058,372,"Small, trendy ear buds with excellent sound quality and printing area actually on each ear- piece. Plastic storage box, with room for cables be wrapped around can also be printed.",FALSE,70 x 70 x 20mm,,,,10mm dia,10058.jpg,10058.jpg,100,250,500,1000,2.19,2.13,2.06,1.99,0.1,0.1,0.05,0.05,0.1,0.1,0.05,0.05,3.81,3.71,3.42,3.17,0,0,0,0,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,,30,,TRUE,24,Screen Printed,Earpiece,200,11,,TRUE,,,TRUE,15,,,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE
Audio,Speakers and Headphones,The Prime Time Company,CM5058:Headstart,Head Start,,10060,372,"Lightweight, slimline, foldable and patented headphones ideal for the gym or exercise. These
headphones uniquely hang from the ears giving security, comfort and an excellent sound quality. There is also a secret cable winding facility.",FALSE,130 x 85 x 45mm,,,,30mm dia,10060.jpg,10060.jpg,100,250,500,1000,5.6,5.43,5.26,5.09,0.1,0.1,0.05,0.05,0.1,0.1,0.05,0.05,9.47,8.96,8.24,7.97,0,0,0,0,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,,30,,TRUE,24,Screen Printed,print plate on ear (s),100,11,,TRUE,,,TRUE,15,,,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE
4

2 回答 2

3

为工作使用正确的工具。正则表达式不太适合解析可以包含无限数量的嵌套引号的 CSV。

改用这个:

一个快速的 CSV 阅读器

http://www.codeproject.com/Articles/9258/A-Fast-CSV-Reader

我们在生产代码中使用它。它工作得很好,让你体会到解析是多么复杂。有关复杂性的更多信息,请查看解决方案中包含的 800 多个单元测试。

于 2013-08-13T03:30:39.790 回答
0

Your C# regex works fine for me in LinqPad, but your data does include a newline within the last "row" of data. So you can't simply use sr.ReadLine() to read the data.

于 2013-08-13T04:11:59.370 回答