-1

我有一个大问题。我在灾难性回溯(http://www.regular-expressions.info/catastrophic.html)中进行了一次验证。但我很难弄清楚为什么。也许有人有想法?除此之外,正则表达式适用于所有用例。

regex: "^((^|[^A-Za-z]+)[A-Z][A-Za-z]*)*[^A-Za-z]*$"

问题输入:

"Disposable
BHT,
Tocopheryl AcetateHydrating Shave Gel
Aqua,
Glycerin,
Palmitic Acid,
Triethanolamine,
Isopentane,
Glyceryl Oleate,
Stearic Acid,
Isobutane,
Sorbitol,
Parfum,
Hydroxyethylcellulose,
Myristic Acid,
PEG-90M,
Butyrospermum Parkii Butter Extract,
Lauric Acid,
PTFE,
PEG-23M,
Propylene Glycol,
Glyceryl Acrylate/Acrylic Acid Copolymer,
PVM/MA Copolymer,
Silica,
Methylparaben,
Propylparaben,
BHT,
Limonene,
Benzyl Salicylate,
Linalool,
CI 42053,
CI 42090
Series Thermal Face Scrub
PEG-4,
Magnesium Sulfate,
PEG/PPG-300/55 Copolymer,
Polyethylene,
Polypropylene,
Laureth-23,
Stearyl Alcohol,
Dioleoylethyl Hydroxyethylmonium Methosulfate,
Cetyl Alcohol,
Behentrimonium Chloride,
Distearyldimonium Chloride,
Hydroxypropylcellulose,
Parfum,
Methylparaben,
Propylparaben,
Niacinamide,
Alcohol Denat,
Hexylene Glycol,
Benzyl Salicylate,
AquaClassic Clean Shampoo
Aqua,
Sodium Lauryl Sulfate,
Sodium Laureth Sulfate,
Glycol Distearate,
Zinc Carbonate,
Sodium Chloride,
Sodium Xylenesulfonate,
Zinc Pyrithione,
Cocamidopropyl Betaine,
Dimethicone,
Sodium Benzoate,
Guar Hydroxypropyltrimonium Chloride,
Hydrochloric Acid,
Hexyl Cinnamal,
Linalool,
Butylphenyl Methylpropional,
Magnesium Carbonate Hydroxide,
Ammonium Laureth Sulfate,
Magnesium Nitrate,
Sodium Polynaphthalenesulfonate,
Methylchloroisothiazolinone,
Magnesium Chloride,
CI 42090,
Citric Acid,
Methylisothiazolinone,
Tetrasodium EDTA,
CI 17200,
DMDM Hydantoin    Perspirant Deodorant Spray Sport Protect 48H
Butane,
Isobutane,
Cyclopentasiloxane,
Aluminum Chlorohydrate,
Cyclodextrin,
Disteardimonium Hectorite,
Dimethicone,
Aqua,
Triethyl Citrate,
Alpha-Isomethyl Ionone,
Butylphenyl Methylpropional,
Citral,
Citronellol,
Coumarin,
Geraniol,
Limonene,
Linalool
Pillite Series Instant Hydration Moisturiser +SPF 15
Aqua,
Glycerin,
Ethylhexyl Salicylate,
Niacinamide,
Butyl Methoxydibenzoylmethane,
Dimethicone,
Polyethylene,
Octocrylene,
Isopropyl Palmitate,
Phenylbenzimidazole Sulfonic Acid,
Sorbitan Stearate,
Triethanolamine,
Cetyl Alcohol,
Sodium Acrylates Copolymer,
Aluminum Starch Octenylsuccinate,
Stearyl Alcohol,
Caprylic/Capric Triglyceride,
Panthenol,
Benzyl Alcohol,
Dimethiconol,
Fragrance,
Ethylparaben,
Cetearyl Glucoside,
Cetearyl Alcohol,
PEG 100 Stearate,
Propylparaben,
Disodium EDTA,
C12-13 Pareth-3,
Palmitic Acid,
Stearic Acid,
Benzyl Salicylate,
Laureth-7,
Linalool,
Butylphenyl Methylpropional,
Myristic Acid,
Coumarin,
Heptadecanoic Acid,
Benzyl Benzoate"

谢谢!

4

1 回答 1

3

问题是您有一个形式为的子句

(something*)* 

当正则表达式正确匹配时,这可以正常工作,但如果您的其中一条线格式错误,事情就会发生灾难性的错误。这是由于回溯和正则表达式引擎将尝试的所有各种组合。

如果您的线路最长:

吉列系列瞬间补水保湿+SPF 15

如果此行与您的正则表达式不匹配,则正则表达式引擎需要 2,251,799,813,685,248 (2^51) 次尝试才能意识到此行与正则表达式不匹配。

修复程序在您链接到的页面上。由于您正在寻找交替的单词序列而不是单词,因此回溯对您没有用(因为无法将单词拆分为单词/非单词/单词的序列)。您可以通过使用所有格量词来防止回溯(即,一旦正则表达式匹配了一个单词或非单词,它就不会放弃该匹配)。

使用所有格量词只是所有量词的加号,所以

(something*)*变成(something*+)*+

于 2012-10-19T10:59:09.353 回答