假设每行只有一个有效的带引号的子字符串,这可能是一个很好的起点:
<?php // test.php Rev:20120105_1800
// Return array of valid quoted substrings, one per line.
function getArrayOfOnePerLineValidQuotedSubstrings($text) {
$re = '%
# Match line w/1 valid "single" or "double" substring.
^ # Anchor to start of line.
[^\'"]* # Everything up to first quote.
(?| # Branch reset group $1: Contents.
"([^"]*)" # Either $1.1 Double quoted,
| \'([^\']*)\' # or $1.2 Single quoted contents.
) # End $1: branch reset group.
[^\'"]* # Everything after quoted sub-string.
$ # Anchor to end of line.
%xm';
if (preg_match_all($re, $text, $matches)) {
return $matches[1];
}
return array();
}
// Fetch test data from file.
$data = file_get_contents('testdata.txt');
// Get array of valid quoted substrings, one per line.
$output = getArrayOfOnePerLineValidQuotedSubstrings($data);
// Display results.
$count = count($output);
printf("%d matches found.\n", $count);
for ($i = 0; $i < $count; ++$i) {
printf(" match[%d] = {%s}\n", $i + 1, $output[$i]);
}
?>
此正则表达式匹配包含一个有效的带引号的子字符串的每一行,并跳过具有无效(即"--'__'--
具有不平衡的双引号子字符串)或没有带引号的子字符串的行。对于匹配的行,有效引用子字符串的内容在 group 中返回$1
。该函数返回匹配的子字符串的数组。
如果您的数据每行包含多个子字符串,或者引用的子字符串或引用的子字符串之间的内容可能包含转义的引号,则可以制定更复杂的解决方案。