-1

这是我(迄今为止)解决此问题的最佳尝试。我是正则表达式的新手,这个问题非常严重,但我会试一试。RegEx 显然需要一些时间来掌握。

这似乎满足分隔符/逗号的要求。对我来说,这似乎是多余的,但由于重复/s*。可能有更好的方法。

/\s*[,|\s*]\s*/

我在 SOF 上找到了这个,并试图将它拆开并将其应用于我的问题(不容易)。这似乎满足了大多数“引用”要求,但我仍在研究如何解决以下要求中的分隔符问题。

/"(?:\\\\.|[^\\\\"])*"|\S+/

我试图满足的要求:

  • PHP preg_match_all() (或类似的)函数将使用该函数将字符串分解为字符串数组。源语言是 PHP。
  • 输入字符串中的单词由(0 个或多个空格)(可选逗号)(0 个或多个空格)或仅(1 个或多个空格)分隔。
  • 输入字符串也可以包含引用的子字符串,这些子字符串成为输出数组中的单个元素。
  • 输入字符串中带引号的子字符串在放入输出数组时必须保留它们的双引号(因为我们必须能够稍后将它们识别为最初在输入字符串中被引用)。
  • 放入输出数组时,必须删除带引号的子字符串中的前导和尾随空格(即双引号字符和字符串本身之间的空格)。示例:“<space>hello<space>world<space><tab>”变为“hello<space>world”
  • 输入字符串中带引号的短语中的空格在放入其输出数组元素时必须减少为单个空格。示例:“hello<space><tab><space><space>world”变为“hello<space>world”
  • 输入字符串中长度为零或仅包含空格的带引号的子字符串不会放入输出数组中(输出数组不得包含任何长度为零的元素)。
  • 必须修剪(左右)输出数组的每个元素以用于空白。

此示例演示了上述所有要求:

输入字符串:

““ 一二三四五六七 ” ” ”

返回此数组(双引号实际上存在于如下所示的字符串中):

{一二三四五六七”}

编辑 2013 年 9 月 13 日

几天来我一直在努力研究正则表达式,最终确定了这个提议的解决方案。这可能不是最好的,但这是我目前所拥有的。

我将使用这个正则表达式使用 PHP 的 preg_match_all() 函数将搜索字符串拆分为一个数组:

/(?:"([^"]*)"|([^\s",]+))/

php 函数 preg_match_all() 需要前导/尾随“/”。

现在数组已创建,我们从函数调用中检索它,如下所示:

$x = preg_match_all(REGEX);
$Array = $x[0];

我们必须这样做,因为函数返回一个复合数组,元素 0 包含正则表达式的实际输出。其他返回的元素包含我们不需要的正则表达式捕获的值。

现在,我将迭代生成的数组并处理每个元素以满足要求(上图),这比使用单个正则表达式在一个步骤中满足所有要求要容易得多

4

1 回答 1

0

我终于为这个问题开发了一个解决方案,其中涉及一些使用正则表达式的 PHP 语句。下面是最终功能。

这个函数是一个类的一部分,这就是它以“public”开头的原因。

public function SearchString_ToArr($SearchString) {
    /*
    Purpose
        Used to parse the specified search string into an array of search terms.
        Search terms are delimited by <0 or more whitespace><optional comma><0 or more whitespace>
    Parameters
        SearchString (string) = The search string we're working with.
    Return (array)
        Returns an array using the following rules to parse the specified search string:
            - Each search term from the search string is converted to a single element in the returned array.
            - Search terms are delimited by whitespace and/or commas, or they may be double quoted.
            - Double-quoted search terms may contain multiple words.
        Unquoted Search Terms:
            - These are delimited by any number of whitespace characters or commas in the search string.
            - These have all leading and trailing whitespace trimmed.
        Quoted Search Terms:
            - These are surrounded by double-quotes in the search string.
            - These retain leading and trailing double-quotes in the returned array.
            - These have all leading and trailing whitespace trimmed.
            - These may contain whitespace.
            - These have all containing whitespace converted into a single space.
            - If these are zero-length or contain only whitespace, they are not included in the returned array.
        Example 1:
            SearchString =  ' "" one " two   three " four "five six" " " '
            Returns {"one", ""two three"", "four", ""five six""}
            Notes   The leading whitespace before the first "" is not returned.
                    The first quoted phrase ("") is empty so it is not returned.
                    The term "one" is returned with leading and trailing whitespace removed.
                    The phrase "two three" is returned with leading and trailing whitspace removed.
                    The phrase "two three" has containing whitespace converted to a single space.
                    The phrase "two three" has leading and trailing double-quotes retained.
                    ...
    Version History
        1.0 2013.09.18 Tested by Russ Tanner on PHP 5.3.10.
    */

    $r = array();
    $Matches = array();

    // Split the search string into an array based on whitespace, commas, and double-quoted phrases.
    preg_match_all('/(?:"([^"]*)"|([^\s",]+))/', $SearchString, $Matches);
    // At this point:
    //  1. all quoted strings have their own element and begin/end with the quote character.
    //  2. all non-quoted strings have their own element and are trimmed.
    //  3. empty strings are omitted.

    // Normalize quoted elements...
    // Convert all internal whitespace to a single space.
    $r = preg_replace('/\s\s+/', ' ', $Matches[0]);
    // Remove all whitespace between the double-quotes and the string.
    $r = preg_replace('/^"\s+/', '"', $r);
    $r = preg_replace('/\s+"$/', '"', $r);

    return $r;
}
于 2013-09-22T10:14:32.507 回答