ruby-on-rails - 正则表达式捕获以冒号分隔的键值对，具有多行值

Question

我目前正在使用 Ruby on Rails（在 Eclipse 中）开发一个项目，我的任务是使用正则表达式将数据块拆分为相关部分。

我决定根据 3 个参数分解数据：

该行必须以大写字母开头（RegEx 等效 - /^[A-Z]/）
它必须以 : (RegEx 等效 - /$":"/)结尾

我将不胜感激......我在控制器中使用的代码是：

@f = File.open("report.rtf")  
@fread = @f.read  
@chunk = @fread.split(/\n/)

where@chunk是将由拆分创建的数组，@fread是正在拆分的数据（通过新行）。

任何帮助将不胜感激，非常感谢！

我无法发布确切的数据，但基本上是这样（这与医学有关）

考试 1：CBW 8080

结果：

该报告由特定测量决定。请参阅原始报告。

比较：2012 年 1 月 30 日、2012 年 3 月 8 日、2012 年 4 月 9 日

RECIST 1.1：废话废话

理想的输出是一个数组，上面写着：

["Exam 1:", "CBW 8080", "RESULT", "This report is dictated with specific measurement. Please see the original report.", "COMPARISON:", "1/30/2012, 3/8/12, 4/9/12", "RECIST 1.1:", "BLAH BLAH BLAH"]

PS我只是使用 \n 作为占位符，直到我让它工作

score 4 · Accepted Answer

鉴于已澄清的问题，这是一个新的解决方案。

更新

首先，将整个数据块（包括换行符和所有）“啜”成一个字符串。

str = IO.read("report.rtf")

然后使用这个正则表达式：

captures = str.scan(/(?<=^|[\r\n])([A-Z][^:]*):([^\r\n]*(?:[\r\n]+(?![A-Z].*:).*)*)/)

在此处查看实时示例：http ://rubular.com/r/8w3X6WGq4l 。

答案，解释：

    (?<=                Lookbehind assertion.
        ^                   Start at the beginning of the string,
        |                   or,
        [\r\n]              a new line.
    )
    (                   Capture group 1, the "key".
        [A-Z][^:]*          Capital letter followed as many non-colon
                            characters as possible.
    )
    :                   The colon character.

    (                   Capture group 2, the "value".
        [^\r\n]*            All characters (i.e. non-newline characters) on the
                            same line belongs to the "value," so take them all.

        (?:             Non-capture group.

            [\r\n]+         Having already taken everything up to a newline
                            character, take the newline character(s) now.

            (?!             Negative lookahead assertion.
                [^A-Z].*:       If this next line contains a capital letter,
                                followed by a string of anything then a colon,
                                then it is a new key/value pair, so we do not
                                want to match this case.
            )
            .*              Providing this isn't the case though, take the line!

        )*              And keep taking lines as long as we don't find a
                        key/value pair.
    )

score 1 · Accepted Answer

我不完全确定你在找什么。如果您希望所有出现的大写字母后跟一些文本和分号，那么您可以执行以下操作：

str.scan(/[A-Z].*?:/)

score 0 · Accepted Answer

0

这应该这样做。

/^[A-Z].*:$/

于 2012-06-18T18:46:41.267 回答

score 0 · Accepted Answer

正则表达式可以是：/(^[A-Z].*\:)/m 你可以通过添加来提取：

@chunk = @fread.scan(/(^[A-Z].*\:)/m)

提供 @fread 是一个字符串。您可以使用http://rubular.com/在 ruby 中测试正则表达式。

score 0 · Accepted Answer

另一个解决方案：

input_str.split("\r\n").each |s| do
    var_name = s.split(": ")[0]
    var_value = s.split(": ")[1]
    # do whatever you like
done

ruby-on-rails - 正则表达式捕获以冒号分隔的键值对，具有多行值

5 回答 5

Related

Reference