ruby - Ruby 正则表达式或解析器

Question

我有一个要解析的字符串，看起来有点像 github markdown，但我真的不想要完整的实现。该字符串将是“代码”块和“文本”块的混合。代码块将是三个反引号，后跟一个可选的“语言”，然后是一些代码，最后是三个反引号。非代码几乎就是其他一切。我不（但可能应该）关心用户是否不能在“文本”块中输入三个反引号。这是一个例子......

这是一些文本，后跟一个代码块
```红宝石
定义函数
   “你好”
结尾
```
还有一些文字

当然可能会有更多的代码和文本块穿插其中。我已经尝试为此编写一个正则表达式，它似乎有效，但我无法让组（在括号中）给我所有的匹配项，并且 scan() 失去了排序。我已经研究过使用几个 ruby 解析器（treetop、parselet），但是对于我想要的来说看起来有点大，但如果这是我最好的选择，我愿意走那条路。

想法？

有几个人询问了我正在尝试的 RE（下面有很多变体）......

re = 
  /
    ```\s*\w+\s*          # 3 backticks followed by the language
      (?!```).*?          # The code everything that's not 3 backticks
    ```                   # 3 more backticks
    |                     # OR
    (?!```).*             # Some text that doesn't include 3 backticks
  /x                      # Ignore white space in RE

似乎即使在简单的情况下，例如

md = /(a|b)*/.match("abaaabaa")

我无法获得所有的 a 和 b。从不存在的说 md[3] 。希望这更有意义，这就是为什么我认为 RE 在我的情况下不起作用，但我不介意被证明是错误的。

score 1 · Accepted Answer

根据我对 Markdown(github-, stackoverflow-flavors) 的了解和您的问题（对于文本的其余部分而言，这不是很精确），我将在这里做出一些假设。

1. 每个代码块都以单行开头，仅包含三个反引号、一个可选的语言名称和换行符。

2. 每个代码块都以仅包含三个反引号的单行结束。

3. 代码块不为空。

如果您可以接受这些假设，那么以下代码应该可以工作（假设文本在str变量中）：

regex = %r{
  ^```[[:blank:]]*(?<lang>\w+)?[[:blank:]]*\n # matches start of codeblock, and captures optional :lang.
    (?<content>.+?) # matches codeblock content and captures in :content
  \n[[:blank:]]*```[[:blank:]]*\n # matches ending of codeblock.
}xm # free-space mode and . matches newline.
position = 0
matches = []
while(match = regex.match(str,position)) do
  position = match.end 0
  matches << [match[:lang], match[:content]]
end

在此匹配项之后包含一个数组数组，其中一个内部数组表示匹配项，第一个元素是（可选）语言，可能为 nil，第二个元素是内容。

如果您对文本有更多假设，我可以更改正则表达式。

这是我使用的测试字符串：

str = %{
this is some random text.
```ruby
  def print
    puts "this is a code block with lang-argument"
  end
```

some other text follows here.
i want some ``` backticks here.

```
  def print
    puts "this is a code block without lang-argument"
  end
```
}

ruby - Ruby 正则表达式或解析器

1 回答 1

Related

Reference