ruby - Ruby 中的正则表达式问题

Question

我在使用 Ruby 从文本文件中获取数据时遇到问题。我已经打开并读取了文件，并用 '%' 替换了所有换行符（因为换行符似乎会导致问题），但是当我尝试对字符串调用扫描时，它并没有按照我想要的方式解析至。我确信这个正则表达式比它需要的更丑陋，但这就是它正在做的事情：http ://rubular.com/r/JNgleGA5bd

该文件有一个编号列表，并且由于格式是一致的，我想要一个正则表达式来获取列表中的每个项目。在我包含的片段中，它应该在“2.（tab）如果“其他”船制造商之前抓取所有内容，

这是字符串的示例：

"1. 你的船是什么牌子的？%% [- Select One -]%%Var. 1: Code = A2_asdfw, Name = A2_WhatMakeIsYourBoat%%Type = Category%%Template = Standard Category%%Cat. 1: Code = 339 , Name = NONE%%Cat. 2: Code = 3, Name = asdfg%%2. 如果是“其他”船制造商，请在此处描述：% _ __ _ __ _ ___ %% Var. 1: Code = A154_asdf, Name = A36_asdfg%%Type = Literal%%Template = Standard Literal%%最大长度 = 20 个字符%%"

这是我的正则表达式：

([0-9]+\.\t[\/0-9a-zA-Z\s,"()'-]+[%\t?:].*?)[0-9]+\.\t[\/0-9a-zA-Z\s,"()'-]+[%\t?:]

score 2 · Accepted Answer

假设每个条目都以模式“digit-period-tab”开头，您可以使用此正则表达式：

[0-9][.]\t(?:(?![0-9][.]\t).)*

工作演示。

这里有一些解释：

[0-9]          # match a digit
[.]            # match a period - same as "\.", but more readable IMHO
\t             # match a tab
(?:            # open non-capturing group. this group will match/consume single
               # character, that is not the beginning of the next item
  (?!          # negative lookahead - this does not consume anything, but ensure
               # its contents canNOT be matched at the current position
    [0-9][.]\t # check that there is no new item starting
  )            # end of negative lookahead ... if we get here, the next character
               # still belongs to the current item; note that the engine's
               # "cursor" has not moved
  .            # consume an arbitrary character
)              # end of group
*              # repeat 0 or more times (as often as possible)

有关环视的更多信息。

如果您的项目可以超出数字9（即，有多个数字），只需+在两者之后添加一个[0-9]。

ruby - Ruby 中的正则表达式问题

1 回答 1

Related

Reference