0

我在使用 Ruby 从文本文件中获取数据时遇到问题。我已经打开并读取了文件,并用 '%' 替换了所有换行符(因为换行符似乎会导致问题),但是当我尝试对字符串调用扫描时,它并没有按照我想要的方式解析至。我确信这个正则表达式比它需要的更丑陋,但这就是它正在做的事情:http ://rubular.com/r/JNgleGA5bd

该文件有一个编号列表,并且由于格式是一致的,我想要一个正则表达式来获取列表中的每个项目。在我包含的片段中,它应该在“2.(tab)如果“其他”船制造商之前抓取所有内容,

这是字符串的示例:

"1. 你的船是什么牌子的?%% [- Select One -]%%Var. 1: Code = A2_asdfw, Name = A2_WhatMakeIsYourBoat%%Type = Category%%Template = Standard Category%%Cat. 1: Code = 339 , Name = NONE%%Cat. 2: Code = 3, Name = asdfg%%2. 如果是“其他”船制造商,请在此处描述:% _ __ _ __ _ ___ %% Var. 1: Code = A154_asdf, Name = A36_asdfg%%Type = Literal%%Template = Standard Literal%%最大长度 = 20 个字符%%"

这是我的正则表达式:

([0-9]+\.\t[\/0-9a-zA-Z\s,"()'-]+[%\t?:].*?)[0-9]+\.\t[\/0-9a-zA-Z\s,"()'-]+[%\t?:]
4

1 回答 1

2

假设每个条目都以模式“digit-period-tab”开头,您可以使用此正则表达式:

[0-9][.]\t(?:(?![0-9][.]\t).)*

工作演示。

这里有一些解释:

[0-9]          # match a digit
[.]            # match a period - same as "\.", but more readable IMHO
\t             # match a tab
(?:            # open non-capturing group. this group will match/consume single
               # character, that is not the beginning of the next item
  (?!          # negative lookahead - this does not consume anything, but ensure
               # its contents canNOT be matched at the current position
    [0-9][.]\t # check that there is no new item starting
  )            # end of negative lookahead ... if we get here, the next character
               # still belongs to the current item; note that the engine's
               # "cursor" has not moved
  .            # consume an arbitrary character
)              # end of group
*              # repeat 0 or more times (as often as possible)

有关环视的更多信息。

如果您的项目可以超出数字9(即,有多个数字),只需+在两者之后添加一个[0-9]

于 2013-04-14T15:15:23.557 回答