ruby - 什么是从字符串中捕获总量的正则表达式？

Question

我需要解析来自不同文件的总量。每个文件的布局不同，所以我需要解析的行也不同。

从刺痛中捕获落在“总计”之后的数字的正则表达式应该是什么？

它需要不区分大小写，并且应该考虑“Total”之后最接近的匹配项。在“Total”这个词之前或之后可以有任何东西，我需要它之后的第一个数字。

例如：

from string "Service charges: 10 Total: 100 Shipping: 10"
from string "Service charges: 10 Total Amount: 100 Shipping: 10"
from string "Service charges: 10 Grand Total: 100 Shipping: 10"
from string "Service charges: 10 Total Amount (Rs.): 100 Shipping: 10"

输出应该是100上述所有情况。

score 3 · Accepted Answer

如果您真正要问的是各种字符串的模式匹配，请查看使用scan并获取数字字符串：

[
  "Service charges: 10 Total: 100 Shipping: 10",
  "Service charges: 10 Total Amount: 100 Shipping: 10",
  "Service charges: 10 Grand Total: 100 Shipping: 10",
  "Service charges: 10 Total Amount (Rs.): 100 Shipping: 10",
].map{ |s| s.scan(/\d+/)[1] }
=> ["100", "100", "100", "100"]

这假设您想要每个字符串中的第二个数字。

如果该订单要更改，这不太可能，因为您看起来正在扫描发票，那么模式和/或的变化scan将起作用。这会将其切换并使用基于“Total”位置的标准正则表达式搜索，一些可能的中间文本，后跟“：”和总值：

[
  "Service charges: 10 Total: 100 Shipping: 10",
  "Service charges: 10 Total Amount: 100 Shipping: 10",
  "Service charges: 10 Grand Total: 100 Shipping: 10",
  "Service charges: 10 Total Amount (Rs.): 100 Shipping: 10",
].map{ |s| s[/Total.*?: (\d+)/, 1] }
=> ["100", "100", "100", "100"]

要to_i在map语句中附加整数值：

[
  "Service charges: 10 Total: 100 Shipping: 10",
  "Service charges: 10 Total Amount: 100 Shipping: 10",
  "Service charges: 10 Grand Total: 100 Shipping: 10",
  "Service charges: 10 Total Amount (Rs.): 100 Shipping: 10",
].map{ |s| s[/Total.*?: (\d+)/, 1].to_i }
=> [100, 100, 100, 100]

对于您的示例字符串，最好使用区分大小写的模式来匹配“Total”，除非您知道您会遇到小写的“total”。而且，在这种情况下，你应该展示这样一个例子。

score 1 · Accepted Answer

我认为你可以这样做：

/Total[^:]*:\s+([0-9]+)/i

解释：

Total搜索“总计”
[^:]*后跟任何东西或什么都没有，直到找到冒号“：”
:\s+阅读冒号和后面的任何空格（可能用 * 代替 +）
([0-9]+)将数字读入一组以供以后检索-> 100

我不确定如何在您使用的环境中指示不区分大小写，但通常可以使用一些标志来完成，例如我用i

这是一个小提琴作为例子

score 0 · Accepted Answer

# assuming you have all your files ready in an array
a = ["Service charges: 10 Total: 100 Shipping: 10",  "Service charges: 10 Total Amount: 100 Shipping: 10", "Service charges: 10 Grand Total: 100 Shipping: 10", "Service charges: 10 Total Amount (Rs.): 100 Shipping: 10"]
# we find every total with the following regexp
a.map {|s| s[/total[^\d]*(?<total>\d+)/i, 'total']}
#=> ["100", "100", "100", "100"]

正则表达式是/total[^\d]*(?<total>\d*)/i. 它查找单词“total”并忽略任何后续字符，直到找到一个数字（它在捕获组中返回）。该i选项使其不区分大小写。

ruby - 什么是从字符串中捕获总量的正则表达式？

3 回答 3

Related

Reference