2

我有一个具有以下值的字符串:

总到期报表$240.05911 费用$10.00特许经营税$.172VS销售税$.53本地税$.23服务折扣-$50.00付款-谢谢-$100.00HBO+STARLET$100.00

我需要将此字符串拆分为键/值对。

TOTAL DUE-STATEMENT $240.05
911 Fee $10.00
FRANCHISE TAX $.17
2VSALES TAX $.53
LOCAL-TAX $.23
SERVICE DISCOUNT -$50.00
PAYMENT - THANK YOU -$100.00
HBO+STARLET $100.00

我的字符串值将始终是动态的,并且描述是动态的,除非911 Fee 我编写了如下的正则表达式。

([911 a-zA-Z |911 a-zA-Z|a-zA-Z |a-zA-Z \\-? a-zA-Z|! ?|+? ]+)(-?\\$[0-9|,]*\\.[0-9][0-9])

我得到了正确的键/值对,除了描述包含数字、字母和特殊字符。我的输出如下:

TOTAL DUE-STATEMENT $240.05
911 Fee $10.00
FRANCHISE TAX $.17
SALES TAX $.53   ** Which is wrong**(Expected is 2VSALES TAX as key)
LOCAL-TAX $.23
SERVICE DISCOUNT -$50.00
PAYMENT - THANK YOU-  $100.00 "-" is coming as key (Expected is PAYMENT - THANK YOU)
STARLET $100.00 **- Which is wrong** (Expected is HBO+STARLET)

有人可以帮我在这个正则表达式中改变什么吗?

4

6 回答 6

2

示例:http ://regexr.com?35dsq

使用这个正则表达式

/([-]{0,1}\$\d*\.\d\d)/g

它找到 a$后跟任意数量的数字,然后是.2 位数字。

然后在你的替换使用

 \1\n
于 2013-07-01T17:54:53.323 回答
1

描述

此正则表达式解决方案假定 money 列有时具有-前缀,但始终包含 a$后跟零个或多个数字、一个点和正好 2 个数字。其余字符是名称的一部分。

([^$]*?)(-?\$\d*\.\d{2})

在此处输入图像描述

每个捕获组 1 将具有名称,捕获组 2 将具有美元值。

例子:

工作示例:http ://www.rubular.com/r/9ODCQXyFoZ

示例文本

TOTAL DUE-STATEMENT$240.05911 Fee$10.00FRANCHISE TAX$.172VSALES TAX$.53LOCAL-TAX$.23SERVICE DISCOUNT-$50.00PAYMENT - THANK YOU-$100.00HBO+STARLET$100.00

Java 代码

import java.util.regex.Pattern;
import java.util.regex.Matcher;
class Module1{
  public static void main(String[] asd){
  String sourcestring = "source string to match with pattern";
  Pattern re = Pattern.compile("([^$]*?)(-?\\$\\d*\\.\\d{2})",Pattern.CASE_INSENSITIVE | Pattern.MULTILINE | Pattern.DOTALL);
  Matcher m = re.matcher(sourcestring);
  int mIdx = 0;
    while (m.find()){
      for( int groupIdx = 0; groupIdx < m.groupCount()+1; groupIdx++ ){
        System.out.println( "[" + mIdx + "][" + groupIdx + "] = " + m.group(groupIdx));
      }
      mIdx++;
    }
  }
}

捕获组

$matches Array:
(
    [0] => Array
        (
            [0] => TOTAL DUE-STATEMENT$240.05
            [1] => 911 Fee$10.00
            [2] => FRANCHISE TAX$.17
            [3] => 2VSALES TAX$.53
            [4] => LOCAL-TAX$.23
            [5] => SERVICE DISCOUNT-$50.00
            [6] => PAYMENT - THANK YOU-$100.00
            [7] => HBO+STARLET$100.00
        )

    [1] => Array
        (
            [0] => TOTAL DUE-STATEMENT
            [1] => 911 Fee
            [2] => FRANCHISE TAX
            [3] => 2VSALES TAX
            [4] => LOCAL-TAX
            [5] => SERVICE DISCOUNT
            [6] => PAYMENT - THANK YOU
            [7] => HBO+STARLET
        )

    [2] => Array
        (
            [0] => $240.05
            [1] => $10.00
            [2] => $.17
            [3] => $.53
            [4] => $.23
            [5] => -$50.00
            [6] => -$100.00
            [7] => $100.00
        )

)
于 2013-07-02T04:50:33.737 回答
0

考虑到总有两位小数

您的正则表达式可以简化为

.+?[$]\d*[.]\d{2}

您需要将模式与上述正则表达式匹配,而不是拆分

Matcher m =Pattern.compile(regex).matcher(input);
while(m.find())
{
m.group();
}
于 2013-07-01T17:54:14.557 回答
0

由于您的价格格式是已知的,请搜索它,其间的所有内容都是描述:

    String in = "TOTAL DUE-STATEMENT$240.05911 Fee$10.00FRANCHISE TAX$.172VSALES TAX$.53LOCAL-TAX$.23SERVICE DISCOUNT-$50.00PAYMENT - THANK YOU-$100.00HBO+STARLET$100.00";
    Pattern price = Pattern.compile("-?\\$\\d*\\.\\d{2}");
    Matcher matcher = price.matcher(in);
    int offset = 0;
    while (matcher.find(offset)) {
        String description = in.substring(offset, matcher.start());
        String value = matcher.group();
        System.out.println(description + " " + value);
        offset = matcher.end();
    }
于 2013-07-01T18:02:15.117 回答
0
class Main {
    public static void main(String[] args) {
        String test = "TOTAL DUE-STATEMENT$240.05911 Fee$10.00FRANCHISE TAX$.172VSALES TAX$.53LOCAL-TAX$.23SERVICE DISCOUNT-$50.00PAYMENT - THANK YOU-$100.00HBO+STARLET$100.00";
        java.util.regex.Pattern p = java.util.regex.Pattern.compile("(?<KEY>.+?(?=-?\\$[\\d,]*\\.\\d{2}))(?<VAL>-?\\$[\\d,]*\\.\\d{2})");
        java.util.regex.Matcher m = p.matcher(test);
        while(m.find()) {
            System.out.println(m.group("KEY") + " : " + m.group("VAL"));
        }
    }
}

您只需要 KEY .+ 的非贪婪匹配吗?然后是 VALUE 的前瞻,它总是以一个点和 2 位数字结束美分。

于 2013-07-01T18:32:34.603 回答
-1

这应该这样做:

^(.+) (-?\$\d*\.\d\d)$

正则表达式的后半部分匹配美元金额,包括可选的 - 符号。第一部分包含除分隔空间之外的所有其他内容。

于 2013-07-01T18:01:47.313 回答