0

好的,事情就是这样。我有旧的 sql server 文本格式的注释。它将所有笔记记录在一个大数据中。我需要提取该文本块并对其进行解析,以便为每个注释条目创建一行,其中包含用于时间戳、用户和注释文本的单独列。我能想到的唯一方法是使用正则表达式来定位每个音符的 unix 时间戳并对其进行解析。我知道有用于解析分隔符的拆分函数,但这会删除分隔符。我需要在 \d{10} 上进行解析,但还需要保留 10 位数字。这是一些示例数据。

create table test_table
(
job_number number,
notes varchar2(4000)
)

insert into test_table values
(12345, '1234567890 username notes text notes text notes text notes text 5468204562 username notes text notes text notes text notes text 1025478510 username notes text notes text notes text notes text')
(12346, '2345678901 username notes text notes text notes text notes text 1523024512 username notes text notes text notes text notes text 1578451236 username notes text notes text notes text notes text')
(12347, '2345678902 username notes text notes text notes text notes text 2365201214 username notes text notes text notes text notes text 1202154215 username notes text notes text notes text notes text')

我希望看到每个音符的一个记录看起来像这样。

JOB_NUMBER        DTTM    USER     NOTES_TEXT
----------    ----------  ----     ----------
12345         1234567890  USERNAME notes text notes text notes text notes text
12345         5468204562  USERNAME notes text notes text notes text notes text
12345         1025478510  USERNAME notes text notes text notes text notes text
12346         2345678901  USERNAME notes text notes text notes text notes text
12346         1523024512  USERNAME notes text notes text notes text notes text
12346         1578451236  USERNAME notes text notes text notes text notes text
12347         2345678902  USERNAME notes text notes text notes text notes text
12347         2365201214  USERNAME notes text notes text notes text notes text
12347         1202154215  USERNAME notes text notes text notes text notes text

感谢您提供任何帮助

4

1 回答 1

1

Text::ParseWords可以处理带引号的字符串并以逗号分隔。您可以使用触发器运算符在输入中向前跳过1 .. /values/。这种特殊的跳过方法可能需要修改。

然后只需要解析字符串,这可以通过使用前瞻断言进行拆分,然后捕获每个子字符串中的各种条目来完成。拆分中的正则表达式:

my @entries = split /(?<!^)(?=\d{10})/, $data;

有一个否定的lookbehind断言来避免匹配字符串的开头^,还有一个lookahead断言来匹配10个数字。这将有效地拆分数字并保留它们。

DATA文件句柄用于演示,只需替换为与参数文件名一起使用<DATA><>

use strict;
use warnings;
use Text::ParseWords;

my $format = "%-12s %-12s %-10s %s\n";              # format for printing
my @headers = qw(JOB_NUMBER DTTM USER NOTES_TEXT);  
printf $format, @headers;
printf $format, map "-" x length, @headers;         # print underline
while (<DATA>) {
    next while 1 .. /values/;                       # skip to data
    s/^\(|\)$//g;                                   # remove parentheses
    my ($job, $data) = quotewords('\s*,\s*',0, $_); # parse string
    my @entries = split /(?<!^)(?=\d{10})/, $data;  # split into entries
    for my $entry (@entries) {                      # parse each entry
        my ($dttm, $user, $notes) = $entry =~ /^(\d+)\s+(\S+)\s+(.*)/;
        printf $format, $job, $dttm, $user, $entry;
    }
}

__DATA__
create table test_table
(
job_number number,
notes varchar2(4000)
)

insert into test_table values
(12345, '1234567890 username notes text notes text notes text notes text 5468204562 username notes text notes text notes text notes text 1025478510 username notes text notes text notes text notes text')
(12346, '2345678901 username notes text notes text notes text notes text 1523024512 username notes text notes text notes text notes text 1578451236 username notes text notes text notes text notes text')
(12347, '2345678902 username notes text notes text notes text notes text 2365201214 username notes text notes text notes text notes text 1202154215 username notes text notes text notes text notes text')

输出:

JOB_NUMBER   DTTM         USER       NOTES_TEXT
----------   ----         ----       ----------
12345        1234567890   username   1234567890 username notes text notes text notes text notes text
12345        5468204562   username   5468204562 username notes text notes text notes text notes text
12345        1025478510   username   1025478510 username notes text notes text notes text notes text
12346        2345678901   username   2345678901 username notes text notes text notes text notes text
12346        1523024512   username   1523024512 username notes text notes text notes text notes text
12346        1578451236   username   1578451236 username notes text notes text notes text notes text
12347        2345678902   username   2345678902 username notes text notes text notes text notes text
12347        2365201214   username   2365201214 username notes text notes text notes text notes text
12347        1202154215   username   1202154215 username notes text notes text notes text notes text
于 2013-02-14T18:03:58.063 回答