1

我正在处理一个 OFX(银行交易)文件。我的银行不使用<NAME>标签来指定收款人,但此信息是<MEMO>标签的子字符串。

所以,我的文件是这样的:

...ofx headers and other stuff
...line below is a transaction
<STMTTRN>
    <TRNTYPE>OTHER</TRNTYPE>
    <DTPOSTED>20160609120000</DTPOSTED>
    <TRNAMT>-4.00</TRNAMT>
    <FITID>2016060914000</FITID>
    <CHECKNUM>000000700132</CHECKNUM>
    <REFNUM>700.132</REFNUM>
    <MEMO>Credit Card Payment - 09/06 18:37 Walmart 2th street</MEMO>
</STMTTRN>
...continues other transactions and end of file

我想匹配每个<MEMO>标签,提取收款人姓名(Walmart 2th street在本例中)并用<NAME>. 我的输出将是:

...ofx headers and other stuff
...line below is a transaction
<STMTTRN>
    <TRNTYPE>OTHER</TRNTYPE>
    <DTPOSTED>20160609120000</DTPOSTED>
    <TRNAMT>-4.00</TRNAMT>
    <FITID>2016060914000</FITID>
    <CHECKNUM>000000700132</CHECKNUM>
    <REFNUM>700.132</REFNUM>
    <MEMO>Credit Card Payment - 09/06 18:37 Walmart 2th street</MEMO>
    <NAME>Walmart 2th street</NAME>
</STMTTRN>
...continues other transactions and end of file

另一个工具如 awk 可能是一个解决方案。

4

2 回答 2

2

使用 GNU sed:

sed -r 's/.*<MEMO>.* [0-9]{2}:[0-9]{2} (.*)<.*/&\n    <NAME>\1<\/NAME>/' file

输出:

<STMTTRN>
    <TRNTYPE>OTHER</TRNTYPE>
    <DTPOSTED>20160609120000</DTPOSTED>
    <TRNAMT>-4.00</TRNAMT>
    <FITID>2016060914000</FITID>
    <CHECKNUM>000000700132</CHECKNUM>
    <REFNUM>700.132</REFNUM>
    <MEMO>Credit Card Payment - 09/06 18:37 Walmart 2th street</MEMO>
    <NAME>Walmart 2th street</NAME>
</STMTTRN>

如果您想“就地”编辑文件,请使用 sed 的选项-i

于 2016-07-07T19:44:06.110 回答
0

补充@Cyrus 回答以处理无 ascii 字符:

我放弃了非 ascii 字符,现在它正在工作:

iconv -f "windows-1252" -t "UTF-8" file-ansi.ofx -o file-utf8.ofx
rm file-ansi.ofx
sed 'y/áÁàÀãÃâÂéÉêÊíÍóÓõÕôÔúÚüÜçÇ/aAaAaAaAeEeEiIoOoOoOuUuUcC/' -i file-utf8.ofx
sed -i -r 's/.*<MEMO>.* [0-9]{2}:[0-9]{2} (.*)<.*/&\n                  <NAME>\1<\/NAME>/' file-utf8.ofx 

我的输出:

<MEMO>Cartao de Credito - 09/06 18:37 Walmart 2th</MEMO>
<NAME>Walmart 2th street</NAME>
于 2016-07-07T20:38:55.873 回答