1

I have a little script in Perl, HTTP POST request

my $request =  $ua->post( $url, [ 'country' => 10, 'evalprice' => 0 ] );
my $response = $request->content;

Now I know that in the response there will be this part, which appears only once

:&nbsp;<b>9570&nbsp;USD

I want to take only the number 9570 (or whatever it will be), I don't know how to search for

:&nbsp;<b>

and then just take the part after that and before

&nbsp;USD

I guess regular expressions will help, but I can't figure out how to use them here.

4

3 回答 3

3

您使用正则表达式走在正确的轨道上。您只需要一个表达式,并且由于您的字符串很简单,您甚至不需要一个非常复杂的表达式。

my $content =~ m/:&nbsp;<b>([.\d]+)&nbsp;USD/;
my $price = $1;

m//匹配运算符。它会一起=~告诉 Perl 对你的变量做一个正则表达式$content。我们有一个()包含价格的捕获组 ( ),它的内容将进入$1. 是[.\d+]一组字符。点只是一个点(您的价格可能有美分),\d表示所有数字(0- 9)。说可能有很多这样的+角色,但至少有一个。

于 2013-06-09T09:26:24.917 回答
1

使用这样的代码(删除 HTML 实体很好,但可选):

use HTML::Entities;

my $content = ":&nbsp;<b>9570&nbsp;USD";
my $decoded = decode_entities($content); # replace &nbsp; to spaces
my ($price) = ($decoded =~ /<b>(\d+)\s*USD/);
print "price = $price\n";
于 2013-06-09T09:26:13.187 回答
1

解析 HTML 最安全的方法是借助适当的 CPAN 模块。但是一个简单的选择(如果响应很简单)可能是这样;

use strict;
use warnings;

my $str = ":&nbsp;<b>9570&nbsp;USD";

if( $str =~ m/:&nbsp;<b>(\d+)&nbsp;/ ) {
   print $1, "\n";
}

我使用了正则表达式,$1找到匹配时的数字。

于 2013-06-09T09:27:27.897 回答