regex - Finding a particular value in an HTTP response using Perl

Question

I have a little script in Perl, HTTP POST request

my $request =  $ua->post( $url, [ 'country' => 10, 'evalprice' => 0 ] );
my $response = $request->content;

Now I know that in the response there will be this part, which appears only once

:&nbsp;<b>9570&nbsp;USD

I want to take only the number 9570 (or whatever it will be), I don't know how to search for

:&nbsp;<b>

and then just take the part after that and before

&nbsp;USD

I guess regular expressions will help, but I can't figure out how to use them here.

score 3 · Accepted Answer

您使用正则表达式走在正确的轨道上。您只需要一个表达式，并且由于您的字符串很简单，您甚至不需要一个非常复杂的表达式。

my $content =~ m/:&nbsp;<b>([.\d]+)&nbsp;USD/;
my $price = $1;

是m//匹配运算符。它会一起=~告诉 Perl 对你的变量做一个正则表达式$content。我们有一个()包含价格的捕获组 ( )，它的内容将进入$1. 是[.\d+]一组字符。点只是一个点（您的价格可能有美分），\d表示所有数字（0- 9）。说可能有很多这样的+角色，但至少有一个。

在http://rubular.com上试用一下
阅读更多关于perlre和perlretut中的正则表达式
如果您想对该网站做更多事情，请查看WWW::Mechanize

score 1 · Accepted Answer

使用这样的代码（删除 HTML 实体很好，但可选）：

use HTML::Entities;

my $content = ":&nbsp;<b>9570&nbsp;USD";
my $decoded = decode_entities($content); # replace &nbsp; to spaces
my ($price) = ($decoded =~ /<b>(\d+)\s*USD/);
print "price = $price\n";

score 1 · Accepted Answer

解析 HTML 最安全的方法是借助适当的 CPAN 模块。但是一个简单的选择（如果响应很简单）可能是这样；

use strict;
use warnings;

my $str = ":&nbsp;<b>9570&nbsp;USD";

if( $str =~ m/:&nbsp;<b>(\d+)&nbsp;/ ) {
   print $1, "\n";
}

我使用了正则表达式，$1找到匹配时的数字。

regex - Finding a particular value in an HTTP response using Perl

3 回答 3

Related

Reference