我有一个包含以下网址的 html 页面:
<h3><a href="http://site.com/path/index.php" h="blablabla">
<h3><a href="https://www.site.org/index.php?option=com_content" h="vlavlavla">
我想提取:
site.com/path
www.site.org
之间<h3><a href="
& /index.php
。
我试过这段代码:
#!/usr/local/bin/perl
use strict;
use warnings;
open (MYFILE, 'MyFileName.txt');
while (<MYFILE>)
{
my $values1 = split('http://', $_); #VALUE WILL BE: www.site.org/path/index2.php
my @values2 = split('index.php', $values1); #VALUE WILL BE: www.site.org/path/ ?option=com_content
print $values2[0]; # here it must print www.site.org/path/ but it don't
print "\n";
}
close (MYFILE);
但这给出了一个输出:
2
1
2
2
1
1
它不解析 https 网站。希望你明白,问候。