提出类似问题的问题:
使用 HTTP/1.1 时 LWP 的问题:块大小错误,响应被截断。
我正在使用 Perl 模块WWW::Mechanize来抓取网站。据我了解,WWW::Mechanize 使用 Net::HTTP 模块来实现 HTTP 协议。
这是问题:
my $url = 'https://somewebsite.com/a/b/c?skey=svalue';
my $browser = WWW::Mechanize->new();
$browser->get($url);
当我执行上面的代码片段时(假设所有导入都到位),我得到一个空的响应内容,在 WWW:Mechanize 的响应对象内的响应标头中出现以下错误:
'x-died' = "Bad chunk-size in HTTP response: { at path/ to/perl/vendor/lib/Net/HTTP/Methods.pm line 542."
注意异常消息中的“{”。然后我尝试调试 Methods.pm 模块以查看发生了什么,看起来异常发生在read_entity_body子例程中。
我还为我拥有的 url 做了一个curl并得到了以下响应头:
< HTTP/1.1 200 OK
< Set-Cookie: JSESSIONID=C61B57BA5DD0A05912C98CE1CFBAD435; Path=/; HttpOnly
< X-Frame-Options: DENY
< Transfer-Encoding: chunked
< Strict-Transport-Security: max-age=31536000 ; includeSubDomains
< Server: Apache-Coyote/1.1
< Cache-Control: no-cache, no-store, max-age=0, must-revalidate
< X-Content-Type-Options: nosniff
< Content-Disposition: attachment;filename=f.txt
< Pragma: no-cache
< Expires: 0
< X-XSS-Protection: 1; mode=block
< Date: Thu, 21 Sep 2017 18:31:27 GMT
< Content-Type: application/json;charset=UTF-8
< Transfer-Encoding: chunked
并具有以下内容:
{
"total" : 1,
"page" : 1,
"records" : 1,
"rows" : [ {
"infoPostRptId" : 2,
"mngPplId" : 1,
"infoPostRptXsdId" : 1,
"rptFmtCode" : "XML",
"createUserId" : 5183202,
"updateUserId" : 1,
"statusId" : 309403,
"seqNbr" : 0,
"urlAnchor" : null,
} ],
"errors" : null
}
* Connection #0 to host xxxxxxx left intact
如果我没记错的话,看起来从网站传来的内容实际上并不是块编码的,尽管标题中提到了要分块的传输编码。
有关 Methods.pm 模块的更多信息:
据我了解,read_entity_body子例程尝试解码并组合块以形成响应内容。
我认为问题在于响应标头具有 Transfer-Encoding: chunked 但内容实际上没有被分块编码。
非常感谢任何帮助。谢谢。
编辑1:
版本:
WWW:Mechanize: 1.83 , LWP:UserAgent: 6.15和 Net::HTTP: 6.12
编辑2:
输出curl -s --raw -D - "https://...."
:
HTTP/1.1 200 OK
Set-Cookie: JSESSIONID=A29B1E0F561F1E4FBAF12583C0C2DE08; Path=/; HttpOnly
X-Frame-Options: DENY
Transfer-Encoding: chunked
Strict-Transport-Security: max-age=31536000 ; includeSubDomains
Server: Apache-Coyote/1.1
Cache-Control: no-cache, no-store, max-age=0, must-revalidate
X-Content-Type-Options: nosniff
Content-Disposition: attachment;filename=f.txt
Pragma: no-cache
Expires: 0
X-XSS-Protection: 1; mode=block
Date: Fri, 22 Sep 2017 02:36:51 GMT
Content-Type: application/json;charset=UTF-8
Transfer-Encoding: chunked
45c
{
"total" : 1,
"page" : 1,
"records" : 1,
"rows" : [ {
"infoPostRptId" : 2,
"mngPplId" : 1,
"infoPostRptXsdId" : 1,
"rptFmtCode" : "XML",
"createUserId" : 5183202,
"updateUserId" : 1,
"statusId" : 309403,
"seqNbr" : 0,
"urlAnchor" : null,
} ],
"errors" : null
}
0
与之前的 JSON 内容一样,我删除/更改了一些值只是为了匿名数据。
编辑3: 这是我执行以下命令时得到的:
perl -MLWP::UserAgent -e'print LWP::UserAgent->new->get($ARGV[0])->as_string' 'https://......'
HTTP/1.1 200 OK
Cache-Control: no-cache, no-store, max-age=0, must-revalidate
Connection: close
Date: Fri, 22 Sep 2017 04:15:06 GMT
Pragma: no-cache
Server: Apache-Coyote/1.1
Content-Type: application/json;charset=UTF-8
Expires: 0
Client-Aborted: die
Client-Date: Fri, 22 Sep 2017 04:15:06 GMT
Client-Peer: 67.221.172.5:443
Client-Response-Num: 1
Client-SSL-Cert-Issuer: /C=US/ST=Arizona/L=Scottsdale/O=GoDaddy.com, Inc./OU=http://certs.godaddy.com/repository//CN=Go Daddy Secure Certificate Authority - G2
Client-SSL-Cert-Subject: /OU=Domain Control Validated/CN=*.trellisenergy.com
Client-SSL-Cipher: ECDHE-RSA-AES128-SHA256
Client-SSL-Socket-Class: IO::Socket::SSL
Client-Transfer-Encoding: chunked
Content-Disposition: attachment;filename=f.txt
Set-Cookie: JSESSIONID=5CAC35648DBBE25E3229DE9BF21C3794; Path=/; HttpOnly
Strict-Transport-Security: max-age=31536000 ; includeSubDomains
X-Content-Type-Options: nosniff
X-Died: Bad chunk-size in HTTP response: { at /usr/local/share/perl5/Net/HTTP/Methods.pm line 544.
X-Frame-Options: DENY
X-XSS-Protection: 1; mode=block
编辑 4: TCP 转储:
在一个终端窗口中执行了以下命令:
perl -MLWP::UserAgent -e'print LWP::UserAgent->new->get($ARGV[0])->as_string' 'https://vgs.trellisenergy.com/ptms/public/infopost/getInfoPostRpts.do?tspId=1&proxyTspId=1&rptId=2&downloadInd=0&searchInd=0&showLatestInd=0&cycleId=10303&startDate=09/20/2017&endDate=09/20/2017&_search=false&nd=1505846852955&rows=10&page=1&sidx=&sord=asc&_=1505846826289'
以下是另一个:
tcpdump -w tcpdump.pcap -A -s0 -e -n -vvv -i eth0 host vgs.trellisenergy.com
漂亮的打印 tcpdump 使用:
tcpick -C -yP -r tcpdump.pcap
TCP转储:
Starting tcpick 0.2.1 at 2017-09-22 10:24 MDT
Timeout for connections is 600
tcpick: reading from tcpdump.pcap
1 SYN-SENT 10.1.1.10:24876 > 67.221.172.5:https
1 SYN-RECEIVED 10.1.1.10:24876 > 67.221.172.5:https
1 ESTABLISHED 10.1.1.10:24876 > 67.221.172.5:https
...........Y.8..*m.i.'ZZP*....1...d
.._.$.^....0.,.(.$...
.....k.j.9.8.....2...*.&.......=.5.../.+.'.#... .....g.@.3.2.....E.D.1.-.).%.......<./...A.........
..................._.........vgs.trellisenergy.com.........
. .....................................
.....0..1.0.......U....US1.0...U....Arizona1.0...U...............>.s].s.a^.
Scottsdale1.0...U.
..........0..0A1!0...U....Domain Control Validated1.0...U....*.trellisenergy.com0.."0 Secure Certificate Authority - G20..
h@s0.*$.H.4./..E8.m.V......'!..f...!tY'.(..`......... ...E.)Tz..z2.%..KEi....Dd.....s....JW_.Y ..8..6..Y ........i.r............"...a.
LI1V 6t....C.....20uB'..#:...n..(-...(..P..M..O...p.3L.].@A.........0...0...U.......0.0...U.%..0...+.........+.......0...U...........07..U...00.0,.*.(.&http://crl.godaddy.com/gdig2s1-337.crl0]..U. .V0T0H..`.H...m....0907..+........+http://certificates.godaddy.com/repository/0...g.....0v..+........j0h0$..+.....0...http://ocsp.godaddy.com/0@..+.....0..4http://certificates.godaddy.com/repository/gdig2.crt0...U.#..0...@..'..4.0.3..l...,..01..U...*0(..*.trellisenergy.com............z...;^..'.@.l..,Cj...N.LY.S.......~p...k.. ...Y..S}.\}o.......(.
.....H..SG.D.vy}...qM(.0LT.C.....R.......y... Y.....wz.s4..Q.t...u...].8.|..q..+.>5...?..`z.X2. .{.%..[ 7.. r...y.yjY..h]...0I.$..x,O....h......n.b.....c.<.....X.Gi.P.vTM.d.B.
.....0..1.0...a...U....US1.0...U....Arizona1.0...U...
Scottsdale1.0...U.
310503070000Z0..1.0110/...U....US1.0...U....Arizona1.0...U...rity - G20..
Scottsdale1.0...U.
..........0.., Inc.1-0+..U...$http://certs.godaddy.com/repository/1301..U...*Go Daddy Secure Certificate Authority - G20.."0
...........v...b.0d...l...b../.>e...b.<R...EKU.xkc.b...il.....L.E3......+..a.yW....?0<]G.....7.AQ..KT.(.....08...&.fGcm.q&G.8GS.F......E...q..o....0:yO_LG...[...`;..C...3N...'O.%........t.dW..DU.-*:>....2
..d..:P.J..y3.. .....9.i.lcR.w...t.....PT5KiN.;.I.....R..........0...0...U.......0....0...U...........0...U......@..'..4.0.3..l...,..0...U.#..0...:....g(.....An .....04..+........(0&0$..+.....0...http://ocsp.godaddy.com/05..U....0,0*.(.&.......`..r.s$..."....bXD...%......b.Q...Q*...s.v.6....,....*...Mu..?.A.#}[K...X.F..``..}PA......../..T.D..}.C.D..p
...3..-v6&.....a....o.F.(..&}
.....0..1.0.......U....US1.0...U....Arizona1.0...U...
Scottsdale1.0...U.
09GoDaddy.com, Inc.110/..U...(Go Daddy Root Certificate Authority - G20..
371231235959Z0..1.0 ..U....US1.0...U....Arizona1.0...U...
Scottsdale1.0...U.
..........0.., Inc.110/..U...(Go Daddy Root Certificate Authority - G20.."0
..f"..im6.......`.8......F.. C.;....I.'....N...p..2...>.N...O/Y0"...Vk......u.9Q{..5.tN......?........j..............;F|2
>.]|.|..+S..biQ%.a.D..,.C.#..:...)....]....0
............]y...Yg.a.~;.1u-. .Oe......../..Z..t.s.8B..{..u...........S.~.F.....+....'....Z.7....l....=.$Oy.5._.......-.......s@.r%......h..W...: ..D...7...2..8..d.,~........h..".8-z..T.i._3.z={
.8.. 'e...]p-..N.(F...6.....(....k.Q......8k...v...v...(...=!.:...;.L.....K./.....D....xH .Zi.<!.}i. t.c.!yWY..c.I......?.._.e......"...v.'8Qq.d].......O(8._M....%........]:LU....]l. .....
............iA...~....C5...k.43... .F6. .\!....X......bJ.e..@.....[.uO.&..-....7.O. .......g2..R.b....H7.........G.....%u1.....8$.u..O....za..T..........P...V2.;.......j.L.Px;..-....&.......H...yQ,n.s..<KFx#...2..K.G..n4OG{N.5.6../...
......
....PU.T....A.d...*.iw.. c.Wjm.V\. ..vP.Z%......v...k......l...b7.|.u..c.=:....$.3K..
........v.{u...`..+.qU. .'.t.g....V......1..P.g..aO....nY..C..F...4x.d...Y....|3..Pz;.K.~]...H..;...PIR..hRv...)].=?.:..[...h...A.. /4..d.......C`....]LZK.Y..q......Q.L.R..D&...l..t..I.j2....8...y.L..).y.n..).u|..'.....z ..,Yg..md."i.......M.74x...3..N.b.6..tm.).u...|-.xK.9R..M,......!....}..[=B.J...... ...~Gx.8p.5.UQ........sJ
...w..Xf.#^..,..G.w.f4.V..'..Bb_..*e.i......P1.
U6!.l..%...ts. u!c5.0>.!.2J.G)p.W.........dF*5.....5..M. .....G+.....I..vG&..>.}(....E. ...9...N.i..Jm&b...G...3Wo#k.........e:..p........:w....V.L'9.-..)......d.P_....#..iide@.2..E>.?|..:....B.,mr...N.JAS1]:...O.......i..c..T.pZZ)..E."\b.r2HA..r!....L........K....~1.....x!.Gp.K..G..D*s.u....WN.?..(+..rU..g?d.....eG.L.^...*..a...]/...N0.gX..;...T...%...;.P?.O4{.i.....%.T.|..
...U..Ug......d...a3:$...p...v..t."...
.......%..J`E....5....n..M....>...ge.r.,...s..,.. k..R.N._>3}...=.0...........T.d.. ...u 7?T...3b.?.lr...8o.Gk.}xkBY[...l..^.-.Wt}..G/..l.f..z..^F.A.G.i8l4.....#.a.....BS.c.Q7..=y...{ELUP.R..c.{...a9.u3..-@F.H..M..2.o.j@.pI..S....R ..vx.u.<-x..".T.d-...:...>......n..Z|..?Dz@N..?...#.../.....2.Z..y..Ej..........Q.....'8.....nC..7.....)e..7r..[..H...R.....h...x7G.+.......eBErwo.r....,..e*.8O..oQ. `O.@.J#...5).9.....!d.u....,...pV..oS...%.o..F..G.7....I...N...s .G..G@.".w6d......R..j
..........G.D..l....0..EH.Y..4.e.\#~s.i.-WKoyK...w.'.o.X-.,x.......4......T.*.>#..
..G(wP.V.i...F.U...t...-.\.!...Y4,...._............7..|<DM3.&u.%.0..G.......9....
.....Y......55ZW..X......Tz..D...r.6$..B...Wv..R..8.."../dL..-...i^o..>:..O...s.W.).i....gOH...@.....8k.......Q........#.....#.R..^.....f.......x^X....^S.R..u.7.._..T]A'/4>k\..Lg....H...J....o>.2 ......$.......PP..#..=.E..;2..>k...`...9..>*.....N...4........(...a....n....)w.I.@O+.(.cV..g.....%G..^.Z#.'EG...]..$_...!e...%.;VG.7.5.&...C........s4..1....t[
1 FIN-WAIT-1 10.1.1.10:24876 > 67.221.172.5:https
1 TIME-WAIT 10.1.1.10:24876 > 67.221.172.5:https
1 CLOSED 10.1.1.10:24876 > 67.221.172.5:https
tcpick: done reading from tcpdump.pcap
22 packets captured
1 tcp sessions detected