1 回答
This is a bug in your code.
LWP::Simple::get
doesn't return the original bytes (in some encoding), it returns decoded text (i.e. Unicode). (Which makes sense, because if it returned bytes, you wouldn't know how to decode them because get
doesn't tell you the encoding.)
So get("http://localhost/wtf.txt")
returns a string containing the codepoint U+00f6. print
then writes some bytes to STDOUT. What are those bytes? That depends on the encoding layer currently set on the filehandle. By default this is a weird mix of Latin-1 and UTF-8 (it might even depend on the internal encoding of the string).
If you want to get UTF-8 output, do binmode STDOUT, ":encoding(UTF-8)";
first. That ensures all text written to STDOUT is encoded as UTF-8.
On the other hand, if you want to ignore encodings and just write the bytes that you received from the web server, then LWP::Simple
is the wrong choice. Use LWP::UserAgent
instead and call $response->content
. (LWP::Simple::get
uses $response->decoded_content
internally.)
The truncation in your second example is probably due to pack
/unpack
, which don't make sense on Unicode strings (they're meant for byte strings, i.e. all codepoints <= 255).