0

我在使用 c++ 解析一些 HTTP 标头时遇到问题。现在我希望能够找到结束每个 HTTP 标头条目的回车/换行组合。我用 str.find() 这样做:

string hdr; //filled with the header data
int line_end_pos = hdr.find("\r\n"); //also tried "\\r\\n", same results

尽管知道标题包含回车符和换行符的组合,但 find() 仍然返回 -1。我在这里想念什么?

编辑:

我使用的库提供了几个不同的函数来显示数据。字符串格式的标头数据示例如下所示:

GET /p/libcrafter/ HTTP/1.1
Host: code.google.com
User-Agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:24.0) Gecko/20100101 Firefox/24.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en,en-us;q=0.5
Accept-Encoding: gzip, deflate
DNT: 1
Cookie: PREF=ID=ad8fd3ab4b0bd3c9:U=e1bd88556eeb2dce:FF=0:TM=1382531357:LM=1382531841:S=Pbh-JiokGeVbsSh-; NID=67=olK2k5sUZ95mRApV77s7CfXscytJSfmVuyubiSCMotOdBBvijqrTwyyifLQZbZA_SCTVQXqTEoE6hqaqVJkRpqoY2RPDFBPghbe5czX6QxKw7lBdOaP6-IpzGXYMWl6Q; OGPC=4061029-5:; __utma=247248150.2068354019.1382532826.1382532826.1382532826.1; __utmb=247248150.10.10.1382532826; __utmc=247248150; __utmz=247248150.1382532826.1.1.utmcsr=(direct)|utmccn=(direct)|utmcmd=(none)
Connection: keep-alive
Cache-Control: max-age=0

在“Hex Dump”格式中看起来像这样:

  47455420 2F702F6C 69626372 61667465  GET /p/libcrafte 00000000
  722F2048 5454502F 312E310D 0A486F73  r/ HTTP/1.1..Hos 00000010
  743A2063 6F64652E 676F6F67 6C652E63  t: code.google.c 00000020
  6F6D0D0A 55736572 2D416765 6E743A20  om..User-Agent:  00000030
  4D6F7A69 6C6C612F 352E3020 28583131  Mozilla/5.0 (X11 00000040
  3B205562 756E7475 3B204C69 6E757820  ; Ubuntu; Linux  00000050
  7838365F 36343B20 72763A32 342E3029  x86_64; rv:24.0) 00000060
  20476563 6B6F2F32 30313030 31303120   Gecko/20100101  00000070
  46697265 666F782F 32342E30 0D0A4163  Firefox/24.0..Ac 00000080
  63657074 3A207465 78742F68 746D6C2C  cept: text/html, 00000090
  6170706C 69636174 696F6E2F 7868746D  application/xhtm 000000A0
  6C2B786D 6C2C6170 706C6963 6174696F  l+xml,applicatio 000000B0
  6E2F786D 6C3B713D 302E392C 2A2F2A3B  n/xml;q=0.9,*/*; 000000C0
  713D302E 380D0A41 63636570 742D4C61  q=0.8..Accept-La 000000D0
  6E677561 67653A20 656E2C65 6E2D7573  nguage: en,en-us 000000E0
  3B713D30 2E350D0A 41636365 70742D45  ;q=0.5..Accept-E 000000F0
  6E636F64 696E673A 20677A69 702C2064  ncoding: gzip, d 00000100
  65666C61 74650D0A 444E543A 20310D0A  eflate..DNT: 1.. 00000110
  436F6F6B 69653A20 50524546 3D49443D  Cookie: PREF=ID= 00000120
  61643866 64336162 34623062 64336339  ad8fd3ab4b0bd3c9 00000130
  3A553D65 31626438 38353536 65656232  :U=e1bd88556eeb2 00000140
  6463653A 46463D30 3A544D3D 31333832  dce:FF=0:TM=1382 00000150
  35333133 35373A4C 4D3D3133 38323533  531357:LM=138253 00000160
  31383431 3A533D50 62682D4A 696F6B47  1841:S=Pbh-JiokG 00000170
  65566273 53682D3B 204E4944 3D36373D  eVbsSh-; NID=67= 00000180
  6F6C4B32 6B357355 5A39356D 52417056  olK2k5sUZ95mRApV 00000190
  37377337 43665873 6379744A 53666D56  77s7CfXscytJSfmV 000001A0
  75797562 6953434D 6F744F64 42427669  uyubiSCMotOdBBvi 000001B0
  6A717254 77797969 664C515A 625A415F  jqrTwyyifLQZbZA_ 000001C0
  53435456 51587154 456F4536 68716171  SCTVQXqTEoE6hqaq 000001D0
  564A6B52 70716F59 32525044 46425067  VJkRpqoY2RPDFBPg 000001E0
  68626535 637A5836 51784B77 376C4264  hbe5czX6QxKw7lBd 000001F0
  4F615036 2D49707A 4758594D 576C3651  OaP6-IpzGXYMWl6Q 00000200
  3B204F47 50433D34 30363130 32392D35  ; OGPC=4061029-5 00000210
  3A3B205F 5F75746D 613D3234 37323438  :; __utma=247248 00000220
  3135302E 32303638 33353430 31392E31  150.2068354019.1 00000230
  33383235 33323832 362E3133 38323533  382532826.138253 00000240
  32383236 2E313338 32353332 3832362E  2826.1382532826. 00000250
  313B205F 5F75746D 623D3234 37323438  1; __utmb=247248 00000260
  3135302E 31302E31 302E3133 38323533  150.10.10.138253 00000270
  32383236 3B205F5F 75746D63 3D323437  2826; __utmc=247 00000280
  32343831 35303B20 5F5F7574 6D7A3D32  248150; __utmz=2 00000290
  34373234 38313530 2E313338 32353332  47248150.1382532 000002A0
  3832362E 312E312E 75746D63 73723D28  826.1.1.utmcsr=( 000002B0
  64697265 6374297C 75746D63 636E3D28  direct)|utmccn=( 000002C0
  64697265 6374297C 75746D63 6D643D28  direct)|utmcmd=( 000002D0
  6E6F6E65 290D0A43 6F6E6E65 6374696F  none)..Connectio 000002E0
  6E3A206B 6565702D 616C6976 650D0A43  n: keep-alive..C 000002F0
  61636865 2D436F6E 74726F6C 3A206D61  ache-Control: ma 00000300
  782D6167 653D300D 0A0D0A             x-age=0....      00000310

最后,它看起来像一个“原始字符串”:

\x47\x45\x54\x20\x2f\x70\x2f\x6c\x69\x62\x63\x72\x61\x66\x74\x65\x72\x2f\x20\x48
\x54\x54\x50\x2f\x31\x2e\x31\xd\xa\x48\x6f\x73\x74\x3a\x20\x63\x6f\x64\x65\x2e\x67
\x6f\x6f\x67\x6c\x65\x2e\x63\x6f\x6d\xd\xa\x55\x73\x65\x72\x2d\x41\x67\x65\x6e\x74
\x3a\x20\x4d\x6f\x7a\x69\x6c\x6c\x61\x2f\x35\x2e\x30\x20\x28\x58\x31\x31\x3b\x20\x55
\x62\x75\x6e\x74\x75\x3b\x20\x4c\x69\x6e\x75\x78\x20\x78\x38\x36\x5f\x36\x34\x3b\x20
\x72\x76\x3a\x32\x34\x2e\x30\x29\x20\x47\x65\x63\x6b\x6f\x2f\x32\x30\x31\x30\x30\x31
\x30\x31\x20\x46\x69\x72\x65\x66\x6f\x78\x2f\x32\x34\x2e\x30\xd\xa\x41\x63\x63\x65\x70
\x74\x3a\x20\x74\x65\x78\x74\x2f\x68\x74\x6d\x6c\x2c\x61\x70\x70\x6c\x69\x63\x61\x74
\x69\x6f\x6e\x2f\x78\x68\x74\x6d\x6c\x2b\x78\x6d\x6c\x2c\x61\x70\x70\x6c\x69\x63\x61
\x74\x69\x6f\x6e\x2f\x78\x6d\x6c\x3b\x71\x3d\x30\x2e\x39\x2c\x2a\x2f\x2a\x3b\x71\x3d
\x30\x2e\x38\xd\xa\x41\x63\x63\x65\x70\x74\x2d\x4c\x61\x6e\x67\x75\x61\x67\x65\x3a\x20
\x65\x6e\x2c\x65\x6e\x2d\x75\x73\x3b\x71\x3d\x30\x2e\x35\xd\xa\x41\x63\x63\x65\x70\x74
\x2d\x45\x6e\x63\x6f\x64\x69\x6e\x67\x3a\x20\x67\x7a\x69\x70\x2c\x20\x64\x65\x66\x6c\x61
\x74\x65\xd\xa\x44\x4e\x54\x3a\x20\x31\xd\xa\x43\x6f\x6f\x6b\x69\x65\x3a\x20\x50\x52
\x45\x46\x3d\x49\x44\x3d\x61\x64\x38\x66\x64\x33\x61\x62\x34\x62\x30\x62\x64\x33\x63
\x39\x3a\x55\x3d\x65\x31\x62\x64\x38\x38\x35\x35\x36\x65\x65\x62\x32\x64\x63\x65\x3a
\x46\x46\x3d\x30\x3a\x54\x4d\x3d\x31\x33\x38\x32\x35\x33\x31\x33\x35\x37\x3a\x4c\x4d
\x3d\x31\x33\x38\x32\x35\x33\x31\x38\x34\x31\x3a\x53\x3d\x50\x62\x68\x2d\x4a\x69\x6f
\x6b\x47\x65\x56\x62\x73\x53\x68\x2d\x3b\x20\x4e\x49\x44\x3d\x36\x37\x3d\x6f\x6c\x4b
\x32\x6b\x35\x73\x55\x5a\x39\x35\x6d\x52\x41\x70\x56\x37\x37\x73\x37\x43\x66\x58\x73
\x63\x79\x74\x4a\x53\x66\x6d\x56\x75\x79\x75\x62\x69\x53\x43\x4d\x6f\x74\x4f\x64\x42
\x42\x76\x69\x6a\x71\x72\x54\x77\x79\x79\x69\x66\x4c\x51\x5a\x62\x5a\x41\x5f\x53\x43
\x54\x56\x51\x58\x71\x54\x45\x6f\x45\x36\x68\x71\x61\x71\x56\x4a\x6b\x52\x70\x71\x6f
\x59\x32\x52\x50\x44\x46\x42\x50\x67\x68\x62\x65\x35\x63\x7a\x58\x36\x51\x78\x4b\x77
\x37\x6c\x42\x64\x4f\x61\x50\x36\x2d\x49\x70\x7a\x47\x58\x59\x4d\x57\x6c\x36\x51\x3b
\x20\x4f\x47\x50\x43\x3d\x34\x30\x36\x31\x30\x32\x39\x2d\x35\x3a\x3b\x20\x5f\x5f\x75
\x74\x6d\x61\x3d\x32\x34\x37\x32\x34\x38\x31\x35\x30\x2e\x32\x30\x36\x38\x33\x35\x34
\x30\x31\x39\x2e\x31\x33\x38\x32\x35\x33\x32\x38\x32\x36\x2e\x31\x33\x38\x32\x35\x33
\x32\x38\x32\x36\x2e\x31\x33\x38\x32\x35\x33\x32\x38\x32\x36\x2e\x31\x3b\x20\x5f\x5f
\x75\x74\x6d\x62\x3d\x32\x34\x37\x32\x34\x38\x31\x35\x30\x2e\x31\x30\x2e\x31\x30\x2e
\x31\x33\x38\x32\x35\x33\x32\x38\x32\x36\x3b\x20\x5f\x5f\x75\x74\x6d\x63\x3d\x32\x34
\x37\x32\x34\x38\x31\x35\x30\x3b\x20\x5f\x5f\x75\x74\x6d\x7a\x3d\x32\x34\x37\x32\x34
\x38\x31\x35\x30\x2e\x31\x33\x38\x32\x35\x33\x32\x38\x32\x36\x2e\x31\x2e\x31\x2e\x75
\x74\x6d\x63\x73\x72\x3d\x28\x64\x69\x72\x65\x63\x74\x29\x7c\x75\x74\x6d\x63\x63\x6e
\x3d\x28\x64\x69\x72\x65\x63\x74\x29\x7c\x75\x74\x6d\x63\x6d\x64\x3d\x28\x6e\x6f\x6e
\x65\x29\xd\xa\x43\x6f\x6e\x6e\x65\x63\x74\x69\x6f\x6e\x3a\x20\x6b\x65\x65\x70\x2d\x61
\x6c\x69\x76\x65\xd\xa\x43\x61\x63\x68\x65\x2d\x43\x6f\x6e\x74\x72\x6f\x6c\x3a\x20\x6d
\x61\x78\x2d\x61\x67\x65\x3d\x30\xd\xa\xd\xa

如您所见,当以十六进制格式输出时,行以 0D 和 0A 结尾,而当以原始字符串格式输出时,它们以 \xd 和 \xa 结尾。不过,我的问题仍然存在,在将数据作为字符串处理时如何找到这些行尾字符(或者我不能)?

4

1 回答 1

0

以下程序的输出是35

#include <iostream>
using namespace std;

int main()
{
    string hdr = "Date: Wed, 23 Oct 2013 02:20:30 GMT\r\nServer: Apache\r\n"; 
    int line_end_pos = hdr.find("\r\n");
    cout << line_end_pos;
}

如果我们然后修改此代码,则它现在是:

#include <iostream>
#include <fstream>
using namespace std;

int main()
{
    string hdr = "Date: Wed, 23 Oct 2013 02:20:30 GMT\r\nServer: Apache\r\n"; 

    int line_end_pos = hdr.find("\r\n");
    cout << line_end_pos;

    fstream output;
    output.open("test.txt", std::fstream::out);

    output << hdr;
    output.close();
}

我们得到一个包含 hdr 内容的文件。使用十六进制编辑器查看它时,可以看到输入发生了一些转换。在GMT和之间Server,我们期望看到两个字符 - 0x0D 和 0x0A。但是,我们看到 test.txt 实际上有 3 个字符 - 0x0D、0x0D、0x0A。当输入字符串为 53 个字节(字符)长时,该文件的长度也是 55 个字节(字符)。

如果我们按位或标志位std::fstream::binarystd::fstream::out

output.open("test.txt", std::fstream::out | std::fstream::binary);

那么输出是保存在 中的字符串的相同副本hdr。即 53 字节长,0x0d, 0x0a行间单行。

编辑:另外,值得指出的是,unix 和基于 Windows 的系统有不同的行尾约定。我在windows下写了这段代码。

太好了,我建议您保存标题的副本并使用十六进制编辑器对其进行检查 - 除非您这样做或使用调试器,否则您将无法知道问题所在。我通常发现将文本输入视为二进制输入是最安全的 - 因为没有行尾字符的翻译。

编辑 2:当你运行这个时你得到 26 的结果吗?如果是这样,恐怕我刚才没有想法。当我早上新鲜时,我会进一步考虑你的问题。

#include <iostream>

using namespace std;

int main()
{
    char rawData[] =
    {
        0x47,0x45,0x54,0x20, 0x2F,0x70,0x2F,0x6C, 0x69,0x62,0x63,0x72, 0x61,0x66,0x74,0x65,
        0x72,0x2F,0x20,0x48, 0x54,0x54,0x50,0x2F, 0x31,0x2E,0x31,0x0D, 0x0A,0x48,0x6F,0x73,
        0x74,0x3A,0x20,0x63, 0x6F,0x64,0x65,0x2E, 0x67,0x6F,0x6F,0x67, 0x6C,0x65,0x2E,0x63
    };
    string hdr = rawData;
    int newLinePos = hdr.find("\r\n");
    cout << newLinePos;
}
于 2013-10-23T06:31:09.567 回答