c++ - 如何从缓冲区中读取特定字符串

Question

我有一个缓冲区

char buffer[size];

我用来存储流的文件内容（假设 pStream 在这里）

HRESULT hr = pStream->Read(buffer, size, &cbRead );

现在我在缓冲区中有这个流的所有内容，它的大小（假设这里的大小）。现在我知道我有两个字符串

"<!doctortype html" and ".html>"

它们存在于这个缓冲区的存储内容中的某个地方（我们不知道它们的位置），我只想从该位置存储缓冲区的内容

"<!doctortype html" to another string ".html>"

进入另一个缓冲区2[SizeWeDontKnow]。

怎么做？？？（实际上这两个位置的内容是 html 文件的内容，我想存储此缓冲区中存在的仅 html 文件的内容）。任何想法如何做到这一点？

score 1 · Accepted Answer

您可以使用 strnstr 函数在缓冲区中找到正确的位置。找到开始和结束标记后，您可以使用 strncpy 提取中间的文本，或者如果性能有问题，可以使用它。
您可以根据标签的位置和第一个标签的长度计算所需的大小
nLength = nPosEnd - nPosStart - nStartTagLength

score 0 · Accepted Answer

寻找 C/C++ 的 HTML 解析器。

另一种方法是从缓冲区的开头获取一个 char 指针，然后检查其中的每个 char。看看是否符合你的要求。

score 0 · Accepted Answer

您仅限于 C，还是可以使用 C++？

在 C 库参考中有很多有用的方法来标记字符串和比较匹配项 (string.h)：

http://www.cplusplus.com/reference/cstring/

使用 C++ 我将执行以下操作（使用代码中的缓冲区和大小变量）：

    // copy char array to std::string
    std::string text(buffer, buffer + size);

    // define what we're looking for
    std::string begin_text("<!doctortype html");
    std::string end_text(".html>");

    // find the start and end of the text we need to extract
    size_t begin_pos = text.find(begin_text) + begin_text.length();
    size_t end_pos = text.find(end_text);

    // create a substring from the positions
    std::string extract = text.substr(begin_pos,end_pos);

    // test that we got the extract
    std::cout << extract << std::endl;

如果您需要 C 字符串兼容性，您可以使用：

char* tmp =  extract.c_str();

score 0 · Accepted Answer

如果这是在您的应用程序中对 HTML 代码进行的唯一操作，那么您可以使用我在下面提供的解决方案（您也可以在线测试它 -此处）。但是，如果您要进行一些更复杂的解析，那么我建议使用一些外部库。

#include <iostream>
#include <cstdio>
#include <cstring>

using namespace std;

int main()
{
    const char* beforePrefix = "asdfasdfasdfasdf";
    const char* prefix = "<!doctortype html";
    const char* suffix = ".html>";
    const char* postSuffix = "asdasdasd";

    unsigned size = 1024;
    char buf[size];
    sprintf(buf, "%s%sTHE STRING YOU WANT TO GET%s%s", beforePrefix, prefix, suffix, postSuffix);

    cout << "Before: " << buf << endl;

    const char* firstOccurenceOfPrefixPtr = strstr(buf, prefix);
    const char* firstOccurenceOfSuffixPtr = strstr(buf, suffix);

    if (firstOccurenceOfPrefixPtr && firstOccurenceOfSuffixPtr)
    {
        unsigned textLen = (unsigned)(firstOccurenceOfSuffixPtr - firstOccurenceOfPrefixPtr - strlen(prefix));
        char newBuf[size];
        strncpy(newBuf, firstOccurenceOfPrefixPtr + strlen(prefix), textLen);
        newBuf[textLen] = 0;

        cout << "After: " << newBuf << endl;
    }

    return 0;
}

编辑我现在明白了:)。您应该使用strstr来查找prefixthen 的第一次出现。我编辑了上面的代码，并更新了链接。

c++ - 如何从缓冲区中读取特定字符串

4 回答 4

Related

Reference