c++ - 使用 C++ 在文件中进行字符串搜索/索引

Question

我正在使用以下代码来搜索文件并提供数据和相关的行号。但是这个代码在几十万行的情况下是否足够快？我的电脑确实冻结了几秒钟。我需要搜索整数对并在逗号后返回其 RHS 值（一些统计数据），但使用以下代码我可以返回整行。

split使用函数解析返回的数据并获取我的 RHS 值在快速性方面是否是个好主意

或者

根据 LHS 参数直接获取 RHS 值。（好吧，我无法做到这一点）

任何人都可以帮助我实现上述两项中的任何一项吗？

这是我的代码：

#include <string>
#include <iostream>
#include <fstream>

    int main()
    {
        std::ifstream file( "index_hyper.txt" ) ;
        std::string search_str = "401" ;
        std::string line ;
        int line_number = 0 ;
        while( std::getline( file, line ) )
        {
            ++line_number ;
            if( line.find(search_str) != std::string::npos )
                std::cout << "line " << line_number << ": " << line << '\n' ;
        }
    }

这是我的index_hyper.txt文件内容：

score 1 · Accepted Answer

You can do the work of the code above with:

grep -n "^401," index_hyper.txt

If you want to output just the RHS, you can:

grep  "^401," index_hyper.txt | sed "s/[^,]*,//"

If you are on a Windows platform without sed, grep, bash, etc. then you can easily access unix tools by installing cygwin.

score 0 · Accepted Answer

作为一般规则，在需要之前不要开始将字符串分解成更小的部分（子字符串）。并从准确指定想要的内容开始：您谈到 RHS 和 LHS，并谈到“根据 LHS 参数获取 RHS 值”。所以：你想要第一个字段的精确匹配，第一个字段的子字符串匹配，还是整行的子字符串匹配？

无论如何：一旦你有了 in 的行line，你可以很容易地将它分成两个字段：

std::string::const_iterator pivot = std::find( line.cbegin(), line.cend(), ',' );

然后你做什么取决于你的标准是什么：

if ( pivot - line.cbegin() == search_str.size() &&
        std::equal( line.cbegin(), pivot, search_str.begin() ) ) {
    //  Exact match on first field...
    std::cout << std::string( std::next( pivot ), line.cend() );
}

if ( std::search( line.cbegin(), pivot, search_str.begin(), search_str.end() ) != pivot ) {
    //  Matches substring in first field...
    std::cout << std::string( std::next( pivot ), line.cend() );
}

if ( std::search( line.cbegin(), line.cend(), search_str.begin(), search_str.end() ) != line.cend() ) {
    //  Matches substring in complete line...
    std::cout << std::string( std::next( pivot ), line.end() ); }
}

当然，您需要一些额外的错误检查。例如，如果行中没有逗号（例如pivot == line.end()），您应该怎么做？或者行中的额外空格怎么办。（您的示例看起来像数字。应该"401"只匹配"401"，还是也匹配"+401"？）

在继续之前，您应该非常仔细地为所有可能的输入指定代码应该做什么。（当然，对于大多数可能的输入，答案可能是：输出带有行号的错误消息并继续。在这种情况下std::cerr一定要返回。）EXIT_FAILURE

c++ - 使用 C++ 在文件中进行字符串搜索/索引

2 回答 2

Related

Reference