3

目标

我的目标是从一个的二进制字符串(一个只包含 1 和 0 的字符串)快速创建一个文件。

开门见山

我需要一个可以实现我的目标的功能。如果我不够清楚,请继续阅读。

例子

Test.exe is running...
.
Inputted binary string:
        1111111110101010
Writing to: c:\users\admin\desktop\Test.txt
        Done!
File(Test.txt) In Byte(s):
        0xFF, 0xAA
.
Test.exe executed successfully!

解释

  • 首先,Test.exe 要求用户输入一个二进制字符串。
  • 然后,它将输入的二进制字符串转换为十六进制。
  • 最后,它将转换后的值写入名为 Test.txt 的文件中。

我试过了

作为实现我目标的失败尝试,我创建了这个简单(可能很可怕)的功能(嘿,至少我尝试过):

void BinaryStrToFile( __in const char* Destination,
                      __in std::string &BinaryStr )
{
    std::ofstream OutputFile( Destination, std::ofstream::binary );

    for( ::UINT Index1 = 0, Dec = 0;
         // 8-Bit binary.
         Index1 != BinaryStr.length( )/8;

         // Get the next set of binary value.
         // Write the decimal value as unsigned char to file.
         // Reset decimal value to 0.
         ++ Index1, OutputFile << ( ::BYTE )Dec, Dec = 0 )
    {
        // Convert the 8-bit binary to hexadecimal using the
        // positional notation method - this is how its done:
        // http://www.wikihow.com/Convert-from-Binary-to-Decimal
        for( ::UINT Index2 = 7, Inc = 1; Index2 + 1 != 0; -- Index2, Inc += Inc )
            if( BinaryStr.substr( Index1 * 8, 8 )[ Index2 ] == '1' ) Dec += Inc;
    }
    OutputFile.close( );
};

使用示例

#include "Global.h"

void BinaryStrToFile( __in const char* Destination,
                      __in std::string &BinaryStr );

int main( void )
{
    std::string Bin = "";

    // Create a binary string that is a size of 9.53674 mb
    // Note: The creation of this string will take awhile.
    // However, I only start to calculate the speed of writing
    // and converting after it is done generating the string.
    // This string is just created for an example.
    std::cout << "Generating...\n";
    while( Bin.length( ) != 80000000 )
        Bin += "10101010";

    std::cout << "Writing...\n";
    BinaryStrToFile( "c:\\users\\admin\\desktop\\Test.txt", Bin );

    std::cout << "Done!\n";
#ifdef IS_DEBUGGING
    std::cout << "Paused...\n";
    ::getchar( );
#endif

    return( 0 );
};

问题

同样,那是我实现目标的失败尝试。问题是速度。它太慢了。花了超过7分钟。有没有什么方法可以从一个大的二进制字符串快速创建一个文件?

提前致谢,

学习者

4

7 回答 7

4

我建议删除substr内部循环中的调用。您正在分配一个新字符串,然后为您处理的每个字符销毁它。替换此代码:

for(::UINT Index2 = 7, Inc = 1; Index2 + 1 != 0; -- Index2, Inc += Inc )
    if( BinaryStr.substr( Index1 * 8, 8 )[ Index2 ] == '1' )
        Dec += Inc;

通过类似的东西:

for(::UINT Index2 = 7, Inc = 1; Index2 + 1 != 0; -- Index2, Inc += Inc )
    if( BinaryStr[Index1 * 8 + Index2 ] == '1' )
        Dec += Inc;
于 2013-02-16T22:58:45.277 回答
3

你的大部分时间都花在这里:

   for( ::UINT Index2 = 7, Inc = 1; Index2 + 1 != 0; -- Index2, Inc += Inc )
        if( BinaryStr.substr( Index1 * 8, 8 )[ Index2 ] == '1' ) Dec += Inc;

当我注释掉文件时,文件会在几秒钟内写入。我认为你需要微调你的转换。

于 2013-02-16T22:52:10.430 回答
2

我想我会考虑这样的事情作为起点:

#include <bitset>
#include <fstream>
#include <algorithm>

int main() { 
    std::ifstream in("junk.txt", std::ios::binary | std::ios::in);
    std::ofstream out("junk.bin", std::ios::binary | std::ios::out);

    std::transform(std::istream_iterator<std::bitset<8> >(in),
                   std::istream_iterator<std::bitset<8> >(),
                   std::ostream_iterator<unsigned char>(out),
                   [](std::bitset<8> const &b) { return b.to_ulong();});
    return 0;
}

做一个快速测试,这会在我的机器上大约 6 秒内处理一个 8000 万字节的输入文件。除非您的文件比您在问题中提到的要大得多,否则我猜这是足够的速度,而且简单性很难被击败。

于 2013-02-17T00:00:31.640 回答
1

所以与其在 s 之间来回转换std::string,为什么不使用一堆机器字大小的整数来快速访问呢?

const size_t bufsz = 1000000;

uint32_t *buf = new uint32_t[bufsz];
memset(buf, 0xFA, sizeof(*buf) * bufsz);
std::ofstream ofile("foo.bin", std::ofstream::binary);

int i;
for (i = 0; i < bufsz; i++) {
    ofile << hex << setw(8) << setfill('0') << buf[i];
    // or if you want raw binary data instead of formatted hex:
    ofile.write(reinterpret_cast<char *>(&buf[i]), sizeof(buf[i]));
}

delete[] buf;

对我来说,这在几分之一秒内运行。

于 2013-02-16T22:46:15.563 回答
1

与此不完全不同的东西应该更快:

void
text_to_binary_file(const std::string& text, const char *fname)
{
    unsigned char wbuf[4096];  // 4k is a good size of "chunk to write to file"
    unsigned int i = 0, j = 0;
    std::filebuf fp;           // dropping down to filebufs may well be faster
                               // for this problem
    fp.open(fname, std::ios::out|std::ios::trunc);
    memset(wbuf, 0, 4096);

    for (std::string::iterator p = text.begin(); p != text.end(); p++) {
        wbuf[i] |= (1u << (CHAR_BIT - (j+1)));
        j++;
        if (j == CHAR_BIT) {
            j = 0;
            i++;
        }
        if (i == 4096) {
            if (fp.sputn(wbuf, 4096) != 4096)
                abort();
            memset(wbuf, 0, 4096);
            i = 0;
            j = 0;
        }
    }
    if (fp.sputn(wbuf, i+1) != i+1)
        abort();
    fp.close();
}

适当的错误处理留作练习。

于 2013-02-16T22:55:16.153 回答
1

即使迟到了,我还是想举个例子来处理这些字符串。特定于架构的优化可以使用未对齐的字符加载到多个寄存器中,以并行“挤出”位。此未经测试的示例代码不检查字符并避免对齐和字节顺序要求。它假定该二进制字符串的字符表示连续的八位字节(字节),最高有效位在前,而不是单词和双字等,它们在内存(和该字符串)中的特定表示需要特殊处理以实现可移植性。

//THIS CODE HAS NEVER BEEN TESTED! But I hope you get the idea.

//set up an ofstream with a 64KiB buffer
std::vector<char> buffer(65536);
std::ofstream ofs("out.bin", std::ofstream::binary|std::ofstream::out|std::ofstream::trunc);
ofs.rdbuf()->pubsetbuf(&buffer[0],buffer.size());

std::string::size_type bits = Bin.length();
std::string::const_iterator cIt = Bin.begin();

//You may treat cases, where (bits % 8 != 0) as error

//Initialize with the first iteration
uint8_t byte = uint8_t(*cIt++) - uint8_t('0');
byte <<= 1;
for(std::string::size_type i = 1;i < (bits & (~std::string::size_type(0x7)));++i,++cIt)
{
    if(i & 0x7) //bit 7 ... 1
    {
        byte |= uint8_t(*cIt) - uint8_t('0');
        byte <<= 1;
    }
    else //bit 0: write and advance to the the next most significant bit of an octet
    {
        byte |= uint8_t(*cIt) - uint8_t('0');
        ofs.put(byte);

        //advance
        ++i;
        ++cIt;
        byte = uint8_t(*cIt) - uint8_t('0');
        byte <<= 1;
    }
}

ofs.flush();
于 2013-02-16T23:49:35.953 回答
-1

这将生成 1010101010101 的 76.2 MB(80,000,000 字节)文件......

#include <stdio.h>
#include <iostream>
#include <fstream>

using namespace std;

int main( void )
{
    char Bin=0;
    ofstream myfile;
    myfile.open (".\\example.bin", ios::out | ios::app | ios::binary);
    int c=0;
    Bin = 0xAA;
    while( c!= 80000000 ){
        myfile.write(&Bin,1);
        c++;
    }
    myfile.close();
    cout << "Done!\n";
    return( 0 );
};

这是文件的第一个字节

于 2013-02-16T23:02:13.083 回答