c++ - 如何在C中按顺序对带有数字和字母的文件名进行排序？

Question

我使用以下代码按字母顺序对文件进行排序，它对文件进行排序，如图所示：

for(int i = 0;i < maxcnt;i++) 
{
    for(int j = i+1;j < maxcnt;j++)
    {           
        if(strcmp(Array[i],Array[j]) > 0)
        {            
            strcpy(temp,Array[i]);      
            strcpy(Array[i],Array[j]);      
            strcpy(Array[j],temp);    
        }    
    } 
}

在此处输入图像描述

但我需要按照 Windows 资源管理器中看到的顺序对其进行排序

在此处输入图像描述

这种方式如何排序？请帮忙

score 7 · Accepted Answer

对于C答案，以下是strcasecmp(). 此函数递归处理包含交替数字和非数字子字符串的字符串。您可以将其用于qsort()：

int strcasecmp_withNumbers(const void *void_a, const void *void_b) {
   const char *a = void_a;
   const char *b = void_b;

   if (!a || !b) { // if one doesn't exist, other wins by default
      return a ? 1 : b ? -1 : 0;
   }
   if (isdigit(*a) && isdigit(*b)) { // if both start with numbers
      char *remainderA;
      char *remainderB;
      long valA = strtol(a, &remainderA, 10);
      long valB = strtol(b, &remainderB, 10);
      if (valA != valB)
         return valA - valB;
      // if you wish 7 == 007, comment out the next two lines
      else if (remainderB - b != remainderA - a) // equal with diff lengths
         return (remainderB - b) - (remainderA - a); // set 007 before 7
      else // if numerical parts equal, recurse
         return strcasecmp_withNumbers(remainderA, remainderB);
   }
   if (isdigit(*a) || isdigit(*b)) { // if just one is a number
      return isdigit(*a) ? -1 : 1; // numbers always come first
   }
   while (*a && *b) { // non-numeric characters
      if (isdigit(*a) || isdigit(*b))
         return strcasecmp_withNumbers(a, b); // recurse
      if (tolower(*a) != tolower(*b))
         return tolower(*a) - tolower(*b);
      a++;
      b++;
   }
   return *a ? 1 : *b ? -1 : 0;
}

笔记：

Windows 需要stricmp()而不是 Unix 等价物strcasecmp()。
如果数字真的很大，上面的代码（显然）会给出不正确的结果。
此处忽略前导零。在我的领域，这是一个特性，而不是错误：我们通常希望 UAL0123 与 UAL123 匹配。但这可能是也可能不是您需要的。
另请参阅对可能包含数字的字符串进行排序以及如何在 C++ 中实现自然排序算法？，虽然那里的答案，或者在他们的链接中，与上面的代码相比，肯定是冗长而杂乱无章的，大约至少是四倍。

score 6 · Accepted Answer

自然排序是您必须采取的方式。我有一个适用于我的场景的工作代码。您可能可以通过根据需要更改它来使用它：

    #ifndef JSW_NATURAL_COMPARE
    #define JSW_NATURAL_COMPARE
    #include <string>
    int natural_compare(const char *a, const char *b);
    int natural_compare(const std::string& a, const std::string& b);
    #endif
    #include <cctype>
    namespace {
      // Note: This is a convenience for the natural_compare 
      // function, it is *not* designed for general use
      class int_span {
        int _ws;
        int _zeros;
        const char *_value;
        const char *_end;
      public:
        int_span(const char *src)
        {
          const char *start = src;
          // Save and skip leading whitespace
          while (std::isspace(*(unsigned char*)src)) ++src;
          _ws = src - start;
          // Save and skip leading zeros
          start = src;
          while (*src == '0') ++src;
          _zeros = src - start;
          // Save the edges of the value
          _value = src;
          while (std::isdigit(*(unsigned char*)src)) ++src;
          _end = src;
        }
        bool is_int() const { return _value != _end; }
        const char *value() const { return _value; }
        int whitespace() const { return _ws; }
        int zeros() const { return _zeros; }
        int digits() const { return _end - _value; }
        int non_value() const { return whitespace() + zeros(); }
      };
      inline int safe_compare(int a, int b)
      {
        return a < b ? -1 : a > b;
      }
    }
    int natural_compare(const char *a, const char *b)
    {
      int cmp = 0;
      while (cmp == 0 && *a != '\0' && *b != '\0') {
        int_span lhs(a), rhs(b);
        if (lhs.is_int() && rhs.is_int()) {
          if (lhs.digits() != rhs.digits()) {
            // For differing widths (excluding leading characters),
            // the value with fewer digits takes priority
            cmp = safe_compare(lhs.digits(), rhs.digits());
          }
          else {
            int digits = lhs.digits();
            a = lhs.value();
            b = rhs.value();
            // For matching widths (excluding leading characters),
            // search from MSD to LSD for the larger value
            while (--digits >= 0 && cmp == 0)
              cmp = safe_compare(*a++, *b++);
          }
          if (cmp == 0) {
            // If the values are equal, we need a tie   
            // breaker using leading whitespace and zeros
            if (lhs.non_value() != rhs.non_value()) {
              // For differing widths of combined whitespace and 
              // leading zeros, the smaller width takes priority
              cmp = safe_compare(lhs.non_value(), rhs.non_value());
            }
            else {
              // For matching widths of combined whitespace 
              // and leading zeros, more whitespace takes priority
              cmp = safe_compare(rhs.whitespace(), lhs.whitespace());
            }
          }
        }
        else {
          // No special logic unless both spans are integers
          cmp = safe_compare(*a++, *b++);
        }
      }
      // All else being equal so far, the shorter string takes priority
      return cmp == 0 ? safe_compare(*a, *b) : cmp;
    }
    #include <string>
    int natural_compare(const std::string& a, const std::string& b)
    {
      return natural_compare(a.c_str(), b.c_str());
    }

score 5 · Accepted Answer

您要做的是执行“自然排序”。这是一篇关于它的博客文章，解释了我相信在 python 中的实现。这是一个完成它的 perl 模块。How to implement a natural sort algorithm in c++?似乎也有类似的问题？

score 5 · Accepted Answer

考虑到这有一个c++标签，您可以详细说明@Joseph Quinsey 的答案并创建一个natural_less要传递给标准库的函数。

using namespace std;

bool natural_less(const string& lhs, const string& rhs)
{
    return strcasecmp_withNumbers(lhs.c_str(), rhs.c_str()) < 0;
}

void example(vector<string>& data)
{
    std::sort(data.begin(), data.end(), natural_less);
}

我花时间写了一些工作代码作为练习 https://github.com/kennethlaskoski/natural_less

score 3 · Accepted Answer

修改这个答案：

bool compareNat(const std::string& a, const std::string& b){
    if (a.empty())
        return true;
    if (b.empty())
        return false;
    if (std::isdigit(a[0]) && !std::isdigit(b[0]))
        return true;
    if (!std::isdigit(a[0]) && std::isdigit(b[0]))
        return false;
    if (!std::isdigit(a[0]) && !std::isdigit(b[0]))
    {
        if (a[0] == b[0])
            return compareNat(a.substr(1), b.substr(1));
        return (toUpper(a) < toUpper(b));
        //toUpper() is a function to convert a std::string to uppercase.
    }

    // Both strings begin with digit --> parse both numbers
    std::istringstream issa(a);
    std::istringstream issb(b);
    int ia, ib;
    issa >> ia;
    issb >> ib;
    if (ia != ib)
        return ia < ib;

    // Numbers are the same --> remove numbers and recurse
    std::string anew, bnew;
    std::getline(issa, anew);
    std::getline(issb, bnew);
    return (compareNat(anew, bnew));
}

toUpper()功能：

std::string toUpper(std::string s){
    for(int i=0;i<(int)s.length();i++){s[i]=toupper(s[i]);}
    return s;
    }

用法：

#include <iostream> // std::cout
#include <string>
#include <algorithm> // std::sort, std::copy
#include <iterator> // std::ostream_iterator
#include <sstream> // std::istringstream
#include <vector>
#include <cctype> // std::isdigit

int main()
{
    std::vector<std::string> str;
    str.push_back("20.txt");
    str.push_back("10.txt");
    str.push_back("1.txt");
    str.push_back("z2.txt");
    str.push_back("z10.txt");
    str.push_back("z100.txt");
    str.push_back("1_t.txt");
    str.push_back("abc.txt");
    str.push_back("Abc.txt");
    str.push_back("bcd.txt");

    std::sort(str.begin(), str.end(), compareNat);
    std::copy(str.begin(), str.end(),
              std::ostream_iterator<std::string>(std::cout, "\n"));
}

score 0 · Accepted Answer

Your problem is that you have an interpretation behind parts of the file name.

In lexicographical order, Slide1 is before Slide10 which is before Slide5.

You expect Slide5 before Slide10 as you have an interpretation of the substrings 5 and 10 (as integers).

You will run into more problems, if you had the name of the month in the filename, and would expect them to be ordered by date (i.e. January comes before August). You will need to adjust your sorting to this interpretation (and the "natural" order will depend on your interpretation, there is no generic solution).

Another approach is to format the filenames in a way that your sorting and the lexicographical order agree. In your case, you would use leading zeroes and a fixed length for the number. So Slide1 becomes Slide01, and then you will see that sorting them lexicographically will yield the result you would like to have.

However, often you cannot influence the output of an application, and thus cannot enforce your format directly.

What I do in those cases: write a little script/function that renames the file to a proper format, and then use standard sorting algorithms to sort them. The advantage of this is that you do not need to adapt your sorting, and can use existing software for the sorting. On the downside, there are situations where this is not feasible (as filenames need to be fixed).

c++ - 如何在C中按顺序对带有数字和字母的文件名进行排序？

6 回答 6

Related

Reference