我想按字典顺序对大量整数(比如 1 百万元素)进行排序。
例子:
input [] = { 100, 21 , 22 , 99 , 1 , 927 }
sorted[] = { 1 , 100, 21 , 22 , 927, 99 }
我已经使用最简单的方法完成了它:
- 将所有数字转换为字符串(非常昂贵,因为它会占用大量内存)
- 使用
std:sort
withstrcmp
作为比较函数 - 将字符串转换回整数
还有比这更好的方法吗?
我想按字典顺序对大量整数(比如 1 百万元素)进行排序。
例子:
input [] = { 100, 21 , 22 , 99 , 1 , 927 }
sorted[] = { 1 , 100, 21 , 22 , 927, 99 }
我已经使用最简单的方法完成了它:
std:sort
withstrcmp
作为比较函数还有比这更好的方法吗?
std::sort()
与合适的比较功能一起使用。这减少了内存需求。
比较函数可以使用n % 10
,n / 10 % 10
等n / 100 % 10
来访问单个数字(对于正整数;负整数的工作方式略有不同)。
要提供任何自定义排序顺序,您可以为std::sort
. 在这种情况下,它会有点复杂,使用对数来检查以 10 为底的数字的各个数字。
这是一个例子——内联评论描述了正在发生的事情。
#include <iostream>
#include <algorithm>
#include <cmath>
#include <cassert>
int main() {
int input[] { 100, 21, 22, 99, 1, 927, -50, -24, -160 };
/**
* Sorts the array lexicographically.
*
* The trick is that we have to compare digits left-to-right
* (considering typical Latin decimal notation) and that each of
* two numbers to compare may have a different number of digits.
*
* This is very efficient in storage space, but inefficient in
* execution time; an approach that pre-visits each element and
* stores a translated representation will at least double your
* storage requirements (possibly a problem with large inputs)
* but require only a single translation of each element.
*/
std::sort(
std::begin(input),
std::end(input),
[](int lhs, int rhs) -> bool {
// Returns true if lhs < rhs
// Returns false otherwise
const auto BASE = 10;
const bool LHS_FIRST = true;
const bool RHS_FIRST = false;
const bool EQUAL = false;
// There's no point in doing anything at all
// if both inputs are the same; strict-weak
// ordering requires that we return `false`
// in this case.
if (lhs == rhs) {
return EQUAL;
}
// Compensate for sign
if (lhs < 0 && rhs < 0) {
// When both are negative, sign on its own yields
// no clear ordering between the two arguments.
//
// Remove the sign and continue as for positive
// numbers.
lhs *= -1;
rhs *= -1;
}
else if (lhs < 0) {
// When the LHS is negative but the RHS is not,
// consider the LHS "first" always as we wish to
// prioritise the leading '-'.
return LHS_FIRST;
}
else if (rhs < 0) {
// When the RHS is negative but the LHS is not,
// consider the RHS "first" always as we wish to
// prioritise the leading '-'.
return RHS_FIRST;
}
// Counting the number of digits in both the LHS and RHS
// arguments is *almost* trivial.
const auto lhs_digits = (
lhs == 0
? 1
: std::ceil(std::log(lhs+1)/std::log(BASE))
);
const auto rhs_digits = (
rhs == 0
? 1
: std::ceil(std::log(rhs+1)/std::log(BASE))
);
// Now we loop through the positions, left-to-right,
// calculating the digit at these positions for each
// input, and comparing them numerically. The
// lexicographic nature of the sorting comes from the
// fact that we are doing this per-digit comparison
// rather than considering the input value as a whole.
const auto max_pos = std::max(lhs_digits, rhs_digits);
for (auto pos = 0; pos < max_pos; pos++) {
if (lhs_digits - pos == 0) {
// Ran out of digits on the LHS;
// prioritise the shorter input
return LHS_FIRST;
}
else if (rhs_digits - pos == 0) {
// Ran out of digits on the RHS;
// prioritise the shorter input
return RHS_FIRST;
}
else {
const auto lhs_x = (lhs / static_cast<decltype(BASE)>(std::pow(BASE, lhs_digits - 1 - pos))) % BASE;
const auto rhs_x = (rhs / static_cast<decltype(BASE)>(std::pow(BASE, rhs_digits - 1 - pos))) % BASE;
if (lhs_x < rhs_x)
return LHS_FIRST;
else if (rhs_x < lhs_x)
return RHS_FIRST;
}
}
// If we reached the end and everything still
// matches up, then something probably went wrong
// as I'd have expected to catch this in the tests
// for equality.
assert("Unknown case encountered");
}
);
std::cout << '{';
for (auto x : input)
std::cout << x << ", ";
std::cout << '}';
// Output: {-160, -24, -50, 1, 100, 21, 22, 927, 99, }
}
有更快的方法来计算数字中的位数,但以上内容将帮助您入门。
这是一个比较解决方案的社区 wiki。我采用了 nim 的代码并使其易于扩展。随意添加您的解决方案和输出。
示例使用 g++4.9 @ 运行一台旧的慢速计算机(3 GB RAM,Core2Duo U9400)-O3 -march=native
:
元素数量:1e+03 整数类型的大小:4 参考解决方案:轨道中的轻盈竞赛 解决方案“dyp”: 持续时间:0 毫秒和 301 微秒 与参考解决方案的比较:完全匹配 解决方案“尼姆”: 持续时间:2 毫秒和 160 微秒 与参考解决方案的比较:完全匹配 解决方案“nyarlathotep”: 持续时间:8 毫秒和 126 微秒 与参考解决方案的比较:完全匹配 解决方案“不错”: 持续时间:1 毫秒和 102 微秒 与参考解决方案的比较:完全匹配 解决方案“Eric Postpischil”: 持续时间:2 毫秒和 550 微秒 与参考解决方案的比较:完全匹配 解决方案“轨道中的亮度竞赛”: 持续时间:17 毫秒和 469 微秒 与参考解决方案的比较:完全匹配 解决方案“pts”: 持续时间:1 毫秒和 92 微秒 与参考解决方案的比较:完全匹配 ==================================================== ======== 元素数量:1e+04 整数类型的大小:4 参考解决方案:轨道中的轻盈竞赛 解决方案“nyarlathotep”: 持续时间:109 毫秒和 712 微秒 与参考解决方案的比较:完全匹配 解决方案“轨道中的亮度竞赛”: 持续时间:272 毫秒和 819 微秒 与参考解决方案的比较:完全匹配 解决方案“dyp”: 持续时间:1 毫秒和 748 微秒 与参考解决方案的比较:完全匹配 解决方案“不错”: 持续时间:16 毫秒和 115 微秒 与参考解决方案的比较:完全匹配 解决方案“pts”: 持续时间:15 毫秒和 10 微秒 与参考解决方案的比较:完全匹配 解决方案“Eric Postpischil”: 持续时间:33 毫秒和 301 微秒 与参考解决方案的比较:完全匹配 解决方案“尼姆”: 持续时间:17 毫秒和 83 微秒 与参考解决方案的比较:完全匹配 ==================================================== ======== 元素数量:1e+05 整数类型的大小:4 参考解决方案:轨道中的轻盈竞赛 解决方案“尼姆”: 持续时间:217 毫秒和 4 微秒 与参考解决方案的比较:完全匹配 解决方案“pts”: 持续时间:199 毫秒和 505 微秒 与参考解决方案的比较:完全匹配 解决方案“dyp”: 持续时间:20 毫秒和 330 微秒 与参考解决方案的比较:完全匹配 解决方案“Eric Postpischil”: 持续时间:415 毫秒和 477 微秒 与参考解决方案的比较:完全匹配 解决方案“轨道中的亮度竞赛”: 持续时间:3955 毫秒和 58 微秒 与参考解决方案的比较:完全匹配 解决方案“不错”: 持续时间:215 毫秒和 259 微秒 与参考解决方案的比较:完全匹配 解决方案“nyarlathotep”: 持续时间:1341 毫秒和 46 微秒 与参考解决方案比较:发现不匹配 ==================================================== ======== 元素数量:1e+06 整数类型的大小:4 参考解决方案:轨道中的轻盈竞赛 解决方案“轨道中的亮度竞赛”: 持续时间:52861 毫秒和 314 微秒 与参考解决方案的比较:完全匹配 解决方案“Eric Postpischil”: 持续时间:4757 毫秒和 608 微秒 与参考解决方案的比较:完全匹配 解决方案“nyarlathotep”: 持续时间:15654 毫秒和 195 微秒 与参考解决方案比较:发现不匹配 解决方案“dyp”: 持续时间:233 毫秒和 779 微秒 与参考解决方案的比较:完全匹配 解决方案“pts”: 持续时间:2181 毫秒和 634 微秒 与参考解决方案的比较:完全匹配 解决方案“尼姆”: 持续时间:2539 毫秒和 9 微秒 与参考解决方案的比较:完全匹配 解决方案“不错”: 持续时间:2675 毫秒和 362 微秒 与参考解决方案的比较:完全匹配 ==================================================== ======== 元素数量:1e+07 整数类型的大小:4 参考解决方案:轨道中的轻盈竞赛 解决方案“不错”: 持续时间:33425 毫秒和 423 微秒 与参考解决方案的比较:完全匹配 解决方案“pts”: 持续时间:26000 毫秒和 398 微秒 与参考解决方案的比较:完全匹配 解决方案“Eric Postpischil”: 持续时间:56206 毫秒和 359 微秒 与参考解决方案的比较:完全匹配 解决方案“轨道中的亮度竞赛”: 持续时间:658540 毫秒和 342 微秒 与参考解决方案的比较:完全匹配 解决方案“nyarlathotep”: 持续时间:187064 毫秒和 518 微秒 与参考解决方案比较:发现不匹配 解决方案“尼姆”: 持续时间:30519 毫秒和 227 微秒 与参考解决方案的比较:完全匹配 解决方案“dyp”: 持续时间:2624 毫秒和 644 微秒 与参考解决方案的比较:完全匹配
算法必须是具有支持接口的函数调用运算符模板的结构:
template<class RaIt> operator()(RaIt begin, RaIt end);
输入数据的副本作为参数提供,算法预期提供相同范围内的结果(例如就地排序)。
#include <iostream>
#include <vector>
#include <algorithm>
#include <iterator>
#include <random>
#include <vector>
#include <utility>
#include <cmath>
#include <cassert>
#include <chrono>
#include <cstring>
#include <climits>
#include <functional>
#include <cstdlib>
#include <iomanip>
using duration_t = decltype( std::chrono::high_resolution_clock::now()
- std::chrono::high_resolution_clock::now());
template<class T>
struct result_t
{
std::vector<T> numbers;
duration_t duration;
char const* name;
};
template<class RaIt, class F>
result_t<typename std::iterator_traits<RaIt>::value_type>
apply_algorithm(RaIt p_beg, RaIt p_end, F f, char const* name)
{
using value_type = typename std::iterator_traits<RaIt>::value_type;
std::vector<value_type> inplace(p_beg, p_end);
auto start = std::chrono::high_resolution_clock::now();
f(begin(inplace), end(inplace));
auto end = std::chrono::high_resolution_clock::now();
auto duration = end - start;
return {std::move(inplace), duration, name};
}
// non-optimized version
int count_digits(int p) // returns `0` for `p == 0`
{
int res = 0;
for(; p != 0; ++res)
{
p /= 10;
}
return res;
}
// non-optimized version
int my_pow10(unsigned exp)
{
int res = 1;
for(; exp != 0; --exp)
{
res *= 10;
}
return res;
}
// !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
// paste algorithms here
// !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
int main(int argc, char** argv)
{
using integer_t = int;
constexpr integer_t dist_min = 0;
constexpr integer_t dist_max = std::numeric_limits<integer_t>::max()/10;
constexpr std::size_t default_number_of_elements = 1E6;
const std::size_t number_of_elements = argc>1 ? std::atoll(argv[1]) :
default_number_of_elements;
std::cout << "number of elements: ";
std::cout << std::scientific << std::setprecision(0);
std::cout << (double)number_of_elements << "\n";
std::cout << /*std::defaultfloat <<*/ std::setprecision(6);
std::cout.unsetf(std::ios_base::floatfield);
std::cout << "size of integer type: " << sizeof(integer_t) << "\n\n";
std::vector<integer_t> input;
{
input.reserve(number_of_elements);
std::random_device rd;
std::mt19937 gen( rd() );
std::uniform_int_distribution<> dist(dist_min, dist_max);
for(std::size_t i = 0; i < number_of_elements; ++i)
input.push_back( dist(gen) );
}
auto b = begin(input);
auto e = end(input);
using res_t = result_t<integer_t>;
std::vector< std::function<res_t()> > algorithms;
#define MAKE_BINDER(B, E, ALGO, NAME) \
std::bind( &apply_algorithm<decltype(B),decltype(ALGO)>, \
B,E,ALGO,NAME )
constexpr auto lightness_name = "Lightness Races in Orbit";
algorithms.push_back( MAKE_BINDER(b, e, lightness(), lightness_name) );
algorithms.push_back( MAKE_BINDER(b, e, dyp(), "dyp") );
algorithms.push_back( MAKE_BINDER(b, e, nim(), "Nim") );
algorithms.push_back( MAKE_BINDER(b, e, pts(), "pts") );
algorithms.push_back( MAKE_BINDER(b, e, epost(), "Eric Postpischil") );
algorithms.push_back( MAKE_BINDER(b, e, nyar(), "nyarlathotep") );
algorithms.push_back( MAKE_BINDER(b, e, notbad(), "notbad") );
{
std::srand( std::random_device()() );
std::random_shuffle(begin(algorithms), end(algorithms));
}
std::vector< result_t<integer_t> > res;
for(auto& algo : algorithms)
res.push_back( algo() );
auto reference_solution
= *std::find_if(begin(res), end(res),
[](result_t<integer_t> const& p)
{ return 0 == std::strcmp(lightness_name, p.name); });
std::cout << "reference solution: "<<reference_solution.name<<"\n\n";
for(auto const& e : res)
{
std::cout << "solution \""<<e.name<<"\":\n";
auto ms =
std::chrono::duration_cast<std::chrono::microseconds>(e.duration);
std::cout << "\tduration: "<<ms.count()/1000<<" ms and "
<<ms.count()%1000<<" microseconds\n";
std::cout << "\tcomparison to reference solution: ";
if(e.numbers.size() != reference_solution.numbers.size())
{
std::cout << "ouput count mismatch\n";
break;
}
auto mismatch = std::mismatch(begin(e.numbers), end(e.numbers),
begin(reference_solution.numbers)).first;
if(end(e.numbers) == mismatch)
{
std::cout << "exact match\n";
}else
{
std::cout << "mismatch found\n";
}
}
}
当前算法;注意我用全局函数替换了数字计数器和 pow-of-10,所以如果有人优化,我们都会受益。
struct lightness
{
template<class RaIt> void operator()(RaIt b, RaIt e)
{
using T = typename std::iterator_traits<RaIt>::value_type;
/**
* Sorts the array lexicographically.
*
* The trick is that we have to compare digits left-to-right
* (considering typical Latin decimal notation) and that each of
* two numbers to compare may have a different number of digits.
*
* This is very efficient in storage space, but inefficient in
* execution time; an approach that pre-visits each element and
* stores a translated representation will at least double your
* storage requirements (possibly a problem with large inputs)
* but require only a single translation of each element.
*/
std::sort(
b,
e,
[](T lhs, T rhs) -> bool {
// Returns true if lhs < rhs
// Returns false otherwise
const auto BASE = 10;
const bool LHS_FIRST = true;
const bool RHS_FIRST = false;
const bool EQUAL = false;
// There's no point in doing anything at all
// if both inputs are the same; strict-weak
// ordering requires that we return `false`
// in this case.
if (lhs == rhs) {
return EQUAL;
}
// Compensate for sign
if (lhs < 0 && rhs < 0) {
// When both are negative, sign on its own yields
// no clear ordering between the two arguments.
//
// Remove the sign and continue as for positive
// numbers.
lhs *= -1;
rhs *= -1;
}
else if (lhs < 0) {
// When the LHS is negative but the RHS is not,
// consider the LHS "first" always as we wish to
// prioritise the leading '-'.
return LHS_FIRST;
}
else if (rhs < 0) {
// When the RHS is negative but the LHS is not,
// consider the RHS "first" always as we wish to
// prioritise the leading '-'.
return RHS_FIRST;
}
// Counting the number of digits in both the LHS and RHS
// arguments is *almost* trivial.
const auto lhs_digits = (
lhs == 0
? 1
: std::ceil(std::log(lhs+1)/std::log(BASE))
);
const auto rhs_digits = (
rhs == 0
? 1
: std::ceil(std::log(rhs+1)/std::log(BASE))
);
// Now we loop through the positions, left-to-right,
// calculating the digit at these positions for each
// input, and comparing them numerically. The
// lexicographic nature of the sorting comes from the
// fact that we are doing this per-digit comparison
// rather than considering the input value as a whole.
const auto max_pos = std::max(lhs_digits, rhs_digits);
for (auto pos = 0; pos < max_pos; pos++) {
if (lhs_digits - pos == 0) {
// Ran out of digits on the LHS;
// prioritise the shorter input
return LHS_FIRST;
}
else if (rhs_digits - pos == 0) {
// Ran out of digits on the RHS;
// prioritise the shorter input
return RHS_FIRST;
}
else {
const auto lhs_x = (lhs / static_cast<decltype(BASE)>(std::pow(BASE, lhs_digits - 1 - pos))) % BASE;
const auto rhs_x = (rhs / static_cast<decltype(BASE)>(std::pow(BASE, rhs_digits - 1 - pos))) % BASE;
if (lhs_x < rhs_x)
return LHS_FIRST;
else if (rhs_x < lhs_x)
return RHS_FIRST;
}
}
// If we reached the end and everything still
// matches up, then something probably went wrong
// as I'd have expected to catch this in the tests
// for equality.
assert("Unknown case encountered");
// dyp: suppress warning and throw
throw "up";
}
);
}
};
namespace ndyp
{
// helper to provide integers with the same number of digits
template<class T, class U>
std::pair<T, T> lexicographic_pair_helper(T const p, U const maxDigits)
{
auto const digits = count_digits(p);
// append zeros so that `l` has `maxDigits` digits
auto const l = static_cast<T>( p * my_pow10(maxDigits-digits) );
return {l, p};
}
template<class RaIt>
using pair_vec
= std::vector<std::pair<typename std::iterator_traits<RaIt>::value_type,
typename std::iterator_traits<RaIt>::value_type>>;
template<class RaIt>
pair_vec<RaIt> lexicographic_sort(RaIt p_beg, RaIt p_end)
{
if(p_beg == p_end) return pair_vec<RaIt>{};
auto max = *std::max_element(p_beg, p_end);
auto maxDigits = count_digits(max);
pair_vec<RaIt> result;
result.reserve( std::distance(p_beg, p_end) );
for(auto i = p_beg; i != p_end; ++i)
result.push_back( lexicographic_pair_helper(*i, maxDigits) );
using value_type = typename pair_vec<RaIt>::value_type;
std::sort(begin(result), end(result),
[](value_type const& l, value_type const& r)
{
if(l.first < r.first) return true;
if(l.first > r.first) return false;
return l.second < r.second; }
);
return result;
}
}
struct dyp
{
template<class RaIt> void operator()(RaIt b, RaIt e)
{
auto pairvec = ndyp::lexicographic_sort(b, e);
std::transform(begin(pairvec), end(pairvec), b,
[](typename decltype(pairvec)::value_type const& e) { return e.second; });
}
};
namespace nnim
{
bool comp(int l, int r)
{
int lv[10] = {}; // probably possible to get this from numeric_limits
int rv[10] = {};
int lc = 10; // ditto
int rc = 10;
while (l || r)
{
if (l)
{
auto t = l / 10;
lv[--lc] = l - (t * 10);
l = t;
}
if (r)
{
auto t = r / 10;
rv[--rc] = r - (t * 10);
r = t;
}
}
while (lc < 10 && rc < 10)
{
if (lv[lc] == rv[rc])
{
lc++;
rc++;
}
else
return lv[lc] < rv[rc];
}
return lc > rc;
}
}
struct nim
{
template<class RaIt> void operator()(RaIt b, RaIt e)
{
std::sort(b, e, nnim::comp);
}
};
struct pts
{
template<class T> static bool lex_less(T a, T b) {
unsigned la = 1, lb = 1;
for (T t = a; t > 9; t /= 10) ++la;
for (T t = b; t > 9; t /= 10) ++lb;
const bool ll = la < lb;
while (la > lb) { b *= 10; ++lb; }
while (lb > la) { a *= 10; ++la; }
return a == b ? ll : a < b;
}
template<class RaIt> void operator()(RaIt b, RaIt e)
{
std::sort(b, e, lex_less<typename std::iterator_traits<RaIt>::value_type>);
}
};
struct epost
{
static bool compare(int x, int y)
{
static const double limit = .5 * (log(INT_MAX) - log(INT_MAX-1));
double lx = log10(x);
double ly = log10(y);
double fx = lx - floor(lx); // Get the mantissa of lx.
double fy = ly - floor(ly); // Get the mantissa of ly.
return fabs(fx - fy) < limit ? lx < ly : fx < fy;
}
template<class RaIt> void operator()(RaIt b, RaIt e)
{
std::sort(b, e, compare);
}
};
struct nyar
{
static bool lexiSmaller(int i1, int i2)
{
int digits1 = count_digits(i1);
int digits2 = count_digits(i2);
double val1 = i1/pow(10.0, digits1-1);
double val2 = i2/pow(10.0, digits2-1);
while (digits1 > 0 && digits2 > 0 && (int)val1 == (int)val2)
{
digits1--;
digits2--;
val1 = (val1 - (int)val1)*10;
val2 = (val2 - (int)val2)*10;
}
if (digits1 > 0 && digits2 > 0)
{
return (int)val1 < (int)val2;
}
return (digits2 > 0);
}
template<class RaIt> void operator()(RaIt b, RaIt e)
{
std::sort(b, e, lexiSmaller);
}
};
struct notbad
{
static int up_10pow(int n) {
int ans = 1;
while (ans < n) ans *= 10;
return ans;
}
static bool compare(int v1, int v2) {
int ceil1 = up_10pow(v1), ceil2 = up_10pow(v2);
while ( ceil1 != 0 && ceil2 != 0) {
if (v1 / ceil1 < v2 / ceil2) return true;
else if (v1 / ceil1 > v2 / ceil2) return false;
ceil1 /= 10;
ceil2 /= 10;
}
if (v1 < v2) return true;
return false;
}
template<class RaIt> void operator()(RaIt b, RaIt e)
{
std::sort(b, e, compare);
}
};
这是另一种算法,它在排序之前进行一些计算。尽管有额外的复制,但它似乎相当快(参见比较)。
笔记:
std::numeric_limits<int>::max()/10
注意你可以优化count_digits
和my_pow10
;例如,请参阅Andrei Alexandrescu 的 C++ 的三个优化技巧和任何比 pow() 更快地在 C++ 中计算 10 的整数幂的方法?
帮手:
#include <random>
#include <vector>
#include <utility>
#include <cmath>
#include <iostream>
#include <algorithm>
#include <limits>
#include <iterator>
// non-optimized version
int count_digits(int p) // returns `0` for `p == 0`
{
int res = 0;
for(; p != 0; ++res)
{
p /= 10;
}
return res;
}
// non-optimized version
int my_pow10(unsigned exp)
{
int res = 1;
for(; exp != 0; --exp)
{
res *= 10;
}
return res;
}
算法(注意 -不是就地):
// helper to provide integers with the same number of digits
template<class T, class U>
std::pair<T, T> lexicographic_pair_helper(T const p, U const maxDigits)
{
auto const digits = count_digits(p);
// append zeros so that `l` has `maxDigits` digits
auto const l = static_cast<T>( p * my_pow10(maxDigits-digits) );
return {l, p};
}
template<class RaIt>
using pair_vec
= std::vector<std::pair<typename std::iterator_traits<RaIt>::value_type,
typename std::iterator_traits<RaIt>::value_type>>;
template<class RaIt>
pair_vec<RaIt> lexicographic_sort(RaIt p_beg, RaIt p_end)
{
if(p_beg == p_end) return {};
auto max = *std::max_element(p_beg, p_end);
auto maxDigits = count_digits(max);
pair_vec<RaIt> result;
result.reserve( std::distance(p_beg, p_end) );
for(auto i = p_beg; i != p_end; ++i)
result.push_back( lexicographic_pair_helper(*i, maxDigits) );
using value_type = typename pair_vec<RaIt>::value_type;
std::sort(begin(result), end(result),
[](value_type const& l, value_type const& r)
{
if(l.first < r.first) return true;
if(l.first > r.first) return false;
return l.second < r.second; }
);
return result;
}
使用示例:
int main()
{
std::vector<int> input = { 100, 21 , 22 , 99 , 1 , 927 };
// generate some numbers
/*{
constexpr int number_of_elements = 1E6;
std::random_device rd;
std::mt19937 gen( rd() );
std::uniform_int_distribution<>
dist(0, std::numeric_limits<int>::max()/10);
for(int i = 0; i < number_of_elements; ++i)
input.push_back( dist(gen) );
}*/
std::cout << "unsorted: ";
for(auto const& e : input) std::cout << e << ", ";
std::cout << "\n\n";
auto sorted = lexicographic_sort(begin(input), end(input));
std::cout << "sorted: ";
for(auto const& e : sorted) std::cout << e.second << ", ";
std::cout << "\n\n";
}
我相信以下作为正整数的排序比较函数,只要使用的整数类型比double
类型(例如, 32-bitint
和 64-bit double
)窄得多,并且log10
使用的例程返回精确的 10 次幂的完全正确的结果(其中一个好的实现确实):
static const double limit = .5 * (log(INT_MAX) - log(INT_MAX-1));
double lx = log10(x);
double ly = log10(y);
double fx = lx - floor(lx); // Get the mantissa of lx.
double fy = ly - floor(ly); // Get the mantissa of ly.
return fabs(fx - fy) < limit ? lx < ly : fx < fy;
它通过比较对数的尾数来工作。尾数是对数的小数部分,它们表示没有大小的数字的有效数字的值(例如,31、3.1 和 310 的对数具有完全相同的尾数)。
的目的fabs(fx - fy) < limit
是允许在取对数时出现错误,这既是因为 的实现log10
不完善,也因为浮点格式会导致一些错误。(31 和 310 的对数的整数部分使用不同的位数,因此有效位数的位数不同,因此它们最终会被舍入到稍微不同的值。)只要整数类型明显更窄比double
类型,计算出来limit
的误差会远大于log10
. 因此,该测试fabs(fx - fy) < limit
基本上告诉我们如果精确计算,两个计算的尾数是否相等。
如果尾数不同,它们表示字典顺序,所以我们返回fx < fy
。如果它们相等,那么对数的整数部分告诉我们顺序,所以我们返回lx < ly
。
很容易测试是否log10
为每个 10 的幂返回正确的结果,因为它们太少了。如果没有,可以轻松进行调整:插入if (1-fx < limit) fx = 0; if (1-fu < limit) fy = 0;
。这允许log10
当它应该返回 5 时返回类似 4.99999 的东西。
这种方法的优点是不使用循环或除法(这在许多处理器上很耗时)。
如果您的所有数字都是非负数并且它们足够小以至于将它们乘以 10 不会导致溢出,则这是一个紧凑的解决方案:
template<class T> bool lex_less(T a, T b) {
unsigned la = 1, lb = 1;
for (T t = a; t > 9; t /= 10) ++la;
for (T t = b; t > 9; t /= 10) ++lb;
const bool ll = la < lb;
while (la > lb) { b *= 10; ++lb; }
while (lb > la) { a *= 10; ++la; }
return a == b ? ll : a < b;
}
像这样运行它:
#include <iostream>
#include <algorithm>
int main(int, char **) {
unsigned short input[] = { 100, 21 , 22 , 99 , 1 , 927 };
unsigned input_size = sizeof(input) / sizeof(input[0]);
std::sort(input, input + input_size, lex_less<unsigned short>);
for (unsigned i = 0; i < input_size; ++i) {
std::cout << ' ' << input[i];
}
std::cout << std::endl;
return 0;
}
这项任务听起来很适合带有填充的 Radix Sort 的 MSD 变体 ( http://en.wikipedia.org/wiki/Radix_sort )。
取决于你想扔多少代码。其他人显示的简单代码是 O(log n) 复杂度,而完全优化的基数排序是 O(kn)。
您可以尝试使用 % 运算符让您访问每个单独的数字,例如 121 % 100 会给您第一个数字并以这种方式检查,但您必须找到一种方法来解决它们具有不同大小的事实。
所以找到数组中的最大值。我不知道您是否可以尝试内置此功能。
int Max (int* pdata,int size)
{
int temp_max =0 ;
for (int i =0 ; i < size ; i++)
{
if (*(pdata+i) > temp_max)
{
temp_max = *(pdata+i);
}
}
return temp_max;
}
此函数将返回数字中的位数
int Digit_checker(int n)
{
int num_digits = 1;
while (true)
{
if ((n % 10) == n)
return num_digits;
num_digits++;
n = n/10;
}
return num_digits;
}
令 max 中的位数等于 n。一旦你以 for (int i = 1; i < n ; i++) 的格式打开一个 for 循环
然后你可以通过你的并使用“data[i] % (10^(ni))”来访问第一个数字然后对其进行排序,然后在下一次迭代中你将可以访问第二个数字。我不知道你会如何排序它们。
它不适用于负数,您必须绕过 data[i] % (10^(ni)) 为位数少于最大值的数字返回自身
重载 < 运算符以按字典顺序比较两个整数。对于每个整数,找到不小于给定整数的最小 10^k。而不是一一比较数字。
class CmpIntLex {
int up_10pow(int n) {
int ans = 1;
while (ans < n) ans *= 10;
return ans;
}
public:
bool operator ()(int v1, int v2) {
int ceil1 = up_10pow(v1), ceil2 = up_10pow(v2);
while ( ceil1 != 0 && ceil2 != 0) {
if (v1 / ceil1 < v2 / ceil2) return true;
else if (v1 / ceil1 > v2 / ceil2) return false;
ceil1 /= 10;
ceil2 /= 10;
}
if (v1 < v2) return true;
return false;
}
int main() {
vector<int> vi = {12,45,12134,85};
sort(vi.begin(), vi.end(), CmpIntLex());
}
虽然这里的其他一些答案(Lightness's, notbad's)已经显示出相当不错的代码,但我相信我可以添加一个性能更高的解决方案(因为它在每个循环中既不需要除法也不需要幂;但它需要浮点运算,这又是可能会使它变慢,并且对于大量数据可能不准确):
#include <algorithm>
#include <iostream>
#include <assert.h>
// method taken from http://stackoverflow.com/a/1489873/671366
template <class T>
int numDigits(T number)
{
int digits = 0;
if (number < 0) digits = 1; // remove this line if '-' counts as a digit
while (number) {
number /= 10;
digits++;
}
return digits;
}
bool lexiSmaller(int i1, int i2)
{
int digits1 = numDigits(i1);
int digits2 = numDigits(i2);
double val1 = i1/pow(10.0, digits1-1);
double val2 = i2/pow(10.0, digits2-1);
while (digits1 > 0 && digits2 > 0 && (int)val1 == (int)val2)
{
digits1--;
digits2--;
val1 = (val1 - (int)val1)*10;
val2 = (val2 - (int)val2)*10;
}
if (digits1 > 0 && digits2 > 0)
{
return (int)val1 < (int)val2;
}
return (digits2 > 0);
}
int main(int argc, char* argv[])
{
// just testing whether the comparison function works as expected:
assert (lexiSmaller(1, 100));
assert (!lexiSmaller(100, 1));
assert (lexiSmaller(100, 22));
assert (!lexiSmaller(22, 100));
assert (lexiSmaller(927, 99));
assert (!lexiSmaller(99, 927));
assert (lexiSmaller(1, 927));
assert (!lexiSmaller(927, 1));
assert (lexiSmaller(21, 22));
assert (!lexiSmaller(22, 21));
assert (lexiSmaller(22, 99));
assert (!lexiSmaller(99, 22));
// use the comparison function for the actual sorting:
int input[] = { 100 , 21 , 22 , 99 , 1 ,927 };
std::sort(&input[0], &input[5], lexiSmaller);
std::cout << "sorted: ";
for (int i=0; i<6; ++i)
{
std::cout << input[i];
if (i<5)
{
std::cout << ", ";
}
}
std::cout << std::endl;
return 0;
}
虽然我不得不承认我还没有测试过性能。
这是不使用任何浮点技巧的愚蠢解决方案。它与字符串比较几乎相同,但不使用字符串,也不处理负数,为此在顶部添加一个部分......
bool comp(int l, int r)
{
int lv[10] = {}; // probably possible to get this from numeric_limits
int rv[10] = {};
int lc = 10; // ditto
int rc = 10;
while (l || r)
{
if (l)
{
auto t = l / 10;
lv[--lc] = l - (t * 10);
l = t;
}
if (r)
{
auto t = r / 10;
rv[--rc] = r - (t * 10);
r = t;
}
}
while (lc < 10 && rc < 10)
{
if (lv[lc] == rv[rc])
{
lc++;
rc++;
}
else
return lv[lc] < rv[rc];
}
return lc > rc;
}
它很快,而且我敢肯定还有可能让它更快,但它确实有效,而且它很愚蠢,无法理解......
编辑:我吃了很多代码,但这是迄今为止所有解决方案的比较..
#include <iostream>
#include <vector>
#include <algorithm>
#include <iterator>
#include <random>
#include <vector>
#include <utility>
#include <cmath>
#include <cassert>
#include <chrono>
std::pair<int, int> lexicographic_pair_helper(int p, int maxDigits)
{
int digits = std::log10(p);
int l = p*std::pow(10, maxDigits-digits);
return {l, p};
}
bool l_comp(int l, int r)
{
int lv[10] = {}; // probably possible to get this from numeric_limits
int rv[10] = {};
int lc = 10; // ditto
int rc = 10;
while (l || r)
{
if (l)
{
auto t = l / 10;
lv[--lc] = l - (t * 10);
l = t;
}
if (r)
{
auto t = r / 10;
rv[--rc] = r - (t * 10);
r = t;
}
}
while (lc < 10 && rc < 10)
{
if (lv[lc] == rv[rc])
{
lc++;
rc++;
}
else
return lv[lc] < rv[rc];
}
return lc > rc;
}
int up_10pow(int n) {
int ans = 1;
while (ans < n) ans *= 10;
return ans;
}
bool l_comp2(int v1, int v2) {
int n1 = up_10pow(v1), n2 = up_10pow(v2);
while ( v1 != 0 && v2 != 0) {
if (v1 / n1 < v2 / n2) return true;
else if (v1 / n1 > v2 / n2) return false;
v1 /= 10;
v2 /= 10;
n1 /= 10;
n2 /= 10;
}
if (v1 == 0 && v2 != 0) return true;
return false;
}
int main()
{
std::vector<int> numbers;
{
constexpr int number_of_elements = 1E6;
std::random_device rd;
std::mt19937 gen( rd() );
std::uniform_int_distribution<> dist;
for(int i = 0; i < number_of_elements; ++i) numbers.push_back( dist(gen) );
}
std::vector<int> lo(numbers);
std::vector<int> dyp(numbers);
std::vector<int> nim(numbers);
std::vector<int> nb(numbers);
std::cout << "starting..." << std::endl;
{
auto start = std::chrono::high_resolution_clock::now();
/**
* Sorts the array lexicographically.
*
* The trick is that we have to compare digits left-to-right
* (considering typical Latin decimal notation) and that each of
* two numbers to compare may have a different number of digits.
*
* This probably isn't very efficient, so I wouldn't do it on
* "millions" of numbers. But, it works...
*/
std::sort(
std::begin(lo),
std::end(lo),
[](int lhs, int rhs) -> bool {
// Returns true if lhs < rhs
// Returns false otherwise
const auto BASE = 10;
const bool LHS_FIRST = true;
const bool RHS_FIRST = false;
const bool EQUAL = false;
// There's no point in doing anything at all
// if both inputs are the same; strict-weak
// ordering requires that we return `false`
// in this case.
if (lhs == rhs) {
return EQUAL;
}
// Compensate for sign
if (lhs < 0 && rhs < 0) {
// When both are negative, sign on its own yields
// no clear ordering between the two arguments.
//
// Remove the sign and continue as for positive
// numbers.
lhs *= -1;
rhs *= -1;
}
else if (lhs < 0) {
// When the LHS is negative but the RHS is not,
// consider the LHS "first" always as we wish to
// prioritise the leading '-'.
return LHS_FIRST;
}
else if (rhs < 0) {
// When the RHS is negative but the LHS is not,
// consider the RHS "first" always as we wish to
// prioritise the leading '-'.
return RHS_FIRST;
}
// Counting the number of digits in both the LHS and RHS
// arguments is *almost* trivial.
const auto lhs_digits = (
lhs == 0
? 1
: std::ceil(std::log(lhs+1)/std::log(BASE))
);
const auto rhs_digits = (
rhs == 0
? 1
: std::ceil(std::log(rhs+1)/std::log(BASE))
);
// Now we loop through the positions, left-to-right,
// calculating the digit at these positions for each
// input, and comparing them numerically. The
// lexicographic nature of the sorting comes from the
// fact that we are doing this per-digit comparison
// rather than considering the input value as a whole.
const auto max_pos = std::max(lhs_digits, rhs_digits);
for (auto pos = 0; pos < max_pos; pos++) {
if (lhs_digits - pos == 0) {
// Ran out of digits on the LHS;
// prioritise the shorter input
return LHS_FIRST;
}
else if (rhs_digits - pos == 0) {
// Ran out of digits on the RHS;
// prioritise the shorter input
return RHS_FIRST;
}
else {
const auto lhs_x = (lhs / static_cast<decltype(BASE)>(std::pow(BASE, lhs_digits - 1 - pos))) % BASE;
const auto rhs_x = (rhs / static_cast<decltype(BASE)>(std::pow(BASE, rhs_digits - 1 - pos))) % BASE;
if (lhs_x < rhs_x)
return LHS_FIRST;
else if (rhs_x < lhs_x)
return RHS_FIRST;
}
}
// If we reached the end and everything still
// matches up, then something probably went wrong
// as I'd have expected to catch this in the tests
// for equality.
assert("Unknown case encountered");
}
);
auto end = std::chrono::high_resolution_clock::now();
auto elapsed = end - start;
std::cout << "Lightness: " << elapsed.count() << '\n';
}
{
auto start = std::chrono::high_resolution_clock::now();
auto max = *std::max_element(begin(dyp), end(dyp));
int maxDigits = std::log10(max);
std::vector<std::pair<int,int>> temp;
temp.reserve(dyp.size());
for(auto const& e : dyp) temp.push_back( lexicographic_pair_helper(e, maxDigits) );
std::sort(begin(temp), end(temp), [](std::pair<int, int> const& l, std::pair<int, int> const& r)
{ if(l.first < r.first) return true; if(l.first > r.first) return false; return l.second < r.second; });
auto end = std::chrono::high_resolution_clock::now();
auto elapsed = end - start;
std::cout << "Dyp: " << elapsed.count() << '\n';
}
{
auto start = std::chrono::high_resolution_clock::now();
std::sort (nim.begin(), nim.end(), l_comp);
auto end = std::chrono::high_resolution_clock::now();
auto elapsed = end - start;
std::cout << "Nim: " << elapsed.count() << '\n';
}
// {
// auto start = std::chrono::high_resolution_clock::now();
// std::sort (nb.begin(), nb.end(), l_comp2);
// auto end = std::chrono::high_resolution_clock::now();
// auto elapsed = end - start;
// std::cout << "notbad: " << elapsed.count() << '\n';
// }
std::cout << (nim == lo) << std::endl;
std::cout << (nim == dyp) << std::endl;
std::cout << (lo == dyp) << std::endl;
// std::cout << (lo == nb) << std::endl;
}
根据@Oswald 的回答,下面是一些执行相同操作的代码。
#include <iostream>
#include <vector>
#include <algorithm>
using namespace std;
bool compare(string a, string b){
// Check each digit
int i = 0, j = 0;
while(i < a.size() && j < b.size()){
// If different digits
if(a[i] - '0' != b[j] - '0')
return (a[i] - '0' < b[j] - '0');
i++, j++;
}
// Different sizes
return (a.size() < b.size());
}
int main(){
vector<string> array = {"1","2","3","4","5","6","7","8","9","10","11","12"};
sort(array.begin(), array.end(), compare);
for(auto value : array)
cout << value << " ";
return 0;
}
输入:1 2 3 4 5 6 7 8 9 10 11 12
输出:1 10 11 12 2 3 4 5 6 7 8 9