c# - 根据大写字母的位置对字符串数组进行排序

Question

我有一个这样的数组 -

string[] input = new string[] {"bRad", "Charles", "sam", "lukE", "vIctor"}

现在我想根据每个字符串中出现的大写字母的位置对其进行排序。第一次出现是排序时唯一要考虑的出现。如果两个字符串在同一位置有 CAP，则按字母顺序对它们进行排序，同样适用于没有任何 CAP 的字符串，按字母顺序对其进行排序。

到目前为止，我所做的还不够好。我已经尝试了无数次来改进它，但没有运气。将有大量数据对此进行测试。所以性能是最重要的。我正在使用 .NET 2.0，并且不允许使用任何更高版本。

public static int q, p, i, s;
public static Dictionary<string, int> a = new Dictionary<string, int>();

Array.Sort(input, delegate (string x, string y) {

    if (x == y)
        return 0;

    if (a.TryGetValue(x + "|" + y, out s))
        return s;

    if (a.TryGetValue(y + "|" + x, out s))
        return -s;

    q = x.Length;
    p = y.Length;

    for (i = 0; i < x.Length; i++)
    {
        if (x[i] < 91)
        {
            q = i;
            break;
        }
    }

    for (i = 0; i < y.Length; i++)
    {
        if (y[i] < 91)
        {
            p = i;
            break;
        }
    }

    if (q == x.Length && p == y.Length)
        s = x.CompareTo(y);
    else if (q > p)
        s = 1;
    else if (q < p)
        s = -1;
    else
        s = x.CompareTo(y);

    a.Add(x + "|" + y, s);

    return s;

});

score 2 · Accepted Answer

你需要在这里考虑你的算法:-)

How many elements are going to be in your dictionary? Well, since you're putting "{x} | {y}" into the dictionary for every element x, y in your array, that's n ^ 2 for an n-element array. Not a good idea.

This isn't necessarily the best solution (I've not thought about that yet), but for a start:

Only store the position of the first Capital in the dictionary for a particular word, not for combinations.

Now your delegate becomes:

delegate (string x, string y) {

    if (x == y)   // NOT A GOOD IDEA - 
                  // THE Sort method should not call a string with itself,
                  // and if this is doing string comparison
                  // (I'm rusty on C#), you're
                  // wasting a comparison if there's a mismatch
        return 0;

    int xCapitalPos, yCapitalPos;
    if (!a.TryGetValue(x, out xCapitalPos)) {
        // compute xCapitalPos and add it to dictionary a
    }
    if (!a.TryGetValue(y, out yCapitalPos)) {
        // compute yCapitalPos and add it to dictionary a
    }
    int delta = xCapitalPos - yCapitalPos;
    if (0!=delta) {
        return delta;
    } else {
        return x.compareTo(y);
    }
}

That's where I would start. See how you do, then consider how you might do better from there...

--- 5 minutes later, cup of coffee in hand

Ok, I've just thought of how I might improve it!

Don't use compareTo, which does a string comparison. Write your own string comparison function, that given 2 strings, will do a string comparison while taking capital location into account. Then you can drop the dictionary and everything else: it won't be necessary, since the Sort method (which I presume is implemented properly as a QuickSort or a MergeSort or something efficient) will ensure you don't do more comparisons than necessary.

All the best, C

score 2 · Accepted Answer

I guess there are a couple of optimisations (it should already be damned fast).

Firstly, as suggested above there's really no need for a dictionary since the Sort algorithm itself optimises the need for that out.

Second, loop through x and y at the same time, and break out as early as possible (that is, if one finds a capital and the other doesn't then exit early). Minor savings here.

That should just about do it (and be simpler). You're basically rewriting String.CompareTo only, and efficiently, and relying on Array.Sort to do the rest.

score 2 · Accepted Answer

Removing the dictionary for your cache alone sped it up (my example of 15000 values, with up to 500 chars per value) went from 2449.51ms with the dictionary, and after removing that went down to 58.72ms

I tried "craigmj"'s idea of caching the individual values position which is faster then doing the concat but it seems with my random data no cache was still faster.

Here is some code to test out... this runs in 30ms compared to 2559ms (original)

Stopwatch stopWatch = new Stopwatch();
stopWatch.Start();
Array.Sort(input, delegate(string x, string y)
{
    if (x == y)
        return 0;

    int shortestLength = Math.Min(x.Length, y.Length);

    for (i = 0; i < shortestLength; i++)
    {
        if (x[i] < 91 && y[i] < 91)
            return x.CompareTo(y);
        else if (x[i] < 91)
            return -1;
        else if (y[i] < 91)
            return 1;
    }
    return x.CompareTo(y);
});

stopWatch.Stop();
double ms = (stopWatch.ElapsedTicks * 1000.0) / Stopwatch.Frequency;
Debug.WriteLine("Optimized Time: " + ms);

code to continue checking for Capital

Stopwatch stopWatch = new Stopwatch();
stopWatch.Start();
Array.Sort(input, delegate(string x, string y)
{
    if (x == y)
        return 0;

    int xlen = x.Length;
    int ylen = y.Length;
    int longestLength = Math.Max(xlen, ylen);

    for (i = 0; i < longestLength; i++)
    {
        if (i < xlen && i < ylen && x[i] < 91 && y[i] < 91)
            return x.CompareTo(y);
        else if (i < xlen && x[i] < 91)
            return -1;
        else if (i < ylen && y[i] < 91)
            return 1;
    }
    return x.CompareTo(y);
});

stopWatch.Stop();
double ms = (stopWatch.ElapsedTicks * 1000.0) / Stopwatch.Frequency;
Debug.WriteLine("Optimized Time: " + ms);

score 1 · Accepted Answer

Here's some (pseudo) code for the delegate that does a Capital-and-String position match:

delegate (string lhs, string rhs) {
    int llength = lhs.Length
    int rlength = rhs.Length

    // The value we will return 
    // <0 => lhs < rhs
    // ==0 => lhs == rhs
    // >0 => lhs > rhs  
    int ret = 0;
    Boolean uppercaseFoundAndEqual = false;

    for (int i=0; i<llength; i++) {
        if (i>=rlength) {
            // We've exhausted the rhs, but not the lhs
            return (0==ret) ? -1 : ret;
        }

        Char l = lhs[i];
        Char r = rhs[i];

        // We only worry about the case position if we've not yet found
        // an uppercase char in either string
        if (!uppercaseFoundAndEqual) {
            Boolean lUpper = (('A'<=l) && ('Z'>=l));
            Boolean rUpper = (('A'<=r) && ('Z'>=r));

            // If we've encountered an upper-lower difference, we return 
            int delta = (lUpper ? 1 : 0) - (rUpper ? 1 : 0);
            if (0!=delta) return delta;

            if (lUpper) {   // Both are upper case - by our delta comparison, we know 
                            // lUpper==rUpper
                if (0!=ret) return ret; // Return based on previous case comparison

                // Otherwise we've found an uppercase, now we're just doing
                // standard string comparison
                uppercaseFoundAndEqual = true;
            }
        }

        if (0==ret) {   // If we're still equal to this point, standard char comparison
            ret = l-r;
        }
        if (uppercaseFoundAndEqual && (0!=ret)) {
            return ret;
        }
    }
    if (i<rlength) {
        // We've exhausted the lhs, but not the rhs
        return (0==ret) ? 1 : ret;
    }
    return 0;   // We exhausted both strings and they're identical
}

Please take careful note:

I've not tested this! (Hence the 'pseudo-code' comment) I don't have C#, so it's based on a little Googling for C# syntax. Please correct if I've made errors!
This is not using Internationalization. In other words, this is a very basic ASCII comparison. Yuck! You should look at using System.Globalization.StringInfo, but I'm afraid to code that just based on some Googling!
Your description doesn't describe what happens when two strings differ in length without capitalization. For example, is 'aa' greater than or less than 'aaA'? I've implemented based on the idea that we ignore capitalization outside the 'intersect' area, but it wouldn't be difficult to change that.
(for emphasis) I've not tested this! Mileage may vary, but I believe the general idea is good.

c# - 根据大写字母的位置对字符串数组进行排序

4 回答 4

Related

Reference