Abstract:
How to sort filenames like Picture17.png.

Created by Peter Kankowski
Last changed
Filed under Algorithms

Share on social sitesReddit Digg Delicious Buzz Facebook Twitter

Natural order sorting

Filenames often contain numbers, e.g., picture17.png. If you apply the usual string comparison, two-digit numbers will be mixed with one-digit numbers:

picture 1.png
picture 10.png
picture 100.png
picture 11.png
picture 2.png
picture 21.png
picture 2_10.png
picture 2_9.png
picture 3.png
picture 3b.png
picture A.png

The “natural” order for alphanumeric data would be:

picture 1.png
picture 2.png
picture 2_9.png
picture 2_10.png
picture 3.png
picture 3b.png
picture 10.png
picture 11.png
picture 21.png
picture 100.png
picture A.png

Windows Explorer and other file managers sort your filenames in this order.

Here is a simple comparison function to use with qsort:

int compare_naturally1(const void* a_ptr, const void* b_ptr) {
    const char * a = *(const char**)a_ptr;
    const char * b = *(const char**)b_ptr;

    while (*a == *b) {
        if(*a == '\0')
            return 0;
        a++, b++;
    }

    int diff = strspn(a, "0123456789") - strspn(b, "0123456789");
    if (diff)
        return diff;
    return *a - *b;
}

When the function finds the first difference, it counts the digits with strspn. The number with more digits is always larger (e.g., 100 > 2, even though '1' < '2'). If the count of digits is the same (e.g., 10 and 21), the values of characters are checked.

The problem is that the function undesirably sorts words before numbers (picture A < picture 1), because the words have zero digits. You cannot fix it by comparing only the items with digits, because this would break associativity (digits and letters would be compared incorrectly). A more complex algorithm is needed:

int count_digits(const char* s) {
    const char * p = s;
    while (isdigit(*p))
        p++;
    return (int)(p - s);
}

int compare_naturally(const void* a_ptr, const void* b_ptr) {
    const char * a = *(const char**)a_ptr;
    const char * b = *(const char**)b_ptr;

    for (;;) {
        if (isdigit(*a) && isdigit(*b)) {
            int a_count = count_digits(a);
            int diff = a_count - count_digits(b);
            if (diff)
                return diff;
            diff = memcmp(a, b, a_count);
            if (diff)
                return diff;
            a += a_count;
            b += a_count;
        }
        if (*a != *b)
            return *a - *b;
        if (*a == '\0')
            return 0;
        a++, b++;
    }
}

When the function finds a number in both strings, it counts the digits. Strspn is replaced here with a more effective custom function. If both numbers have an equal count of digits, their values are compared using memcmp.

Other solutions

Peter Kankowski
Peter Kankowski

About the author

Peter is the developer of Aba Search and Replace, a tool for replacing text in multiple files. He likes to program in C with a bit of C++, also in x86 assembly language, Python, and PHP. He can be reached at kankowski@gmail.com.

Created by Peter Kankowski
Last changed

Leave your comment

Your name:


Comment:


Please ignore this field: