[prev in list] [next in list] [prev in thread] [next in thread] 

List:       apache-stdcxx-dev
Subject:    [jira] Created: (STDCXX-397) std::sort introsort implementation
From:       "Joshua Lehrer (JIRA)" <jira () apache ! org>
Date:       2007-04-20 5:31:15
Message-ID: 12581384.1177047075317.JavaMail.jira () brutus
[Download RAW message or body]

std::sort introsort implementation error
----------------------------------------

                 Key: STDCXX-397
                 URL: https://issues.apache.org/jira/browse/STDCXX-397
             Project: C++ Standard Library
          Issue Type: Bug
          Components: 25. Algorithms
    Affects Versions: 4.1.3
         Environment: Bug in algorithm.cc effects all platforms
            Reporter: Joshua Lehrer


introsort is designed to detect when an input set would push quicksort into its \
worst-case scenario N^2 and fall back on a slower, yet still NLogN algorithm.

The implementation in __introsort_loop has a bug, however, and it fails to catch all \
of the scenarios.  While I can not supply an exact input set to demonstrate the bug, \
I can explain the bug very easily.

First, allow me to paste in the code:


// David R. Musser's Introspective Sorting algorithm
// O(N * log (N)) worst case complexity
_EXPORT
template <class _RandomAccessIter, class _Dist, class _Compare>
void __introsort_loop (_RandomAccessIter __first, _RandomAccessIter __last,
                       _Dist __max_depth, _Compare __comp)
{
    for (; __last - __first > __rw_threshold; __max_depth /= 2) {
        if (0 == __max_depth) {
            __partial_sort (__first, __last, __last,
                            _RWSTD_VALUE_TYPE (_RandomAccessIter), __comp);
            break;
        }

        _RandomAccessIter __cut =
            __unguarded_partition (__first, __last,
                                   __median (*__first,
                                             *(__first + (__last - __first) /2),
                                             *(__last - 1), __comp), __comp);

        // limit the depth of the recursion tree to log2 (last - first)
        // where first and last are the initial values passed in from sort()
        __introsort_loop (__cut, __last, __max_depth, __comp);
        __last = __cut;
    }
}

the variable '__max_depth' is supposed to be cut in half on each subsequent \
"recursive" call.  Once it reaches zero, LogN recurisve calls have been made, and the \
algorithm falls back on a different sorting algorithm for the remainder.

The algorithm, as implemented, uses real recursion and tail recursion.

First, the pivot is selected, the pivot is done, and the algorithm has a left and a \
right half, hopefully balanced.

Consider what happens for the LEFT half, which is done using tail recursion.  \
'__last' gets assigned '__cut', then the code goes to the top of the 'for' loop.  The \
test condition of the loop is run, which divides '__max_depth' by two, bringing it \
closer to zero.

Now consider what happens for the RIGHT half, which is done using real recursion.  \
The function is called recurisvely on the right.  __max_depth is NOT cut in half.

What would happen if a poor pivot was selected causing the right half to be large and \
the left half to be small?  What if that happens again and again?  The real-recursion \
case is failing to decrement __max_depth until it starts working on the left half.  \
You can see how if the algorithm continually built right-halves that were relatively \
large that __max_depth never gets decremented, and the algorithm never detects that \
it has made LogN recurisve calls.

I believe the proper fix is as follows:


// David R. Musser's Introspective Sorting algorithm
// O(N * log (N)) worst case complexity
_EXPORT
template <class _RandomAccessIter, class _Dist, class _Compare>
void __introsort_loop (_RandomAccessIter __first, _RandomAccessIter __last,
                       _Dist __max_depth, _Compare __comp)
{
    for (; __last - __first > __rw_threshold; ) {
        if (0 == __max_depth) {
            __partial_sort (__first, __last, __last,
                            _RWSTD_VALUE_TYPE (_RandomAccessIter), __comp);
            break;
        }

        _RandomAccessIter __cut =
            __unguarded_partition (__first, __last,
                                   __median (*__first,
                                             *(__first + (__last - __first) /2),
                                             *(__last - 1), __comp), __comp);

        // limit the depth of the recursion tree to log2 (last - first)
        // where first and last are the initial values passed in from sort()
        __max_depth /= 2;
        __introsort_loop (__cut, __last, __max_depth, __comp);
        __last = __cut;
    }
}

"__max_depth/=2" is removed from the "for" loop and placed just above the two \
recursive calls.

This fixes the worst-case sample set that I have generated.

I look forward to your response,

joshua lehrer
http://www.lehrerfamily.com/


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic