## Analyzing STL lower_bound: why does it iterate the whole list twice?

I criticized an STL algorithm in a previous entry, calling it “unintuitive but correct.”. That’s pretty arrogant of me, considering the STL has been published for fifteen or so years. You’d think that if an STL algorithm were incorrect, someone would have noticed by now. Or taken differently, STL doesn’t need an obscure blogger’s approval.

Anyway, we’re here to criticize algorithms here, not listen to me moan.

Here’s the naive algorithm (in C++, yet! And corrected to *cough* test the termination condition):

``````
template< class iter, class val >
inline iter naive_lower_bound( iter l, iter u, const val& v)
{
iter r = l;

while( l != u && *l <= v )
{
r = l++;
}

return r;
}
``````

Here’s the STL lower_bound algorithm from the latest version of SGI STL. (That would be v3.3 released in 2000):

``````
template <class _ForwardIter, class _Tp, class _Distance>
_ForwardIter __lower_bound(_ForwardIter __first, _ForwardIter __last,
const _Tp& __val, _Distance*)
{
1  _Distance __len = 0;
2  distance(__first, __last, __len);
3  _Distance __half;
4  _ForwardIter __middle;
5
6  while (__len > 0) {
7    __half = __len >> 1;
8    __middle = __first;
10    if (*__middle < __val) {
11      __first = __middle;
12      ++__first;
13      __len = __len - __half - 1;
14    }
15    else
16      __len = __half;
17  }
18  return __first;
}
``````

You can tell it’s an STL algorithm because it’s elegant, graceful, and full of \$%! underscores. (Did Cfront need that horrible convention?)

So first let’s look at line 2. Calculating the distance() between two iterators is O(1) for a random access iterator, but O(n) for all other iterator types. The naive algorithm doesn’t calculate the distance at all.

Next think about the case where the correct lower bound is element `l`. The naive algorithm finds this in two list accesses: when *(l+1) is greater than v, it terminates and returns. STL walks all the way to the halfway element and inspects it first. It’s way too big, so we reduce len and partition again. We actually wind up traversing N elements to get the answer.

For the middle case or end case, a similar analysis shows that STL runs in Î©(n), about 2*N*t + C * lg N, where t and C are the average traversal and comparison costs. In contrast, the naive approach runs in an expected N*(C+t)/2 time. In the very worst case, drop the two.

So what’s the big deal? Both algorithms are O(n), differing only in a constant factor. Besides, the STL one is documented to use O(n) time but only do O(lg n) comparisons — a feature the naive algorithm notably lacks. So what’s the big deal?

The problem, in a word: caching. Next time: data!