Main

Week 7 (cont.)

In this lecture we will look more closely at the STL algorithms and how they are used with containers and iterators. We will consider different types of iterators. Code is in week7 directory of the lecture code repository.

STL algorithms

Reference: TICPPv2 Chapter 6; A list of algorithms organized by purpose: http://cs.stmarys.ca/~porter/csc/ref/stl/alg_by_purpose.html

STL algorithms define computations over STL containers in a generic way, allowing us to reuse their functionality for a variety of data structures.

Let's start with a simple example -- computing the average value from a vector of ints.

  1. #include <iostream>
  2. #include <fstream>
  3. #include <set>
  4.  
  5. using namespace std;
  6. int main() {
  7.     ifstream input("data.txt");
  8.     multiset<int> values;
  9.     /* Read the data from the file. */
  10.     int currValue;
  11.     while (input >> currValue)
  12.         values.insert(currValue);
  13.     /* Compute the average. */
  14.     double total = 0.0;
  15.     for (multiset<int>::iterator itr = values.begin();
  16.             itr != values.end(); ++itr)
  17.         total += *itr;
  18.     cout << "Average is: " << total / values.size() << endl;
  19. }

As written, this code is perfectly legal and will work as intended. However, there's something slightly odd about it. If you were to describe what this program needs to do in plain English, it would probably be something like this:

  1. Read the contents of the file.
  2. Add the values together.
  3. Divide by the number of elements.

The above code matches this description, but not exactly. The first loop of the program reads in the contents of the file, the second loop sums together the values, and the last line divides by the number of elements. However, the code we've written is somewhat unsatisfactory. Consider this first loop:

 int currValue;
 while (input >> currValue)
 values.insert(currValue);

This implements the "Read contents of the file" step, however, the code itself is reading single integer at a time and inserting it into the multiset. We will deal with this in a later lecture (when we talk of iterator adaptors, but you can take a look at the "Version 3: iterator adaptors" portion of this tutorial.

Similarly for the summation loop:

  1.     for (multiset<int>::iterator itr = values.begin();
  2.             itr != values.end(); ++itr)
  3.         total += *itr;

We think of "adding the values together", but the above code goes through the values one by one and adds them to the total variable.

A better way to implement this example

Consider the loop to sum all values:

  1. double total = 0.0;
  2. for (multiset<int>::iterator itr = values.begin(); itr != values.end(); ++itr)
  3.     total += *itr;
  4. cout << "Average is: " << total / values.size() << endl;

This code is entirely equivalent to the following:

  1. cout << accumulate(values.begin(), values.end(), 0.0) / values.size() << endl;

We've replaced the entire for loop with a single call to accumulate, eliminating about a third of the code from our original program.

The accumulate function is defined in the <numeric> header and takes three parameters – two iterators that define a range of elements, and an initial value to use in the summation. It then computes the sum of all of the elements contained in the range of iterators, plus the base value. Like other STL algorithms, accumulate can be used with iterators of any type (e.g., iterators from a multiset, vector, or deque). This means that whenever you need to compute the sum of the elements in a container (assuming that the binary operator+ is defined on these elements), you can pass the begin() and end() iterators of the container to accumulate to get the sum. You can also specify an iterator range other than begin and end, for example, to sum only the elements with values between 50 and 75, we could write

 accumulate(values.lower_bound(50), values.upper_bound(75), 0);

Behind the scenes, accumulate is implemented as a template function that accepts two iterators and simply uses a loop to sum together the values. Here's one possible implementation of accumulate:

  1.  template <typename InputIterator, typename Type> inline
  2.  Type accumulate(InputIterator start, InputIterator stop, Type initial) {
  3.      while(start != stop) {
  4.          initial += *start;
  5.          ++start;
  6.       }
  7.       return initial;
  8.  }

The code is just a normal iterator loop, just like you would write if you are doing it on your own.

If STL algorithms are just functions that use loops that you could write, why bother with them?

  • Simplicity: With STL algorithms, you can leverage off of code that's already been written for you rather than reinventing the code from scratch.
  • Correctness: If you had to rewrite all the algorithms from scratch every time you needed to use them, odds are that at some point you'd slip up and make a mistake.
  • Speed: In general, you can assume that if there's an STL algorithm that performs a task, it's going to be faster than most code you could write by hand.
  • Clarity: With algorithms, you can immediately tell that a call to accumulate adds up numbers in a range. With a for loop that sums up values, you'd have to read each line in the loop before you understood what the code did.

Algorithm Naming Conventions

There are over fifty STL algorithms (defined either in <algorithm> or in <numeric>), and memorizing them all is not out goal here. What you should do, however, is when you are thinking of some operation over a container or multiple containers, go and check first whether an algorithm already exists before writing your own loops.

_if

STL algorithms follow certain naming conventions that make them even easier to use. The suffix _if on an algorithm (replace_if, count_if, etc.) means the algorithm will perform a task on elements only if they meet a certain criterion. Functions ending in _if require you to pass in a predicate function that accepts an element and returns a bool indicating whether the element matches the criterion. For example consider the count algorithm and its counterpart count_if. count accepts a range of iterators and a value, then returns the number of times that the value appears in that range. If we have a vector<int> of several integer values, we could print out the number of copies of the number 42 in that vector as follows:

    cout << count(myVec.begin(), myVec.end(), 42) << endl;

count_if, on the other hand, accepts a range of iterators and a predicate function, then returns the number of times the predicate evaluates to true in that range. If we were interested in how number of even numbers are contained in a vector<int>, we could could obtain the value as follows. First, we write a predicate function that takes in an int and returns whether it's even, as shown here:

    bool IsEven(int value) {
        return value % 2 == 0;
    }

We could then use count_if as follows:

 cout << count_if(myVec.begin(), myVec.end(), IsEven) << endl;

_copy

Algorithms whose names contain the word copy (remove_copy, partial_sort_copy, etc.) will perform some task on a range of data and store the result in the location pointed at by an extra iterator parameter. With copy functions, you'll specify all the normal data for the algorithm plus an extra iterator specifying a destination for the result.

_n

If an algorithm's name ends in _n (generate_n, search_n, etc), then it will perform a certain operation n times. These functions are useful for cases where the number of times you perform an operation is meaningful, rather than the range over which you perform it. To give you a better feel for what this means, consider the fill and fill_n algorithms. Each of these algorithms sets a range of elements to some specified value. For example, we could use fill as follows to set every element in a deque to have value 0:

    fill(myDeque.begin(), myDeque.end(), 0);

The fill_n algorithm is similar to fill, except that instead of accepting a range of iterators, it takes in a start iterator and a number of elements to write. For instance, we could set the first ten elements of a deque to be zero by calling

    fill_n(myDeque.begin(), 10, 0);

Types of iterators

Because not all STL iterators can efficiently or legally perform all of the functions of every other iterator, STL iterators are categorized based on their relative power. At the high end are random-access iterators that can perform all of the possible iterator functions, and at the bottom are the input and output iterators which guarantee only a minimum of functionality. There are five different types of iterators, each of which is discussed in short detail below.

  • Output Iterators. Output iterators are one of the two weakest types of iterators. With an output iterator, you can write values using the syntax *myItr = value and can advance the iterator forward one step using the ++ operator. However, you cannot read a value from an output iterator using the syntax value = *myItr, nor can you use the += or operators.
  • Input Iterators. Input iterators are similar to output iterators except that they read values instead of writing them. That is, you can write code along the lines of value = *myItr, but not *myItr = value. Moreover, input iterators cannot iterate over the same range twice.
  • Forward Iterators. Forward iterators combine the functionality of input and output iterators so that most intuitive operations are well-defined. With a forward iterator, you can write both *myItr = value and value = *myItr. Forward iterators, as their name suggests, can only move forward. Thus, ++myItr is legal, but --myItr is not.
  • Bidirectional Iterators. Bidirectional iterators are the iterators exposed by map and set and en- compass all of the functionality of forward iterators. Additionally, they can move backwards with the decrement operator. Thus it's possible to write --myItr to go back to the last element you visited, or even to traverse a list in reverse order. However, bidirectional iterators cannot respond to the + or += operators.
  • Random-Access Iterators. Don't get tripped up by the name – random-access iterators don't move around randomly. Random-access iterators get their name from their ability to move forward and backward by arbitrary amounts at any point. These are the iterators employed by vector and deque and represent the maximum possible functionality, including iterator-from-iterator subtraction, bracket syntax, and incrementing with + and +=.

If you'll notice, each class of iterators is progressively more powerful than the previous one – that is, the iterators form a functionality hierarchy. This means that when a library function requires a certain class of iterator, you can provide it any iterator that's at least as powerful. For example, if a function requires a forward iterator, you can provide either a forward, bidirectional, or random-access iterator. The iterator hierarchy is illustrated below:

Why categorize iterators?

  • Efficiency: In some cases, certain iterator operations cannot be performed efficiently. For instance, the STL map and set are layered on top of balanced binary trees, a structure in which it is simple to move from one element to the next but significantly more complex to jump from one position to another arbitrarily. By disallowing the + operator on map and set iterators, the STL designers prevent subtle sources of inefficiency where simple code like itr + 5 is unreasonably inefficient.
  • Better classification of the STL algorithms. For example, suppose that an algorithm takes as input a pair of input iterators. From this, we can tell that the algorithm will not modify the elements being iterated over, and so can feel free to pass in iterators to data that must not be modified under any circumstance. Similarly, if an algorithm has a parameter that is labeled as an output iterator, it should be clear from context that the iterator parameter defines where data generated by the algorithm should be written.

Lambda expressions

Lambda expressions are often used in conjunction with algorithms when some functionality is only needed in a single context (so if you were to write a function, it would only be called in one place in your code).

Reference: Lambda expressions intro (2 pages), and video (25mins). As always cppreference.com has the complete details!

Green Marinee theme adapted by David Gilbert, powered by PmWiki