Main

Week 5

In this lecture we will introduce templates and containers, starting with vector.

(Anti)-Motivational quote

“I have always wished for my computer to be as easy to use as my telephone; my wish has come true because I can no longer figure out how to use my telephone.” – Bjarne Stroustrup

A first look at containers

So far we have mostly limited ourselves to C arrays for managing collections of objects. C++ provides a much larger set of options for managing collections through the C++ standard template library (STL), which is available with all C++ compilers.

We will spend much more time on templates, containers and related topics; this is just an introduction and not meant to be complete. We will use TICPP Vol. 2 for additional reading (Chapter 7).

Brief intro to templates

Templates are a way to make "generic" datatypes that can then be specialized for different specific types. It is a powerful mechanism for reusing logic that is the same for data of any type.

For example, suppose you want to print some values, that can be integers, doubles, or strings. You could use function overloading to implement this:

  1. void print(int value) {
  2.      std::cout << "The value is " << value << std::endl;
  3. }
  4.  
  5. void print(double value) {
  6.      std::cout << "The value is " << value << std::endl;
  7. }
  8.  
  9. void print(std::string value) {
  10.      std::cout << "The value is " << value << std::endl;
  11. }

These are almost identical, the only difference is the type of the arguments. Templates enable us to write a single function that can be used for all three (and more) cases:

  1. template<typename T>
  2. void print(T value) {
  3.      std::cout << "The value is " << value << std::endl;
  4. }

The template<typename T> part tells the compiler that the following construct (in this case a function) is a template and that T is a template parameter that designates a type.

Instantiating templates

You can instantiate templates by specifying a concrete type to use anywhere the template parameter T is used. For example:

  1. int i=5;
  2. double d=4.75;
  3. std::string s("hello");
  4. bool b=false;
  5. print(i);  // T is int
  6. print(d); // T is double
  7. print(s); // T is std::string
  8. print(b); // T is bool

You don't have to explicitly assign anything to T, the compiler figures it out automatically based on the type of the argument in the calls to the print function. Internally, you object code will have four distinct definitions of print, but in your source code you only have to implement a single templated print function.

Containers

Containers are all implemented as templates. C++ containers include vector, list, deque, set, multiset, map, multimap, hash_set, hash_multiset, hash_map, and hash_multimap. Each of these classes is a template, and can be instantiated to contain any type of object.

A brief introduction is also available in TICPPv2: A First Look at containers.

Vectors

Suppose that T is any type or class - say int, float, double, or the name of a class, then

 	         vector<T> v;

declares a new and empty vector called v. Given object v declare like the above you can do the following things with it:

  • test to see if v is empty:
 		v.empty()
  • find how many items are in v:
 		v.size()
  • push t in T onto the end of v:
 		v.push_back(t)
  • pop the back of v off v:
 		v.pop_back()
  • Access the ith item (0<=i<size()) without checking to see if it exists:
 		v[i]
  • Assign a copy of v1 to v:
 		v = v1

Iterators

Some containers (like vectors) allow random access through direct indexing with [] similar to arrays. However, others (like set) do not. To do something with each element, we introduce the concept of an iterator.

An iterator is an object that can traverse (iterate over) any container class without the user having to know how the container is implemented. With many classes (particularly lists and the associative classes), iterators are the primary way elements of these classes are accessed. We will consider some examples of iterators below and in the lecture exercises (see week5/ in the lecture code repository).

Iterate forwards then backwards through a vector of ints.

  1. #include <iostream>
  2. #include <vector>
  3.  
  4. int main ()
  5. {
  6.   std::vector<int> myvector (5);  // 5 default-constructed ints
  7.  
  8.   // Iterate forwards, initializing the vector
  9.   std::cout << "forward:";
  10.   int i=0;
  11.  
  12.   std::vector<int>::iterator it = myvector.begin();
  13.   for (it = myvector.begin(); it!= myvector.end(); ++it) {
  14.     // it is a pointer to the current object
  15.     *it = ++i;  
  16.     std::cout << ' ' << (*it);
  17.   }
  18.   std::cout << std::endl;
  19.  
  20.  
  21.   // Iterate backwards
  22.   std::cout << "backward:";
  23.  
  24.   std::vector<int>::reverse_iterator rit = myvector.rbegin();
  25.   for (rit = myvector.rbegin(); rit!= myvector.rend(); ++rit)
  26.     // rit is a pointer to the current object
  27.     std::cout << ' ' << (*rit);
  28.   std::cout << std::endl;
  29.  
  30.   return 0;
  31. }

The following operations are defined for iterators:

  • get value of an iterator, e.g., int x = *it;
  • increment and decrement iterators it1++, it2--;
  • compare iterators by != and by <
  • add an immediate to iterator it += 20; <=> shift 20 elements forward
  • get the distance between iterators, int n = it2 - it1;

Vectors (in more depth)

Perhaps the simplest STL container is vector. Vector is just an array with extended functionality. Vector is the only container that is backward-compatible to native C code – this means that vector can be treated as an array, but with some additional features.

In the examples below, we assume that we are using namespace std;.

  1.  vector<int> v(10);
  2.  for(int i = 0; i < 10; i++) {
  3.       v[i] = (i+1)*(i+1);
  4.  }
  5.  for(int i = 9; i > 0; i--) {
  6.       v[i] -= v[i-1];
  7.  }

When you type

 vector<int> v; 

the empty vector is created. Be careful with constructions like this one:

 vector<int> v[10]; 

Here we declare v as an array of 10 vector<int>s, that is 10 vectors, which are initially empty. If you instead want a vector of 10 elements, use parenthesis:

 vector<int> v(10);

Note that the values of the vector will be initialized to 0. All vectors containing values of basic types will be initialized (unlike C arrays).

Also unlike a C array, a vector knows its size:

 int elements_count = v.size();  

Two things to keep in mind:

  • size() returns an unsigned integer, which may sometimes cause warnings when compiling, you can cast to int with static_cast<int>.
  • It is not a good practice to compare v.size() to zero if you want to know whether the container is empty. Instead, use the empty() function:
 bool is_nonempty_notgood = (v.size() >= 0); // Avoid!
 bool is_nonempty_ok = !v.empty(); // OK

This is because not all the containers can report their size efficiently, and you definitely should not require counting all elements in a double-linked list just to ensure that it contains at least one.

Another very popular function to use in vector is push_back, which adds an element to the end of vector, increasing its size by one. Consider the following example:

  1.  vector<int> v;
  2.  for(int i = 1; i < 1000000; i *= 2) {
  3.       v.push_back(i);
  4.  }
  5.  int elements_count = v.size();

Don’t worry about memory allocation -- vector will not allocate just one element each time. Instead, vector allocates more memory then it actually needs when adding new elements with push_back.

If you allocated a vector of a certain size and wish to resize it later, use the resize() function:

  1.  vector<int> v(20);
  2.  for(int i = 0; i < 20; i++) {
  3.       v[i] = i+1;
  4.  }
  5.  v.resize(25);
  6.  for(int i = 20; i < 25; i++) {
  7.       v[i] = i*2;
  8.  }

The resize() function makes vector contain the required number of elements. If you require less elements than vector already contain, the last ones will be deleted. If you ask a vector to grow, it will enlarge its size and fill the newly created elements with zeroes.

Note that if you use push_back() after resize(), it will add elements after the newly allocated size, but not into it. In the example above the size of the resulting vector is 25, while if we use push_back() in a second loop, it would be 30.

  1.  vector<int> v(20);
  2.  for(int i = 0; i < 20; i++) {
  3.       v[i] = i+1;
  4.  }
  5.  v.resize(25);
  6.  for(int i = 20; i < 25; i++) {
  7.       v.push_back(i*2); // Writes to elements with indices [25..30), not [20..25) ! <
  8.  }

To clear a vector, use the clear() member function. After v.clear(), vector v contains 0 elements. Be careful -- it does not just set the element values to 0 but completely erases the container.

There are many ways to initialize vector. You may create vector from another vector:

 vector<int> v1; 
 // ... 
 vector<int> v2 = v1; 
 vector<int> v3(v1); 

The initialization of v2 and v3 in the example above are exactly the same.

If you want to create a vector of a specific size, use the following constructor:

 vector<int> data(1000); 

In the example above, the data vector will contain 1,000 zeroes after creation.

Remember to use parentheses (), not brackets to specify the number of elements. If you want a vector to be initialized with something else, use a constructor such as this one:

 vector<string> names(20, “Unknown”); 

Remember that you can create vectors of any type.

Multidimensional arrays are very important. The simplest way to create the two-dimensional array via vector is to create a vector of vectors.

 vector< vector<int> > Matrix; 

To create the two-dimensional vector of a given size:

 int N, N; 
 // ... 
 vector< vector<int> > Matrix(N, vector<int>(M, -1)); 

Here we create a matrix of size N*M and fill it with -1.

The simplest way to add data to vector is to use push_back(). But what if we want to add data somewhere other than the end? There is the insert() member function for this purpose. And there is also the erase() member function to erase elements, as well. But first, we need to say a few words about iterators.

You should remember one more very important thing: When a vector is passed as a parameter to some function (not by reference), a copy of the vector is created. It may take a lot of time and memory to create new vectors when they are not really needed. Actually, it’s hard to find a task where the copying of vector is really needed when passing it as a parameter. So, you should never write:

 void some_function(vector<int> v) { // Almost never needed 
      // ... 
 } 

Instead, use the following construction:

 void some_function(const vector<int>& v) { // OK 
      // ... 
 } 

If you are going to change the contents of vector in the function, just omit the ‘const’ modifier.

 int modify_vector(vector<int>& v) { // Correct 
      V[0]++; 
 } 

Pairs, maps

Pairs

The std::pair is just a pair of elements of any type. If you were to define it yourself, it may look like this:

  1.  template<typename T1, typename T2> struct pair {
  2.       T1 first;
  3.       T2 second;
  4.  };

In general pair<int,int> is a pair of integer values. At a more complex level, pair<string, pair<int, int> > is a pair of string and two integers. In the second case, the usage may be like this:

  1.  pair<string, pair<int,int> > P;
  2.  string s = P.first; // extract string
  3.  int x = P.second.first; // extract first int
  4.  int y = P.second.second; // extract second int

One of the very useful features of pairs is that they have built-in operations to compare themselves. Pairs are compared first-to-second element. If the first elements are not equal, the result will be based on the comparison of the first elements only; the second elements will be compared only if the first ones are equal. The array (or vector) of pairs can thus easily be sorted by STL internal functions.

Maps and multimaps

The map and the multimap are both containers that manage key/value pairs as single components. The essential difference between the two is that in a map the keys must be unique, while a multimap permits duplicate keys. In both containers, the sort order of components is the sort order of the keys, with the values corresponding to keys determining the order for duplicate keys in a multimap. Nearly everything said here about maps is also true of multimaps, with minor differences showing up in the count() member function and one of the insert() member functions. Also you cannot index a multimap using the [] notation.

  1.  map<string, int> students;
  2.  students["CIS330"] = 69;
  3.  students["CIS314"] = 60;
  4.  students["CIS315"] = 54;
  5.  
  6.  int all = students["CIS330"] + students["CIS314"] + students["CIS315"];
  7.  
  8.  if(students.find("CIS314") != students.end()) {
  9.       students.erase(students.find("CIS314")); // or even students.erase("CIS314")
  10.  }

The map::operator[]

There is one important difference between map::find() and map::operator []. While map::find() will never change the contents of map, operator [] will create an element if it does not exist. In some cases, this could be very convenient, but it can also be a bad idea to use operator [] many times in a loop, when you do not want to add new elements. That’s why operator [] may not be used if map is passed as a const reference parameter to some function:

  1.  void f(const map<string, int>& M) {
  2.       if(M["the meaning"] == 42) { // Error! Cannot use [] on const map objects!
  3.       }
  4.       if(M.find("the meaning") != M.end() && M.find("the meaning")->second == 42) { // Correct
  5.            cout << "Don't Panic!" << endl;
  6.       }
  7.  }

Green Marinee theme adapted by David Gilbert, powered by PmWiki