MindView Inc.
[ Viewing Hints ] [ Revision History ] [ Free Newsletter ]
[ Seminars ] [ Seminars on CD ROM ] [ Consulting ]

Thinking in C++, 2nd ed., Volume 2, Revision 3

©2000 by Bruce Eckel

[ Previous Chapter ] [ Short TOC ] [ Table of Contents ] [ Index ] [ Next Chapter ]

4: STL Containers & Iterators

Container classes are the solution to a specific kind of code reuse problem. They are building blocks used to create object-oriented programs – they make the internals of a program much easier to construct.

A container class describes an object that holds other objects. Container classes are so important that they were considered fundamental to early object-oriented languages. In Smalltalk, for example, programmers think of the language as the program translator together with the class library, and a critical part of that library is the container classes. So it became natural that C++ compiler vendors also include a container class library. You’ll note that the vector was so useful that it was introduced in its simplest form very early in this book.

Like many other early C++ libraries, early container class libraries followed Smalltalk’s object-based hierarchy, which worked well for Smalltalk, but turned out to be awkward and difficult to use in C++. Another approach was required.

This chapter attempts to slowly work you into the concepts of the C++ Standard Template Library (STL), which is a powerful library of containers (as well as algorithms, but these are covered in the following chapter). In the past, I have taught that there is a relatively small subset of elements and ideas that you need to understand in order to get much of the usefulness from the STL. Although this can be true it turns out that understanding the STL more deeply is important to gain the full power of the library. This chapter and the next probe into the STL containers and algorithms.

Containers and iterators

If you don’t know how many objects you’re going to need to solve a particular problem, or how long they will last, you also don’t know how to store those objects. How can you know how much space to create? You can’t, since that information isn’t known until run time.

The solution to most problems in object-oriented design seems flippant: you create another type of object. For the storage problem, the new type of object holds other objects, or pointers to objects. Of course, you can do the same thing with an array, but there’s more. This new type of object, which is typically referred to in C++ as a container (also called a collection in some languages), will expand itself whenever necessary to accommodate everything you place inside it. So you don’t need to know how many objects you’re going to hold in a collection. You just create a collection object and let it take care of the details.

Fortunately, a good OOP language comes with a set of containers as part of the package. In C++, it’s the Standard Template Library (STL). In some libraries, a generic container is considered good enough for all needs, and in others (C++ in particular) the library has different types of containers for different needs: a vector for consistent access to all elements, and a linked list for consistent insertion at all elements, for example, so you can choose the particular type that fits your needs. These may include sets, queues, hash tables, trees, stacks, etc.

All containers have some way to put things in and get things out. The way that you place something into a container is fairly obvious. There’s a function called “push” or “add” or a similar name. Fetching things out of a container is not always as apparent; if it’s an array-like entity such as a vector, you might be able to use an indexing operator or function. But in many situations this doesn’t make sense. Also, a single-selection function is restrictive. What if you want to manipulate or compare a group of elements in the container?

The solution is an iterator, which is an object whose job is to select the elements within a container and present them to the user of the iterator. As a class, it also provides a level of abstraction. This abstraction can be used to separate the details of the container from the code that’s accessing that container. The container, via the iterator, is abstracted to be simply a sequence. The iterator allows you to traverse that sequence without worrying about the underlying structure – that is, whether it’s a vector, a linked list, a stack or something else. This gives you the flexibility to easily change the underlying data structure without disturbing the code in your program.

From the design standpoint, all you really want is a sequence that can be manipulated to solve your problem. If a single type of sequence satisfied all of your needs, there’d be no reason to have different kinds. There are two reasons that you need a choice of containers. First, containers provide different types of interfaces and external behavior. A stack has a different interface and behavior than that of a queue, which is different than that of a set or a list. One of these might provide a more flexible solution to your problem than the other. Second, different containers have different efficiencies for certain operations. The best example is a vector and a list. Both are simple sequences that can have identical interfaces and external behaviors. But certain operations can have radically different costs. Randomly accessing elements in a vector is a constant-time operation; it takes the same amount of time regardless of the element you select. However, in a linked list it is expensive to move through the list to randomly select an element, and it takes longer to find an element if it is further down the list. On the other hand, if you want to insert an element in the middle of a sequence, it’s much cheaper in a list than in a vector. These and other operations have different efficiencies depending upon the underlying structure of the sequence. In the design phase, you might start with a list and, when tuning for performance, change to a vector. Because of the abstraction via iterators, you can change from one to the other with minimal impact on your code.

In the end, remember that a container is only a storage cabinet to put objects in. If that cabinet solves all of your needs, it doesn’t really matter how it is implemented (a basic concept with most types of objects). If you’re working in a programming environment that has built-in overhead due to other factors, then the cost difference between a vector and a linked list might not matter. You might need only one type of sequence. You can even imagine the “perfect” container abstraction, which can automatically change its underlying implementation according to the way it is used.

STL reference documentation

You will notice that this chapter does not contain exhaustive documentation describing each of the member functions in each STL container. Although I describe the member functions that I use, I’ve left the full descriptions to others: there are at least two very good on-line sources of STL documentation in HTML format that you can keep resident on your computer and view with a Web browser whenever you need to look something up. The first is the Dinkumware library (which covers the entire Standard C and C++ library) mentioned at the beginning of this book section (page XXX). The second is the freely-downloadable SGI STL and documentation, freely downloadable at http://www.sgi.com/Technology/STL/. These should provide complete references when you’re writing code. In addition, the STL books listed in Appendix XX will provide you with other resources.

The Standard Template Library

The C++ STL[15] is a powerful library intended to satisfy the vast bulk of your needs for containers and algorithms, but in a completely portable fashion. This means that not only are your programs easier to port to other platforms, but that your knowledge itself does not depend on the libraries provided by a particular compiler vendor (and the STL is likely to be more tested and scrutinized than a particular vendor’s library). Thus, it will benefit you greatly to look first to the STL for containers and algorithms, before looking at vendor-specific solutions.

A fundamental principle of software design is that all problems can be simplified by introducing an extra level of indirection. This simplicity is achieved in the STL using iterators to perform operations on a data structure while knowing as little as possible about that structure, thus producing data structure independence. With the STL, this means that any operation that can be performed on an array of objects can also be performed on an STL container of objects and vice versa. The STL containers work just as easily with built-in types as they do with user-defined types. If you learn the library, it will work on everything.

The drawback to this independence is that you’ll have to take a little time at first getting used to the way things are done in the STL. However, the STL uses a consistent pattern, so once you fit your mind around it, it doesn’t change from one STL tool to another.

Consider an example using the STL set class. A set will allow only one of each object value to be inserted into itself. Here is a simple set created to work with ints by providing int as the template argument to set:

//: C04:Intset.cpp
// Simple use of STL set
#include <set>
#include <iostream>
using namespace std;

int main() {
 set<int> intset;
  for(int i = 0; i < 25; i++)
    for(int j = 0; j < 10; j++)
      // Try to insert multiple copies:
      intset.insert(j);
  // Print to output:
 copy(intset.begin(), intset.end(),
   ostream_iterator<int>(cout, "\n"));

} ///:~

The insert( ) member does all the work: it tries putting the new element in and rejects it if it’s already there. Very often the activities involved in using a set are simply insertion and a test to see whether it contains the element. You can also form a union, intersection, or difference of sets, and test to see if one set is a subset of another.

In this example, the values 0 - 9 are inserted into the set 25 times, and the results are printed out to show that only one of each of the values is actually retained in the set.

The copy( ) function is actually the instantiation of an STL template function, of which there are many. These template functions are generally referred to as “the STL Algorithms” and will be the subject of the following chapter. However, several of the algorithms are so useful that they will be introduced in this chapter. Here, copy( ) shows the use of iterators. The set member functions begin( ) and end( ) produce iterators as their return values. These are used by copy( ) as beginning and ending points for its operation, which is simply to move between the boundaries established by the iterators and copy the elements to the third argument, which is also an iterator, but in this case, a special type created for iostreams. This places int objects on cout and separates them with a newline.

Because of its genericity, copy( ) is certainly not restricted to printing on a stream. It can be used in virtually any situation: it needs only three iterators to talk to. All of the algorithms follow the form of copy( ) and simply manipulate iterators (the use of iterators is the “extra level of indirection”).

Now consider taking the form of Intset.cpp and reshaping it to display a list of the words used in a document. The solution becomes remarkably simple.

//: C04:WordSet.cpp
#include "../require.h"
#include <string>
#include <fstream>
#include <iostream>
#include <set>
using namespace std;

int main(int argc, char* argv[]) {
  requireArgs(argc, 1);
  ifstream source(argv[1]);
  assure(source, argv[1]);
  string word;
  set<string> words;
  while(source >> word)
    words.insert(word);
  copy(words.begin(), words.end(),
    ostream_iterator<string>(cout, "\n"));
  cout << "Number of unique words:" 
    << words.size() << endl;

} ///:~

The only substantive difference here is that string is used instead of int. The words are pulled from a file, but everything else is the same as in Intset.cpp. The operator>> returns a whitespace-separated group of characters each time it is called, until there’s no more input from the file. So it approximately breaks an input stream up into words. Each string is placed in the set using insert( ), and the copy( ) function is used to display the results. Because of the way set is implemented (as a tree), the words are automatically sorted.

Consider how much effort it would be to accomplish the same task in C, or even in C++ without the STL.

The basic concepts

The primary idea in the STL is the container (also known as a collection), which is just what it sounds like: a place to hold things. You need containers because objects are constantly marching in and out of your program and there must be someplace to put them while they’re around. You can’t make named local objects because in a typical program you don’t know how many, or what type, or the lifetime of the objects you’re working with. So you need a container that will expand whenever necessary to fill your needs.

All the containers in the STL hold objects and expand themselves. In addition, they hold your objects in a particular way. The difference between one container and another is the way the objects are held and how the sequence is created. Let’s start by looking at the simplest containers.

A vector is a linear sequence that allows rapid random access to its elements. However, it’s expensive to insert an element in the middle of the sequence, and is also expensive when it allocates additional storage. A deque is also a linear sequence, and it allows random access that’s nearly as fast as vector, but it’s significantly faster when it needs to allocate new storage, and you can easily add new elements at either end (vector only allows the addition of elements at its tail). A list the third type of basic linear sequence, but it’s expensive to move around randomly and cheap to insert an element in the middle. Thus list, deque and vector are very similar in their basic functionality (they all hold linear sequences), but different in the cost of their activities. So for your first shot at a program, you could choose any one, and only experiment with the others if you’re tuning for efficiency.

Many of the problems you set out to solve will only require a simple linear sequence like a vector, deque or list. All three have a member function push_back( ) which you use to insert a new element at the back of the sequence (deque and list also have push_front( )).

But now how do you retrieve those elements? With a vector or deque, it is possible to use the indexing operator[ ], but that doesn’t work with list. Since it would be nicest to learn a single interface, we’ll often use the one defined for all STL containers: the iterator.

An iterator is a class that abstracts the process of moving through a sequence. It allows you to select each element of a sequence without knowing the underlying structure of that sequence. This is a powerful feature, partly because it allows us to learn a single interface that works with all containers, and partly because it allows containers to be used interchangeably.

One more observation and you’re ready for another example. Even though the STL containers hold objects by value (that is, they hold the whole object inside themselves) that’s probably not the way you’ll generally use them if you’re doing object-oriented programming. That’s because in OOP, most of the time you’ll create objects on the heap with new and then upcast the address to the base-class type, later manipulating it as a pointer to the base class. The beauty of this is that you don’t worry about the specific type of object you’re dealing with, which greatly reduces the complexity of your code and increases the maintainability of your program. This process of upcasting is what you try to do in OOP with polymorphism, so you’ll usually be using containers of pointers.

Consider the classic “shape” example where shapes have a set of common operations, and you have different types of shapes. Here’s what it looks like using the STL vector to hold pointers to various types of Shape created on the heap:

//: C04:Stlshape.cpp
// Simple shapes w/ STL
#include <vector>
#include <iostream>
using namespace std;

class Shape {
public:
  virtual void draw() = 0;
  virtual ~Shape() {};
};

class Circle : public Shape {
public:
  void draw() { cout << "Circle::draw\n"; }
  ~Circle() { cout << "~Circle\n"; }
};

class Triangle : public Shape {
public:
  void draw() { cout << "Triangle::draw\n"; }
  ~Triangle() { cout << "~Triangle\n"; }
};

class Square : public Shape {
public:
  void draw() { cout << "Square::draw\n"; }
  ~Square() { cout << "~Square\n"; }
};

typedef std::vector<Shape*> Container;
typedef Container::iterator Iter;

int main() {
  Container shapes;
  shapes.push_back(new Circle);
  shapes.push_back(new Square);
  shapes.push_back(new Triangle);
  for(Iter i = shapes.begin();
      i != shapes.end(); i++)
    (*i)->draw();
  // ... Sometime later:
  for(Iter j = shapes.begin();
      j != shapes.end(); j++)
    delete *j;

} ///:~

The creation of Shape, Circle, Square and Triangle should be fairly familiar. Shape is a pure abstract base class (because of the pure specifier =0) that defines the interface for all types of shapes. The derived classes redefine the virtual function draw( ) to perform the appropriate operation. Now we’d like to create a bunch of different types of Shape object, but where to put them? In an STL container, of course. For convenience, this typedef:

typedef std::vector<Shape*> Container;

creates an alias for a vector of Shape*, and this typedef:

typedef Container::iterator Iter;

uses that alias to create another one, for vector<Shape*>::iterator. Notice that the container type name must be used to produce the appropriate iterator, which is defined as a nested class. Although there are different types of iterators (forward, bidirectional, reverse, etc., which will be explained later) they all have the same basic interface: you can increment them with ++, you can dereference them to produce the object they’re currently selecting, and you can test them to see if they’re at the end of the sequence. That’s what you’ll want to do 90% of the time. And that’s what is done in the above example: after creating a container, it’s filled with different types of Shape*. Notice that the upcast happens as the Circle, Square or Rectangle pointer is added to the shapes container, which doesn’t know about those specific types but instead holds only Shape*. So as soon as the pointer is added to the container it loses its specific identity and becomes an anonymous Shape*. This is exactly what we want: toss them all in and let polymorphism sort it out.

The first for loop creates an iterator and sets it to the beginning of the sequence by calling the begin( ) member function for the container. All containers have begin( ) and end( ) member functions that produce an iterator selecting, respectively, the beginning of the sequence and one past the end of the sequence. To test to see if you’re done, you make sure you’re != to the iterator produced by end( ). Not < or <=. The only test that works is !=. So it’s very common to write a loop like:

for(Iter i = shapes.begin(); i != shapes.end(); i++)

This says: “take me through every element in the sequence.”

What do you do with the iterator to produce the element it’s selecting? You dereference it using (what else) the ‘*’ (which is actually an overloaded operator). What you get back is whatever the container is holding. This container holds Shape*, so that’s what *i produces. If you want to send a message to the Shape, you must select that message with ->, so you write the line:

(*i)->draw();

This calls the draw( ) function for the Shape* the iterator is currently selecting. The parentheses are ugly but necessary to produce the proper order of evaluation. As an alternative, operator-> is defined so that you can say:

i->draw();

As they are destroyed or in other cases where the pointers are removed, the STL containers do not call delete for the pointers they contain. If you create an object on the heap with new and place its pointer in a container, the container can’t tell if that pointer is also placed inside another container. So the STL just doesn’t do anything about it, and puts the responsibility squarely in your lap. The last lines in the program move through and delete every object in the container so proper cleanup occurs.

It’s very interesting to note that you can change the type of container that this program uses with two lines. Instead of including <vector>, you include <list>, and in the first typedef you say:

typedef std::list<Shape*> Container;

instead of using a vector. Everything else goes untouched. This is possible not because of an interface enforced by inheritance (there isn’t any inheritance in the STL, which comes as a surprise when you first see it), but because the interface is enforced by a convention adopted by the designers of the STL, precisely so you could perform this kind of interchange. Now you can easily switch between vector and list and see which one works fastest for your needs.

Containers of strings

In the prior example, at the end of main( ), it was necessary to move through the whole list and delete all the Shape pointers.

for(Iter j = shapes.begin();
      j != shapes.end(); j++)

delete *j;

This highlights what could be seen as a flaw in the STL: there’s no facility in any of the STL containers to automatically delete the pointers they contain, so you must do it by hand. It’s as if the assumption of the STL designers was that containers of pointers weren’t an interesting problem, although I assert that it is one of the more common things you’ll want to do.

Automatically deleting a pointer turns out to be a rather aggressive thing to do because of the multiple membership problem. If a container holds a pointer to an object, it’s not unlikely that pointer could also be in another container. A pointer to an Aluminum object in a list of Trash pointers could also reside in a list of Aluminum pointers. If that happens, which list is responsible for cleaning up that object – that is, which list “owns” the object?

This question is virtually eliminated if the object rather than a pointer resides in the list. Then it seems clear that when the list is destroyed, the objects it contains must also be destroyed. Here, the STL shines, as you can see when creating a container of string objects. The following example stores each incoming line as a string in a vector<string>:

//: C04:StringVector.cpp
// A vector of strings
#include "../require.h"
#include <string>
#include <vector>
#include <fstream>
#include <iostream>
#include <iterator>
#include <sstream>
using namespace std;

int main(int argc, char* argv[]) {
  requireArgs(argc, 1);
  ifstream in(argv[1]);
  assure(in, argv[1]);
  vector<string> strings;
  string line;
  while(getline(in, line))
    strings.push_back(line);
  // Do something to the strings...
  int i = 1;
  vector<string>::iterator w;
  for(w = strings.begin();
      w != strings.end(); w++) {
    ostringstream ss;
    ss << i++;
    *w = ss.str() + ": " + *w;
  }
  // Now send them out:
  copy(strings.begin(), strings.end(),
    ostream_iterator<string>(cout, "\n"));
  // Since they aren't pointers, string 
  // objects clean themselves up! 

} ///:~

Once the vector<string> called strings is created, each line in the file is read into a string and put in the vector:

  while(getline(in, line))

strings.push_back(line);

The operation that’s being performed on this file is to add line numbers. A stringstream provides easy conversion from an int to a string of characters representing that int.

Assembling string objects is quite easy, since operator+ is overloaded. Sensibly enough, the iterator w can be dereferenced to produce a string that can be used as both an rvalue and an lvalue:

*w = ss.str() + ": " + *w;

The fact that you can assign back into the container via the iterator may seem a bit surprising at first, but it’s a tribute to the careful design of the STL.

Because the vector<string> contains the objects themselves, a number of interesting things take place. First, no cleanup is necessary. Even if you were to put addresses of the string objects as pointers into other containers, it’s clear that strings is the “master list” and maintains ownership of the objects.

Second, you are effectively using dynamic object creation, and yet you never use new or delete! That’s because, somehow, it’s all taken care of for you by the vector (this is non-trivial. You can try to figure it out by looking at the header files for the STL – all the code is there – but it’s quite an exercise). Thus your coding is significantly cleaned up.

The limitation of holding objects instead of pointers inside containers is quite severe: you can’t upcast from derived types, thus you can’t use polymorphism. The problem with upcasting objects by value is that they get sliced and converted until their type is completely changed into the base type, and there’s no remnant of the derived type left. It’s pretty safe to say that you never want to do this.

Inheriting from STL containers

The power of instantly creating a sequence of elements is amazing, and it makes you realize how much time you’ve spent (or rather, wasted) in the past solving this particular problem. For example, many utility programs involve reading a file into memory, modifying the file and writing it back out to disk. One might as well take the functionality in StringVector.cpp and package it into a class for later reuse.

Now the question is: do you create a member object of type vector, or do you inherit? A general guideline is to always prefer composition (member objects) over inheritance, but with the STL this is often not true, because there are so many existing algorithms that work with the STL types that you may want your new type to be an STL type. So the list of strings should also be a vector, thus inheritance is desired.

//: C04:FileEditor.h
// File editor tool
#ifndef FILEEDITOR_H
#define FILEEDITOR_H
#include <string>
#include <vector>
#include <iostream>

class FileEditor : 
public std::vector<std::string> {
public:
  FileEditor(char* filename);
  void write(std::ostream& out = std::cout);
};

#endif // FILEEDITOR_H ///:~

Note the careful avoidance of a global using namespace std statement here, to prevent the opening of the std namespace to every file that includes this header.

The constructor opens the file and reads it into the FileEditor, and write( ) puts the vector of string onto any ostream. Notice in write( ) that you can have a default argument for a reference.

The implementation is quite simple:

//: C04:FileEditor.cpp {O}
#include "FileEditor.h"
#include "../require.h"
#include <fstream>
using namespace std;

FileEditor::FileEditor(char* filename) {
  ifstream in(filename);
  assure(in, filename);
  string line;
  while(getline(in, line))
    push_back(line);
}

// Could also use copy() here:
void FileEditor::write(ostream& out) {
  for(iterator w = begin();  w != end(); w++)
    out << *w << endl;

} ///:~

The functions from StringVector.cpp are simply repackaged. Often this is the way classes evolve – you start by creating a program to solve a particular application, then discover some commonly-used functionality within the program that can be turned into a class.

The line numbering program can now be rewritten using FileEditor:

//: C04:FEditTest.cpp
//{L} FileEditor
// Test the FileEditor tool
#include "FileEditor.h"
#include "../require.h"
#include <sstream>
using namespace std;

int main(int argc, char* argv[]) {
  requireArgs(argc, 1);
  FileEditor file(argv[1]);
  // Do something to the lines...
  int i = 1;
  FileEditor::iterator w = file.begin();
  while(w != file.end()) {
    ostringstream ss;
    ss << i++;
    *w = ss.str() + ": " + *w;
    w++;
  }
  // Now send them to cout:
  file.write();

} ///:~

Now the operation of reading the file is in the constructor:

FileEditor file(argv[1]);

and writing happens in the single line (which defaults to sending the output to cout):

file.write();

The bulk of the program is involved with actually modifying the file in memory.

A plethora of iterators

As mentioned earlier, the iterator is the abstraction that allows a piece of code to be generic, and to work with different types of containers without knowing the underlying structure of those containers. Every container produces iterators. You must always be able to say:

ContainerType::iterator

ContainerType::const_iterator

to produce the types of the iterators produced by that container. Every container has a begin( ) method that produces an iterator indicating the beginning of the elements in the container, and an end( ) method that produces an iterator which is the as the past-the-end value of the container. If the container is const¸ begin( ) and end( ) produce const iterators.

Every iterator can be moved forward to the next element using the operator++ (an iterator may be able to do more than this, as you shall see, but it must at least support forward movement with operator++).

The basic iterator is only guaranteed to be able to perform == and != comparisons. Thus, to move an iterator it forward without running it off the end you say something like:

while(it != pastEnd) {
  // Do something
  it++;

}

Where pastEnd is the past-the-end value produced by the container’s end( ) member function.

An iterator can be used to produce the element that it is currently selecting within a container by dereferencing the iterator. This can take two forms. If it is an iterator and f( ) is a member function of the objects held in the container that the iterator is pointing within, then you can say either:

(*it).f();

or

it->f();

Knowing this, you can create a template that works with any container. Here, the apply( ) function template calls a member function for every object in the container, using a pointer to member that is passed as an argument:

//: C04:Apply.cpp
// Using basic iterators
#include <iostream>
#include <vector>
#include <iterator>
using namespace std;

template<class Cont, class PtrMemFun>
void apply(Cont& c, PtrMemFun f) {
  typename Cont::iterator it = c.begin();
  while(it != c.end()) {
    (it->*f)(); // Compact form
    ((*it).*f)(); // Alternate form
    it++;
  }
}

class Z {
  int i;
public:
  Z(int ii) : i(ii) {}
  void g() { i++; }
  friend ostream& 
  operator<<(ostream& os, const Z& z) {
    return os << z.i;
  }
};

int main() {
  ostream_iterator<Z> out(cout, " ");
  vector<Z> vz;
  for(int i = 0; i < 10; i++)
    vz.push_back(Z(i));
  copy(vz.begin(), vz.end(), out);
  cout << endl;
  apply(vz, &Z::g);
  copy(vz.begin(), vz.end(), out);

} ///:~

Because operator-> is defined for STL iterators, it can be used for pointer-to-member dereferencing (in the following chapter you’ll learn a more elegant way to handle the problem of applying a member function or ordinary function to every object in a container).

Much of the time, this is all you need to know about iterators – that they are produced by begin( ) and end( ), and that you can use them to move through a container and select elements. Many of the problems that you solve, and the STL algorithms (covered in the next chapter) will allow you to just flail away with the basics of iterators. However, things can at times become more subtle, and in those cases you need to know more about iterators. The rest of this section gives you the details.

Iterators in reversible containers

All containers must produce the basic iterator. A container may also be reversible, which means that it can produce iterators that move backwards from the end, as well as the iterators that move forward from the beginning.

A reversible container has the methods rbegin( ) (to produce a reverse_iterator selecting the end) and rend( ) (to produce a reverse_iterator indicating “one past the beginning”). If the container is const then rbegin( ) and rend( ) will produce const_reverse_iterators.

All the basic sequence containers vector, deque and list are reversible containers. The following example uses vector, but will work with deque and list as well:

//: C04:Reversible.cpp
// Using reversible containers
#include "../require.h"
#include <vector>
#include <iostream>
#include <fstream>
#include <string>
using namespace std;

int main() {
  ifstream in("Reversible.cpp");
  assure(in, "Reversible.cpp");
  string line;
  vector<string> lines;
  while(getline(in, line))
    lines.push_back(line);
  vector<string>::reverse_iterator r;
  for(r = lines.rbegin(); r != lines.rend(); r++)
    cout << *r << endl;

} ///:~

You move backward through the container using the same syntax as moving forward through a container with an ordinary iterator.

The associative containers set, multiset, map and multimap are also reversible. Using iterators with associative containers is a bit different, however, and will be delayed until those containers are more fully introduced.

Iterator categories

The iterators are classified into different “categories” which describe what they are capable of doing. The order in which they are generally described moves from the categories with the most restricted behavior to those with the most powerful behavior.

Input: read-only, one pass

The only predefined implementations of input iterators are istream_iterator and istreambuf_iterator, to read from an istream. As you can imagine, an input iterator can only be dereferenced once for each element that’s selected, just as you can only read a particular portion of an input stream once. They can only move forward. There is a special constructor to define the past-the-end value. In summary, you can dereference it for reading (once only for each value), and move it forward.

Output: write-only, one pass

This is the complement of an input iterator, but for writing rather than reading. The only predefined implementations of output iterators are ostream_iterator and ostreambuf_iterator, to write to an ostream, and the less-commonly-used raw_storage_iterator. Again, these can only be dereferenced once for each written value, and they can only move forward. There is no concept of a terminal past-the-end value for an output iterator. Summarizing, you can dereference it for writing (once only for each value) and move it forward.

Forward: multiple read/write

The forward iterator contains all the functionality of both the input iterator and the output iterator, plus you can dereference an iterator location multiple times, so you can read and write to a value multiple times. As the name implies, you can only move forward. There are no predefined iterators that are only forward iterators.

Bidirectional: operator--

The bidirectional iterator has all the functionality of the forward iterator, and in addition it can be moved backwards one location at a time using operator--.

Random-access: like a pointer

Finally, the random-access iterator has all the functionality of the bidirectional iterator plus all the functionality of a pointer (a pointer is a random-access iterator). Basically, anything you can do with a pointer you can do with a random-access iterator, including indexing with operator[ ], adding integral values to a pointer to move it forward or backward by a number of locations, and comparing one iterator to another with <, >=, etc.

Is this really important?

Why do you care about this categorization? When you’re just using containers in a straightforward way (for example, just hand-coding all the operations you want to perform on the objects in the container) it usually doesn’t impact you too much. Things either work or they don’t. The iterator categories become important when:

  1. You use some of the fancier built-in iterator types that will be demonstrated shortly. Or you graduate to creating your own iterators (this will also be demonstrated, later in this chapter).
  2. You use the STL algorithms (the subject of the next chapter). Each of the algorithms have requirements that they place on the iterators that they work with. Knowledge of the iterator categories is even more important when you create your own reusable algorithm templates, because the iterator category that your algorithm requires determines how flexible the algorithm will be. If you only require the most primitive iterator category (input or output) then your algorithm will work with everything (copy( ) is an example of this).

Predefined iterators

The STL has a predefined set of iterator classes that can be quite handy. For example, you’ve already seen reverse_iterator (produced by calling rbegin( ) and rend( ) for all the basic containers).

The insertion iterators are necessary because some of the STL algorithms – copy( ) for example – use the assignment operator= in order to place objects in the destination container. This is a problem when you’re using the algorithm to fill the container rather than to overwrite items that are already in the destination container. That is, when the space isn’t already there. What the insert iterators do is change the implementation of the operator= so that instead of doing an assignment, it calls a “push” or “insert” function for that container, thus causing it to allocate new space. The constructors for both back_insert_iterator and front_insert_iterator take a basic sequence container object (vector, deque or list) as their argument and produce an iterator that calls push_back( ) or push_front( ), respectively, to perform assignment. The shorthand functions back_inserter( ) and front_inserter( ) produce the same objects with a little less typing. Since all the basic sequence containers support push_back( ), you will probably find yourself using back_inserter( ) with some regularity.

The insert_iterator allows you to insert elements in the middle of the sequence, again replacing the meaning of operator=, but this time with insert( ) instead of one of the “push” functions. The insert( ) member function requires an iterator indicating the place to insert before, so the insert_iterator requires this iterator in addition to the container object. The shorthand function inserter( ) produces the same object.

The following example shows the use of the different types of inserters:

//: C04:Inserters.cpp
// Different types of iterator inserters
#include <iostream>
#include <vector>
#include <deque>
#include <list>
#include <iterator>
using namespace std;

int a[] = { 1, 3, 5, 7, 11, 13, 17, 19, 23 };

template<class Cont>
void frontInsertion(Cont& ci) {
  copy(a, a + sizeof(a)/sizeof(int), 
    front_inserter(ci));
  copy(ci.begin(), ci.end(),
    ostream_iterator<int>(cout, " "));
  cout << endl;
}

template<class Cont>
void backInsertion(Cont& ci) {
  copy(a, a + sizeof(a)/sizeof(int), 
    back_inserter(ci));
  copy(ci.begin(), ci.end(),
    ostream_iterator<int>(cout, " "));
  cout << endl;
}

template<class Cont>
void midInsertion(Cont& ci) {
  typename Cont::iterator it = ci.begin();
  it++; it++; it++;
  copy(a, a + sizeof(a)/(sizeof(int) * 2),
    inserter(ci, it));
  copy(ci.begin(), ci.end(),
    ostream_iterator<int>(cout, " "));
  cout << endl;
}

int main() {
  deque<int> di;
  list<int>  li;
  vector<int> vi;
  // Can't use a front_inserter() with vector
  frontInsertion(di);
  frontInsertion(li);
  di.clear();
  li.clear();
  backInsertion(vi);
  backInsertion(di);
  backInsertion(li);
  midInsertion(vi);
  midInsertion(di);
  midInsertion(li);

} ///:~

Since vector does not support push_front( ), it cannot produce a front_insertion_iterator. However, you can see that vector does support the other two types of insertion (even though, as you shall see later, insert( ) is not a very efficient operation for vector).

IO stream iterators

You’ve already seen some use of the ostream_iterator (an output iterator) in conjunction with copy( ) to place the contents of a container on an output stream. There is a corresponding istream_iterator (an input iterator) which allows you to “iterate” a set of objects of a specified type from an input stream. An important difference between ostream_iterator and istream_iterator comes from the fact that an output stream doesn’t have any concept of an “end,” since you can always just keep writing more elements. However, an input stream eventually terminates (for example, when you reach the end of a file) so there needs to be a way to represent that. An istream_iterator has two constructors, one that takes an istream and produces the iterator you actually read from, and the other which is the default constructor and produces an object which is the past-the-end sentinel. In the following program this object is named end:

//: C04:StreamIt.cpp
// Iterators for istreams and ostreams
#include "../require.h"
#include <iostream>
#include <fstream>
#include <vector>
#include <string>
using namespace std;

int main() {
  ifstream in("StreamIt.cpp");
  assure(in, "StreamIt.cpp");
  istream_iterator<string> init(in), end;
  ostream_iterator<string> out(cout, "\n");
  vector<string> vs;
  copy(init, end, back_inserter(vs));
  copy(vs.begin(), vs.end(), out);
  *out++ = vs[0];
  *out++ = "That's all, folks!";

} ///:~

When in runs out of input (in this case when the end of the file is reached) then init becomes equivalent to end and the copy( ) terminates.

Because out is an ostream_iterator<string>, you can simply assign any string object to the dereferenced iterator using operator= and that string will be placed on the output stream, as seen in the two assignments to out. Because out is defined with a newline as its second argument, these assignments also cause a newline to be inserted along with each assignment.

While it is possible to create an istream_iterator<char> and ostream_iterator<char>, these actually parse the input and thus will for example automatically eat whitespace (spaces, tabs and newlines), which is not desirable if you want to manipulate an exact representation of an istream. Instead, you can use the special iterators istreambuf_iterator and ostreambuf_iterator, which are designed strictly to move characters[16]. Although these are templates, the only template arguments they will accept are either char or wchar_t (for wide characters). The following example allows you to compare the behavior of the stream iterators vs. the streambuf iterators:

//: C04:StreambufIterator.cpp
// istreambuf_iterator & ostreambuf_iterator
#include "../require.h"
#include <iostream>
#include <fstream>
#include <iterator>
#include <algorithm>
using namespace std;

int main() {
  ifstream in("StreambufIterator.cpp");
  assure(in, "StreambufIterator.cpp");
  // Exact representation of stream:
  istreambuf_iterator<char> isb(in), end;
  ostreambuf_iterator<char> osb(cout);
  while(isb != end)
    *osb++ = *isb++; // Copy 'in' to cout
  cout << endl;
  ifstream in2("StreambufIterator.cpp");
  // Strips white space:
  istream_iterator<char> is(in2), end2;
  ostream_iterator<char> os(cout);
  while(is != end2)
    *os++ = *is++;
  cout << endl;

} ///:~

The stream iterators use the parsing defined by istream::operator>>, which is probably not
what you want if you are parsing characters directly – it’s fairly rare that you would want all the whitespace stripped out of your character stream. You’ll virtually always want to use a streambuf iterator when using characters and streams, rather than a stream iterator. In addition, istream::operator>> adds significant overhead for each operation, so it is only appropriate for higher-level operations such as parsing floating-point numbers.[17]

Manipulating raw storage

This is a little more esoteric and is generally used in the implementation of other Standard Library functions, but it is nonetheless interesting. The raw_storage_iterator is defined in <algorithm> and is an output iterator. It is provided to enable algorithms to store their results into uninitialized memory. The interface is quite simple: the constructor takes an output iterator that is pointing to the raw memory (thus it is typically a pointer) and the operator= assigns an object into that raw memory. The template parameters are the type of the output iterator pointing to the raw storage, and the type of object that will be stored. Here’s an example which creates Noisy objects (you’ll be introduced to the Noisy class shortly; it’s not necessary to know its details for this example):

//: C04:RawStorageIterator.cpp
// Demonstrate the raw_storage_iterator
#include "Noisy.h"
#include <iostream>
#include <iterator>
#include <algorithm>
using namespace std;

int main() {
  const int quantity = 10;
  // Create raw storage and cast to desired type:
  Noisy* np = 
    (Noisy*)new char[quantity * sizeof(Noisy)];
  raw_storage_iterator<Noisy*, Noisy> rsi(np);
  for(int i = 0; i < quantity; i++)
    *rsi++ = Noisy(); // Place objects in storage
  cout << endl;
  copy(np, np + quantity,
    ostream_iterator<Noisy>(cout, " "));
  cout << endl;
  // Explicit destructor call for cleanup:
  for(int j = 0; j < quantity; j++)
    (&np[j])->~Noisy();
  // Release raw storage:
  delete (char*)np;

} ///:~

To make the raw_storage_iterator template happy, the raw storage must be of the same type as the objects you’re creating. That’s why the pointer from the new array of char is cast to a Noisy*. The assignment operator forces the objects into the raw storage using the copy-constructor. Note that the explicit destructor call must be made for proper cleanup, and this also allows the objects to be deleted one at a time during container manipulation.

Basic sequences:
vector, list & deque

If you take a step back from the STL containers you’ll see that there are really only two types of container: sequences (including vector, list, deque, stack, queue, and priority_queue) and associations (including set, multiset, map and multimap). The sequences keep the objects in whatever sequence that you establish (either by pushing the objects on the end or inserting them in the middle).

Since all the sequence containers have the same basic goal (to maintain your order) they seem relatively interchangeable. However, they differ in the efficiency of their operations, so if you are going to manipulate a sequence in a particular fashion you can choose the appropriate container for those types of manipulations. The “basic” sequence containers are vector, list and deque – these actually have fleshed-out implementations, while stack, queue and priority_queue are built on top of the basic sequences, and represent more specialized uses rather than differences in underlying structure (stack, for example, can be implemented using a deque, vector or list).

So far in this book I have been using vector as a catch-all container. This was acceptable because I’ve only used the simplest and safest operations, primarily push_back( ) and operator[ ]. However, when you start making more sophisticated uses of containers it becomes important to know more about their underlying implementations and behavior, so you can make the right choices (and, as you’ll see, stay out of trouble).

Basic sequence operations

Using a template, the following example shows the operations that all the basic sequences (vector, deque or list) support. As you shall learn in the sections on the specific sequence containers, not all of these operations make sense for each basic sequence, but they are supported.

//: C04:BasicSequenceOperations.cpp
// The operations available for all the 
// basic sequence Containers.
#include <iostream>
#include <vector>
#include <deque>
#include <list>
using namespace std;

template<typename Container>
void print(Container& c, char* s = "") {
  cout << s << ":" << endl;
  if(c.empty()) {
    cout << "(empty)" << endl;
    return;
  }
  typename Container::iterator it;
  for(it = c.begin(); it != c.end(); it++)
    cout << *it << " ";
  cout << endl;
  cout << "size() " << c.size() 
    << " max_size() "<< c.max_size() 
    << " front() " << c.front()
    << " back() " << c.back() << endl;
}
  
template<typename ContainerOfInt>
void basicOps(char* s) {
  cout << "------- " << s << " -------" << endl;
  typedef ContainerOfInt Ci;
  Ci c;
  print(c, "c after default constructor");
  Ci c2(10, 1); // 10 elements, values all 1
  print(c2, "c2 after constructor(10,1)");
  int ia[] = { 1, 3, 5, 7, 9 };
  const int iasz = sizeof(ia)/sizeof(*ia);
  // Initialize with begin & end iterators:
  Ci c3(ia, ia + iasz);
  print(c3, "c3 after constructor(iter,iter)");
  Ci c4(c2); // Copy-constructor
  print(c4, "c4 after copy-constructor(c2)");
  c = c2; // Assignment operator
  print(c, "c after operator=c2");
  c.assign(10, 2); // 10 elements, values all 2
  print(c, "c after assign(10, 2)");
  // Assign with begin & end iterators:
  c.assign(ia, ia + iasz);
  print(c, "c after assign(iter, iter)");
  cout << "c using reverse iterators:" << endl;
  typename Ci::reverse_iterator rit = c.rbegin();
  while(rit != c.rend())
    cout << *rit++ << " ";
  cout << endl;
  c.resize(4);
  print(c, "c after resize(4)");
  c.push_back(47);
  print(c, "c after push_back(47)");
  c.pop_back();
  print(c, "c after pop_back()");
  typename Ci::iterator it = c.begin();
  it++; it++;
  c.insert(it, 74);
  print(c, "c after insert(it, 74)");
  it = c.begin();
  it++;
  c.insert(it, 3, 96);
  print(c, "c after insert(it, 3, 96)");
  it = c.begin();
  it++;
  c.insert(it, c3.begin(), c3.end());
  print(c, "c after insert("
    "it, c3.begin(), c3.end())");
  it = c.begin();
  it++;
  c.erase(it);
  print(c, "c after erase(it)");
  typename Ci::iterator it2 = it = c.begin();
  it++;
  it2++; it2++; it2++; it2++; it2++;
  c.erase(it, it2);
  print(c, "c after erase(it, it2)");
  c.swap(c2);
  print(c, "c after swap(c2)");
  c.clear();
  print(c, "c after clear()");
}

int main() {
  basicOps<vector<int> >("vector");
  basicOps<deque<int> >("deque");
  basicOps<list<int> >("list");

} ///:~

The first function template, print( ), demonstrates the basic information you can get from any sequence container: whether it’s empty, its current size, the size of the largest possible container, the element at the beginning and the element at the end. You can also see that every container has begin( ) and end( ) methods that return iterators.

The basicOps( ) function tests everything else (and in turn calls print( )), including a variety of constructors: default, copy-constructor, quantity and initial value, and beginning and ending iterators. There’s an assignment operator= and two kinds of assign( ) member functions, one which takes a quantity and initial value and the other which take a beginning and ending iterator.

All the basic sequence containers are reversible containers, as shown by the use of the rbegin( ) and rend( ) member functions. A sequence container can be resized, and the entire contents of the container can be removed with clear( ).

Using an iterator to indicate where you want to start inserting into any sequence container, you can insert( ) a single element, a number of elements that all have the same value, and a group of elements from another container using the beginning and ending iterators of that group.

To erase( ) a single element from the middle, use an iterator; to erase( ) a range of elements, use a pair of iterators. Notice that since a list only supports bidirectional iterators, all the iterator motion must be performed with increments and decrements (if the containers were limited to vector and deque, which produce random-access iterators, then operator+ and operator- could have been used to move the iterators in big jumps).

Although both list and deque support push_front( ) and pop_front( ), vector does not, so the only member functions that work with all three are push_back( ) and pop_back( ).

The naming of the member function swap( ) is a little confusing, since there’s also a non-member swap( ) algorithm that switches two elements of a container. The member swap( ), however, swaps everything in one container for another (if the containers hold the same type), effectively swapping the containers themselves. There’s also a non-member version of this function.

The following sections on the sequence containers discuss the particulars of each type of container.

vector

The vector is intentionally made to look like a souped-up array, since it has array-style indexing but also can expand dynamically. vector is so fundamentally useful that it was introduced in a very primitive way early in this book, and used quite regularly in previous examples. This section will give a more in-depth look at vector.

To achieve maximally-fast indexing and iteration, the vector maintains its storage as a single contiguous array of objects. This is a critical point to observe in understanding the behavior of vector. It means that indexing and iteration are lighting-fast, being basically the same as indexing and iterating over an array of objects. But it also means that inserting an object anywhere but at the end (that is, appending) is not really an acceptable operation for a vector. It also means that when a vector runs out of pre-allocated storage, in order to maintain its contiguous array it must allocate a whole new (larger) chunk of storage elsewhere and copy the objects to the new storage. This has a number of unpleasant side effects.

Cost of overflowing allocated storage

A vector starts by grabbing a block of storage, as if it’s taking a guess at how many objects you plan to put in it. As long as you don’t try to put in more objects than can be held in the initial block of storage, everything is very rapid and efficient (note that if you do know how many objects to expect, you can pre-allocate storage using reserve( )). But eventually you will put in one too many objects and, unbeknownst to you, the vector responds by:

  1. Allocating a new, bigger piece of storage
  2. Copying all the objects from the old storage to the new (using the copy-constructor)
  3. Destroying all the old objects (the destructor is called for each one)
  4. Releasing the old memory

For complex objects, this copy-construction and destruction can end up being very expensive if you overfill your vector a lot. To see what happens when you’re filling a vector, here is a class that prints out information about its creations, destructions, assignments and copy-constructions:

//: C04:Noisy.h
// A class to track various object activities
#ifndef NOISY_H
#define NOISY_H
#include <iostream>

class Noisy {
  static long create, assign, copycons, destroy;
  long id;
public:
  Noisy() : id(create++) { 
    std::cout << "d[" << id << "]"; 
  }
  Noisy(const Noisy& rv) : id(rv.id) {
    std::cout << "c[" << id << "]";
    copycons++;
  }
  Noisy& operator=(const Noisy& rv) {
    std::cout << "(" << id << ")=[" <<
      rv.id << "]";
    id = rv.id;
    assign++;
    return *this;
  }
  friend bool 
  operator<(const Noisy& lv, const Noisy& rv) {
    return lv.id < rv.id;
  }
  friend bool 
  operator==(const Noisy& lv, const Noisy& rv) {
    return lv.id == rv.id;
  }
  ~Noisy() {
    std::cout << "~[" << id << "]";
    destroy++;
  }
  friend std::ostream& 
  operator<<(std::ostream& os, const Noisy& n) {
    return os << n.id;
  }
  friend class NoisyReport;
};

struct NoisyGen {
  Noisy operator()() { return Noisy(); }
};

// A singleton. Will automatically report the
// statistics as the program terminates:
class NoisyReport {
  static NoisyReport nr;
  NoisyReport() {} // Private constructor
public:
  ~NoisyReport() {
    std::cout << "\n-------------------\n"
      << "Noisy creations: " << Noisy::create
      << "\nCopy-Constructions: " 
      << Noisy::copycons
      << "\nAssignments: " << Noisy::assign
      << "\nDestructions: " << Noisy::destroy
      << std::endl;
  }
};

// Because of these this file can only be used
// in simple test situations. Move them to a 
// .cpp file for more complex programs:
long Noisy::create = 0, Noisy::assign = 0,
  Noisy::copycons = 0, Noisy::destroy = 0;
NoisyReport NoisyReport::nr;

#endif // NOISY_H ///:~

Each Noisy object has its own identifier, and there are static variables to keep track of all the creations, assignments (using operator=), copy-constructions and destructions. The id is initialized using the create counter inside the default constructor; the copy-constructor and assignment operator take their id values from the rvalue. Of course, with operator= the lvalue is already an initialized object so the old value of id is printed before it is overwritten with the id from the rvalue.

In order to support certain operations like sorting and searching (which are used implicitly by some of the containers), Noisy must have an operator< and operator==. These simply compare the id values. The operator<< for ostream follows the standard form and simply prints the id.

NoisyGen produces a function object (since it has an operator( )) that is used to automatically generate Noisy objects during testing.

NoisyReport is a type of class called a singleton, which is a “design pattern” (these are covered more fully in Chapter XX). Here, the goal is to make sure there is one and only one NoisyReport object, because it is responsible for printing out the results at program termination. It has a private constructor so no one else can make a NoisyReport object, and a single static instance of NoisyReport called nr. The only executable statements are in the destructor, which is called as the program exits and the static destructors are called; this destructor prints out the statistics captured by the static variables in Noisy.

The one snag to this header file is the inclusion of the definitions for the statics at the end. If you include this header in more than one place in your project, you’ll get multiple-definition errors at link time. Of course, you can put the static definitions in a separate cpp file and link it in, but that is less convenient, and since Noisy is just intended for quick-and-dirty experiments the header file should be reasonable for most situations.

Using Noisy.h, the following program will show the behaviors that occur when a vector overflows its currently allocated storage:

//: C04:VectorOverflow.cpp
// Shows the copy-construction and destruction
// That occurs when a vector must reallocate
// (It maintains a linear array of elements)
#include "Noisy.h"
#include "../require.h"
#include <vector>
#include <iostream>
#include <string>
#include <cstdlib>
using namespace std;

int main(int argc, char* argv[]) {
  requireArgs(argc, 1);
  int size = 1000;
  if(argc >= 2) size = atoi(argv[1]);
  vector<Noisy> vn;
  Noisy n;
  for(int i = 0; i < size; i++)
    vn.push_back(n);
  cout << "\n cleaning up \n";

} ///:~

You can either use the default value of 1000, or use your own value by putting it on the command-line.

When you run this program, you’ll see a single default constructor call (for n), then a lot of copy-constructor calls, then some destructor calls, then some more copy-constructor calls, and so on. When the vector runs out of space in the linear array of bytes it has allocated, it must (to maintain all the objects in a linear array, which is an essential part of its job) get a bigger piece of storage and move everything over, copying first and then destroying the old objects. You can imagine that if you store a lot of large and complex objects, this process could rapidly become prohibitive.

There are two solutions to this problem. The nicest one requires that you know beforehand how many objects you’re going to make. In that case you can use reserve( ) to tell the vector how much storage to pre-allocate, thus eliminating all the copies and destructions and making everything very fast (especially random access to the objects with operator[ ]). Note that the use of reserve( ) is different from using the vector constructor with an integral first argument; the latter initializes each element using the default copy-constructor.

However, in the more general case you won’t know how many objects you’ll need. If vector reallocations are slowing things down, you can change sequence containers. You could use a list, but as you’ll see, the deque allows speedy insertions at either end of the sequence, and never needs to copy or destroy objects as it expands its storage. The deque also allows random access with operator[ ], but it’s not quite as fast as vector’s operator[ ]. So in the case where you’re creating all your objects in one part of the program and randomly accessing them in another, you may find yourself filling a deque, then creating a vector from the deque and using the vector for rapid indexing. Of course, you don’t want to program this way habitually, just be aware of these issues (avoid premature optimization).

There is a darker side to vector’s reallocation of memory, however. Because vector keeps its objects in a nice, neat array (allowing, for one thing, maximally-fast random access), the iterators used by vector are generally just pointers. This is a good thing – of all the sequence containers, these pointers allow the fastest selection and manipulation. However, consider what happens when you’re holding onto an iterator (i.e. a pointer) and then you add the one additional object that causes the vector to reallocate storage and move it elsewhere. Your pointer is now pointing off into nowhere:

//: C04:VectorCoreDump.cpp
// How to break a program using a vector
#include <vector>
#include <iostream>
using namespace std;

int main() {
  vector<int> vi(10, 0);
  ostream_iterator<int> out(cout, " ");
  copy(vi.begin(), vi.end(), out);
  vector<int>::iterator i = vi.begin();
  cout << "\n i: " << long(i) << endl;
  *i = 47;
  copy(vi.begin(), vi.end(), out);
  // Force it to move memory (could also just add
  // enough objects):
  vi.resize(vi.capacity() + 1);
  // Now i points to wrong memory:
  cout << "\n i: " << long(i) << endl;
  cout << "vi.begin(): " << long(vi.begin());
  *i = 48;  // Access violation

} ///:~

If your program is breaking mysteriously, look for places where you hold onto an iterator while adding more objects to a vector. You’ll need to get a new iterator after adding elements, or use operator[ ] instead for element selections. If you combine the above observation with the awareness of the potential expense of adding new objects to a vector, you may conclude that the safest way to use one is to fill it up all at once (ideally, knowing first how many objects you’ll need) and then just use it (without adding more objects) elsewhere in the program. This is the way vector has been used in the book up to this point.

You may observe that using vector as the “basic” container in the earlier chapters of this book may not be the best choice in all cases. This is a fundamental issue in containers, and in data structures in general: the “best” choice varies according to the way the container is used. The reason vector has been the “best” choice up until now is that it looks a lot like an array, and was thus familiar and easy for you to adopt. But from now on it’s also worth thinking about other issues when choosing containers.

Inserting and erasing elements

The vector is most efficient if:

  1. You reserve( ) the correct amount of storage at the beginning so the vector never has to reallocate.
  2. You only add and remove elements from the back end.

It is possible to insert and erase elements from the middle of a vector using an iterator, but the following program demonstrates what a bad idea it is:

//: C04:VectorInsertAndErase.cpp
// Erasing an element from a vector
#include "Noisy.h"
#include <iostream>
#include <vector>
#include <algorithm>
using namespace std;

int main() {
  vector<Noisy> v;
  v.reserve(11);
  cout << "11 spaces have been reserved" << endl;
  generate_n(back_inserter(v), 10, NoisyGen());
  ostream_iterator<Noisy> out(cout, " ");
  cout << endl;
  copy(v.begin(), v.end(), out);
  cout << "Inserting an element:" << endl;
  vector<Noisy>::iterator it = 
    v.begin() + v.size() / 2; // Middle
  v.insert(it, Noisy());
  cout << endl;
  copy(v.begin(), v.end(), out);
  cout << "\nErasing an element:" << endl;
  // Cannot use the previous value of it:
  it = v.begin() + v.size() / 2;
  v.erase(it);
  cout << endl;
  copy(v.begin(), v.end(), out);
  cout << endl;

} ///:~

When you run the program you’ll see that the call to reserve( ) really does only allocate storage – no constructors are called. The generate_n( ) call is pretty busy: each call to NoisyGen::operator( ) results in a construction, a copy-construction (into the vector) and a destruction of the temporary. But when an object is inserted into the vector in the middle, it must shove everything down to maintain the linear array and – since there is enough space – it does this with the assignment operator (if the argument of reserve( ) is 10 instead of eleven then it would have to reallocate storage). When an object is erased from the vector, the assignment operator is once again used to move everything up to cover the place that is being erased (notice that this requires that the assignment operator properly cleans up the lvalue). Lastly, the object on the end of the array is deleted.

You can imagine how enormous the overhead can become if objects are inserted and removed from the middle of a vector if the number of elements is large and the objects are complicated. It’s obviously a practice to avoid.

deque

The deque (double-ended-queue, pronounced “deck”) is the basic sequence container optimized for adding and removing elements from either end. It also allows for reasonably fast random access – it has an operator[ ] like vector. However, it does not have vector’s constraint of keeping everything in a single sequential block of memory. Instead, deque uses multiple blocks of sequential storage (keeping track of all the blocks and their order in a mapping structure). For this reason the overhead for a deque to add or remove elements at either end is very low. In addition, it never needs to copy and destroy contained objects during a new storage allocation (like vector does) so it is far more efficient than vector if you are adding an unknown quantity of objects. This means that vector is the best choice only if you have a pretty good idea of how many objects you need. In addition, many of the programs shown earlier in this book that use vector and push_back( ) might be more efficient with a deque. The interface to deque is only slightly different from a vector (deque has a push_front( ) and pop_front( ) while vector does not, for example) so converting code from using vector to using deque is almost trivial. Consider StringVector.cpp, which can be changed to use deque by replacing the word “vector” with “deque” everywhere. The following program adds parallel deque operations to the vector operations in StringVector.cpp, and performs timing comparisons:

//: C04:StringDeque.cpp
// Converted from StringVector.cpp
#include "../require.h"
#include <string>
#include <deque>
#include <vector>
#include <fstream>
#include <iostream>
#include <iterator>
#include <sstream>
#include <ctime>
using namespace std;

int main(int argc, char* argv[]) {
  requireArgs(argc, 1);
  ifstream in(argv[1]);
  assure(in, argv[1]);
  vector<string> vstrings;
  deque<string> dstrings;
  string line;
  // Time reading into vector:
  clock_t ticks = clock();
  while(getline(in, line))
    vstrings.push_back(line);
  ticks = clock() - ticks;
  cout << "Read into vector: " << ticks << endl;
  // Repeat for deque:
  ifstream in2(argv[1]);
  assure(in2, argv[1]);
  ticks = clock();
  while(getline(in2, line))
    dstrings.push_back(line);
  ticks = clock() - ticks;
  cout << "Read into deque: " << ticks << endl;
  // Now compare indexing:
  ticks = clock();
  for(int i = 0; i < vstrings.size(); i++) {
    ostringstream ss;
    ss << i;
    vstrings[i] = ss.str() + ": " + vstrings[i];
  }
  ticks = clock() - ticks;
  cout << "Indexing vector: " << ticks << endl;
  ticks = clock();
  for(int j = 0; j < dstrings.size(); j++) {
    ostringstream ss;
    ss << j;
    dstrings[j] = ss.str() + ": " + dstrings[j];
  }
  ticks = clock() - ticks;
  cout << "Indexing deqeue: " << ticks << endl;
  // Compare iteration
  ofstream tmp1("tmp1.tmp"), tmp2("tmp2.tmp");
  ticks = clock();
  copy(vstrings.begin(), vstrings.end(),
    ostream_iterator<string>(tmp1, "\n"));
  ticks = clock() - ticks;
  cout << "Iterating vector: " << ticks << endl;
  ticks = clock();
  copy(dstrings.begin(), dstrings.end(),
    ostream_iterator<string>(tmp2, "\n"));
  ticks = clock() - ticks;
  cout << "Iterating deqeue: " << ticks << endl;

} ///:~

Knowing now what you do about the inefficiency of adding things to vector because of storage reallocation, you may expect dramatic differences between the two. However, on a 1.7 Megabyte text file one compiler’s program produced the following (measured in platform/compiler specific clock ticks, not seconds):

Read into vector: 8350
Read into deque: 7690
Indexing vector: 2360
Indexing deqeue: 2480
Iterating vector: 2470

Iterating deqeue: 2410

A different compiler and platform roughly agreed with this. It’s not so dramatic, is it? This points out some important issues:

  1. We (programmers) are typically very bad at guessing where inefficiencies occur in our programs.
  2. Efficiency comes from a combination of effects – here, reading the lines in and converting them to strings may dominate over the cost of the vector vs. deque.
  3. The string class is probably fairly well-designed in terms of efficiency.

Of course, this doesn’t mean you shouldn’t use a deque rather than a vector when you know that an uncertain number of objects will be pushed onto the end of the container. On the contrary, you should – when you’re tuning for performance. But you should also be aware that performance issues are usually not where you think they are, and the only way to know for sure where your bottlenecks are is by testing. Later in this chapter there will be a more “pure” comparison of performance between vector, deque and list.

Converting between sequences

Sometimes you need the behavior or efficiency of one kind of container for one part of your program, and a different container’s behavior or efficiency in another part of the program. For example, you may need the efficiency of a deque when adding objects to the container but the efficiency of a vector when indexing them. Each of the basic sequence containers (vector, deque and list) has a two-iterator constructor (indicating the beginning and ending of the sequence to read from when creating a new object) and an assign( ) member function to read into an existing container, so you can easily move objects from one sequence container to another.

The following example reads objects into a deque and then converts to a vector:

//: C04:DequeConversion.cpp
// Reading into a Deque, converting to a vector
#include "Noisy.h"
#include <deque>
#include <vector>
#include <iostream>
#include <algorithm>
#include <cstdlib>
using namespace std;

int main(int argc, char* argv[]) {
  int size = 25;
  if(argc >= 2) size = atoi(argv[1]);
  deque<Noisy> d;
  generate_n(back_inserter(d), size, NoisyGen());
  cout << "\n Converting to a vector(1)" << endl;
  vector<Noisy> v1(d.begin(), d.end());
  cout << "\n Converting to a vector(2)" << endl;
  vector<Noisy> v2;
  v2.reserve(d.size());
  v2.assign(d.begin(), d.end());
  cout << "\n Cleanup" << endl;

} ///:~

You can try various sizes, but you should see that it makes no difference – the objects are simply copy-constructed into the new vectors. What’s interesting is that v1 does not cause multiple allocations while building the vector, no matter how many elements you use. You might initially think that you must follow the process used for v2 and preallocate the storage to prevent messy reallocations, but the constructor used for v1 determines the memory need ahead of time so this is unnecessary.

Cost of overflowing allocated storage

It’s illuminating to see what happens with a deque when it overflows a block of storage, in contrast with VectorOverflow.cpp:

//: C04:DequeOverflow.cpp
// A deque is much more efficient than a vector
// when pushing back a lot of elements, since it
// doesn't require copying and destroying.
#include "Noisy.h"
#include <deque>
#include <cstdlib>
using namespace std;

int main(int argc, char* argv[]) {
  int size = 1000;
  if(argc >= 2) size = atoi(argv[1]);
  deque<Noisy> dn;
  Noisy n;
  for(int i = 0; i < size; i++)
    dn.push_back(n);
  cout << "\n cleaning up \n";

} ///:~

Here you will never see any destructors before the words “cleaning up” appear. Since the deque allocates all its storage in blocks instead of a contiguous array like vector, it never needs to move existing storage (thus no additional copy-constructions and destructions occur). It simply allocates a new block. For the same reason, the deque can just as efficiently add elements to the beginning of the sequence, since if it runs out of storage it (again) just allocates a new block for the beginning. Insertions in the middle of a deque, however, could be even messier than for vector (but not as costly).

Because a deque never moves its storage, a held iterator never becomes invalid when you add new things to either end of a deque, as it was demonstrated to do with vector (in VectorCoreDump.cpp). However, it’s still possible (albeit harder) to do bad things:

//: C04:DequeCoreDump.cpp
// How to break a program using a deque
#include <queue>
#include <iostream>
using namespace std;

int main() {
  deque<int> di(100, 0);
  // No problem iterating from beginning to end,
  // even though it spans multiple blocks:
  copy(di.begin(), di.end(), 
    ostream_iterator<int>(cout, " "));
  deque<int>::iterator i = // In the middle:
    di.begin() + di.size() / 2;;
  // Walk the iterator forward as you perform 
  // a lot of insertions in the middle:
  for(int j = 0; j < 1000; j++) {
    cout << j << endl;
    di.insert(i++, 1); // Eventually breaks
  }

} ///:~

Of course, there are two things here that you wouldn’t normally do with a deque: first, elements are inserted in the middle, which deque allows but isn’t designed for. Second, calling insert( ) repeatedly with the same iterator would not ordinarily cause an access violation, but the iterator is walked forward after each insertion. I’m guessing it eventually walks off the end of a block, but I’m not sure what actually causes the problem.

If you stick to what deque is best at – insertions and removals from either end, reasonably rapid traversals and fairly fast random-access using operator[ ] – you’ll be in good shape.

Checked random-access

Both vector and deque provide two ways to perform random access of their elements: the operator[ ], which you’ve seen already, and at( ), which checks the boundaries of the container that’s being indexed and throws an exception if you go out of bounds. It does cost more to use at( ):

//: C04:IndexingVsAt.cpp
// Comparing "at()" to operator[]
#include "../require.h"
#include <vector>
#include <deque>
#include <iostream>
#include <ctime>
using namespace std;

int main(int argc, char* argv[]) {
  requireMinArgs(argc, 1);
  long count = 1000;
  int sz = 1000;
  if(argc >= 2) count = atoi(argv[1]);
  if(argc >= 3) sz = atoi(argv[2]);
  vector<int> vi(sz);
  clock_t ticks = clock();
  for(int i1 = 0; i1 < count; i1++)
    for(int j = 0; j < sz; j++)
      vi[j];
  cout << "vector[]" << clock() - ticks << endl;
  ticks = clock();
  for(int i2 = 0; i2 < count; i2++)
    for(int j = 0; j < sz; j++)
      vi.at(j);
  cout << "vector::at()" << clock()-ticks <<endl;
  deque<int> di(sz);
  ticks = clock();
  for(int i3 = 0; i3 < count; i3++)
    for(int j = 0; j < sz; j++)
      di[j];
  cout << "deque[]" << clock() - ticks << endl;
  ticks = clock();
  for(int i4 = 0; i4 < count; i4++)
    for(int j = 0; j < sz; j++)
      di.at(j);
  cout << "deque::at()" << clock()-ticks <<endl;
  // Demonstrate at() when you go out of bounds:
  di.at(vi.size() + 1);

} ///:~

As you’ll learn in the exception-handling chapter, different systems may handle the uncaught exception in different ways, but you’ll know one way or another that something went wrong with the program when using at( ), whereas it’s possible to go blundering ahead using operator[ ].

list

A list is implemented as a doubly-linked list and is thus designed for rapid insertion and removal of elements in the middle of the sequence (whereas for vector and deque this is a much more costly operation). A list is so slow when randomly accessing elements that it does not have an operator[ ]. It’s best used when you’re traversing a sequence, in order, from beginning to end (or end to beginning) rather than choosing elements randomly from the middle. Even then the traversal is significantly slower than either a vector or a deque, but if you aren’t doing a lot of traversals that won’t be your bottleneck.

Another thing to be aware of with a list is the memory overhead of each link, which requires a forward and backward pointer on top of the storage for the actual object. Thus a list is a better choice when you have larger objects that you’ll be inserting and removing from the middle of the list. It’s better not to use a list if you think you might be traversing it a lot, looking for objects, since the amount of time it takes to get from the beginning of the list – which is the only place you can start unless you’ve already got an iterator to somewhere you know is closer to your destination – to the object of interest is proportional to the number of objects between the beginning and that object.

The objects in a list never move after they are created; “moving” a list element means changing the links, but never copying or assigning the actual objects. This means that a held iterator never moves when you add new things to a list as it was demonstrated to do in vector. Here’s an example using the Noisy class:

//: C04:ListStability.cpp
// Things don't move around in lists
#include "Noisy.h"
#include <list>
#include <iostream>
#include <algorithm>
using namespace std;

int main() {
  list<Noisy> l;
  ostream_iterator<Noisy> out(cout, " ");
  generate_n(back_inserter(l), 25, NoisyGen());
  cout << "\n Printing the list:" << endl;
  copy(l.begin(), l.end(), out);
  cout << "\n Reversing the list:" << endl;
  l.reverse();
  copy(l.begin(), l.end(), out);
  cout << "\n Sorting the list:" << endl;
  l.sort();
  copy(l.begin(), l.end(), out);
  cout << "\n Swapping two elements:" << endl;
  list<Noisy>::iterator it1, it2;
  it1 = it2 = l.begin();
  it2++;
  swap(*it1, *it2);
  cout << endl;
  copy(l.begin(), l.end(), out);
  cout << "\n Using generic reverse(): " << endl;
  reverse(l.begin(), l.end());
  cout << endl;
  copy(l.begin(), l.end(), out);
  cout << "\n Cleanup" << endl;

} ///:~

Operations as seemingly radical as reversing and sorting the list require no copying of objects, because instead of moving the objects, the links are simply changed. However, notice that sort( ) and reverse( ) are member functions of list, so they have special knowledge of the internals of list and can perform the pointer movement instead of copying. On the other hand, the swap( ) function is a generic algorithm, and doesn’t know about list in particular and so it uses the copying approach for swapping two elements. There are also generic algorithms for sort( ) and reverse( ), but if you try to use these you’ll discover that the generic reverse( ) performs lots of copying and destruction (so you should never use it with a list) and the generic sort( ) simply doesn’t work because it requires random-access iterators that list doesn’t provide (a definite benefit, since this would certainly be an expensive way to sort compared to list’s own sort( )). The generic sort( ) and reverse( ) should only be used with arrays, vectors and deques.

If you have large and complex objects you may want to choose a list first, especially if construction, destruction, copy-construction and assignment are expensive and if you are doing things like sorting the objects or otherwise reordering them a lot.

Special list operations

The list has some special operations that are built-in to make the best use of the structure of the list. You’ve already seen reverse( ) and sort( ), and here are some of the others in use:

//: C04:ListSpecialFunctions.cpp
#include "Noisy.h"
#include <list>
#include <iostream>
#include <algorithm>
using namespace std;
ostream_iterator<Noisy> out(cout, " ");

void print(list<Noisy>& ln, char* comment = "") {
  cout << "\n" << comment << ":\n";
  copy(ln.begin(), ln.end(), out);
  cout << endl;
}

int main() {
  typedef list<Noisy> LN;
  LN l1, l2, l3, l4;
  generate_n(back_inserter(l1), 6, NoisyGen());
  generate_n(back_inserter(l2), 6, NoisyGen());
  generate_n(back_inserter(l3), 6, NoisyGen());
  generate_n(back_inserter(l4), 6, NoisyGen());
  print(l1, "l1"); print(l2, "l2");
  print(l3, "l3"); print(l4, "l4");
  LN::iterator it1 = l1.begin();
  it1++; it1++; it1++;
  l1.splice(it1, l2);
  print(l1, "l1 after splice(it1, l2)");
  print(l2, "l2 after splice(it1, l2)");
  LN::iterator it2 = l3.begin();
  it2++; it2++; it2++;
  l1.splice(it1, l3, it2);
  print(l1, "l1 after splice(it1, l3, it2)");
  LN::iterator it3 = l4.begin(), it4 = l4.end();
  it3++; it4--;
  l1.splice(it1, l4, it3, it4);
  print(l1, "l1 after splice(it1,l4,it3,it4)");
  Noisy n;
  LN l5(3, n);
  generate_n(back_inserter(l5), 4, NoisyGen());
  l5.push_back(n);
  print(l5, "l5 before remove()");
  l5.remove(l5.front());
  print(l5, "l5 after remove()");
  l1.sort(); l5.sort();
  l5.merge(l1);
  print(l5, "l5 after l5.merge(l1)");
  cout << "\n Cleanup" << endl;

} ///:~

The print( ) function is used to display results. After filling four lists with Noisy objects, one list is spliced into another in three different ways. In the first, the entire list l2 is spliced into l1 at the iterator it1. Notice that after the splice, l2 is empty – splicing means removing the elements from the source list. The second splice inserts elements from l3 starting at it2 into l1 starting at it1. The third splice starts at it1 and uses elements from l4 starting at it3 and ending at it4 (the seemingly-redundant mention of the source list is because the elements must be erased from the source list as part of the transfer to the destination list).

The output from the code that demonstrates remove( ) shows that the list does not have to be sorted in order for all the elements of a particular value to be removed.

Finally, if you merge( ) one list with another, the merge only works sensibly if the lists have been sorted. What you end up with in that case is a sorted list containing all the elements from both lists (the source list is erased – that is, the elements are moved to the destination list).

There’s also a unique( ) member function that removes all duplicates, but only if the list has been sorted first:

//: C04:UniqueList.cpp
// Testing list's unique() function
#include <list>
#include <iostream>
using namespace std;

int a[] = { 1, 3, 1, 4, 1, 5, 1, 6, 1 };
const int asz = sizeof a / sizeof *a;

int main() {
  // For output:
  ostream_iterator<int> out(cout, " ");
  list<int> li(a, a + asz);
  li.unique();
  // Oops! No duplicates removed:
  copy(li.begin(), li.end(), out);
  cout << endl;
  // Must sort it first:
  li.sort();
  copy(li.begin(), li.end(), out);
  cout << endl;
  // Now unique() will have an effect:
  li.unique();
  copy(li.begin(), li.end(), out);
  cout << endl;

} ///:~

The list constructor used here takes the starting and past-the-end iterator from another container, and it copies all the elements from that container into itself (a similar constructor is available for all the containers). Here, the “container” is just an array, and the “iterators” are pointers into that array, but because of the design of the STL it works with arrays just as easily as any other container.

If you run this program, you’ll see that unique( ) will only remove adjacent duplicate elements, and thus sorting is necessary before calling unique( ).

There are four additional list member functions that are not demonstrated here: a remove_if( ) that takes a predicate which is used to decide whether an object should be removed, a unique( ) that takes a binary predicate to perform uniqueness comparisons, a merge( ) that takes an additional argument which performs comparisons, and a sort( ) that takes a comparator (to provide a comparison or override the existing one).

list vs. set

Looking at the previous example you may note that if you want a sorted list with no duplicates, a set can give you that, right? It’s interesting to compare the performance of the two containers:

//: C04:ListVsSet.cpp
// Comparing list and set performance
#include <iostream>
#include <list>
#include <set>
#include <algorithm>
#include <ctime>
#include <cstdlib>
using namespace std;

class Obj {
  int a[20]; // To take up extra space
  int val;
public:
  Obj() : val(rand() % 500) {}
  friend bool 
  operator<(const Obj& a, const Obj& b) {
    return a.val < b.val;
  }
  friend bool 
  operator==(const Obj& a, const Obj& b) {
    return a.val == b.val;
  }
  friend ostream& 
  operator<<(ostream& os, const Obj& a) {
    return os << a.val;
  }
};

template<class Container>
void print(Container& c) {
  typename Container::iterator it;
  for(it = c.begin(); it != c.end(); it++)
    cout << *it << " ";
  cout << endl;
}

struct ObjGen {
  Obj operator()() { return Obj(); }
};

int main() {
  const int sz = 5000;
  srand(time(0));
  list<Obj> lo;
  clock_t ticks = clock();
  generate_n(back_inserter(lo), sz, ObjGen());
  lo.sort();
  lo.unique();
  cout << "list:" << clock() - ticks << endl;
  set<Obj> so;
  ticks = clock();
  generate_n(inserter(so, so.begin()), 
    sz, ObjGen());
  cout << "set:" << clock() - ticks << endl;
  print(lo);
  print(so);

} ///:~

When you run the program, you should discover that set is much faster than list. This is reassuring – after all, it is set’s primary job description!

Swapping all basic sequences

It turns out that all basic sequences have a member function swap( ) that’s designed to switch one sequence with another (however, this swap( ) is only defined for sequences of the same type). The member swap( ) makes use of its knowledge of the internal structure of the particular container in order to be efficient:

//: C04:Swapping.cpp
// All basic sequence containers can be swapped
#include "Noisy.h"
#include <list>
#include <vector>
#include <deque>
#include <iostream>
#include <algorithm>
using namespace std;
ostream_iterator<Noisy> out(cout, " ");

template<class Cont>
void print(Cont& c, char* comment = "") {
  cout << "\n" << comment << ": ";
  copy(c.begin(), c.end(), out);
  cout << endl;
}

template<class Cont>
void testSwap(char* cname) {
  Cont c1, c2;
  generate_n(back_inserter(c1), 10, NoisyGen());
  generate_n(back_inserter(c2), 5, NoisyGen());
  cout << "\n" << cname << ":" << endl;
  print(c1, "c1"); print(c2, "c2");
  cout << "\n Swapping the " << cname 
    << ":" << endl;
  c1.swap(c2);
  print(c1, "c1"); print(c2, "c2");
}  

int main() {
  testSwap<vector<Noisy> >("vector");
  testSwap<deque<Noisy> >("deque");
  testSwap<list<Noisy> >("list");

} ///:~

When you run this, you’ll discover that each type of sequence container is able to swap one sequence for another without any copying or assignments, even if the sequences are of different sizes. In effect, you’re completely swapping the memory of one object for another.

The STL algorithms also contain a swap( ), and when this function is applied to two containers of the same type, it will use the member swap( ) to achieve fast performance. Consequently, if you apply the sort( ) algorithm to a container of containers, you will find that the performance is very fast – it turns out that fast sorting of a container of containers was a design goal of the STL.

Robustness of lists

To break a list, you have to work pretty hard:

//: C04:ListRobustness.cpp
// lists are harder to break
#include <list>
#include <iostream>
using namespace std;

int main() {
  list<int> li(100, 0);
  list<int>::iterator i = li.begin();
  for(int j = 0; j < li.size() / 2; j++)
    i++;
  // Walk the iterator forward as you perform 
  // a lot of insertions in the middle:
  for(int k = 0; k < 1000; k++)
    li.insert(i++, 1); // No problem
  li.erase(i);
  i++;
  *i = 2; // Oops! It's invalid

} ///:~

When the link that the iterator i was pointing to was erased, it was unlinked from the list and thus became invalid. Trying to move forward to the “next link” from an invalid link is poorly-formed code. Notice that the operation that broke deque in DequeCoreDump.cpp is perfectly fine with a list.

Performance comparison

To get a better feel for the differences between the sequence containers, it’s illuminating to race them against each other while performing various operations.

//: C04:SequencePerformance.cpp
// Comparing the performance of the basic
// sequence containers for various operations
#include <vector>
#include <queue>
#include <list>
#include <iostream>
#include <string>
#include <typeinfo>
#include <ctime>
#include <cstdlib>
using namespace std;

class FixedSize {
  int x[20];
  // Automatic generation of default constructor,
  // copy-constructor and operator=
} fs;

template<class Cont>
struct InsertBack {
  void operator()(Cont& c, long count) {
    for(long i = 0; i < count; i++)
      c.push_back(fs);
  }
  char* testName() { return "InsertBack"; }
};

template<class Cont>
struct InsertFront {
  void operator()(Cont& c, long count) {
    long cnt = count * 10;
    for(long i = 0; i < cnt; i++)
      c.push_front(fs);
  }  
  char* testName() { return "InsertFront"; }
};

template<class Cont>
struct InsertMiddle {
  void operator()(Cont& c, long count) {
    typename Cont::iterator it;
    long cnt = count / 10;
    for(long i = 0; i < cnt; i++) {
      // Must get the iterator every time to keep
      // from causing an access violation with
      // vector. Increment it to put it in the
      // middle of the container:
      it = c.begin();
      it++;
      c.insert(it, fs);
    }
  }
  char* testName() { return "InsertMiddle"; }
};

template<class Cont>
struct RandomAccess { // Not for list
  void operator()(Cont& c, long count) {
    int sz = c.size();
    long cnt = count * 100;
    for(long i = 0; i < cnt; i++)
      c[rand() % sz];
  }
  char* testName() { return "RandomAccess"; }
};

template<class Cont>
struct Traversal {
  void operator()(Cont& c, long count) {
    long cnt = count / 100;
    for(long i = 0; i < cnt; i++) {
      typename Cont::iterator it = c.begin(),
        end = c.end();
      while(it != end) it++;
    }
  }
  char* testName() { return "Traversal"; }
};

template<class Cont>
struct Swap {
  void operator()(Cont& c, long count) {
    int middle = c.size() / 2;
    typename Cont::iterator it = c.begin(), 
      mid = c.begin();
    it++; // Put it in the middle
    for(int x = 0; x < middle + 1; x++)
      mid++;
    long cnt = count * 10;
    for(long i = 0; i < cnt; i++)
      swap(*it, *mid);
  }
  char* testName() { return "Swap"; }
};

template<class Cont>
struct RemoveMiddle {
  void operator()(Cont& c, long count) {
    long cnt = count / 10;
    if(cnt > c.size()) {
      cout << "RemoveMiddle: not enough elements"
        << endl;
      return;
    }
    for(long i = 0; i < cnt; i++) {
      typename Cont::iterator it = c.begin();
      it++;
      c.erase(it);
    }
  }
  char* testName() { return "RemoveMiddle"; }
};

template<class Cont>
struct RemoveBack {
  void operator()(Cont& c, long count) {
    long cnt = count * 10;
    if(cnt > c.size()) {
      cout << "RemoveBack: not enough elements"
        << endl;
      return;
    }
    for(long i = 0; i < cnt; i++)
      c.pop_back();
  }
  char* testName() { return "RemoveBack"; }
};

template<class Op, class Container>
void measureTime(Op f, Container& c, long count){
  string id(typeid(f).name());
  bool Deque = id.find("deque") != string::npos;
  bool List = id.find("list") != string::npos;
  bool Vector = id.find("vector") !=string::npos;
  string cont = Deque ? "deque" : List ? "list" 
    : Vector? "vector" : "unknown";
  cout << f.testName() << " for " << cont << ": ";
  // Standard C library CPU ticks:
  clock_t ticks = clock();
  f(c, count); // Run the test
  ticks = clock() - ticks;
  cout << ticks << endl;
}

typedef deque<FixedSize> DF;
typedef list<FixedSize> LF;
typedef vector<FixedSize> VF;

int main(int argc, char* argv[]) {
  srand(time(0));
  long count = 1000;
  if(argc >= 2) count = atoi(argv[1]);
  DF deq;
  LF lst;
  VF vec, vecres;
  vecres.reserve(count); // Preallocate storage
  measureTime(InsertBack<VF>(), vec, count);
  measureTime(InsertBack<VF>(), vecres, count);
  measureTime(InsertBack<DF>(), deq, count);
  measureTime(InsertBack<LF>(), lst, count);
  // Can't push_front() with a vector:
//! measureTime(InsertFront<VF>(), vec, count);
  measureTime(InsertFront<DF>(), deq, count);
  measureTime(InsertFront<LF>(), lst, count);
  measureTime(InsertMiddle<VF>(), vec, count);
  measureTime(InsertMiddle<DF>(), deq, count);
  measureTime(InsertMiddle<LF>(), lst, count);
  measureTime(RandomAccess<VF>(), vec, count);
  measureTime(RandomAccess<DF>(), deq, count);
  // Can't operator[] with a list:
//! measureTime(RandomAccess<LF>(), lst, count);
  measureTime(Traversal<VF>(), vec, count);
  measureTime(Traversal<DF>(), deq, count);
  measureTime(Traversal<LF>(), lst, count);
  measureTime(Swap<VF>(), vec, count);
  measureTime(Swap<DF>(), deq, count);
  measureTime(Swap<LF>(), lst, count);
  measureTime(RemoveMiddle<VF>(), vec, count);
  measureTime(RemoveMiddle<DF>(), deq, count);
  measureTime(RemoveMiddle<LF>(), lst, count);
  vec.resize(vec.size() * 10); // Make it bigger
  measureTime(RemoveBack<VF>(), vec, count);
  measureTime(RemoveBack<DF>(), deq, count);
  measureTime(RemoveBack<LF>(), lst, count);

} ///:~

This example makes heavy use of templates to eliminate redundancy, save space, guarantee identical code and improve clarity. Each test is represented by a class that is templatized on the container it will operate on. The test itself is inside the operator( ) which, in each case, takes a reference to the container and a repeat count – this count is not always used exactly as it is, but sometimes increased or decreased to prevent the test from being too short or too long. The repeat count is just a factor, and all tests are compared using the same value.

Each test class also has a member function that returns its name, so that it can easily be printed. You might think that this should be accomplished using run-time type identification, but since the actual name of the class involves a template expansion, this turns out to be the more direct approach.

The measureTime( ) function template takes as its first template argument the operation that it’s going to test – which is itself a class template selected from the group defined previously in the listing. The template argument Op will not only contain the name of the class, but also (decorated into it) the type of the container it’s working with. The RTTI typeid( ) operation allows the name of the class to be extracted as a char*, which can then be used to create a string called id. This string can be searched using string::find( ) to look for deque, list or vector. The bool variable that corresponds to the matching string becomes true, and this is used to properly initialize the string cont so the container name can be accurately printed, along with the test name.

Once the type of test and the container being tested has been printed out, the actual test is quite simple. The Standard C library function clock( ) is used to capture the starting and ending CPU ticks (this is typically more fine-grained than trying to measure seconds). Since f is an object of type Op, which is a class that has an operator( ), the line:

f(c, count);

is actually calling the operator( ) for the object f.

In main( ), you can see that each different type of test is run on each type of container, except for the containers that don’t support the particular operation being tested (these are commented out).

When you run the program, you’ll get comparative performance numbers for your particular compiler and your particular operating system and platform. Although this is only intended to give you a feel for the various performance features relative to the other sequences, it is not a bad way to get a quick-and-dirty idea of the behavior of your library, and also to compare one library with another.

set

The set produces a container that will accept only one of each thing you place in it; it also sorts the elements (sorting isn’t intrinsic to the conceptual definition of a set, but the STL set stores its elements in a balanced binary tree to provide rapid lookups, thus producing sorted results when you traverse it). The first two examples in this chapter used sets.

Consider the problem of creating an index for a book. You might like to start with all the words in the book, but you only want one instance of each word and you want them sorted. Of course, a set is perfect for this, and solves the problem effortlessly. However, there’s also the problem of punctuation and any other non-alpha characters, which must be stripped off to generate proper words. One solution to this problem is to use the Standard C library function strtok( ), which produces tokens (in our case, words) given a set of delimiters to strip out:

//: C04:WordList.cpp
// Display a list of words used in a document
#include "../require.h"
#include <string>
#include <cstring>
#include <set>
#include <iostream>
#include <fstream>
using namespace std;

const char* delimiters =
  " \t;()\"<>:{}[]+-=&*#.,/\\~";

int main(int argc, char* argv[]) {
  requireArgs(argc, 1);
  ifstream in(argv[1]);
  assure(in, argv[1]);
  set<string> wordlist;
  string line;
  while(getline(in, line)) {
    // Capture individual words:
    char* s = // Cast probably won’t crash:
      strtok((char*)line.c_str(), delimiters);
    while(s) {
      // Automatic type conversion:
      wordlist.insert(s); 
      s = strtok(0, delimiters);
    }
  }
  // Output results:
  copy(wordlist.begin(), wordlist.end(),
       ostream_iterator<string>(cout, "\n"));

} ///:~

strtok( ) takes the starting address of a character buffer (the first argument) and looks for delimiters (the second argument). It replaces the delimiter with a zero, and returns the address of the beginning of the token. If you call it subsequent times with a first argument of zero it will continue extracting tokens from the rest of the string until it finds the end. In this case, the delimiters are those that delimit the keywords and identifiers of C++, so it extracts these keywords and identifiers. Each word is turned into a string and placed into the wordlist vector, which eventually contains the whole file, broken up into words.

You don’t have to use a set just to get a sorted sequence. You can use the sort( ) function (along with a multitude of other functions in the STL) on different STL containers. However, it’s likely that set will be faster.

Eliminating strtok( )

Some programmers consider strtok( ) to be the poorest design in the Standard C library because it uses a static buffer to hold its data between function calls. This means:

  1. You can’t use strtok( ) in two places at the same time.
  2. You can’t use strtok( ) in a multithreaded program.
  3. You can’t use strtok( ) in a library that might be used in a multithreaded program.
  4. strtok( ) modifies the input sequence, which can produce unexpected side effects.
  5. strtok( ) depends on reading in “lines”, which means you need a buffer big enough for the longest line. This produces both wastefully-sized buffers, and lines longer than the “longest” line. This can also introduce security holes. (Notice that the buffer size problem was eliminated in WordList.cpp by using string input, but this required a cast so that strtok( ) could modify the data in the string – a dangerous approach for general-purpose programming).

For all these reasons it seems like a good idea to find an alternative for strtok( ). The following example will use an istreambuf_iterator (introduced earlier) to move the characters from one place (which happens to be an istream) to another (which happens to be a string), depending on whether the Standard C library function isalpha( ) is true:

//: C04:WordList2.cpp
// Eliminating strtok() from Wordlist.cpp
#include "../require.h"
#include <string>
#include <cstring>
#include <set>
#include <iostream>
#include <fstream>
#include <iterator>
using namespace std;

int main(int argc, char* argv[]) {
  requireArgs(argc, 1);
  ifstream in(argv[1]);
  assure(in, argv[1]);
  istreambuf_iterator<char> p(in), end;
  set<string> wordlist;
  while (p != end) {
    string word;
    insert_iterator<string> 
      ii(word, word.begin());
    // Find the first alpha character:
    while(!isalpha(*p) && p != end)
      p++;
    // Copy until the first non-alpha character:
    while (isalpha(*p) && p != end)
      *ii++ = *p++;
    if (word.size() != 0)
      wordlist.insert(word);
  } 
  // Output results:
  copy(wordlist.begin(), wordlist.end(),
    ostream_iterator<string>(cout, "\n"));

} ///:~

This example was suggested by Nathan Myers, who invented the istreambuf_iterator and its relatives. This iterator extracts information character-by-character from a stream. Although the istreambuf_iterator template argument might suggest to you that you could extract, for example, ints instead of char, that’s not the case. The argument must be of some character type – a regular char or a wide character.

After the file is open, an istreambuf_iterator called p is attached to the istream so characters can be extracted from it. The set<string> called wordlist will be used to hold the resulting words.

The while loop reads words until the end of the input stream is found. This is detected using the default constructor for istreambuf_iterator which produces the past-the-end iterator object end. Thus, if you want to test to make sure you’re not at the end of the stream, you simply say p != end.

The second type of iterator that’s used here is the insert_iterator, which creates an iterator that knows how to insert objects into a container. Here, the “container” is the string called word which, for the purposes of insert_iterator, behaves like a container. The constructor for insert_iterator requires the container and an iterator indicating where it should start inserting the characters. You could also use a back_insert_iterator, which requires that the container have a push_back( ) (string does).

After the while loop sets everything up, it begins by looking for the first alpha character, incrementing start until that character is found. Then it copies characters from one iterator to the other, stopping when a non-alpha character is found. Each word, assuming it is non-empty, is added to wordlist.

StreamTokenizer:
a more flexible solution

The above program parses its input into strings of words containing only alpha characters, but that’s still a special case compared to the generality of strtok( ). What we’d like now is an actual replacement for strtok( ) so we’re never tempted to use it. WordList2.cpp can be modified to create a class called StreamTokenizer that delivers a new token as a string whenever you call next( ), according to the delimiters you give it upon construction (very similar to strtok( )):

//: C04:StreamTokenizer.h
// C++ Replacement for Standard C strtok()
#ifndef STREAMTOKENIZER_H
#define STREAMTOKENIZER_H
#include <string>
#include <iostream>
#include <iterator>
  
class StreamTokenizer {
  typedef std::istreambuf_iterator<char> It;
  It p, end;
  std::string delimiters;
  bool isDelimiter(char c) {
    return 
      delimiters.find(c) != std::string::npos;
  }
public:
  StreamTokenizer(std::istream& is, 
    std::string delim = " \t\n;()\"<>:{}[]+-=&*#"
    ".,/\\~!0123456789") : p(is), end(It()),
    delimiters(delim) {}
  std::string next(); // Get next token
};

#endif STREAMTOKENIZER_H ///:~

The default delimiters for the StreamTokenizer constructor extract words with only alpha characters, as before, but now you can choose different delimiters to parse different tokens. The implementation of next( ) looks similar to Wordlist2.cpp:

//: C04:StreamTokenizer.cpp {O}
#include "StreamTokenizer.h"
using namespace std;

string StreamTokenizer::next() {
  string result;
  if(p != end) {
    insert_iterator<string>
      ii(result, result.begin());
    while(isDelimiter(*p) && p != end)
      p++;
    while (!isDelimiter(*p) && p != end)
      *ii++ = *p++;
  }
  return result;

} ///:~

The first non-delimiter is found, then characters are copied until a delimiter is found, and the resulting string is returned. Here’s a test:

//: C04:TokenizeTest.cpp
//{L} StreamTokenizer
// Test StreamTokenizer
#include "StreamTokenizer.h"
#include "../require.h"
#include <iostream>
#include <fstream>
#include <set>
using namespace std;

int main(int argc, char* argv[]) {
  requireArgs(argc, 1);
  ifstream in(argv[1]);
  assure(in, argv[1]);
  StreamTokenizer words(in);
  set<string> wordlist;
  string word;
  while((word = words.next()).size() != 0)
    wordlist.insert(word);
  // Output results:
  copy(wordlist.begin(), wordlist.end(),
    ostream_iterator<string>(cout, "\n"));

} ///:~

Now the tool is more reusable than before, but it’s still inflexible, because it can only work with an istream. This isn’t as bad as it first seems, since a string can be turned into an istream via an istringstream. But in the next section we’ll come up with the most general, reusable tokenizing tool, and this should give you a feeling of what “reusable” really means, and the effort necessary to create truly reusable code.

A completely reusable tokenizer

Since the STL containers and algorithms all revolve around iterators, the most flexible solution will itself be an iterator. You could think of the TokenIterator as an iterator that wraps itself around any other iterator that can produce characters. Because it is designed as an input iterator (the most primitive type of iterator) it can be used with any STL algorithm. Not only is it a useful tool in itself, the TokenIterator is also a good example of how you can design your own iterators.[18]

The TokenIterator is doubly flexible: first, you can choose the type of iterator that will produce the char input. Second, instead of just saying what characters represent the delimiters, TokenIterator will use a predicate which is a function object whose operator( ) takes a char and decides if it should be in the token or not. Although the two examples given here have a static concept of what characters belong in a token, you could easily design your own function object to change its state as the characters are read, producing a more sophisticated parser.

The following header file contains the two basic predicates Isalpha and Delimiters, along with the template for TokenIterator:

//: C04:TokenIterator.h
#ifndef TOKENITERATOR_H
#define TOKENITERATOR_H
#include <string>
#include <iterator>
#include <algorithm>
#include <cctype>

struct Isalpha { 
  bool operator()(char c) { 
    using namespace std; //[[For a compiler bug]]
    return isalpha(c); 
  }
};

class Delimiters {
  std::string exclude;
public:
  Delimiters() {}
  Delimiters(const std::string& excl) 
    : exclude(excl) {}
  bool operator()(char c) {
    return exclude.find(c) == std::string::npos;
  }
};

template <class InputIter, class Pred = Isalpha>
class TokenIterator: public std::iterator<
  std::input_iterator_tag,std::string,ptrdiff_t>{
  InputIter first;
  InputIter last;
  std::string word;
  Pred predicate;
public:
  TokenIterator(InputIter begin, InputIter end, 
    Pred pred = Pred()) 
    : first(begin), last(end), predicate(pred) {
      ++*this; 
  }
  TokenIterator() {} // End sentinel
  // Prefix increment:
  TokenIterator& operator++() {
    word.resize(0);
    first = std::find_if(first, last, predicate);
    while (first != last && predicate(*first))
      word += *first++;
    return *this;
  }
  // Postfix increment
  class Proxy { 
    std::string word;
  public:
    Proxy(const std::string& w) : word(w) {}
    std::string operator*() { return word; } 
  };
  Proxy operator++(int) { 
    Proxy d(word);
    ++*this; 
    return d; 
  }
  // Produce the actual value:
  std::string operator*() const { return word; }
  std::string* operator->() const {
    return &(operator*()); 
  }
  // Compare iterators:
  bool operator==(const TokenIterator&) { 
    return word.size() == 0 && first == last; 
  }
  bool operator!=(const TokenIterator& rv) { 
    return !(*this == rv);
  }
};

#endif // TOKENITERATOR_H ///:~

TokenIterator is inherited from the std::iterator template. It might appear that there’s some kind of functionality that comes with std::iterator, but it is purely a way of tagging an iterator so that a container that uses it knows what it’s capable of. Here, you can see input_iterator_tag as a template argument – this tells anyone who asks that a TokenIterator only has the capabilities of an input iterator, and cannot be used with algorithms requiring more sophisticated iterators. Apart from the tagging, std::iterator doesn’t do anything else, which means you must design all the other functionality in yourself.

TokenIterator may look a little strange at first, because the first constructor requires both a “begin” and “end” iterator as arguments, along with the predicate. Remember that this is a “wrapper” iterator that has no idea of how to tell whether it’s at the end of its input source, so the ending iterator is necessary in the first constructor. The reason for the second (default) constructor is that the STL algorithms (and any algorithms you write) need a TokenIterator sentinel to be the past-the-end value. Since all the information necessary to see if the TokenIterator has reached the end of its input is collected in the first constructor, this second constructor creates a TokenIterator that is merely used as a placeholder in algorithms.

The core of the behavior happens in operator++. This erases the current value of word using string::resize( ), then finds the first character that satisfies the predicate (thus discovering the beginning of the new token) using find_if( ) (from the STL algorithms, discussed in the following chapter). The resulting iterator is assigned to first, thus moving first forward to the beginning of the token. Then, as long as the end of the input is not reached and the predicate is satisfied, characters are copied into the word from the input. Finally, the TokenIterator object is returned, and must be dereferenced to access the new token.

The postfix increment requires a proxy object to hold the value before the increment, so it can be returned (see the operator overloading chapter for more details of this). Producing the actual value is a straightforward operator*. The only other functions that must be defined for an output iterator are the operator== and operator!= to indicate whether the TokenIterator has reached the end of its input. You can see that the argument for operator== is ignored – it only cares about whether it has reached its internal last iterator. Notice that operator!= is defined in terms of operator==.

A good test of TokenIterator includes a number of different sources of input characters including a streambuf_iterator, a char*, and a deque<char>::iterator. Finally, the original Wordlist.cpp problem is solved:

//: C04:TokenIteratorTest.cpp
#include "TokenIterator.h"
#include "../require.h"
#include <fstream>
#include <iostream>
#include <vector>
#include <deque>
#include <set>
using namespace std;

int main() {
  ifstream in("TokenIteratorTest.cpp");
  assure(in, "TokenIteratorTest.cpp");
  ostream_iterator<string> out(cout, "\n");
  typedef istreambuf_iterator<char> IsbIt;
  IsbIt begin(in), isbEnd;
  Delimiters 
    delimiters(" \t\n~;()\"<>:{}[]+-=&*#.,/\\");
  TokenIterator<IsbIt, Delimiters> 
    wordIter(begin, isbEnd, delimiters),
    end;
  vector<string> wordlist;
  copy(wordIter, end, back_inserter(wordlist));
  // Output results:
  copy(wordlist.begin(), wordlist.end(), out);
  *out++ = "-----------------------------------";
  // Use a char array as the source:
  char* cp = 
    "typedef std::istreambuf_iterator<char> It";
  TokenIterator<char*, Delimiters>
    charIter(cp, cp + strlen(cp), delimiters),
    end2;
  vector<string> wordlist2;
  copy(charIter, end2, back_inserter(wordlist2));
  copy(wordlist2.begin(), wordlist2.end(), out);
  *out++ = "-----------------------------------";
  // Use a deque<char> as the source:
  ifstream in2("TokenIteratorTest.cpp");
  deque<char> dc;
  copy(IsbIt(in2), IsbIt(), back_inserter(dc));
  TokenIterator<deque<char>::iterator,Delimiters>
    dcIter(dc.begin(), dc.end(), delimiters),
    end3;
  vector<string> wordlist3;
  copy(dcIter, end3, back_inserter(wordlist3));
  copy(wordlist3.begin(), wordlist3.end(), out);
  *out++ = "-----------------------------------";
  // Reproduce the Wordlist.cpp example:
  ifstream in3("TokenIteratorTest.cpp");
  TokenIterator<IsbIt, Delimiters>
    wordIter2(IsbIt(in3), isbEnd, delimiters);
  set<string> wordlist4;
  while(wordIter2 != end)
    wordlist4.insert(*wordIter2++);
  copy(wordlist4.begin(), wordlist4.end(), out);

} ///:~

When using an istreambuf_iterator, you create one to attach to the istream object, and one with the default constructor as the past-the-end marker. Both of these are used to create the TokenIterator that will actually produce the tokens; the default constructor produces the faux TokenIterator past-the-end sentinel (this is just a placeholder, and as mentioned previously is actually ignored). The TokenIterator produces strings that are inserted into a container which must, naturally, be a container of string – here a vector<string> is used in all cases except the last (you could also concatenate the results onto a string). Other than that, a TokenIterator works like any other input iterator.

stack

The stack, along with the queue and priority_queue, are classified as adapters, which means they are implemented using one of the basic sequence containers: vector, list or deque. This, in my opinion, is an unfortunate case of confusing what something does with the details of its underlying implementation – the fact that these are called “adapters” is of primary value only to the creator of the library. When you use them, you generally don’t care that they’re adapters, but instead that they solve your problem. Admittedly there are times when it’s useful to know that you can choose an alternate implementation or build an adapter from an existing container object, but that’s generally one level removed from the adapter’s behavior. So, while you may see it emphasized elsewhere that a particular container is an adapter, I shall only point out that fact when it’s useful. Note that each type of adapter has a default container that it’s built upon, and this default is the most sensible implementation, so in most cases you won’t need to concern yourself with the underlying implementation.

The following example shows stack<string> implemented in the three possible ways: the default (which uses deque), with a vector and with a list:

//: C04:Stack1.cpp
// Demonstrates the STL stack
#include "../require.h"
#include <iostream>
#include <fstream>
#include <stack>
#include <list>
#include <vector>
#include <string>
using namespace std;

// Default: deque<string>:
typedef stack<string> Stack1;
// Use a vector<string>:
typedef stack<string, vector<string> > Stack2;
// Use a list<string>:
typedef stack<string, list<string> > Stack3;

int main(int argc, char* argv[]) {
  requireArgs(argc, 1); // File name is argument
  ifstream in(argv[1]);
  assure(in, argv[1]);
  Stack1 textlines; // Try the different versions
  // Read file and store lines in the stack:
  string line;
  while(getline(in, line)) 
    textlines.push(line + "\n");
  // Print lines from the stack and pop them:
  while(!textlines.empty()) {
    cout << textlines.top();
    textlines.pop();
  }

} ///:~

The top( ) and pop( ) operations will probably seem non-intuitive if you’ve used other stack classes. When you call pop( ) it returns void rather than the top element that you might have expected. If you want the top element, you get a reference to it with top( ). It turns out this is more efficient, since a traditional pop( ) would have to return a value rather than a reference, and thus invoke the copy-constructor. When you’re using a stack (or a priority_queue, described later) you can efficiently refer to top( ) as many times as you want, then discard the top element explicitly using pop( ) (perhaps if some other term than the familiar “pop” had been used, this would have been a bit clearer).

The stack template has a very simple interface, essentially the member functions you see above. It doesn’t have sophisticated forms of initialization or access, but if you need that you can use the underlying container that the stack is implemented upon. For example, suppose you have a function that expects a stack interface but in the rest of your program you need the objects stored in a list. The following program stores each line of a file along with the leading number of spaces in that line (you might imagine it as a starting point for performing some kinds of source-code reformatting):

//: C04:Stack2.cpp
// Converting a list to a stack
#include "../require.h"
#include <iostream>
#include <fstream>
#include <stack>
#include <list>
#include <string>
using namespace std;

// Expects a stack:
template<class Stk>
void stackOut(Stk& s, ostream& os = cout) {
  while(!s.empty()) {
    os << s.top() << "\n";
    s.pop();
  }
}

class Line {
  string line; // Without leading spaces
  int lspaces; // Number of leading spaces
public:
  Line(string s) : line(s) {
    lspaces = line.find_first_not_of(' ');
    if(lspaces == string::npos)
      lspaces = 0;
    line = line.substr(lspaces);
  }
  friend ostream& 
  operator<<(ostream& os, const Line& l) {
    for(int i = 0; i < l.lspaces; i++)
      os << ' ';
    return os << l.line;
  }
  // Other functions here...
};

int main(int argc, char* argv[]) {
  requireArgs(argc, 1); // File name is argument
  ifstream in(argv[1]);
  assure(in, argv[1]);
  list<Line> lines;
  // Read file and store lines in the list:
  string s;
  while(getline(in, s)) 
    lines.push_front(s);
  // Turn the list into a stack for printing:
  stack<Line, list<Line> > stk(lines);
  stackOut(stk);

} ///:~

The function that requires the stack interface just sends each top( ) object to an ostream and then removes it by calling pop( ). The Line class determines the number of leading spaces, then stores the contents of the line without the leading spaces. The ostream operator<< re-inserts the leading spaces so the line prints properly, but you can easily change the number of spaces by changing the value of lspaces (the member functions to do this are not shown here).

In main( ), the input file is read into a list<Line>, then a stack is wrapped around this list so it can be sent to stackOut( ).

You cannot iterate through a stack; this emphasizes that you only want to perform stack operations when you create a stack. You can get equivalent “stack” functionality using a vector and its back( ), push_back( ) and pop_back( ) methods, and then you have all the additional functionality of the vector. Stack1.cpp can be rewritten to show this:

//: C04:Stack3.cpp
// Using a vector as a stack; modified Stack1.cpp
#include "../require.h"
#include <iostream>
#include <fstream>
#include <vector>
#include <string>
using namespace std;

int main(int argc, char* argv[]) {
  requireArgs(argc, 1);
  ifstream in(argv[1]);
  assure(in, argv[1]);
  vector<string> textlines;
  string line;
  while(getline(in, line)) 
    textlines.push_back(line + "\n");
  while(!textlines.empty()) {
    cout << textlines.back();
    textlines.pop_back();
  }

} ///:~

You’ll see this produces the same output as Stack1.cpp, but you can now perform vector operations as well. Of course, list has the additional ability to push things at the front, but it’s generally less efficient than using push_back( ) with vector. (In addition, deque is usually more efficient than list for pushing things at the front).

queue

The queue is a restricted form of a deque – you can only enter elements at one end, and pull them off the other end. Functionally, you could use a deque anywhere you need a queue, and you would then also have the additional functionality of the deque. The only reason you need to use a queue rather than a deque, then, is if you want to emphasize that you will only be performing queue-like behavior.

The queue is an adapter class like stack, in that it is built on top of another sequence container. As you might guess, the ideal implementation for a queue is a deque, and that is the default template argument for the queue; you’ll rarely need a different implementation.

Queues are often used when modeling systems where some elements of the system are waiting to be served by other elements in the system. A classic example of this is the “bank-teller problem,” where you have customers arriving at random intervals, getting into a line, and then being served by a set of tellers. Since the customers arrive randomly and each take a random amount of time to be served, there’s no way to deterministically know how long the line will be at any time. However, it’s possible to simulate the situation and see what happens.

A problem in performing this simulation is the fact that, in effect, each customer and teller should be run by a separate process. What we’d like is a multithreaded environment, then each customer or teller would have their own thread. However, Standard C++ has no model for multithreading so there is no standard solution to this problem. On the other hand, with a little adjustment to the code it’s possible to simulate enough multithreading to provide a satisfactory solution to our problem.

Multithreading means you have multiple threads of control running at once, in the same address space (this differs from multitasking, where you have different processes each running in their own address space). The trick is that you have fewer CPUs than you do threads (and very often only one CPU) so to give the illusion that each thread has its own CPU there is a time-slicing mechanism that says “OK, current thread – you’ve had enough time. I’m going to stop you and go give time to some other thread.” This automatic stopping and starting of threads is called pre-emptive and it means you don’t need to manage the threading process at all.

An alternative approach is for each thread to voluntarily yield the CPU to the scheduler, which then goes and finds another thread that needs running. This is easier to synthesize, but it still requires a method of “swapping” out one thread and swapping in another (this usually involves saving the stack frame and using the standard C library functions setjmp( ) and longjmp( ); see my article in the (XX) issue of Computer Language magazine for an example). So instead, we’ll build the time-slicing into the classes in the system. In this case, it will be the tellers that represent the “threads,” (the customers will be passive) so each teller will have an infinite-looping run( ) method that will execute for a certain number of “time units,” and then simply return. By using the ordinary return mechanism, we eliminate the need for any swapping. The resulting program, although small, provides a remarkably reasonable simulation:

//: C04:BankTeller.cpp
// Using a queue and simulated multithreading
// To model a bank teller system
#include <iostream>
#include <queue>
#include <list>
#include <cstdlib>
#include <ctime>
using namespace std;

class Customer {
  int serviceTime;
public:
  Customer() : serviceTime(0) {}
  Customer(int tm) : serviceTime(tm) {}
  int getTime() { return serviceTime; }
  void setTime(int newtime) {
    serviceTime = newtime;
  }
  friend ostream& 
  operator<<(ostream& os, const Customer& c) {
    return os << '[' << c.serviceTime << ']';
  }
};

class Teller {
  queue<Customer>& customers;
  Customer current;
  static const int slice = 5;
  int ttime; // Time left in slice
  bool busy; // Is teller serving a customer?
public:
  Teller(queue<Customer>& cq) 
    : customers(cq), ttime(0), busy(false) {}
  Teller& operator=(const Teller& rv) {
    customers = rv.customers;
    current = rv.current;
    ttime = rv.ttime;
    busy = rv.busy;
    return *this;
  }
  bool isBusy() { return busy; }
  void run(bool recursion = false) {
    if(!recursion)
      ttime = slice;
    int servtime = current.getTime();
    if(servtime > ttime) {
      servtime -= ttime;
      current.setTime(servtime);
      busy = true; // Still working on current
      return;
    }
    if(servtime < ttime) {
      ttime -= servtime;
      if(!customers.empty()) {
        current = customers.front();
        customers.pop(); // Remove it
        busy = true;
        run(true); // Recurse
      }
      return;
    }
    if(servtime == ttime) {
      // Done with current, set to empty:
      current = Customer(0);
      busy = false;
      return; // No more time in this slice
    }
  }
};

// Inherit to access protected implementation:
class CustomerQ : public queue<Customer> {
public:
  friend ostream& 
  operator<<(ostream& os, const CustomerQ& cd) {
    copy(cd.c.begin(), cd.c.end(), 
      ostream_iterator<Customer>(os, ""));
    return os;
  }
};

int main() {
  CustomerQ customers;
  list<Teller> tellers;
  typedef list<Teller>::iterator TellIt;
  tellers.push_back(Teller(customers));
  srand(time(0)); // Seed random number generator
  while(true) {
    // Add a random number of customers to the
    // queue, with random service times:
    for(int i = 0; i < rand() % 5; i++)
      customers.push(Customer(rand() % 15 + 1));
    cout << '{' << tellers.size() << '}' 
      << customers << endl;
    // Have the tellers service the queue:
    for(TellIt i = tellers.begin(); 
      i != tellers.end(); i++)
      (*i).run();
    cout << '{' << tellers.size() << '}' 
      << customers << endl;
    // If line is too long, add another teller:
    if(customers.size() / tellers.size() > 2)
      tellers.push_back(Teller(customers));
    // If line is short enough, remove a teller:
    if(tellers.size() > 1 && 
      customers.size() / tellers.size() < 2)
      for(TellIt i = tellers.begin();
        i != tellers.end(); i++)
        if(!(*i).isBusy()) {
          tellers.erase(i);
          break; // Out of for loop
        }
  }

} ///:~

Each customer requires a certain amount of service time, which is the number of time units that a teller must spend on the customer in order to serve that customer’s needs. Of course, the amount of service time will be different for each customer, and will be determined randomly. In addition, you won’t know how many customers will be arriving in each interval, so this will also be determined randomly.

The Customer objects are kept in a queue<Customer>, and each Teller object keeps a reference to that queue. When a Teller object is finished with its current Customer object, that Teller will get another Customer from the queue and begin working on the new Customer, reducing the Customer’s service time during each time slice that the Teller is allotted. All this logic is in the run( ) member function, which is basically a three-way if statement based on whether the amount of time necessary to serve the customer is less than, greater than or equal to the amount of time left in the teller’s current time slice. Notice that if the Teller has more time after finishing with a Customer, it gets a new customer and recurses into itself.

Just as with a stack, when you use a queue, it’s only a queue and doesn’t have any of the other functionality of the basic sequence containers. This includes the ability to get an iterator in order to step through the stack. However, the underlying sequence container (that the queue is built upon) is held as a protected member inside the queue, and the identifier for this member is specified in the C++ Standard as ‘c’, which means that you can inherit from queue in order to access the underlying implementation. The CustomerQ class does exactly that, for the sole purpose of defining an ostream operator<< that can iterate through the queue and print out its members.

The driver for the simulation is the infinite while loop in main( ). At the beginning of each pass through the loop, a random number of customers are added, with random service times. Both the number of tellers and the queue contents are displayed so you can see the state of the system. After running each teller, the display is repeated. At this point, the system adapts by comparing the number of customers and the number of tellers; if the line is too long another teller is added and if it is short enough a teller can be removed. It is in this adaptation section of the program that you can experiment with policies regarding the optimal addition and removal of tellers. If this is the only section that you’re modifying, you may want to encapsulate policies inside of different objects.

Priority queues

When you push( ) an object onto a priority_queue, that object is sorted into the queue according to a function or function object (you can allow the default less template to supply this, or provide one of your own). The priority_queue ensures that when you look at the top( ) element it will be the one with the highest priority. When you’re done with it, you call pop( ) to remove it and bring the next one into place. Thus, the priority_queue has nearly the same interface as a stack, but it behaves differently.

Like stack and queue, priority_queue is an adapter which is built on top of one of the basic sequences – the default is vector.

It’s trivial to make a priority_queue that works with ints:

//: C04:PriorityQueue1.cpp
#include <iostream>
#include <queue>
#include <cstdlib>
#include <ctime>
using namespace std;

int main() {
  priority_queue<int> pqi;
  srand(time(0)); // Seed random number generator
  for(int i = 0; i < 100; i++)
    pqi.push(rand() % 25);
  while(!pqi.empty()) {
    cout << pqi.top() << ' ';
    pqi.pop();
  }

} ///:~

This pushes into the priority_queue 100 random values from 0 to 24. When you run this program you’ll see that duplicates are allowed, and the highest values appear first. To show how you can change the ordering by providing your own function or function object, the following program gives lower-valued numbers the highest priority:

//: C04:PriorityQueue2.cpp
// Changing the priority
#include <iostream>
#include <queue>
#include <cstdlib>
#include <ctime>
using namespace std;

struct Reverse {
  bool operator()(int x, int y) {
    return y < x;
  }
};

int main() {
  priority_queue<int, vector<int>, Reverse> pqi;
  // Could also say:
  // priority_queue<int, vector<int>, 
  //   greater<int> > pqi;
  srand(time(0));
  for(int i = 0; i < 100; i++)
    pqi.push(rand() % 25);
  while(!pqi.empty()) {
    cout << pqi.top() << ' ';
    pqi.pop();
  }

} ///:~

Although you can easily use the Standard Library greater template to produce the predicate, I went to the trouble of creating Reverse so you could see how to do it in case you have a more complex scheme for ordering your objects.

If you look at the description for priority_queue, you see that the constructor can be handed a “Compare” object, as shown above. If you don’t use your own “Compare” object, the default template behavior is the less template function. You might think (as I did) that it would make sense to leave the template instantiation as priority_queue<int>, thus using the default template arguments of vector<int> and less<int>. Then you could inherit a new class from less<int>, redefine operator( ) and hand an object of that type to the priority_queue constructor. I tried this, and got it to compile, but the resulting program produced the same old less<int> behavior. The answer lies in the less< > template:

template <class T>
struct less : binary_function<T, T, bool> {
  // Other stuff...
  bool operator()(const T& x, const T& y) const {
    return x < y;
  }

};

The operator( ) is not virtual, so even though the constructor takes your subclass of less<int> by reference (thus it doesn’t slice it down to a plain less<int>), when operator( ) is called, it is the base-class version that is used. While it is generally reasonable to expect ordinary classes to behave polymorphically, you cannot make this assumption when using the STL.

Of course, a priority_queue of int is trivial. A more interesting problem is a to-do list, where each object contains a string and a primary and secondary priority value:

//: C04:PriorityQueue3.cpp
// A more complex use of priority_queue
#include <iostream>
#include <queue>
#include <string>
using namespace std;

class ToDoItem {
  char primary;
  int secondary;
  string item;
public:
  ToDoItem(string td, char pri ='A', int sec =1)
    : item(td), primary(pri), secondary(sec) {}
  friend bool operator<(
    const ToDoItem& x, const ToDoItem& y) {
    if(x.primary > y.primary) 
      return true;
    if(x.primary == y.primary)
      if(x.secondary > y.secondary) 
        return true;
    return false;
  }
  friend ostream& 
  operator<<(ostream& os, const ToDoItem& td) {
    return os << td.primary << td.secondary 
      << ": " << td.item;
  }
};

int main() {
  priority_queue<ToDoItem> toDoList;
  toDoList.push(ToDoItem("Empty trash", 'C', 4));
  toDoList.push(ToDoItem("Feed dog", 'A', 2));
  toDoList.push(ToDoItem("Feed bird", 'B', 7));
  toDoList.push(ToDoItem("Mow lawn", 'C', 3));
  toDoList.push(ToDoItem("Water lawn", 'A', 1));
  toDoList.push(ToDoItem("Feed cat", 'B', 1));
  while(!toDoList.empty()) {
    cout << toDoList.top() << endl;
    toDoList.pop();
  }

} ///:~

ToDoItem’s operator< must be a non-member function for it to work with less< >. Other than that, everything happens automatically. The output is:

A1: Water lawn
A2: Feed dog
B1: Feed cat
B7: Feed bird
C3: Mow lawn

C4: Empty trash

Note that you cannot iterate through a priority_queue. However, it is possible to emulate the behavior of a priority_queue using a vector, thus allowing you access to that vector. You can do this by looking at the implementation of priority_queue, which uses make_heap( ), push_heap( ) and pop_heap( ) (they are the soul of the priority_queue; in fact you could say that the heap is the priority queue and priority_queue is just a wrapper around it). This turns out to be reasonably straightforward, but you might think that a shortcut is possible. Since the container used by priority_queue is protected (and has the identifier, according to the Standard C++ specification, named c) you can inherit a new class which provides access to the underlying implementation:

//: C04:PriorityQueue4.cpp
// Manipulating the underlying implementation
#include <iostream>
#include <queue>
#include <cstdlib>
#include <ctime>
using namespace std;

class PQI : public priority_queue<int> {
public:
  vector<int>& impl() { return c; }
};

int main() {
  PQI pqi;
  srand(time(0));
  for(int i = 0; i < 100; i++)
    pqi.push(rand() % 25);
  copy(pqi.impl().begin(), pqi.impl().end(),
    ostream_iterator<int>(cout, " "));
  cout << endl;
  while(!pqi.empty()) {
    cout << pqi.top() << ' ';
    pqi.pop();
  }

} ///:~

However, if you run this program you’ll discover that the vector doesn’t contain the items in the descending order that you get when you call pop( ), the order that you want from the priority queue. It would seem that if you want to create a vector that is a priority queue, you have to do it by hand, like this:

//: C04:PriorityQueue5.cpp
// Building your own priority queue
#include <iostream>
#include <queue>
#include <cstdlib>
#include <ctime>
using namespace std;

template<class T, class Compare>
class PQV : public vector<T> {
  Compare comp;
public:
  PQV(Compare cmp = Compare()) : comp(cmp) {
    make_heap(begin(), end(), comp);
  }
  const T& top() { return front(); }
  void push(const T& x) {
    push_back(x);
    push_heap(begin(), end(), comp);
  }
  void pop() {
    pop_heap(begin(), end(), comp);
    pop_back();
  }  
};

int main() {
  PQV<int, less<int> > pqi;
  srand(time(0));
  for(int i = 0; i < 100; i++)
    pqi.push(rand() % 25);
  copy(pqi.begin(), pqi.end(),
    ostream_iterator<int>(cout, " "));
  cout << endl;
  while(!pqi.empty()) {
    cout << pqi.top() << ' ';
    pqi.pop();
  }

} ///:~

But this program behaves in the same way as the previous one! What you are seeing in the underlying vector is called a heap. This heap represents the tree of the priority queue (stored in the linear structure of the vector), but when you iterate through it you do not get a linear priority-queue order. You might think that you can simply call sort_heap( ), but that only works once, and then you don’t have a heap anymore, but instead a sorted list. This means that to go back to using it as a heap the user must remember to call make_heap( ) first. This can be encapsulated into your custom priority queue:

//: C04:PriorityQueue6.cpp
#include <iostream>
#include <queue>
#include <algorithm>
#include <cstdlib>
#include <ctime>
using namespace std;

template<class T, class Compare>
class PQV : public vector<T> {
  Compare comp;
  bool sorted;
  void assureHeap() {
    if(sorted) {
      // Turn it back into a heap:
      make_heap(begin(), end(), comp);
      sorted = false;
    }
  }    
public:
  PQV(Compare cmp = Compare()) : comp(cmp) {
    make_heap(begin(), end(), comp);
    sorted = false;
  }
  const T& top() {
    assureHeap();
    return front(); 
  }
  void push(const T& x) {
    assureHeap();
    // Put it at the end:
    push_back(x);
    // Re-adjust the heap:
    push_heap(begin(), end(), comp);
  }
  void pop() {
    assureHeap();
    // Move the top element to the last position:
    pop_heap(begin(), end(), comp);
    // Remove that element:
    pop_back();
  }
  void sort() {
    if(!sorted) {
      sort_heap(begin(), end(), comp);
      reverse(begin(), end());
      sorted = true;
    }
  }
};

int main() {
  PQV<int, less<int> > pqi;
  srand(time(0));
  for(int i = 0; i < 100; i++) {
    pqi.push(rand() % 25);
    copy(pqi.begin(), pqi.end(),
      ostream_iterator<int>(cout, " "));
    cout << "\n-----\n";
  }
  pqi.sort();
  copy(pqi.begin(), pqi.end(),
    ostream_iterator<int>(cout, " "));
  cout << "\n-----\n";
  while(!pqi.empty()) {
    cout << pqi.top() << ' ';
    pqi.pop();
  }

} ///:~

If sorted is true, then the vector is not organized as a heap, but instead as a sorted sequence. assureHeap( ) guarantees that it’s put back into heap form before performing any heap operations on it.

The first for loop in main( ) now has the additional quality that it displays the heap as it’s being built.

The only drawback to this solution is that the user must remember to call sort( ) before viewing it as a sorted sequence (although one could conceivably override all the methods that produce iterators so that they guarantee sorting). Another solution is to build a priority queue that is not a vector, but will build you a vector whenever you want one:

//: C04:PriorityQueue7.cpp
// A priority queue that will hand you a vector
#include <iostream>
#include <queue>
#include <algorithm>
#include <cstdlib>
#include <ctime>
using namespace std;

template<class T, class Compare>
class PQV {
  vector<T> v;
  Compare comp;
public:
  // Don't need to call make_heap(); it's empty:
  PQV(Compare cmp = Compare()) : comp(cmp) {}
  void push(const T& x) {
    // Put it at the end:
    v.push_back(x);
    // Re-adjust the heap:
    push_heap(v.begin(), v.end(), comp);
  }
  void pop() {
    // Move the top element to the last position:
    pop_heap(v.begin(), v.end(), comp);
    // Remove that element:
    v.pop_back();
  }
  const T& top() { return v.front(); }
  bool empty() const { return v.empty(); }
  int size() const { return v.size(); }
  typedef vector<T> TVec;
  TVec vector() {
    TVec r(v.begin(), v.end());
    // It’s already a heap
    sort_heap(r.begin(), r.end(), comp);
    // Put it into priority-queue order:
    reverse(r.begin(), r.end());
    return r;
  }
};

int main() {
  PQV<int, less<int> > pqi;
  srand(time(0));
  for(int i = 0; i < 100; i++)
    pqi.push(rand() % 25);
  const vector<int>& v = pqi.vector();
  copy(v.begin(), v.end(),
    ostream_iterator<int>(cout, " "));
  cout << "\n-----------\n"; 
  while(!pqi.empty()) {
    cout << pqi.top() << ' ';
    pqi.pop();
  }

} ///:~

PQV follows the same form as the STL’s priority_queue, but has the additional member vector( ), which creates a new vector that’s a copy of the one in PQV (which means that it’s already a heap), then sorts it (thus it leave’s PQV’s vector untouched), then reverses the order so that traversing the new vector produces the same effect as popping the elements from the priority queue.

You may observe that the approach of inheriting from priority_queue used in PriorityQueue4.cpp could be used with the above technique to produce more succinct code:

//: C04:PriorityQueue8.cpp
// A more compact version of PriorityQueue7.cpp
#include <iostream>
#include <queue>
#include <algorithm>
#include <cstdlib>
#include <ctime>
using namespace std;

template<class T>
class PQV : public priority_queue<T> {
public:
  typedef vector<T> TVec;
  TVec vector() {
    TVec r(c.begin(), c.end());
    // c is already a heap
    sort_heap(r.begin(), r.end(), comp);
    // Put it into priority-queue order:
    reverse(r.begin(), r.end());
    return r;
  }
};

int main() {
  PQV<int> pqi;
  srand(time(0));
  for(int i = 0; i < 100; i++)
    pqi.push(rand() % 25);
  const vector<int>& v = pqi.vector();
  copy(v.begin(), v.end(),
    ostream_iterator<int>(cout, " "));
  cout << "\n-----------\n"; 
  while(!pqi.empty()) {
    cout << pqi.top() << ' ';
    pqi.pop();
  }

} ///:~

The brevity of this solution makes it the simplest and most desirable, plus it’s guaranteed that the user will not have a vector in the unsorted state. The only potential problem is that the vector( ) member function returns the vector<T> by value, which might cause some overhead issues with complex values of the parameter type T.

Holding bits

Most of my computer education was in hardware-level design and programming, and I spent my first few years doing embedded systems development. Because C was a language that purported to be “close to the hardware,” I have always found it dismaying that there was no native binary representation for numbers. Decimal, of course, and hexadecimal (tolerable only because it’s easier to group the bits in your mind), but octal? Ugh. Whenever you read specs for chips you’re trying to program, they don’t describe the chip registers in octal, or even hexadecimal – they use binary. And yet C won’t let you say 0b0101101, which is the obvious solution for a language close to the hardware.

Although there’s still no native binary representation in C++, things have improved with the addition of two classes: bitset and vector<bool>, both of which are designed to manipulate a group of on-off values. The primary differences between these types are:

  1. The bitset holds a fixed number of bits. You establish the quantity of bits in the bitset template argument. The vector<bool> can, like a regular vector, expand dynamically to hold any number of bool values.
  2. The bitset is explicitly designed for performance when manipulating bits, and not as a “regular” container. As such, it has no iterators and it’s most storage-efficient when it contains an integral number of long values. The vector<bool>, on the other hand, is a specialization of a vector, and so has all the operations of a normal vector – the specialization is just designed to be space-efficient for bool.

There is no trivial conversion between a bitset and a vector<bool>, which implies that the two are for very different purposes.

bitset<n>

The template for bitset accepts an integral template argument which is the number of bits to represent. Thus, bitset<10> is a different type than bitset<20>, and you cannot perform comparisons, assignments, etc. between the two.

A bitset provides virtually any bit operation that you could ask for, in a very efficient form. However, each bitset is made up of an integral number of longs (typically 32 bits), so even though it uses no more space than it needs, it always uses at least the size of a long. This means you’ll use space most efficiently if you increase the size of your bitsets in chunks of the number of bits in a long. In addition, the only conversion from a bitset to a numerical value is to an unsigned long, which means that 32 bits (if your long is the typical size) is the most flexible form of a bitset.

The following example tests almost all the functionality of the bitset (the missing operations are redundant or trivial). You’ll see the description of each of the bitset outputs to the right of the output so that the bits all line up and you can compare them to the source values. If you still don’t understand bitwise operations, running this program should help.

//: C04:BitSet.cpp
// Exercising the bitset class
#include <iostream>
#include <bitset>
#include <cstdlib>
#include <ctime>
#include <climits>
#include <string>
using namespace std;
const int sz = 32;
typedef bitset<sz> BS;

template<int bits>
bitset<bits> randBitset() {
  bitset<bits> r(rand());
  for(int i = 0; i < bits/16 - 1; i++) {
    r <<= 16;
    // "OR" together with a new lower 16 bits:
    r |= bitset<bits>(rand()); 
  }
  return r;
}  

int main() {
  srand(time(0));
  cout << "sizeof(bitset<16>) = " 
    << sizeof(bitset<16>) << endl;
  cout << "sizeof(bitset<32>) = " 
    << sizeof(bitset<32>) << endl;
  cout << "sizeof(bitset<48>) = " 
    << sizeof(bitset<48>) << endl;
  cout << "sizeof(bitset<64>) = " 
    << sizeof(bitset<64>) << endl;
  cout << "sizeof(bitset<65>) = " 
    << sizeof(bitset<65>) << endl;
  BS a(randBitset<sz>()), b(randBitset<sz>());
  // Converting from a bitset:
  unsigned long ul = a.to_ulong();
  string s = b.to_string();
  // Converting a string to a bitset:
  char* cbits = "111011010110111";
  cout << "char* cbits = " << cbits <<endl;
  cout << BS(cbits) << " [BS(cbits)]" << endl;
  cout << BS(cbits, 2) 
    << " [BS(cbits, 2)]" << endl;
  cout << BS(cbits, 2, 11)
    << " [BS(cbits, 2, 11)]" << endl;
  cout << a << " [a]" << endl;
  cout << b << " [b]"<< endl;
  // Bitwise AND:
  cout << (a & b) << " [a & b]" << endl;
  cout << (BS(a) &= b) << " [a &= b]" << endl;
  // Bitwise OR:
  cout << (a | b) << " [a | b]" << endl;
  cout << (BS(a) |= b) << " [a |= b]" << endl;
  // Exclusive OR:
  cout << (a ^ b) << " [a ^ b]" << endl;
  cout << (BS(a) ^= b) << " [a ^= b]" << endl;
  cout << a << " [a]" << endl; // For reference
  // Logical left shift (fill with zeros):
  cout << (BS(a) <<= sz/2) 
    << " [a <<= (sz/2)]" << endl;
  cout << (a << sz/2) << endl;
  cout << a << " [a]" << endl; // For reference
  // Logical right shift (fill with zeros):
  cout << (BS(a) >>= sz/2) 
    << " [a >>= (sz/2)]" << endl;
  cout << (a >> sz/2) << endl;
  cout << a << " [a]" << endl; // For reference
  cout << BS(a).set() << " [a.set()]" << endl;
  for(int i = 0; i < sz; i++)
    if(!a.test(i)) {
      cout << BS(a).set(i) 
        << " [a.set(" << i <<")]" << endl;
      break; // Just do one example of this
    }
  cout << BS(a).reset() << " [a.reset()]"<< endl;
  for(int j = 0; j < sz; j++)
    if(a.test(j)) {
      cout << BS(a).reset(j) 
        << " [a.reset(" << j <<")]" << endl;
      break; // Just do one example of this
    }
  cout << BS(a).flip() << " [a.flip()]" << endl;
  cout << ~a << " [~a]" << endl;  
  cout << a << " [a]" << endl; // For reference
  cout << BS(a).flip(1) << " [a.flip(1)]"<< endl;
  BS c;
  cout << c << " [c]" << endl;
  cout << "c.count() = " << c.count() << endl;
  cout << "c.any() = " 
    << (c.any() ? "true" : "false") << endl;
  cout << "c.none() = " 
    << (c.none() ? "true" : "false") << endl;
  c[1].flip(); c[2].flip();
  cout << c << " [c]" << endl;
  cout << "c.count() = " << c.count() << endl;
  cout << "c.any() = " 
    << (c.any() ? "true" : "false") << endl;
  cout << "c.none() = " 
    << (c.none() ? "true" : "false") << endl;
  // Array indexing operations:
  c.reset();
  for(int k = 0; k < c.size(); k++)
    if(k % 2 == 0)
      c[k].flip();
  cout << c << " [c]" << endl;
  c.reset();
  // Assignment to bool:
  for(int ii = 0; ii < c.size(); ii++)
    c[ii] = (rand() % 100) < 25;
  cout << c << " [c]" << endl;
  // bool test:
  if(c[1] == true) 
    cout << "c[1] == true"; 
  else 
    cout << "c[1] == false" << endl;

} ///:~

To generate interesting random bitsets, the randBitset( ) function is created. The Standard C rand( ) function only generates an int, so this function demonstrates operator<<= by shifting each 16 random bits to the left until the bitset (which is templatized in this function for size) is full. The generated number and each new 16 bits is combined using the operator|=.

The first thing demonstrated in main( ) is the unit size of a bitset. If it is less than 32 bits, sizeof produces 4 (4 bytes = 32 bits), which is the size of a single long on most implementations. If it’s between 32 and 64, it requires two longs, greater than 64 requires 3 longs, etc. Thus you make the best use of space if you use a bit quantity that fits in an integral number of longs. However, notice there’s no extra overhead for the object – it’s as if you were hand-coding to use a long.

Another clue that bitset is optimized for longs is that there is a to_ulong( ) member function that produces the value of the bitset as an unsigned long. There are no other numerical conversions from bitset, but there is a to_string( ) conversion that produces a string containing ones and zeros, and this can be as long as the actual bitset. However, using bitset<32> may make your life simpler because of to_ulong( ).

There’s still no primitive format for binary values, but the next best thing is supported by bitset: a string of ones and zeros with the least-significant bit (lsb) on the right. The three constructors demonstrated show taking the entire string (the char array is automatically converted to a string), the string starting at character 2, and the string from character 2 through 11. You can write to an ostream from a bitset using operator<< and it comes out as ones and zeros. You can also read from an istream using operator>> (not shown here).

You’ll notice that bitset only has three non-member operators: and (&), or (|) and exclusive-or (^). Each of these create a new bitset as their return value. All of the member operators opt for the more efficient &=, |=, etc. form where a temporary is not created. However, these forms actually change their lvalue (which is a in most of the tests in the above example). To prevent this, I created a temporary to be used as the lvalue by invoking the copy-constructor on a; this is why you see the form BS(a). The result of each test is printed out, and occasionally a is reprinted so you can easily look at it for reference.

The rest of the example should be self-explanatory when you run it; if not you can find the details in your compiler’s documentation or the other documentation mentioned earlier in this chapter.

vector<bool>

vector<bool> is a specialization of the vector template. A normal bool variable requires at least one byte, but since a bool only has two states the ideal implementation of vector<bool> is such that each bool value only requires one bit. This means the iterator must be specially-defined, and cannot be a bool*.

The bit-manipulation functions for vector<bool> are much more limited than those of bitset. The only member function that was added to those already in vector is flip( ), to invert all the bits; there is no set( ) or reset( ) as in bitset. When you use operator[ ], you get back an object of type vector<bool>::reference, which also has a flip( ) to invert that individual bit.

//: C04:VectorOfBool.cpp
// Demonstrate the vector<bool> specialization
#include <iostream>
#include <sstream>
#include <vector>
#include <bitset>
#include <iterator>
using namespace std;

int main() {
  vector<bool> vb(10, true);
  vector<bool>::iterator it;
  for(it = vb.begin(); it != vb.end(); it++)
    cout << *it;
  cout << endl;
  vb.push_back(false);
  ostream_iterator<bool> out(cout, "");
  copy(vb.begin(), vb.end(), out);
  cout << endl;
  bool ab[] = { true, false, false, true, true, 
    true, true, false, false, true };
  // There's a similar constructor:
  vb.assign(ab, ab + sizeof(ab)/sizeof(bool));
  copy(vb.begin(), vb.end(), out);
  cout << endl;
  vb.flip(); // Flip all bits
  copy(vb.begin(), vb.end(), out);
  cout << endl;
  for(int i = 0; i < vb.size(); i++)
    vb[i] = 0; // (Equivalent to "false")
  vb[4] = true;
  vb[5] = 1;
  vb[7].flip(); // Invert one bit
  copy(vb.begin(), vb.end(), out);
  cout << endl;
  // Convert to a bitset:
  ostringstream os;
  copy(vb.begin(), vb.end(), 
    ostream_iterator<bool>(os, ""));
  bitset<10> bs(os.str());
  cout << "Bitset:\n" << bs << endl;

} ///:~

The last part of this example takes a vector<bool> and converts it to a bitset by first turning it into a string of ones and zeros. Of course, you must know the size of the bitset at compile-time. You can see that this conversion is not the kind of operation you’ll want to do on a regular basis.

Associative containers

The set, map, multiset and multimap are called associative containers because they associate keys with values. Well, at least maps and multimaps associate keys to values, but you can look at a set as a map that has no values, only keys (and they can in fact be implemented this way), and the same for the relationship between multiset and multimap. So, because of the structural similarity sets and multisets are lumped in with associative containers.

The most important basic operations with associative containers are putting things in, and in the case of a set, seeing if something is in the set. In the case of a map, you want to first see if a key is in the map, and if it exists you want the associated value for that key to be returned. Of course, there are many variations on this theme but that’s the fundamental concept. The following example shows these basics:

//: C04:AssociativeBasics.cpp
// Basic operations with sets and maps
#include "Noisy.h"
#include <iostream>
#include <set>
#include <map>
using namespace std;

int main() {
  Noisy na[] = { Noisy(), Noisy(), Noisy(), 
    Noisy(), Noisy(), Noisy(), Noisy() };
  // Add elements via constructor:
  set<Noisy> ns(na, na+ sizeof na/sizeof(Noisy));
  // Ordinary insertion:
  Noisy n;
  ns.insert(n);
  cout << endl;
  // Check for set membership:
  cout << "ns.count(n)= " << ns.count(n) << endl;
  if(ns.find(n) != ns.end())
    cout << "n(" << n << ") found in ns" << endl;
  // Print elements:
  copy(ns.begin(), ns.end(), 
    ostream_iterator<Noisy>(cout, " "));
  cout << endl;
  cout << "\n-----------\n";
  map<int, Noisy> nm;
  for(int i = 0; i < 10; i++)
    nm[i]; // Automatically makes pairs
  cout << "\n-----------\n";
  for(int j = 0; j < nm.size(); j++)
    cout << "nm[" << j <<"] = " << nm[j] << endl;
  cout << "\n-----------\n";
  nm[10] = n;
  cout << "\n-----------\n";
  nm.insert(make_pair(47, n));
  cout << "\n-----------\n";
  cout << "\n nm.count(10)= " 
    << nm.count(10) << endl;
  cout << "nm.count(11)= " 
    << nm.count(11) << endl;
  map<int, Noisy>::iterator it = nm.find(6);
  if(it != nm.end())
    cout << "value:" << (*it).second
      << " found in nm at location 6" << endl;
  for(it = nm.begin(); it != nm.end(); it++)
    cout << (*it).first << ":" 
      << (*it).second << ", ";
  cout << "\n-----------\n";

} ///:~

The set<Noisy> object ns is created using two iterators into an array of Noisy objects, but there is also a default constructor and a copy-constructor, and you can pass in an object that provides an alternate scheme for doing comparisons. Both sets and maps have an insert( ) member function to put things in, and there are a couple of different ways to check to see if an object is already in an associative container: count( ), when given a key, will tell you how many times that key occurs (this can only be zero or one in a set or map, but it can be more than one with a multiset or multimap). The find( ) member function will produce an iterator indicating the first occurrence (with set and map, the only occurrence) of the key that you give it, or the past-the-end iterator if it can’t find the key. The count( ) and find( ) member functions exist for all the associative containers, which makes sense. The associative containers also have member functions lower_bound( ), upper_bound( ) and equal_range( ), which actually only make sense for multiset and multimap, as you shall see (but don’t try to figure out how they would be useful for set and map, since they are designed for dealing with a range of duplicate keys, which those containers don’t allow).

Designing an operator[ ] always produces a little bit of a dilemma because it’s intended to be treated as an array-indexing operation, so people don’t tend to think about performing a test before they use it. But what happens if you decide to index out of the bounds of the array? One option, of course, is to throw an exception, but with a map “indexing out of the array” could mean that you want an entry there, and that’s the way the STL map treats it. The first for loop after the creation of the map<int, Noisy> nm just “looks up” objects using the operator[ ], but this is actually creating new Noisy objects! The map creates a new key-value pair (using the default constructor for the value) if you look up a value with operator[ ] and it isn’t there. This means that if you really just want to look something up and not create a new entry, you must use count( ) (to see if it’s there) or find( ) (to get an iterator to it).

The for loop that prints out the values of the container using operator[ ] has a number of problems. First, it requires integral keys (which we happen to have in this case). Next and worse, if all the keys are not sequential, you’ll end up counting from 0 to the size of the container, and if there are some spots which don’t have key-value pairs you’ll automatically create them, and miss some of the higher values of the keys. Finally, if you look at the output from the for loop you’ll see that things are very busy, and it’s quite puzzling at first why there are so many constructions and destructions for what appears to be a simple lookup. The answer only becomes clear when you look at the code in the map template for operator[ ], which will be something like this:

mapped_type& operator[] (const key_type& k) {
  value_type tmp(k,T()); 
  return (*((insert(tmp)).first)).second;

}

Following the trail, you’ll find that map::value_type is:

typedef pair<const Key, T> value_type;

Now you need to know what a pair is, which can be found in <utility>:

template <class T1, class T2>
struct pair { 
  typedef T1 first_type; 
  typedef T2 second_type; 
  T1 first; 
  T2 second; 
  pair(); 
  pair(const T1& x, const T2& y) 
    : first(x), second(y) {}
  // Templatized copy-constructor:
  template<class U, class V> 
   pair(const pair<U, V> &p);

};

It turns out this is a very important (albeit simple) struct which is used quite a bit in the STL. All it really does it package together two objects, but it’s very useful, especially when you want to return two objects from a function (since a return statement only takes one object). There’s even a shorthand for creating a pair called make_pair( ), which is used in AssociativeBasics.cpp.

So to retrace the steps, map::value_type is a pair of the key and the value of the map – actually, it’s a single entry for the map. But notice that pair packages its objects by value, which means that copy-constructions are necessary to get the objects into the pair. Thus, the creation of tmp in map::operator[ ] will involve at least a copy-constructor call and destructor call for each object in the pair. Here, we’re getting off easy because the key is an int. But if you want to really see what kind of activity can result from map::operator[ ], try running this:

//: C04:NoisyMap.cpp
// Mapping Noisy to Noisy
#include "Noisy.h"
#include <map>
using namespace std;

int main() {
  map<Noisy, Noisy> mnn;
  Noisy n1, n2;
  cout << "\n--------\n";
  mnn[n1] = n2;
  cout << "\n--------\n";
  cout << mnn[n1] << endl;
  cout << "\n--------\n";

} ///:~

You’ll see that both the insertion and lookup generate a lot of extra objects, and that’s because of the creation of the tmp object. If you look back up at map::operator[ ] you’ll see that the second line calls insert( ) passing it tmp – that is, operator[ ] does an insertion every time. The return value of insert( ) is a different kind of pair, where first is an iterator pointing to the key-value pair that was just inserted, and second is a bool indicating whether the insertion took place. You can see that operator[ ] grabs first (the iterator), dereferences it to produce the pair, and then returns the second which is the value at that location.

So on the upside, map has this fancy “make a new entry if one isn’t there” behavior, but the downside is that you always get a lot of extra object creations and destructions when you use map::operator[ ]. Fortunately, AssociativeBasics.cpp also demonstrates how to reduce the overhead of insertions and deletions, by not using operator[ ] if you don’t have to. The insert( ) member function is slightly more efficient than operator[ ]. With a set you only hold one object, but with a map you hold key-value pairs, so insert( ) requires a pair as its argument. Here’s where make_pair( ) comes in handy, as you can see.

For looking objects up in a map, you can use count( ) to see whether a key is in the map, or you can use find( ) to produce an iterator pointing directly at the key-value pair. Again, since the map contains pairs that’s what the iterator produces when you dereference it, so you have to select first and second. When you run AssociativeBasics.cpp you’ll notice that the iterator approach involves no extra object creations or destructions at all. It’s not as easy to write or read, though.

If you use a map with large, complex objects and discover there’s too much overhead when doing lookups and insertions (don’t assume this from the beginning – take the easy approach first and use a profiler to discover bottlenecks), then you can use the counted-handle approach shown in Chapter XX so that you are only passing around small, lightweight objects.

Of course, you can also iterate through a set or map and operate on each of its objects. This will be demonstrated in later examples.

Generators and fillers
for associative containers

You’ve seen how useful the fill( ), fill_n( ), generate( ) and generate_n( ) function templates in <algorithm> have been for filling the sequential containers (vector, list and deque) with data. However, these are implemented by using operator= to assign values into the sequential containers, and the way that you add objects to associative containers is with their respective insert( ) member functions. Thus the default “assignment” behavior causes a problem when trying to use the “fill” and “generate” functions with associative containers.

One solution is to duplicate the “fill” and “generate” functions, creating new ones that can be used with associative containers. It turns out that only the fill_n( ) and generate_n( ) functions can be duplicated (fill( ) and generate( ) copy in between two iterators, which doesn’t make sense with associative containers), but the job is fairly easy, since you have the <algorithm> header file to work from (and since it contains templates, all the source code is there):

//: C04:assocGen.h
// The fill_n() and generate_n() equivalents 
// for associative containers.
#ifndef ASSOCGEN_H
#define ASSOCGEN_H

template<class Assoc, class Count, class T>
void 
assocFill_n(Assoc& a, Count n, const T& val) {
  while(n-- > 0)
    a.insert(val);
}

template<class Assoc, class Count, class Gen>
void assocGen_n(Assoc& a, Count n, Gen g) {
  while(n-- > 0)
    a.insert(g());
}

#endif // ASSOCGEN_H ///:~

You can see that instead of using iterators, the container class itself is passed (by reference, of course, since you wouldn’t want to make a local copy, fill it, and then have it discarded at the end of the scope).

This code demonstrates two valuable lessons. The first lesson is that if the algorithms don’t do what you want, copy the nearest thing and modify it. You have the example at hand in the STL header, so most of the work has already been done.

The second lesson is more pointed: if you look long enough, there’s probably a way to do it in the STL without inventing anything new. The present problem can instead be solved by using an insert_iterator (produced by a call to inserter( )), which calls insert( ) to place items in the container instead of operator=. This is not simply a variation of front_insert_iterator (produced by a call to front_inserter( )) or back_insert_iterator (produced by a call to back_inserter( )), since those iterators use push_front( ) and push_back( ), respectively. Each of the insert iterators is different by virtue of the member function it uses for insertion, and insert( ) is the one we need. Here’s a demonstration that shows filling and generating both a map and a set (of course, it can also be used with multimap and multiset). First, some templatized, simple generators are created (this may seem like overkill, but you never know when you’ll need them; for that reason they’re placed in a header file):

//: C04:SimpleGenerators.h
// Generic generators, including
// one that creates pairs
#include <iostream>
#include <utility>

// A generator that increments its value:
template<typename T>
class IncrGen {
  T i;
public:
  IncrGen(T ii) : i (ii) {}
  T operator()() { return i++; }
};

// A generator that produces an STL pair<>:
template<typename T1, typename T2>
class PairGen {
  T1 i;
  T2 j;
public:
  PairGen(T1 ii, T2 jj) : i(ii), j(jj) {}
  std::pair<T1,T2> operator()() { 
    return std::pair<T1,T2>(i++, j++); 
  }
};

// A generic global operator<< 
// for printing any STL pair<>:
template<typename Pair> std::ostream&
operator<<(std::ostream& os, const Pair& p) {
  return os << p.first << "\t" 
    << p.second << std::endl;

} ///:~

Both generators expect that T can be incremented, and they simply use operator++ to generate new values from whatever you used for initialization. PairGen creates an STL pair object as its return value, and that’s what can be placed into a map or multimap using insert( ).

The last function is a generalization of operator<< for ostreams, so that any pair can be printed, assuming each element of the pair supports a stream operator<<. As you can see below, this allows the use of copy( ) to output the map:

//: C04:AssocInserter.cpp
// Using an insert_iterator so fill_n() and
// generate_n() can be used with associative 
// containers
#include "SimpleGenerators.h"
#include <iterator>
#include <iostream>
#include <algorithm>
#include <set>
#include <map>
using namespace std;

int main() {
  set<int> s;
  fill_n(inserter(s, s.begin()), 10, 47);
  generate_n(inserter(s, s.begin()), 10, 
    IncrGen<int>(12));
  copy(s.begin(), s.end(), 
    ostream_iterator<int>(cout, "\n"));
  
  map<int, int> m;
  fill_n(inserter(m, m.begin()), 10, 
    make_pair(90,120));
  generate_n(inserter(m, m.begin()), 10, 
    PairGen<int, int>(3, 9));
  copy(m.begin(), m.end(), 
    ostream_iterator<pair<int,int> >(cout,"\n"));

} ///:~

The second argument to inserter is an iterator, which actually isn’t used in the case of associative containers since they maintain their order internally, rather than allowing you to tell them where the element should be inserted. However, an insert_iterator can be used with many different types of containers so you must provide the iterator.

Note how the ostream_iterator is created to output a pair; this wouldn’t have worked if the operator<< hadn’t been created, and since it’s a template it is automatically instantiated for pair<int, int>.

The magic of maps

An ordinary array uses an integral value to index into a sequential set of elements of some type. A map is an associative array, which means you associate one object with another in an array-like fashion, but instead of selecting an array element with a number as you do with an ordinary array, you look it up with an object! The example which follows counts the words in a text file, so the index is the string object representing the word, and the value being looked up is the object that keeps count of the strings.

In a single-item container like a vector or list, there’s only one thing being held. But in a map, you’ve got two things: the key (what you look up by, as in mapname[key]) and the value that results from the lookup with the key. If you simply want to move through the entire map and list each key-value pair, you use an iterator, which when dereferenced produces a pair object containing both the key and the value. You access the members of a pair by selecting first or second.

This same philosophy of packaging two items together is also used to insert elements into the map, but the pair is created as part of the instantiated map and is called value_type, containing the key and the value. So one option for inserting a new element is to create a value_type object, loading it with the appropriate objects and then calling the insert( ) member function for the map. Instead, the following example makes use of the aforementioned special feature of map: if you’re trying to find an object by passing in a key to operator[ ] and that object doesn’t exist, operator[ ] will automatically insert a new key-value pair for you, using the default constructor for the value object. With that in mind, consider an implementation of a word counting program:

//: C04:WordCount.cpp
//{L} StreamTokenizer
// Count occurrences of words using a map
#include "StreamTokenizer.h"
#include "../require.h"
#include <string>
#include <map>
#include <iostream>
#include <fstream>
using namespace std;

class Count {
  int i;
public:
  Count() : i(0) {}
  void operator++(int) { i++; } // Post-increment  
  int& val() { return i; }
};

typedef map<string, Count> WordMap;
typedef WordMap::iterator WMIter;

int main(int argc, char* argv[]) {
  requireArgs(argc, 1);
  ifstream in(argv[1]);
  assure(in, argv[1]);
  StreamTokenizer words(in);
  WordMap wordmap;
  string word;
  while((word = words.next()).size() != 0)
    wordmap[word]++;
  for(WMIter w = wordmap.begin(); 
      w != wordmap.end(); w++)
    cout << (*w).first << ": "
      << (*w).second.val() << endl;

} ///:~

The need for the Count class is to contain an int that’s automatically initialized to zero. This is necessary because of the crucial line:

wordmap[word]++;

This finds the word that has been produced by StreamTokenizer and increments the Count object associated with that word, which is fine as long as there is a key-value pair for that string. If there isn’t, the map automatically inserts a key for the word you’re looking up, and a Count object, which is initialized to zero by the default constructor. Thus, when it’s incremented the Count becomes 1.

Printing the entire list requires traversing it with an iterator (there’s no copy( ) shortcut for a map unless you want to write an operator<< for the pair in the map). As previously mentioned, dereferencing this iterator produces a pair object, with the first member the key and the second member the value. In this case second is a Count object, so its val( ) member must be called to produce the actual word count.

If you want to find the count for a particular word, you can use the array index operator, like this:

cout << "the: " << wordmap["the"].val() << endl;

You can see that one of the great advantages of the map is the clarity of the syntax; an associative array makes intuitive sense to the reader (note, however, that if “the” isn’t already in the wordmap a new entry will be created!).

A command-line argument tool

A problem that often comes up in programming is the management of program arguments that you can specify on the command line. Usually you’d like to have a set of defaults that can be changed via the command line. The following tool expects the command line arguments to be in the form flag1=value1 with no spaces around the ‘=‘ (so it will be treated as a single argument). The ProgVal class simply inherits from map<string, string>:

//: C04:ProgVals.h
// Program values can be changed by command line
#ifndef PROGVALS_H
#define PROGVALS_H
#include <map>
#include <iostream>
#include <string>

class ProgVals 
  : public std::map<std::string, std::string> {
public:
  ProgVals(std::string defaults[][2], int sz);
  void parse(int argc, char* argv[],
    std::string usage, int offset = 1);
  void print(std::ostream& out = std::cout);
};

#endif // PROGVALS_H ///:~

The constructor expects an array of string pairs (as you’ll see, this allows you to initialize it with an array of char*) and the size of that array. The parse( ) member function is handed the command-line arguments along with a “usage” string to print if the command line is given incorrectly, and the “offset” which tells it which command-line argument to start with (so you can have non-flag arguments at the beginning of the command line). Finally, print( ) displays the values. Here is the implementation:

//: C04:ProgVals.cpp {O}
#include "ProgVals.h"
using namespace std;

ProgVals::ProgVals(
  std::string defaults[][2], int sz) {
  for(int i = 0; i < sz; i++)
    insert(make_pair(
      defaults[i][0], defaults[i][1]));
}

void ProgVals::parse(int argc, char* argv[],
  string usage, int offset) {
  // Parse and apply additional
  // command-line arguments:
  for(int i = offset; i < argc; i++) {
    string flag(argv[i]);
    int equal = flag.find('=');
    if(equal == string::npos) {
      cerr << "Command line error: " <<
        argv[i] << endl << usage << endl;
      continue; // Next argument
    }
    string name = flag.substr(0, equal);
    string value = flag.substr(equal + 1);
    if(find(name) == end()) {
      cerr << name << endl << usage << endl;
      continue; // Next argument
    }
    operator[](name) = value;
  }
}

void ProgVals::print(ostream& out) {
  out << "Program values:" << endl;
  for(iterator it = begin(); it != end(); it++)
    out << (*it).first << " = "
        << (*it).second << endl;

} ///:~

The constructor uses the STL make_pair( ) helper function to convert each pair of char* into a pair object that can be inserted into the map. In parse( ), each command-line argument is checked for the existence of the telltale ‘=‘ sign (reporting an error if it isn’t there), and then is broken into two strings, the name which appears before the ‘=‘, and the value which appears after. The operator[ ] is then used to change the existing value to the new one.

Here’s an example to test the tool:

//: C04:ProgValTest.cpp
//{L} ProgVals
#include "ProgVals.h"
using namespace std;

string defaults[][2] = {
  { "color", "red" },
  { "size", "medium" },
  { "shape", "rectangular" },
  { "action", "hopping"},
};

const char* usage = "usage:\n"
"ProgValTest [flag1=val1 flag2=val2 ...]\n"
"(Note no space around '=')\n"
"Where the flags can be any of: \n"
"color, size, shape, action \n";

// So it can be used globally:
ProgVals pvals(defaults, 
  sizeof defaults / sizeof *defaults);

class Animal {
  string color, size, shape, action;
public:
  Animal(string col, string sz, 
    string shp, string act) 
    :color(col),size(sz),shape(shp),action(act){}
  // Default constructor uses program default
  // values, possibly change on command line:
  Animal() : color(pvals["color"]), 
    size(pvals["size"]), shape(pvals["shape"]),
    action(pvals["action"]) {}
  void print() {
    cout << "color = " << color << endl
      << "size = " << size << endl
      << "shape = " << shape << endl
      << "action = " << action << endl;
  }
  // And of course pvals can be used anywhere
  // else you'd like.
};

int main(int argc, char* argv[]) {
  // Initialize and parse command line values
  // before any code that uses pvals is called:
  pvals.parse(argc, argv, usage);
  pvals.print();
  Animal a;
  cout << "Animal a values:" << endl;
  a.print();

} ///:~

This program can create Animal objects with different characteristics, and those characteristics can be established with the command line. The default characteristics are given in the two-dimensional array of char* called defaults and, after the usage string you can see a global instance of ProgVals called pvals is created; this is important because it allows the rest of the code in the program to access the values.

Note that Animal’s default constructor uses the values in pvals inside its constructor initializer list. When you run the program you can try creating different animal characteristics.

Many command-line programs also use a style of beginning a flag with a hyphen, and sometimes they use single-character flags.

The STL map is used in numerous places throughout the rest of this book.

Multimaps and duplicate keys

A multimap is a map that can contain duplicate keys. At first this may seem like a strange idea, but it can occur surprisingly often. A phone book, for example, can have many entries with the same name.

Suppose you are monitoring wildlife, and you want to keep track of where and when each type of animal is spotted. Thus, you may see many animals of the same kind, all in different locations and at different times. So if the type of animal is the key, you’ll need a multimap. Here’s what it looks like:

//: C04:WildLifeMonitor.cpp
#include <vector>
#include <map>
#include <string>
#include <algorithm>
#include <iostream>
#include <sstream>
#include <ctime>
using namespace std;

class DataPoint {
  int x, y; // Location coordinates
  time_t time; // Time of Sighting
public:
  DataPoint() : x(0), y(0), time(0) {}
  DataPoint(int xx, int yy, time_t tm) :
    x(xx), y(yy), time(tm) {}
  // Synthesized operator=, copy-constructor OK
  int getX() { return x; }
  int getY() { return y; }
  time_t* getTime() { return &time; }
};

string animal[] = { 
  "chipmunk", "beaver", "marmot", "weasel",
  "squirrel", "ptarmigan", "bear", "eagle",
  "hawk", "vole", "deer", "otter", "hummingbird",
};
const int asz = sizeof animal/sizeof *animal;
vector<string> animals(animal, animal + asz);

// All the information is contained in a 
// "Sighting," which can be sent to an ostream:
typedef pair<string, DataPoint> Sighting;

ostream& 
operator<<(ostream& os, const Sighting& s) {
  return os << s.first << " sighted at x= " << 
    s.second.getX() << ", y= " << s.second.getY()
    << ", time = " << ctime(s.second.getTime());
}

// A generator for Sightings:
class SightingGen {
  vector<string>& animals;
  static const int d = 100;
public:
  SightingGen(vector<string>& an) :
    animals(an) { srand(time(0)); }
  Sighting operator()() {
    Sighting result;
    int select = rand() % animals.size();
    result.first = animals[select];
    result.second = DataPoint(
      rand() % d, rand() % d, time(0));
    return result;
  }
};

typedef multimap<string, DataPoint> DataMap;
typedef DataMap::iterator DMIter;

int main() {
  DataMap sightings;
  generate_n(
    inserter(sightings, sightings.begin()),
    50, SightingGen(animals));
  // Print everything:
  copy(sightings.begin(), sightings.end(),
    ostream_iterator<Sighting>(cout, ""));
  // Print sightings for selected animal:
  while(true) {
    cout << "select an animal or 'q' to quit: ";
    for(int i = 0; i < animals.size(); i++)
      cout <<'['<< i <<']'<< animals[i] << ' ';
    cout << endl;
    string reply;
    cin >> reply;
    if(reply.at(0) == 'q') return 0;
    istringstream r(reply);
    int i;
    r >> i; // Converts to int
    i %= animals.size();
    // Iterators in "range" denote begin, one 
    // past end of matching range:
    pair<DMIter, DMIter> range = 
      sightings.equal_range(animals[i]);
    copy(range.first, range.second,
      ostream_iterator<Sighting>(cout, ""));
  }

} ///:~

All the data about a sighting is encapsulated into the class DataPoint, which is simple enough that it can rely on the synthesized assignment and copy-constructor. It uses the Standard C library time functions to record the time of the sighting.

In the array of string animal, notice that the char* constructor is automatically used during initialization, which makes initializing an array of string quite convenient. Since it’s easier to use the animal names in a vector, the length of the array is calculated and a vector<string> is initialized using the vector(iterator, iterator) constructor.

The key-value pairs that make up a Sighting are the string which names the type of animal, and the DataPoint that says where and when it was sighted. The standard pair template combines these two types and is typedefed to produce the Sighting type. Then an ostream operator<< is created for Sighting; this will allow you to iterate through a map or multimap of Sightings and print it out.

SightingGen generates random sightings at random data points to use for testing. It has the usual operator( ) necessary for a function object, but it also has a constructor to capture and store a reference to a vector<string>, which is where the aforementioned animal names are stored.

A DataMap is a multimap of string-DataPoint pairs, which means it stores Sightings. It is filled with 50 Sightings using generate_n( ), and printed out (notice that because there is an operator<< that takes a Sighting, an ostream_iterator can be created). At this point the user is asked to select the animal that they want to see all the sightings for. If you press ‘q’ the program will quit, but if you select an animal number, then the equal_range( ) member function is invoked. This returns an iterator (DMIter) to the beginning of the set of matching pairs, and one indicating past-the-end of the set. Since only one object can be returned from a function, equal_range( ) makes use of pair. Since the range pair has the beginning and ending iterators of the matching set, those iterators can be used in copy( ) to print out all the sightings for a particular type of animal.

Multisets

You’ve seen the set, which only allows one object of each value to be inserted. The multiset is odd by comparison since it allows more than one object of each value to be inserted. This seems to go against the whole idea of “setness,” where you can ask “is ‘it’ in this set?” If there can be more than one of ‘it’, then what does that question mean?

With some thought, you can see that it makes no sense to have more than one object of the same value in a set if those duplicate objects are exactly the same (with the possible exception of counting occurrences of objects, but as seen earlier in this chapter that can be handled in an alternative, more elegant fashion). Thus each duplicate object will have something that makes it unique from the other duplicates – most likely different state information that is not used in the calculation of the value during the comparison. That is, to the comparison operation, the objects look the same but they actually contain some differing internal state.

Like any STL container that must order its elements, the multiset template uses the less template by default to determine element ordering. This uses the contained classes’ operator<, but you may of course substitute your own comparison function.

Consider a simple class that contains one element that is used in the comparison, and another that is not:

//: C04:MultiSet1.cpp
// Demonstration of multiset behavior
#include <iostream>
#include <set>
#include <algorithm>
#include <ctime>
using namespace std;

class X {
  char c; // Used in comparison
  int i; // Not used in comparison
  // Don't need default constructor and operator=
  X();
  X& operator=(const X&);
  // Usually need a copy-constructor (but the
  // synthesized version works here)
public:
  X(char cc, int ii) : c(cc), i(ii) {}
  // Notice no operator== is required
  friend bool operator<(const X& x, const X& y) {
    return x.c < y.c;
  }
  friend ostream& operator<<(ostream& os, X x) {
    return os << x.c << ":" << x.i;
  }
};

class Xgen {
  static int i;
  // Number of characters to select from:
  static const int span = 6;
public:
  Xgen() { srand(time(0)); }
  X operator()() {
    char c = 'A' + rand() % span;    
    return X(c, i++);
  }
};

int Xgen::i = 0;

typedef multiset<X> Xmset;
typedef Xmset::const_iterator Xmit;

int main() {
  Xmset mset;
  // Fill it with X's:
  generate_n(inserter(mset, mset.begin()), 
    25, Xgen());
  // Initialize a regular set from mset:
  set<X> unique(mset.begin(), mset.end());
  copy(unique.begin(), unique.end(), 
    ostream_iterator<X>(cout, " "));
  cout << "\n----\n";
  // Iterate over the unique values:
  for(set<X>::iterator i = unique.begin();
      i != unique.end(); i++) {
    pair<Xmit, Xmit> p = mset.equal_range(*i);
    copy(p.first, p.second, 
      ostream_iterator<X>(cout, " "));
    cout << endl;
  }

} ///:~

In X, all the comparisons are made with the char c. The comparison is performed with operator<, which is all that is necessary for the multiset, since in this example the default less comparison object is used. The class Xgen is used to randomly generate X objects, but the comparison value is restricted to the span from ‘A’ to ‘E’. In main( ), a multiset<X> is created and filled with 25 X objects using Xgen, guaranteeing that there will be duplicate keys. So that we know what the unique values are, a regular set<X> is created from the multiset (using the iterator, iterator constructor). These values are displayed, then each one is used to produce the equal_range( ) in the multiset (equal_range( ) has the same meaning here as it does with multimap: all the elements with matching keys). Each set of matching keys is then printed.

As a second example, a (possibly) more elegant version of WordCount.cpp can be created using multiset:

//: C04:MultiSetWordCount.cpp
//{L} StreamTokenizer
// Count occurrences of words using a multiset
#include "StreamTokenizer.h"
#include "../require.h"
#include <string>
#include <set>
#include <fstream>
#include <iterator>
using namespace std;

int main(int argc, char* argv[]) {
  requireArgs(argc, 1);
  ifstream in(argv[1]);
  assure(in, argv[1]);
  StreamTokenizer words(in);
  multiset<string> wordmset;
  string word;
  while((word = words.next()).size() != 0)
    wordmset.insert(word);
  typedef multiset<string>::iterator MSit;
  MSit it = wordmset.begin();
  while(it != wordmset.end()) {
    pair<MSit, MSit> p=wordmset.equal_range(*it);
    int count = distance(p.first, p.second);
    cout << *it << ": " << count << endl;
    it = p.second; // Move to the next word
  }

} ///:~

The setup in main( ) is identical to WordCount.cpp, but then each word is simply inserted into the multiset<string>. An iterator is created and initialized to the beginning of the multiset; dereferencing this iterator produces the current word. equal_range( ) produces the starting and ending iterators of the word that’s currently selected, and the STL algorithm distance( ) (which is in <iterator>) is used to count the number of elements in that range. Then the iterator it is moved forward to the end of the range, which puts it at the next word. Although if you’re unfamiliar with the multiset this code can seem more complex, the density of it and the lack of need for supporting classes like Count has a lot of appeal.

In the end, is this really a “set,” or should it be called something else? An alternative is the generic “bag” that has been defined in some container libraries, since a bag holds anything at all without discrimination – including duplicate objects. This is close, but it doesn’t quite fit since a bag has no specification about how elements should be ordered, while a multiset (which requires that all duplicate elements be adjacent to each other) is even more restrictive than the concept of a set, which could use a hashing function to order its elements, in which case they would not be in sorted order. Besides, if you wanted to store a bunch of objects without any special criterions, you’d probably just use a vector, deque or list.

Combining STL containers

When using a thesaurus, you have a word and you want to know all the words that are similar. When you look up a word, then, you want a list of words as the result. Here, the “multi” containers (multimap or multiset) are not appropriate. The solution is to combine containers, which is easily done using the STL. Here, we need a tool that turns out to be a powerful general concept, which is a map of vector:

//: C04:Thesaurus.cpp
// A map of vectors
#include <map>
#include <vector>
#include <string>
#include <iostream>
#include <algorithm>
#include <ctime>
using namespace std;

typedef map<string, vector<string> > Thesaurus;
typedef pair<string, vector<string> > TEntry;
typedef Thesaurus::iterator TIter;

ostream& operator<<(ostream& os,const TEntry& t){
  os << t.first << ": ";
  copy(t.second.begin(), t.second.end(),
    ostream_iterator<string>(os, " "));
  return os;
}

// A generator for thesaurus test entries:
class ThesaurusGen {
  static const string letters;
  static int count;
public:
  int maxSize() { return letters.size(); }
  ThesaurusGen() { srand(time(0)); }
  TEntry operator()() {
    TEntry result;
    if(count >= maxSize()) count = 0;
    result.first = letters[count++];
    int entries = (rand() % 5) + 2;
    for(int i = 0; i < entries; i++) {
      int choice = rand() % maxSize();
      char cbuf[2] = { 0 };
      cbuf[0] = letters[choice];
      result.second.push_back(cbuf);
    }
    return result;
  }
};

int ThesaurusGen::count = 0;
const string ThesaurusGen::letters("ABCDEFGHIJKL"
  "MNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz");

int main() {
  Thesaurus thesaurus;
  // Fill with 10 entries:
  generate_n(
    inserter(thesaurus, thesaurus.begin()), 
    10, ThesaurusGen());
  // Print everything:
  copy(thesaurus.begin(), thesaurus.end(),
    ostream_iterator<TEntry>(cout, "\n"));
  // Ask for a "word" to look up:
  while(true) {
    cout << "Select a \"word\", 0 to quit: ";
    for(TIter it = thesaurus.begin(); 
      it != thesaurus.end(); it++)
      cout << (*it).first << ' ';
    cout << endl;
    string reply;
    cin >> reply;
    if(reply.at(0) == '0') return 0; // Quit
    if(thesaurus.find(reply) == thesaurus.end())
      continue; // Not in list, try again
    vector<string>& v = thesaurus[reply];
    copy(v.begin(), v.end(), 
      ostream_iterator<string>(cout, " "));
    cout << endl;
  }

} ///:~

A Thesaurus maps a string (the word) to a vector<string> (the synonyms). A TEntry is a single entry in a Thesaurus. By creating an ostream operator<< for a TEntry, a single entry from the Thesaurus can easily be printed (and the whole Thesaurus can easily be printed with copy( )). The ThesaurusGen creates “words” (which are just single letters) and “synonyms” for those words (which are just other randomly-chosen single letters) to be used as thesaurus entries. It randomly chooses the number of synonym entries to make, but there must be at least two. All the letters are chosen by indexing into a static string that is part of ThesaurusGen.

In main( ), a Thesaurus is created, filled with 10 entries and printed using the copy( ) algorithm. Then the user is requested to choose a “word” to look up by typing the letter of that word. The find( ) member function is used to find whether the entry exists in the map (remember, you don’t want to use operator[ ] or it will automatically make a new entry if it doesn’t find a match!). If so, operator[ ] is used to fetch out the vector<string> which is displayed.

Because templates make the expression of powerful concepts easy, you can take this concept much further, creating a map of vectors containing maps, etc. For that matter, you can combine any of the STL containers this way.

Cleaning up
containers of pointers

In Stlshape.cpp, the pointers did not clean themselves up automatically. It would be convenient to be able to do this easily, rather than writing out the code each time. Here is a function template that will clean up the pointers in any sequence container; note that it is placed in the book’s root directory for easy access:

//: :purge.h
// Delete pointers in an STL sequence container
#ifndef PURGE_H
#define PURGE_H
#include <algorithm>

template<class Seq> void purge(Seq& c) {
  typename Seq::iterator i;
  for(i = c.begin(); i != c.end(); i++) {
    delete *i;
    *i = 0;
  }
}

// Iterator version:
template<class InpIt>
void purge(InpIt begin, InpIt end) {
  while(begin != end) {
    delete *begin;
    *begin = 0;
    begin++;
  }
}

#endif // PURGE_H ///:~

In the first version of purge( ), note that typename is absolutely necessary; indeed this is exactly the case that the keyword was added for: Seq is a template argument, and iterator is something that is nested within that template. So what does Seq::iterator refer to? The typename keyword specifies that it refers to a type, and not something else.

While the container version of purge must work with an STL-style container, the iterator version of purge( ) will work with any range, including an array.

Here is Stlshape.cpp, modified to use the purge( ) function:

//: C04:Stlshape2.cpp
// Stlshape.cpp with the purge() function
#include "../purge.h"
#include <vector>
#include <iostream>
using namespace std;

class Shape {
public:
  virtual void draw() = 0;
  virtual ~Shape() {};
};

class Circle : public Shape {
public:
  void draw() { cout << "Circle::draw\n"; }
  ~Circle() { cout << "~Circle\n"; }
};

class Triangle : public Shape {
public:
  void draw() { cout << "Triangle::draw\n"; }
  ~Triangle() { cout << "~Triangle\n"; }
};

class Square : public Shape {
public:
  void draw() { cout << "Square::draw\n"; }
  ~Square() { cout << "~Square\n"; }
};

typedef std::vector<Shape*> Container;
typedef Container::iterator Iter;

int main() {
  Container shapes;
  shapes.push_back(new Circle);
  shapes.push_back(new Square);
  shapes.push_back(new Triangle);
  for(Iter i = shapes.begin();
      i != shapes.end(); i++)
    (*i)->draw();
  purge(shapes);

} ///:~

When using purge( ), you must be careful to consider ownership issues – if an object pointer is held in more than one container, then you must be sure not to delete it twice, and you don’t want to destroy the object in the first container before the second one is finished with it. Purging the same container twice is not a problem, because purge( ) sets the pointer to zero once it deletes that pointer, and calling delete for a zero pointer is a safe operation.

Creating your own containers

With the STL as a foundation, it’s possible to create your own containers. Assuming you follow the same model of providing iterators, your new container will behave as if it were a built-in STL container.

Consider the “ring” data structure, which is a circular sequence container. If you reach the end, it just wraps around to the beginning. This can be implemented on top of a list as follows:

//: C04:Ring.cpp
// Making a "ring" data structure from the STL
#include <iostream>
#include <list>
#include <string>
using namespace std;

template<class T>
class Ring {
  list<T> lst;
public:
  // Declaration necessary so the following 
  // 'friend' statement sees this 'iterator' 
  // instead of std::iterator:
  class iterator;
  friend class iterator;
  class iterator : public std::iterator<
    std::bidirectional_iterator_tag,T,ptrdiff_t>{
    list<T>::iterator it;
    list<T>* r;
  public:
    // "typename" necessary to resolve nesting:
    iterator(list<T>& lst,
      const typename list<T>::iterator& i)
      : r(&lst), it(i) {}
    bool operator==(const iterator& x) const {
      return it == x.it;
    }
    bool operator!=(const iterator& x) const {
      return !(*this == x);
    }
    list<T>::reference operator*() const {
      return *it;
    }
    iterator& operator++() {
      ++it;
      if(it == r->end())
        it = r->begin();
      return *this;
    }
    iterator operator++(int) {
      iterator tmp = *this;
      ++*this;
      return tmp;
    }
    iterator& operator--() {
      if(it == r->begin())
        it = r->end();
      --it;
      return *this;
    }
    iterator operator--(int) {
      iterator tmp = *this;
      --*this; 
      return tmp;
    }
    iterator insert(const T& x){
      return iterator(*r, r->insert(it, x));
    }
    iterator erase() {
      return iterator(*r, r->erase(it));
    }
  };
  void push_back(const T& x) {
    lst.push_back(x);
  }
  iterator begin() {
    return iterator(lst, lst.begin());
  }
 int size() { return lst.size(); }
};

int main() {
  Ring<string> rs;
  rs.push_back("one");
  rs.push_back("two");
  rs.push_back("three");
  rs.push_back("four");
  rs.push_back("five");
  Ring<string>::iterator it = rs.begin();
  it++; it++;
  it.insert("six");
  it = rs.begin();
  // Twice around the ring:
  for(int i = 0; i < rs.size() * 2; i++)
    cout << *it++ << endl;

} ///:~

You can see that the iterator is where most of the coding is done. The Ring iterator must know how to loop back to the beginning, so it must keep a reference to the list of its “parent” Ring object in order to know if it’s at the end and how to get back to the beginning.

You’ll notice that the interface for Ring is quite limited; in particular there is no end( ), since a ring just keeps looping. This means that you won’t be able to use a Ring in any STL algorithms that require a past-the-end iterator – which is many of them. (It turns out that adding this feature is a non-trivial exercise). Although this can seem limiting, consider stack, queue and priority_queue, which don’t produce any iterators at all!

Freely-available
STL extensions

Although the STL containers may provide all the functionality you’ll ever need, they are not complete. For example, the standard implementations of set and map use trees, and although these are reasonably fast they may not be fast enough for your needs. In the C++ Standards Committee it was generally agreed that hashed implementations of set and map should have been included in Standard C++, however there was not considered to be enough time to add these components, and thus they were left out.

Fortunately, there are freely-available alternatives. One of the nice things about the STL is that it establishes a basic model for creating STL-like classes, so anything built using the same model is easy to understand if you are already familiar with the STL.

The SGI STL (freely available at http://www.sgi.com/Technology/STL/) is one of the most robust implementations of the STL, and can be used to replace your compiler’s STL if that is found wanting. In addition they’ve added a number of extensions including hash_set, hash_multiset, hash_map, hash_multimap, slist (a singly-linked list) and rope (a variant of string optimized for very large strings and fast concatenation and substring operations).

Let’s consider a performance comparison between a tree-based map and the SGI hash_map. To keep things simple, the mappings will be from int to int:

//: C04:MapVsHashMap.cpp
// The hash_map header is not part of the 
// Standard C++ STL. It is an extension that 
// is only available as part of the SGI STL:
#include <hash_map> 
#include <iostream>
#include <map>
#include <ctime>
using namespace std;

int main(){
  hash_map<int, int> hm;
  map<int, int> m;
  clock_t ticks = clock();
  for(int i = 0; i < 100; i++)
    for(int j = 0; j < 1000; j++)
      m.insert(make_pair(j,j));
  cout << "map insertions: " 
    << clock() - ticks << endl;
  ticks = clock();
  for(int i = 0; i < 100; i++)
    for(int j = 0; j < 1000; j++)
      hm.insert(make_pair(j,j));
  cout << "hash_map insertions: " 
    << clock() - ticks << endl;
  ticks = clock();
  for(int i = 0; i < 100; i++)
    for(int j = 0; j < 1000; j++)
      m[j];
  cout << "map::operator[] lookups: " 
    << clock() - ticks << endl;
  ticks = clock();
  for(int i = 0; i < 100; i++)
    for(int j = 0; j < 1000; j++)
      hm[j];
  cout << "hash_map::operator[] lookups: "
    << clock() - ticks << endl;
  ticks = clock();
  for(int i = 0; i < 100; i++)
    for(int j = 0; j < 1000; j++)
      m.find(j);
  cout << "map::find() lookups: "
    << clock() - ticks << endl;
  ticks = clock();
  for(int i = 0; i < 100; i++)
    for(int j = 0; j < 1000; j++)
      hm.find(j);
  cout << "hash_map::find() lookups: " 
    << clock() - ticks << endl;

} ///:~

The performance test I ran showed a speed improvement of roughly 4:1 for the hash_map over the map in all operations (and as expected, find( ) is slightly faster than operator[ ] for lookups for both types of map). If a profiler shows a bottleneck in your map, you should consider a hash_map.

Summary

The goal of this chapter was not just to introduce the STL containers in some considerable depth (of course, not every detail could be covered here, but you should have enough now that you can look up further information in the other resources). My higher hope is that this chapter has made you grasp the incredible power available in the STL, and shown you how much faster and more efficient your programming activities can be by using and understanding the STL.

The fact that I could not escape from introducing some of the STL algorithms in this chapter suggests how useful they can be. In the next chapter you’ll get a much more focused look at the algorithms.

Exercises

  1. Create a set<char>, then open a file (whose name is provided on the command line) and read that file in a char at a time, placing each char in the set. Print the results and observe the organization, and whether there are any letters in the alphabet that are not used in that particular file.
  2. Create a kind of “hangman” game. Create a class that contains a char and a bool to indicate whether that char has been guessed yet. Randomly select a word from a file, and read it into a vector of your new type. Repeatedly ask the user for a character guess, and after each guess display the characters in the word that have been guessed, and underscores for the characters that haven’t. Allow a way for the user to guess the whole word. Decrement a value for each guess, and if the user can get the whole word before the value goes to zero, they win.
  3. Modify WordCount.cpp so that it uses insert( ) instead of operator[ ] to insert elements in the map.
  4. Modify WordCount.cpp so that it uses a multimap instead of a map.
  5. Create a generator that produces random int values between 0 and 20. Use this to fill a multiset<int>. Count the occurrences of each value, following the example given in MultiSetWordCount.cpp.
  6. Change StlShape.cpp so that it uses a deque instead of a vector.
  7. Modify Reversible.cpp so it works with deque and list instead of vector.
  8. Modify Progvals.h and ProgVals.cpp so that they expect leading hyphens to distinguish command-line arguments.
  9. Create a second version of Progvals.h and ProgVals.cpp that uses a set instead of a map to manage single-character flags on the command line (such as -a -b -c etc) and also allows the characters to be ganged up behind a single hyphen (such as -abc).
  10. Use a stack<int> and build a Fibonacci sequence on the stack. The program’s command line should take the number of Fibonacci elements desired, and you should have a loop that looks at the last two elements on the stack and pushes a new one for every pass through the loop.
  11. Open a text file whose name is provided on the command line. Read the file a word at a time (hint: use >>) and use a multiset<string> to create a word count for each word.
  12. Modify BankTeller.cpp so that the policy that decides when a teller is added or removed is encapsulated inside a class.
  13. Create two classes A and B (feel free to choose more interesting names). Create a multimap<A, B> and fill it with key-value pairs, ensuring that there are some duplicate keys. Use equal_range( ) to discover and print a range of objects with duplicate keys. Note you may have to add some functions in A and/or B to make this program work.
  14. Perform the above exercise for a multiset<A>.
  15. Create a class that has an operator< and an ostream& operator<<. The class should contain a priority number. Create a generator for your class that makes a random priority number. Fill a priority_queue using your generator, then pull the elements out to show they are in the proper order.
  16. Rewrite Ring.cpp so it uses a deque instead of a list for its underlying implementation.
  17. Modify Ring.cpp so that the underlying implementation can be chosen using a template argument (let that template argument default to list).
  18. Open a file and read it into a single string. Turn the string into a stringstream. Read tokens from the stringstream into a list<string> using a TokenIterator.
  19. Compare the performance of stack based on whether it is implemented with vector, deque or list.
  20. Create an iterator class called BitBucket that just absorbs whatever you send to it without writing it anywhere.
  21. Create a template that implements a singly-linked list called SList. Provide a default constructor, begin( ) and end( ) functions (thus you must create the appropriate nested iterator), insert( ), erase( ) and a destructor.
  22. (More challenging) Create a little command language. Each command can simply print its name and its arguments, but you may also want to make it perform other activities like run programs. The commands will be read from a file that you pass as an command-line argument, or from standard input if no file is given. Each command is on a single line, and lines beginning with ‘#’ are comments. A line begins with the one-word command itself, followed by any number of arguments. Commands and arguments are separated by spaces. Use a map that maps string objects (the name of the command) to object pointers. The object pointers point to objects of a base class Command that has a virtual execute(string args) function, where args contains all the arguments for that command (execute( ) will parse its own arguments from args). Each different type of command is represented by a class that is inherited from Command.
  23. Add features to the above exercise so that you can have labels, if-then statements, and the ability to jump program execution to a label.

[15] Contributed to the C++ Standard by Alexander Stepanov and Meng Lee at Hewlett-Packard.

[16] These were actually created to abstract the “locale” facets away from iostreams, so that locale facets could operate on any sequence of characters, not only iostreams. Locales allow iostreams to easily handle culturally-different formatting (such as representation of money), and are beyond the scope of this book.

[17] I am indebted to Nathan Myers for explaining this to me.

[18] This is another example coached by Nathan Myers.

[ Previous Chapter ] [ Short TOC ] [ Table of Contents ] [ Index ] [ Next Chapter ]
Last Update:05/23/2000