Team LiB
Previous Section Next Section

11.2. A First Look at the Algorithms

11.2. 初窥算法

Before covering the structure of the algorithms library, let's look at a couple of examples. We've already seen the use of find; in this section we'll use a few additional algorithms. To use a generic algorithm, we must include the algorithm header:

在研究算法标准库的结构之前,先看一些例子。上一节已经介绍了 find 函数的用法;本节将要使用其他的一些算法。使用泛型算法必须包含 algorithm 头文件:

    #include <algorithm>

The library also defines a set of generalized numeric algorithms, using the same conventions as the generic algorithms. To use these algorithms we include the numeric header:

标准库还定义了一组泛化的算术算法(generalized numeric algorithm),其命名习惯与泛型算法相同。使用这些算法则必须包含 numeric 头文件:

    #include <numeric>

With only a handful of exceptions, all the algorithms operate over a range of elements. We'll refer to this range as the "input range." The algorithms that take an input range always use their first two parameters to denote that range. These parameters are iterators used to denote the first and one past the last element that we want to process.

除了少数例外情况,所有算法都在一段范围内的元素上操作,我们将这段范围称为“输出范围(input range)”。带有输入范围参数的算法总是使用头两个形参标记该范围。这两个形参是分别指向要处理的第一个元素和最后一个元素的下一位置的迭代器。

Although most algorithms are similar in that they operate over an input range, they differ in how they use the elements in that range. The most basic way to understand the algorithms is to know whether they read elements, write elements, or rearrange the order of the elements. We'll look at samples of each kind of algorithm in the remainder of this section.

尽管大多数算法对算法对输入范围的操作是类似的,但在该范围内如何操纵元素却有所不同。理解算法的最基本方法是了解该算法是否读元素、写元素或者对元素进行重新排序。在本节的余下内容中,将会观察到每种算法的例子。

11.2.1. Read-Only Algorithms

11.2.1. 只读算法

A number of the algorithms read, but never write to, the elements in their input range. find is one such algorithm. Another simple, read-only algorithm is accumulate, which is defined in the numeric header. Assuming vec is a vector of int values, the following code

许多算法只会读取其输入范围内的元素,而不会写这些元素。find 就是一个这样的算法。另一个简单的只读算法是 accumulate,该算法在 numeric 头文件中定义。假设 vec 是一个 int 型的 vector 对象,下面的代码:

     // sum the elements in vec starting the summation with the value 42
     int sum = accumulate(vec.begin(), vec.end(), 42);

sets sum equal to the sum of the elements in vec plus 42. accumulate takes three parameters. The first two specify a range of elements to sum. The third is an initial value for the sum. The accumulate function sets an internal variable to the initial value. It then adds the value of each element in the range to that initial value. The algorithm returns the result of the summation. The return type of accumulate is the type of its third argument.

将 sum 设置为 vec 的元素之和再加上 42。accumulate 带有三个形参。头两个形参指定要累加的元素范围。第三个形参则是累加的初值。accumulate 函数将它的一个内部变量设置为指定的初值,然后在此初值上累加输入范围 accumulate 

The third argument, which specifies the starting value, is necessary because accumulate knows nothing about the element types that it is accumulating. Therefore, it has no way to invent an appropriate starting value or associated type.

用于指定累加起始值的第三个实参是必要的,因为 accumulate 对将要累加的元素类型一无所知,因此,除此之外,没有别的办法创建合适的起始值或者关联的类型。

There are two implications of the fact that accumulate doesn't know about the types over which it sums. First, we must pass an initial starting value because otherwise accumulate cannot know what starting value to use. Second, the type of the elements in the container must match or be convertible to the type of the third argument. Inside accumulate, the third argument is used as the starting point for the summation; the elements in the container are successively added into this sum. It must be possible to add the element type to the type of the sum.

accumulate 对要累加的元素类型一无所知,这个事实有两层含义。首先,调用该函数时必须传递一个起始值,否则,accumulate 将不知道使用什么起始值。其次,容器内的元素类型必须与第三个实参的类型匹配,或者可转换为第三个实参的类型。在 accumulate 内部,第三个实参用作累加的起点;容器内的元素按顺序连续累加到总和之中。因此,必须能够将元素类型加到总和类型上。

As an example, we could use accumulate to concatenate the elements of a vector of strings:

考虑下面的例子,可以使用 accumulate 把 string 型的 vector 容器中的元素连接起来:

     // concatenate elements from v and store in sum
     string sum = accumulate(v.begin(), v.end(), string(""));

The effect of this call is to concatenate each element in vec onto a string that starts out as the empty string. Note that we explicitly create a string as the third parameter. Passing a character-string literal would be a compile-time error. If we passed a string literal, the summation type would be const char* but the string addition operator (Section 3.2.3, p. 86) for operands of type string and const char* yields a string not a const char*.

这个函数调用的效果是:从空字符串开始,把 vec 里的每个元素连接成一个字符串。注意:程序显式地创建了一个 string 对象,用该函数调用的第三个实参。传递一个字符串字面值,将会导致编译时错误。因为此时,累加和的类型将是 const char*,而 string 的加法操作符(第 3.2.3 节)所使用的操作数则分别是 string 和 const char* 类型,加法的结果将产生一个 string 对象,而不是 const char* 指针。

Using find_first_of
find_first_of 的使用

In addition to find, the library defines several other, more complicated searching algorithms. Several of these are similar to the find operations of the string class (Section 9.6.4, p. 343). One such is find_first_of. This algorithm takes two pairs of iterators that denote two ranges of elements. It looks in the first range for a match to any element from the second range and returns an iterator denoting the first element that matches. If no match is found, it returns the end iterator of the first range. Assuming that roster1 and roster2 are two lists of names, we could use find_first_of to count how many names are in both lists:

除了 find 之外,标准库还定义了其他一些更复杂的查找算法。当中的一部分类似 string 类的 find 操作(第 9.6.4 节),其中一个是 find_first_of 函数。这个算法带有两对迭代器参数来标记两段元素范围,在第一段范围内查找与第二段范围中任意元素匹配的元素,然后返回一个迭代器,指向第一个匹配的元素。如果找不到元素,则返回第一个范围的 end 迭代器。假设 roster1 和 roster2 是两个存放名字的 list 对象,可使用 find_first_of 统计有多少个名字同时出现在这两个列表中:

     // program for illustration purposes only:
     // there are much faster ways to solve this problem
     size_t cnt = 0;
     list<string>::iterator it = roster1.begin();
     // look in roster1 for any name also in roster2
     while   ((it = find_first_of(it, roster1.end(),
                  roster2.begin(), roster2.end()))
                     != roster1.end()) {
        ++cnt;
        // we got a match, increment it to look in the rest of roster1
        ++it;
     }
     cout << "Found " << cnt
          << " names on both rosters" << endl;

The call to find_first_of looks for any element in roster2 that matches an element from the first rangethat is, it looks for an element in the range from it to roster1.end(). The function returns the first element in that range that is also in the second range. On the first trip through the while, we look in the entire range of roster1. On second and subsequent trips, we look in that part of roster1 that has not already been matched.

调用 find_first_of 查找 roster2 中的每个元素是否与第一个范围内的元素匹配,也就是在 it 到 roster1.end() 范围内查找一个元素。该函数返回此范围内第一个同时存在于第二个范围中的元素。在 while 的第一次循环中,遍历整个 roster1 范围。第二次以及后续的循环迭代则只考虑 roster1 中尚未匹配的部分。

In the condition, we check the return from find_first_of to see whether we found a matching name. If we got a match, we increment our counter. We also increment it so that it refers to the next element in roster1. We know we're done when find_first_of returns roster1.end(), which it does if there is no match.

循环条件检查 find_first_of 的返回值,判断是否找到匹配的名字。如果找到一个匹配,则使计数器加 1,同时给 it 加 1,使它指向 roster1 中的下一个元素。很明显可知,当不再有任何匹配时,find_first_of 返回 roster1.end(),完成统计。

Key Concept: Iterator Argument Types

关键概念:迭代器实参类型

In general, the generic algorithms operate on iterator pairs that denote a range of elements in a container (or other sequence). The types of the arguments that denote the range must match exactly, and the iterators themselves must denote a range: They must refer to elements in the same container (or to the element just past the end of that container), and if they are unequal, then it must be possible to reach the second iterator by repeatedly incrementing the first iterator.

通常,泛型算法都是在标记容器(或其他序列)内的元素范围的迭代器上操作的。标记范围的两个实参类型必须精确匹配,而迭代器本身必须标记一个范围:它们必须指向同一个容器中的元素(或者超出容器末端的下一位置),并且如果两者不相等,则第一个迭代器通过不断地自增,必须可以到达第二个迭代器。

Some algorithms, such as find_first_of, take two pairs of iterators. The iterator types in each pair must match exactly, but there is no requirement that the type of the two pairs match each other. In particular, the elements can be stored in different kinds of sequences. What is required is that we be able to compare elements from the two sequences.

有些算法,例如 find_first_of,带有两对迭代器参数。每对迭代器中,两个实参的类型必须精确匹配,但不要求两对之间的类型匹配。特别是,元素可存储在不同类型序列中,只要这两序列的元素可以比较即可。

In our program, the types of roster1 and roster2 need not match exactly: roster1 could be a list while roster2 was a vector, deque, or other sequence that we'll learn about later in this chapter. What is required is that we be able to compare the elements from these two sequences using the == operator. If roster1 is a list<string>, then roster2 could be a vector<char*> because the string library defines == on a string and a char*.

在上述程序中,roster1 和 roster2 的类型不必精确匹配:roster1 可以是 list 对象,而 roster2 则可以是 vector 对象、 deque 对象或者是其他后面要学到的序列。只要这两个序列的元素可使用相等(==)操作符进行比较即可。如果 roster1 是 list<string> 对象,则 roster2 可以是 vector<char*> 对象,因为 string 标准库为 string 对象与 char* 对象定义了相等(==)操作符

Exercises Section 11.2.1

Exercise 11.3:

Use accumulate to sum the elements in a vector<int>.

用 accumulate 统计 vector<int> 容器对象中的元素之和

Exercise 11.4:

Assuming v is a vector<double> what, if anything, is wrong with calling accumulate<v.begin(), v.end(), 0)?

假定 v 是 vector<double> 类型的对象,则调用 accumulate<v.begin(), v.end(), 0) 是否有错?如果有的话,错在哪里?

Exercise 11.5:

What would happen if the program that called find_first_of did not increment it?

对于本节调用 find_first_of 的例程,如果不给 it 加 1,将会如何?

11.2.2. Algorithms that Write Container Elements

11.2.2. 写容器元素的算法

Some algorithms write element values. When using algorithms that write elements, we must take care to ensure that the sequence into which the algorithm writes is at least as large as the number of elements being written.

一些算法写入元素值。在使用这些算法写元素时要当心,必须确保算法所写的序列至少足以存储要写入的元素。

Some algorithms write directly into the input sequence. Others take an additional iterator that denotes a destination. Such algorithms use the destination iterator as a place in which to write output. Still a third kind writes a specified number of elements to some sequence.

有些算法直接将数据写入到输入序列,另外一些则带有一个额外的迭代器参数指定写入目标。这类算法将目标迭代器用作输出的位置。还有第三种算法将指定数目的元素写入某个序列。

Writing to the Elements of the Input Sequence
写入输入序列的元素

The algorithms that write to their input sequence are inherently safethey write only as many elements as are in the specified input range.

写入到输入序列的算法本质上是安全的——只会写入与指定输入范围数量相同的元素。

A simple example of an algorithm that writes to its input sequence is fill:

写入到输入序列的一个简单算法是 fill 函数,考虑如下例子:

     fill(vec.begin(), vec.end(), 0); // reset each element to 0
     // set subsequence of the range to 10
     fill(vec.begin(), vec.begin() + vec.size()/2, 10);

fill takes a pair of iterators that denote a range in which to write copies of its third parameter. It executes by setting each element in the range to the given value. Assuming that the input range is valid, then the writes are safe. The algorithm writes only to elements known to exist in the input range itself.

fill 带有一对迭代器形参,用于指定要写入的范围,而所写的值是它的第三个形参的副本。执行时,将该范围内的每个元素都设为给定的值。如果输入范围有效,则可安全写入。这个算法只会对输入范围内已存在的元素进行写入操作。

Algorithms Do Not Check Write Operations
不检查写入操作的算法

The fill_n function takes an iterator, a count, and a value. It writes the specified number of elements with the given value starting at the element referred to by the iterator. The fill_n function assumes that it is safe to write the specified number of elements. It is a fairly common beginner mistake to call fill_n (or similar algorithms that write to elements) on a container that has no elements:

fill_n 函数带有的参数包括:一个迭代器、一个计数器以及一个值。该函数从迭代器指向的元素开始,将指定数量的元素设置为给定的值。fill_n 函数假定对指定数量的元素做写操作是安全的。初学者常犯的错误的是:在没有元素的空容器上调用 fill_n 函数(或者类似的写元素算法)。

     vector<int> vec; // empty vector
     // disaster: attempts to write to 10 (nonexistent) elements in vec
     fill_n(vec.begin(), 10, 0);

This call to fill_n is a disaster. We specified that ten elements should be written, but there are no such elements vec is empty. The result is undefined and will probably result in a serious run-time failure.

这个 fill_n 函数的调用将带来灾难性的后果。我们指定要写入 10 个元素,但这些元素却不存在——vec 是空的。其结果未定义,很可能导致严重的运行时错误。

Algorithms that write a specified number of elements or that write to a destination iterator do not check whether the destination is large enough to hold the number of elements being written.

对指定数目的元素做写入运算,或者写到目标迭代器的算法,都不检查目标的大小是否足以存储要写入的元素。

Introducing back_inserter
引入 back_inserter

One way to ensure that an algorithm has enough elements to hold the output is to use an insert iterator. An insert iterator is an iterator that adds elements to the underlying container. Ordinarily, when we assign to a container element through an iterator, we assign to the element to which the iterator refers. When we assign through an insert iterator, a new element equal to the right-hand value is added to the container.

确保算法有足够的元素存储输出数据的一种方法是使用插入迭代器。插入迭代器是可以给基础容器添加元素的迭代器。通常,用迭代器给容器元素赋值时,被赋值的是迭代器所指向的元素。而使用插入迭代器赋值时,则会在容器中添加一个新元素,其值等于赋值运算的右操作数的值。

We'll have more to say about insert iterators in Section 11.3.1 (p. 406). However, in order to illustrate how to safely use algorithms that write to a container, we will use back_inserter. Programs that use back_inserter must include the iterator header.

第 11.3.1 节将会讨论更多关于插入迭代器的内容。然而,为了说明如何安全使用写容器的算法,下面将使用 back_inserter. 使用 back_inserter 的程序必须包含 iterator 头文件。

The back_inserter function is an iterator adaptor. Like the container adaptors (Section 9.7, p. 348), an iterator adaptor takes an object and generates a new object that adapts the behavior of its argument. In this case, the argument to back_inserter is a reference to a container. back_inserter generates an insert iterator bound to that container. When we attempt to assign to an element through that iterator, the assignment calls push_back to add an element with the given value to the container. We can use back_inserter to generate an iterator to use as the destination in fill_n:

back_inserter 函数是迭代器适配器。与容器适配器(第 9.7 节)一样,迭代器适配器使用一个对象作为实参,并生成一个适应其实参行为的新对象。在本例中,传递给 back_inserter 的实参是一个容器的引用。back_inserter 生成一个绑定在该容器上的插入迭代器。在试图通过这个迭代器给元素赋值时,赋值运算将调用 push_back 在容器中添加一个具有指定值的元素。使用 back_inserter 可以生成一个指向 fill_n 写入目标的迭代器:

     vector<int> vec; // empty vector
     // ok: back_inserter creates an insert iterator that adds elements to vec
     fill_n (back_inserter(vec), 10, 0); // appends 10 elements to vec

Now, each time fill_n writes a value, it will do so through the insert iterator generated by back_inserter. The effect will be to call push_back on vec, adding ten elements to the end of vec, each of which has the value 0.

现在,fill_n 函数每写入一个值,都会通过 back_inserter 生成的插入迭代器实现。效果相当于在 vec 上调用 push_back,在 vec 末尾添加 10 个元素,每个元素的值都是 0。

Algorithms that Write to a Destination Iterator
写入到目标迭代器的算法

A third kind of algorithm writes an unknown number of elements to a destination iterator. As with fill_n, the destination iterator refers to the first element of a sequence that will hold the output. The simplest such algorithm is copy. This algorithm takes three iterators: The first two denote an input range and the third refers to an element in the destination sequence. It is essential that the destination passed to copy be at least as large as the input range. Assuming ilst is a list holding ints, we might copy it into a vector:

第三类算法向目标迭代器写入未知个数的元素。正如 fill_n 函数一样,目标迭代器指向存放输出数据的序列中第一个元素。这类算法中最简单的是 copy 函数。copy 带有三个迭代器参数:头两个指定输入范围,第三个则指向目标序列的一个元素。传递给 copy 的目标序列必须至少要与输入范围一样大。假设 ilst 是一个存放 int 型数据的 list 对象,可如下将它 copy 给一个 vector 对象:

     vector<int> ivec; // empty vector
     // copy elements from ilst into ivec
     copy (ilst.begin(), ilst.end(), back_inserter(ivec));

copy reads elements from the input range, copying them to the destination.

copy 从输入范围中读取元素,然后将它们复制给目标 ivec。

Of course, this example is a bit inefficient: Ordinarily if we want to create a new container as a copy of an existing container, it is better to use an input range directly as the initializer for a newly constructed container:

当然,这个例子的效率比较差:通常,如果要以一个已存在的容器为副本创建新容器,更好的方法是直接用输入范围作为新构造容器的初始化式:

     // better way to copy elements from ilst
     vector<int> ivec(ilst.begin(), ilst.end());
Algorithm _copy Versions
算法的 _copy 版本

Several algorithms provide so-called "copying" versions. These algorithms do some processing on the elements of their input sequence but do not change the original elements. Instead, a new sequence is written that contains the result of processing the elements of the original.

有些算法提供所谓的“复制(copying)”版本。这些算法对输入序列的元素做出处理,但不修改原来的元素,而是创建一个新序列存储元素的处理结果。但不修改原来的元素,而是创建一个新序列存储元素的处理结果。

The replace algorithm is a good example. This algorithm reads and writes to an input sequence, replacing a given value by a new value. The algorithm takes four parameters: a pair of iterators denoting the input range and a pair of values. It substitutes the second value for each element that is equal the first:

replace 算法就是一个很好的例子。该算法对输入序列做读写操作,将序列中特定的值替换为新的值。该算法带有四个形参:一对指定输入范围的迭代器和两个值。每一个等于第一值的元素替换成第二个值。

     // replace any element with value of 0 by 42
     replace(ilst.begin(), ilst.end(), 0, 42);

This call replaces all instances of 0 by the 42. If we wanted to leave the original sequence unchanged, we would call replace_copy. That algorithm takes a third iterator argument denoting a destination in which to write the adjusted sequence:

这个调用将所有值为 0 的实例替换成 42。如果不想改变原来的序列,则调用 replace_copy。这个算法接受第三个迭代器实参,指定保存调整后序列的目标位置。

     // create empty vector to hold the replacement
     vector<int> ivec;

     // use back_inserter to grow destination as needed
     replace_copy (ilst.begin(), ilst.end(),
                  back_inserter(ivec), 0, 42);

After this call, ilst is unchanged, and ivec contains a copy of ilst with the exception that every element in ilst with the value 0 has the value 42 in ivec.

调用该函数后,ilst 没有改变,ivec 存储 ilst 一份副本,而 ilst 内所有的 0 在 ivec 中都变成了 42。

Exercises Section 11.2.2

Exercise 11.6:

Using fill_n, write a program to set a sequence of int values to 0.

使用 fill_n 编写程序,将一个 int 序列的值设为 0。

Exercise 11.7:

Determine if there are any errors in the following programs and, if so, correct the error(s):

判断下面的程序是否有错,如果有,请改正之:

     (a) vector<int> vec; list<int> lst; int i;
         while (cin >> i)
             lst.push_back(i);
         copy(lst.begin(), lst.end(), vec.begin());

     (b) vector<int> vec;
         vec.reserve(10);
         fill_n(vec.begin(), 10, 0);
Exercise 11.8:

We said that algorithms do not change the size of the containers over which they operate. Why doesn't the use of back_inserter invalidate this claim?

前面说过,算法不改变它所操纵的容器的大小,为什么使用 back_inserter 也不能突破这个限制?

11.2.3. Algorithms that Reorder Container Elements

11.2.3. 对容器元素重新排序的算法

Suppose we want to analyze the words used in a set of children's stories. For example, we might want know how many words contain six or more characters. We want to count each word only once, regardless of how many times it appears or whether it appears in multiple stories. We'd like to be able to print the words in size order, and we want the words to be in alphabetic order within a given size.

假设我们要分析一组儿童故事中所使用的单词。例如,可能想知道它们使用了多少个由六个或以上字母组成的单词。每个单词只统计一次,不考虑它出现的次数,也不考虑它是否在多个故事中出现。要求以长度的大小输出这些单词,对于同样长的单词,则以字典顺序输出。

We'll assume that we have read our input and stored the text of each book in a vector of strings named words. How might we solve the part of the problem that involves counting word occurrences? To solve this problem, we'd need to:

假定每本书的文本已经读入并保存在一个 string 类型的 vector 对象中,它的名字是 words。现在,应该怎么解决包括统计单词出现次数这个问题呢?为了解此问题,要做下面几项操作:

  1. Eliminate duplicate copies of each word

    去掉所有重复的单词。

  2. Order the words based on size

    按单词的长度排序。

  3. Count the words whose size is 6 or greater

    统计长度等于或超过 6 个字符的单词个数。

We can use generic algorithms in each of these steps.

上述每一步都可使用泛型算法实现。

For purposes of illustration, we'll use the following simple story as our input:

为了说清楚,使用下面这个简单的故事作为我们的输入:

     the quick red fox jumps over the slow red turtle

Given this input, our program should produce the following output:

对于这个输入,我们的程序应该产生如下输出:

     1 word 6 characters or longer
Eliminating Duplicates
去除重复

Assuming our input is in a vector named words, our first subproblem is to eliminate duplicates from the words:

假设我们的输入存储在一个名为 words 的 vector 对象中,第一个子问题是将 words 中重复出现的单词去除掉:

     // sort words alphabetically so we can find the duplicates
     sort(words.begin(), words.end());
     /* eliminate duplicate words:
      * unique reorders words so that each word appears once in the
      *    front portion of words and returns an iterator one past the unique range;
      * erase uses a vector operation to remove the nonunique elements
      */
     vector<string>::iterator end_unique =
                    unique(words.begin(), words.end());
     words.erase(end_unique, words.end());

Our input vector contains a copy of every word used in each story. We start by sorting this vector. The sort algorithm takes two iterators that denote the range of elements to sort. It uses the < operator to compare the elements. In this call we ask that the entire vector be sorted.

vector 对象包含每个故事中使用的所有单词。首先对此 vector 对象排序。sort 算法带有两个迭代器实参,指出要排序的元素范围。这个算法使用小于(<)操作符比较元素。在本次调用中,要求对整个 vector 对象排序。

After the call to sort, our vector elements are ordered:

调用 sort 后,此 vector 对象的元素按次序排列:

     fox jumps over quick red red slow the the turtle

Note that the words red and the are duplicated.

注意,单词 red 和 the 重复出现了。

Using unique
unique 的使用

Once words is sorted, our problem is to keep only one copy of each word that is used in our stories. The unique algorithm is well suited to this problem. It takes two iterators that denote a range of elements. It rearranges the elements in the input range so that adjacent duplicated entries are eliminated and returns an iterator that denotes the end of the range of the unique values.

单词按次序排列后,现在的问题是:让故事中所用到的每个单词都只保留一个副本。unique 算法很适合用于解决这个问题,它带有两个指定元素范围的迭代器参数。该算法删除相邻的重复元素,然后重新排列输入范围内的元素,并且返回一个迭代器,表示无重复的值范围的结束。

After the call to unique, the vector holds

调用 unique 后,vector 中存储内容是:

Note that the size of words is unchanged. It still has ten elements; only the order of these elements has changed. The call to unique "removes" adjacent duplicates. We put remove in quotes because unique doesn't remove any elements. Instead, it overwrites adjacent duplicates so that the unique elements are copied into the front of the sequence. The iterator returned by unique denotes one past the end of the range of unique elements.

注意,words 的大小并没有改变,依然保存着 10 个元素;只是这些元素的顺序改变了。调用 unique“删除”了相邻的重复值。给“删除”加上引号是因为 unique 实际上并没有删除任何元素,而是将无重复的元素复制到序列的前端,从而覆盖相邻的重复元素。unique 返回的迭代器指向超出无重复的元素范围末端的下一位置。

Using Container Operations to Remove Elements
使用容器操作删除元素

If we want to eliminate the duplicate items, we must use a container operation, which we do in the call to erase. This call erases the elements starting with the one to which end_unique refers through the end of words. After this call, words contains the eight unique words from the input.

如果要删除重复的项,必须使用容器操作,在本例中调用 erase 实现该功能。这个函数调用从 end_unique 指向的元素开始删除,直到 words 的最后一个元素也删除掉为止。调用之后,words 存储输入的 8 个不相同的元素。

Algorithms never directly change the size of a container. If we want to add or remove elements, we must use a container operation.

算法不直接修改容器的大小。如果需要添加或删除元素,则必须使用容器操作。

It is worth noting that this call to erase would be safe even if there were no duplicated words in our vector. If there were no duplicates, then unique would return words.end(). Both arguments in the call to erase would have the same value, words.end(). The fact that the iterators are equal would mean that the range to erase would be empty. Erasing an empty range has no effect, so our program is correct even if the input has no duplicates.

值得注意的是,对没有重复元素的 vector 对象,调用 erase 也是安全的。如果不存在重复的元素,unique 就会返回 words.end(),此时,调用 erase 的两个实参值相同,都是 words.end()。两个迭代器相等这个事实意味着 erase 函数要删除的范围是空的。删除一段空的范围没有任何作用,所以即使输入中没有重复的元素,我们的程序仍然正确。

Defining Needed Utility Functions
定义需要的实用函数

Our next subproblem is to count how many words are of length six or greater. To solve this problem, we'll use two additional generic algorithms: stable_sort and count_if. To use each of these algorithms we'll need a companion utility function, known as a predicates. A predicate is a function that performs some test and returns a type that can be used in a condition to indicate success or failure.

下一个子问题统计长度不小于 6 的单词个数。为了解决这个问题,需要用到另外两个泛型算法:stable_sort 和 count_if。使用这些算法,还需要一个配套的实用函数,称为谓词。谓词是做某些检测的函数,返回用于条件判断的类型,指出条件是否成立。

The first predicate we need will be used to sort the elements based on size. To do this sort, we need to define a predicate function that compares two strings and returns a bool indicating whether the first is shorter in length than the second:

我们需要的第一个谓词将用在基于大小的元素排序中。为了实现排序,必须定义一个谓词函数来实现两个 string 对象的比较,并返回一个 bool 值,指出第一个字符串是否比第二个短:

     // comparison function to be used to sort by word length
     bool isShorter(const string &s1, const string &s2)
     {
         return s1.size() < s2.size();
     }

The other function we need will determine whether a given string is of length six or greater:

另一个所需的谓词函数将判断给出的 string 对象的长度是否不小于 6:

     // determine whether a length of a given word is 6 or more
     bool GT6(const string &s)
     {
          return s.size() >= 6;
     }

Although this function solves our problem, it is unnecessarily limitedthe function hardwires the size into the function itself. If we wanted to find out how many words were of another length, we'd have to write another function. We could easily write a more general comparison function that took two parameters, the string and the size. However, the function we pass to count_if takes a single argument, so we cannot use the more general approach in this program. We'll see a better way to write this part of our solution in Section 14.8.1 (p. 531).

尽管这个函数能解决问题,但存在不必要限制——函数内部硬性规定了对长度大小的要求。如果要统计其他长度的单词个数,则必须编写另一个函数。其实很容易写出更通用的比较函数,使它带有两个形参,分别是 string 对象和一个长度大小值即可。但是,传递给 count_if 算法的函数只能带有一个实参,因此本程序不能使用上述更通用的方法。第 14.8.1 节将为这个问题提供更好的解决方案。

Sorting Algorithms
排序算法

The library defines four different sort algorithms, of which we've used the simplest, sort, tosort words into alphabetical order. In addition to sort, the library also defines a stable_sort algorithm. A stable_sort maintains the original order among equal elements. Ordinarily, we don't care about the relative order of equal elements in a sorted sequence. After all, they're equal. However, in this case, we have defined "equal" to mean "the same length." Elements that have the same length can still be distinct when viewed alphabetically. By calling stable_sort, we maintain alphabetic order among those elements that have the same length.

标准库定义了四种不同的排序算法,上面只使用了最简单的 sort 算法,使 words 按字典次序排列。除了 sort 之外,标准库还定义了 stable_sort 算法,stable_sort 保留相等元素的原始相对位置。通常,对于已排序的序列,我们并不关心其相等元素的相对位置,毕竟,这些元素是相等的。但是,在这个应用中,我们将“相等”定义为“相同的长度”,有着相同长度的元素还能以字典次序的不同而区分。调用 stable_sort 后,对于长度相同的元素,将保留其字典顺序。

Both sort and stable_sort are overloaded functions. One version uses the < operator for the element type to do the comparison. We used this version of sort to sort words before looking for duplicate elements. The second version takes a third parameter: the name of a predicate to use when comparing elements. That function must take two arguments of the same type as the element type and return a value that can be tested in a condition. We will use this second version, passing our isShorter function to compare elements:

sort 和 stable_sort 都是重载函数。其中一个版本使用元素类型提供的小于(<)操作符实现比较。在查找重复元素之前,我们就是用这个 sort 版本对元素排序。第二个重载版本带有第三个形参:比较元素所使用的谓词函数的名字。这个谓词函数必须接受两个实参,实参的类型必须与元素类型相同,并返回一个可用作条件检测的值。下面将比较元素的 isShorter 函数作为实参,调用第二个版本的排序函数:

     // sort words by size, but maintain alphabetic order for words of the same size
     stable_sort(words.begin(), words.end(), isShorter);

After this call, words is sorted by element size, but the words of each length are also still in alphabetical order:

调用后,words 中的元素按长度大小排序,而长度相同的单词则仍然保持字典顺序:

Counting Words of Length Six or More
统计长度不小于 6 的单词

Now that we've reordered our vector by word size, our remaining problem is to count how many words are of length six or greater. The count_if algorithm handles this problem:

现在此 vector 对象已经按单词长度排序,剩下的问题就是统计长度不小于 6 的单词个数。使用 count_if 算法处理这个问题:

     vector<string>::size_type wc =
                  count_if(words.begin(), words.end(), GT6);

count_if executes by reading the range denoted by its first two parameters. It passes each value that it reads to the predicate function represented by its third argument. That function must take a single argument of the element type and must return a value that can be tested as a condition. The algorithm returns a count of the number of elements for which the function succeeded. In this case, count_if passes each word to GT6, which returns the bool value true if the word's length is six or more.

执行 count_if 时,首先读取它的头两个实参所标记的范围内的元素。每读出一个元素,就将它传递给第三个实参表示的谓词函数。此谓词函数。此谓词函数需要单个元素类型的实参,并返回一个可用作条件检测的值。count_if 算法返回使谓词函数返回条件成立的元素个数。在这个程序中,count_if 将每个单词传递给 GT6,而 GT6 返回一个 bool 值,如果单词长度不小于 6,则该 bool 值为 true

Putting It All Together
将全部程序段放在一起

Having looked at the program in detail, here is the program as a whole:

了解程序的细节之后,下面是完整的程序:

     // comparison function to be used to sort by word length
     bool isShorter(const string &s1, const string &s2)
     {
         return s1.size() < s2.size();
     }
     // determine whether a length of a given word is 6 or more
     bool GT6(const string &s)
     {
         return s.size() >= 6;
     }
     int main()
     {
         vector<string> words;
         // copy contents of each book into a single vector
         string next_word;
         while (cin >> next_word) {
             // insert next book's contents at end of words
             words.push_back(next_word);
         }
         // sort words alphabetically so we can find the duplicates
         sort (words.begin(), words.end());
         /* eliminate duplicate words:
          * unique reorders words so that each word appears once in the
          *       front portion of words and returns an iterator one past the unique range;
          * erase uses a vector operation to remove the nonunique elements
          */
         vector<string>::iterator end_unique =
                     unique(words.begin(), words.end());
         words.erase(end_unique, words.end());
          // sort words by size, but maintain alphabetic order for words of the same size
          stable_sort(words.begin(), words.end(), isShorter);
          vector<string>::size_type wc =
                          count_if (words.begin(), words.end(), GT6);
          cout << wc << " " << make_plural(wc, "word", "s")
               << " 6 characters or longer" << endl;
          return 0;
     }

We leave as an exercise the problem of printing the words in size order.

最后,我们留下按长度顺序输出单词这个问题作为习题。

Exercises Section 11.2.3

Exercise 11.9:

Implement the program to count words of size 4 or greater, including printing the list of unique words in the input. Test your program by running it on the program's source file.

编写程序统计长度不小于 4 的单词,并输出输入序列中不重复的单词。在程序源文件上运行和测试你自己编写的程序。

Exercise 11.10:

The library defines a find_if function. Like find, the find_if function takes a pair of iterators that indicates a range over which to operate. Like count_if, it also takes a third parameter that names a predicate that can be used to test each element in the range. find_if returns an iterator that refers to the first element for which the function returns a nonzero value. It returns its second iterator argument if there is no such element. Use the find_if function to rewrite the portion of our program that counted how many words are greater than length six.

标准库定义了一个 find_if 函数。与 find 一样,find_if 函数带有一对迭代器形参,指定其操作的范围。与 count_if 一样,该函数还带有第三个形参,表明用于检查范围内每个元素的谓词函数。find_if 返回一个迭代器,指向第一个谓词函数返回非零值的元素。如果这样的元素不存在,则返回第二个迭代器实参。使用 find_if 函数重写上述例题中统计长度大于 6 的单词个数的程序部分。

Exercise 11.11:

Why do you think the algorithms don't change the size of containers?

你认为为什么算法不改变容器的大小?

Exercise 11.12:

Why was it be necessary to use erase rather than define a generic algorithm that could remove elements from the container?

为什么必须使用 erase,而不是定义一个泛型算法来删除容器中的元素?

Team LiB
Previous Section Next Section