Team LiB
Previous Section Next Section

4.3. C-Style Character Strings

4.3. C 风格字符串

Although C++ supports C-style strings, they should not be used by C++ programs. C-style strings are a surprisingly rich source of bugs and are the root cause of many, many security problems.

尽管 C++ 支持 C 风格字符串,但不应该在 C++ 程序中使用这个类型。C 风格字符串常常带来许多错误,是导致大量安全问题的根源。

In Section 2.2 (p. 40) we first used string literals and learned that the type of a string literal is array of constant characters. We can now be more explicit and note that the type of a string literal is an array of const char. A string literal is an instance of a more general construct that C++ inherits from C: C-style character strings. C-style strings are not actually a type in either C or C++. Instead, C-style strings are null-terminated arrays of characters:

第 2.2 节中我们第一次使用了字符串字面值,并了解字符串字面值的类型是字符常量的数组,现在可以更明确地认识到:字符串字面值的类型就是 const char 类型的数组。C++ 从 C 语言继承下来的一种通用结构是C 风格字符串,而字符串字面值就是该类型的实例。实际上,C 风格字符串既不能确切地归结为 C 语言的类型,也不能归结为 C++ 语言的类型,而是以空字符 null 结束的字符数组:

          char ca1[] = {'C', '+', '+'};        // no null, not C-style string
          char ca2[] = {'C', '+', '+', '\0'};  // explicit null
          char ca3[] = "C++";     // null terminator added automatically
          const char *cp = "C++"; // null terminator added automatically
          char *cp1 = ca1;   // points to first element of a array, but not C-style string
          char *cp2 = ca2;   // points to first element of a null-terminated char array

Neither ca1 nor cp1 are C-style strings: ca1 is a character array, but the array is not null-terminated. cp1, which points to ca1, therefore, does not point to a null-terminated array. The other declarations are all C-style strings, remembering that the name of an array is treated as a pointer to the first element of the array. Thus, ca2 and ca3 are pointers to the first elements of their respective arrays.

ca1cp1 都不是 C 风格字符串:ca1 是一个不带结束符 null 的字符数组,而指针 cp1 指向 ca1,因此,它指向的并不是以 null 结束的数组。其他的声明则都是 C 风格字符串,数组的名字即是指向该数组第一个元素的指针。于是,ca2ca3 分别是指向各自数组第一个元素的指针。

Using C-style Strings

C 风格字符串的使用

C-style strings are manipulated through (const) char* pointers. One frequent usage pattern uses pointer arithmetic to traverse the C-style string. The traversal tests and increments the pointer until we reach the terminating null character:

C++ 语言通过(const)char*类型的指针来操纵 C 风格字符串。一般来说,我们使用指针的算术操作来遍历 C 风格字符串,每次对指针进行测试并递增 1,直到到达结束符 null 为止:

          const char *cp = "some value";
          while (*cp) {
              // do something to *cp
              ++cp;
          }

The condition in the while dereferences the const char* pointer cp and the resulting character is tested for its true or false value. A true value is any character other than the null. So, the loop continues until it encounters the null character that terminates the array to which cp points. The body of the while does whatever processing is needed and concludes by incrementing cp to advance the pointer to address the next character in the array.

while 语句的循环条件是对 const char* 类型的指针 cp 进行解引用,并判断 cp 当前指向的字符是 true 值还是 false 值。真值表明这是除 null 外的任意字符,则继续循环直到 cp 指向结束字符数组的 null 时,循环结束。while 循环体做完必要的处理后,cp 加1,向下移动指针指向数组中的下一个字符。

This loop will fail if the array that cp addresses is not null-terminated. If this case, the loop is apt to read characters starting at cp until it encounters a null character somewhere in memory.

如果 cp 所指向的字符数组没有 null 结束符,则此循环将会失败。这时,循环会从 cp 指向的位置开始读数,直到遇到内存中某处 null 结束符为止。

C Library String Functions

C 风格字符串的标准库函数

The Standard C library provides a set of functions, listed in Table 4.1, that operate on C-style strings. To use these functions, we must include the associated C header file

表4-1列出了 C 语言标准库提供的一系列处理 C 风格字符串的库函数。要使用这些标准库函数,必须包含相应的 C 头文件:

which is the C++ version of the string.h header from the C library.

cstringstring.h 头文件的 C++ 版本,而 string.h 则是 C 语言提供的标准库。

These functions do no checking on their string parameters.

这些标准库函数不会检查其字符串参数。



Table 4.1. C-Style Character String Functions
表 4.1. 操纵 C 风格字符串的标准库函数

strlen(s)

Returns the length of s, not counting the null.

返回 s 的长度,不包括字符串结束符 null

strcmp(s1, s2)

Compares s1 and s2 for equality. Returns 0 if s1 == s2, positive value if s1 > s2, negative value if s1 < s2.

比较两个字符串 s1s2 是否相同。若 s1s2 相等,返回 0;若 s1 大于 s2,返回正数;若 s1 小于 s2,则返回负数

strcat(s1, s2)

Appends s2 to s1. Returns s1.

将字符串 s2 连接到 s1 后,并返回 s1

strcpy(s1, s2)

Copies s2 into s1. Returns s1.

s2 复制给 s1,并返回 s1

strncat(s1, s2,n)

Appends n characters from s2 onto s1. Returns s1.

s2 的前 n 个字符连接到 s1 后面,并返回 s1

strncpy(s1, s2, n)

Copies n characters from s2 into s1. Returns s1.

s2 的前 n 个字符复制给 s1,并返回 s1


          #include <cstring>

The pointer(s) passed to these routines must be nonzero and each pointer must point to the initial character in a null-terminated array. Some of these functions write to a string they are passed. These functions assume that the array to which they write is large enough to hold whatever characters the function generates. It is up to the programmer to ensure that the target string is big enough.

传递给这些标准库函数例程的指针必须具有非零值,并且指向以 null 结束的字符数组中的第一个元素。其中一些标准库函数会修改传递给它的字符串,这些函数将假定它们所修改的字符串具有足够大的空间接收本函数新生成的字符,程序员必须确保目标字符串必须足够大。

When we compare library strings, we do so using the normal relational operators. We can use these operators to compare pointers to C-style strings, but the effect is quite different; what we're actually comparing is the pointer values, not the strings to which they point:

C++ 语言提供普通的关系操作符实现标准库类型 string 的对象的比较。这些操作符也可用于比较指向C风格字符串的指针,但效果却很不相同:实际上,此时比较的是指针上存放的地址值,而并非它们所指向的字符串:

          if (cp1 < cp2) // compares addresses, not the values pointed to

Assuming cp1 and cp2 point to elements in the same array (or one past that array), then the effect of this comparison is to compare the address in cp1 with the address in cp2. If the pointers do not address the same array, then the comparison is undefined.

如果 cp1cp2 指向同一数组中的元素(或该数组的溢出位置),上述表达式等效于比较在 cp1cp2 中存放的地址;如果这两个指针指向不同的数组,则该表达式实现的比较没有定义。

To compare the strings, we must use strcmp and interpret the result:

字符串的比较和比较结果的解释都须使用标准库函数 strcmp 进行:

          const char *cp1 = "A string example";
          const char *cp2 = "A different string";
          int i = strcmp(cp1, cp2);    // i is positive
          i = strcmp(cp2, cp1);        // i is negative
          i = strcmp(cp1, cp1);        // i is zero

The strcmp function returns three possible values: 0 if the strings are equal; or a positive or negative value, depending on whether the first string is larger or smaller than the second.

标准库函数 strcmp 有 3 种可能的返回值:若两个字符串相等,则返回 0 值;若第一个字符串大于第二个字符串,则返回正数,否则返回负数。

Never Forget About the Null-Terminator

永远不要忘记字符串结束符 null

When using the C library string functions it is essential to remember the strings must be null-terminated:

在使用处理 C 风格字符串的标准库函数时,牢记字符串必须以结束符 null 结束:

          char ca[] = {'C', '+', '+'}; // not null-terminated
          cout << strlen(ca) << endl; // disaster: ca isn't null-terminated

In this case, ca is an array of characters but is not null-terminated. What happens is undefined. The strlen function assumes that it can rely on finding a null character at the end of its argument. The most likely effect of this call is that strlen will keep looking through the memory that follows wherever ca happens to reside until it encounters a null character. In any event, the return from strlen will not be the correct value.

在这个例题中,ca 是一个没有 null 结束符的字符数组,则计算的结果不可预料。标准库函数 strlen 总是假定其参数字符串以 null 字符结束,当调用该标准库函数时,系统将会从实参 ca 指向的内存空间开始一直搜索结束符,直到恰好遇到 null 为止。strlen 返回这一段内存空间中总共有多少个字符,无论如何这个数值不可能是正确的。

Caller Is Responsible for Size of a Destination String

调用者必须确保目标字符串具有足够的大小

The array that we pass as the first argument to strcat and strcpy must be large enough to hold the generated string. The code we show here, although a common usage pattern, is frought with the potential for serious error:

传递给标准库函数 strcatstrcpy 的第一个实参数组必须具有足够大的空间存放新生成的字符串。以下代码虽然演示了一种通常的用法,但是却有潜在的严重错误:

          // Dangerous: What happens if we miscalculate the size of largeStr?
          char largeStr[16 + 18 + 2];         // will hold cp1 a space and cp2
          strcpy(largeStr, cp1);              // copies cp1 into largeStr
          strcat(largeStr, " ");              // adds a space at end of largeStr
          strcat(largeStr, cp2);              // concatenates cp2 to largeStr
          // prints A string example A different string
          cout << largeStr << endl;

The problem is that we could easily miscalculate the size needed in largeStr. Similarly, if we later change the sizes of the strings to which either cp1 or cp2 point, then the calculated size of largeStr will be wrong. Unfortunately, programs similar to this code are widely distributed. Programs with such code are error-prone and often lead to serious security leaks.

问题在于我们经常会算错 largeStr 需要的大小。同样地,如果 cp1cp2 所指向的字符串大小发生了变化,largeStr 所需要的大小则会计算错误。不幸的是,类似于上述代码的程序应用非常广泛,这类程序往往容易出错,并导致严重的安全漏洞。

When Using C-Style Strings, Use the strn Functions

使用 strn 函数处理C风格字符串

If you must use C-style strings, it is usually safer to use the strncat and strncpy functions instead of strcat and strcpy:

如果必须使用 C 风格字符串,则使用标准库函数 strncatstrncpystrcatstrcpy 函数更安全:

          char largeStr[16 + 18 + 2]; // to hold cp1 a space and cp2
          strncpy(largeStr, cp1, 17); // size to copy includes the null
          strncat(largeStr, " ", 2);  // pedantic, but a good habit
          strncat(largeStr, cp2, 19); // adds at most 18 characters, plus a null

The trick to using these versions is to properly calculate the value to control how many characters get copied. In particular, we must always remember to account for the null when copying or concatenating characters. We must allocate space for the null because that is the character that terminates largeStr after each call. Let's walk through these calls in detail:

使用标准库函数 strncatstrncpy 的诀窍在于可以适当地控制复制字符的个数。特别是在复制和串连字符串时,一定要时刻记住算上结束符 null。在定义字符串时要切记预留存放 null字符的空间,因为每次调用标准库函数后都必须以此结束字符串 largeStr。让我们详细分析一下这些标准库函数的调用:

  • On the call to strncpy, we ask to copy 17 characters: all the characters in cp1 plus the null. Leaving room for the null is necessary so that largeStr is properly terminated. After the strncpy call, largeStr has a strlen value of 16. Remember, strlen counts the characters in a C-style string, not including the null.

    调用 strncpy 时,要求复制 17 个字符:字符串 cp1 中所有字符,加上结束符 null。留下存储结束符 null 的空间是必要的,这样 largeStr 才可以正确地结束。调用 strncpy 后,字符串 largeStr 的长度 strlen 值是 16。记住:标准库函数 strlen 用于计算 C 风格字符串中的字符个数,不包括 null结束符。

  • When we call strncat, we ask to copy two characters: the space and the null that terminates the string literal. After this call, largeStr has a strlen of 17. The null that had ended largeStr is overwritten by the space that we appended. A new null is written after that space.

    调用 strncat 时,要求复制 2 个字符:一个空格和结束该字符串字面值的 null。调用结束后,字符串 largeStr 的长度是 17,原来用于结束 largeStr 的 null 被新添加的空格覆盖了,然后在空格后面写入新的结束符 null。

  • When we append cp2 in the second call, we again ask to copy all the characters from cp2, including the null. After this call, the strlen of largeStr would be 35: 16 characters from cp1, 18 from cp2, and 1 for the space that separates the two strings.

    第二次调用 strncat 串接 cp2 时,要求复制 cp2 中所有字符,包括字符串结束符 null。调用结束后,字符串 largeStr 的长度是 35:cp1 的 16 个字符和 cp2 的 18 个字符,再加上分隔这两个字符串的一个空格。

The array size of largeStr remains 36 throughout.

整个过程中,存储 largeStr 的数组大小始终保持为 36(包括结束符)。

These operations are safer than the simpler versions that do not take a size argument as long as we calculate the size argument correctly. If we ask to copy or concatenate more characters than the size of the target array, we will still overrun that array. If the string we're copying from or concatenating is bigger than the requested size, then we'll inadvertently truncate the new version. Truncating is safer than overrunning the array, but it is still an error.

只要可以正确计算出 size 实参的值,使用 strn 版本要比没有 size 参数的简化版本更安全。但是,如果要向目标数组复制或串接比其 size 更多的字符,数组溢出的现象仍然会发生。如果要复制或串接的字符串比实际要复制或串接的 size 大,我们会不经意地把新生成的字符串截短了。截短字符串比数组溢出要安全,但这仍是错误的。

Whenever Possible, Use Library strings

尽可能使用标准库类型 string

None of these issues matter if we use C++ library strings:

如果使用 C++ 标准库类型 string,则不存在上述问题:

          string largeStr = cp1; // initialize large Str as a copy of cp1
          largeStr += " ";       // add space at end of largeStr
          largeStr += cp2;       // concatenate cp2 onto end of largeStr

Now the library handles all memory management, and we need no longer worry if the size of either string changes.

此时,标准库负责处理所有的内存管理问题,我们不必再担心每一次修改字符串时涉及到的大小问题。

For most applications, in addition to being safer, it is also more efficient to use library strings rather than C-style strings.

对大部分的应用而言,使用标准库类型 string,除了增强安全性外,效率也提高了,因此应该尽量避免使用 C 风格字符串。


Exercises Section 4.3

Exercise 4.22:

Explain the difference between the following two while loops:

解释下列两个 while 循环的差别:

          const char *cp = "hello";
          int cnt;
          while (cp) { ++cnt; ++cp; }
          while (*cp) { ++cnt; ++cp; }

Exercise 4.23:

What does the following program do?

下列程序实现什么功能?

          const char ca[] = {'h', 'e', 'l', 'l', 'o'};
          const char *cp = ca;
          while (*cp) {
              cout << *cp << endl;
              ++cp;
          }

Exercise 4.24:

Explain the differences between strcpy and strncpy. What are the advantages of each? The disadvantages?

解释 strcpystrncpy 的差别在哪里,各自的优缺点是什么?

Exercise 4.25:

Write a program to compare two strings. Now write a program to compare the value of two C-style character strings.

编写程序比较两个 string 类型的字符串,然后编写另一个程序比较两个C风格字符串的值。

Exercise 4.26:

Write a program to read a string from the standard input. How might you write a program to read from the standard input into a C-style character string?

编写程序从标准输入设备读入一个 string 类型的字符串。考虑如何编程实现从标准输入设备读入一个 C 风格字符串。

4.3.1. Dynamically Allocating Arrays

4.3.1. 创建动态数组

A variable of array type has three important limitations: Its size is fixed, the size must be known at compile time, and the array exists only until the end of the block in which it was defined. Real-world programs usually cannot live with these restrictionsthey need a way to allocate an array dynamically at run time. Although all arrays have fixed size, the size of a dynamically allocated array need not be fixed at compile time. It can be (and usually is) determined at run time. Unlike an array variable, a dynamically allocated array continues to exist until it is explicitly freed by the program.

数组类型的变量有三个重要的限制:数组长度固定不变,在编译时必须知道其长度,数组只在定义它的块语句内存在。实际的程序往往不能忍受这样的限制——它们需要在运行时动态地分配数组。虽然数组长度是固定的,但动态分配的数组不必在编译时知道其长度,可以(通常也是)在运行时才确定数组长度。与数组变量不同,动态分配的数组将一直存在,直到程序显式释放它为止。

Every program has a pool of available memory it can use during program execution to hold dynamically allocated objects. This pool of available memory is referred to as the program's free store or heap. C programs use a pair of functions named malloc and free to allocate space from the free store. In C++ we use new and delete expressions.

每一个程序在执行时都占用一块可用的内存空间,用于存放动态分配的对象,此内存空间称为程序的自由存储区。C 语言程序使用一对标准库函数 mallocfree 在自由存储区中分配存储空间,而 C++ 语言则使用 newdelete 表达式实现相同的功能。

Defining a Dynamic Array
动态数组的定义

When we define an array variable, we specify a type, a name, and a dimension. When we dynamically allocate an array, we specify the type and size but do not name the object. Instead, the new expression returns a pointer to the first element in the newly allocated array:

数组变量通过指定类型、数组名和维数来定义。而动态分配数组时,只需指定类型和数组长度,不必为数组对象命名,new 表达式返回指向新分配数组的第一个元素的指针:

          int *pia = new int[10]; // array of 10 uninitialized ints

This new expression allocates an array of ten ints and returns a pointer to the first element in that array, which we use to initialize pia.

new 表达式分配了一个含有 10 个 int 型元素的数组,并返回指向该数组第一个元素的指针,此返回值初始化了指针 pia

A new expression takes a type and optionally an array dimension specified inside a bracket-pair. The dimension can be an arbitrarily complex expression. When we allocate an array, new returns a pointer to the first element in the array. Objects allocated on the free store are unnamed. We use objects on the heap only indirectly through their address.

new 表达式需要指定指针类型以及在方括号中给出的数组维数,该维数可以是任意的复杂表达式。创建数组后,new 将返回指向数组第一个元素的指针。在自由存储区中创建的数组对象是没有名字的,程序员只能通过其地址间接地访问堆中的对象。

Initializing a Dynamically Allocated Array
初始化动态分配的数组

When we allocate an array of objects of a class type, then that type's default constructor (Section 2.3.4, p. 50) is used to initialize each element. If the array holds elements of built-in type, then the elements are uninitialized:

动态分配数组时,如果数组元素具有类类型,将使用该类的默认构造函数(第 2.3.4 节)实现初始化;如果数组元素是内置类型,则无初始化:

          string *psa = new string[10]; // array of 10 empty strings
          int *pia = new int[10];       // array of 10 uninitialized ints

Each of these new expressions allocates an array of ten objects. In the first case, those objects are strings. After allocating memory to hold the objects, the default string constructor is run on each element of the array in turn. In the second case, the objects are a built-in type; memory to hold ten ints is allocated, but the elements are uninitialized.

这两个 new 表达式都分配了含有 10 个对象的数组。其中第一个数组是 string 类型,分配了保存对象的内存空间后,将调用 string 类型的默认构造函数依次初始化数组中的每个元素。第二个数组则具有内置类型的元素,分配了存储 10 个 int 对象的内存空间,但这些元素没有初始化。

Alternatively, we can value-initialize (Section 3.3.1, p. 92) the elements by following the array size by an empty pair of parentheses:

也可使用跟在数组长度后面的一对空圆括号,对数组元素做值初始化(第 3.3.1 节):

          int *pia2 = new int[10] (); // array of 10 uninitialized ints

The parentheses are effectively a request to the compiler to value-initialize the array, which in this case sets its elements to 0.

圆括号要求编译器对数组做值初始化,在本例中即把数组元素都设置为0。

The elements of a dynamically allocated array can be initialized only to the default value of the element type. The elements cannot be initialized to separate values as can be done for elements of an array variable.

对于动态分配的数组,其元素只能初始化为元素类型的默认值,而不能像数组变量一样,用初始化列表为数组元素提供各不相同的初值。

Dynamic Arrays of const Objects
const 对象的动态数组

If we create an array of const objects of built-in type on the free store, we must initialize that array: The elements are const, there is no way to assign values to the elements. The only way to initialize the elements is to value-initialize the array:

如果我们在自由存储区中创建的数组存储了内置类型的 const 对象,则必须为这个数组提供初始化:因为数组元素都是 const 对象,无法赋值。实现这个要求的唯一方法是对数组做值初始化:

          // error: uninitialized const array
          const int *pci_bad = new const int[100];
          // ok: value-initialized const array
          const int *pci_ok = new const int[100]();

It is possible to have a const array of elements of a class type that provides a default constructor:

C++ 允许定义类类型的 const 数组,但该类类型必须提供默认构造函数:

          // ok: array of 100 empty strings
          const string *pcs = new const string[100];

In this case, the default constructor is used to initialize the elements of the array.

在这里,将使用 string 类的默认构造函数初始化数组元素。

Of course, once the elements are created, they may not be changedwhich means that such arrays usually are not very useful.

当然,已创建的常量元素不允许修改——因此这样的数组实际上用处不大。

It Is Legal to Dynamically Allocate an Empty Array
允许动态分配空数组

When we dynamically allocate an array, we often do so because we don't know the size of the array at compile time. We might write code such as

之所以要动态分配数组,往往是由于编译时并不知道数组的长度。我们可以编写如下代码

          size_t n = get_size(); // get_size returns number of elements needed
          int* p = new int[n];
          for (int* q = p; q != p + n; ++q)
               /* process the array */ ;

to figure out the size of the array and then allocate and process the array.

计算数组长度,然后创建和处理该数组。

An interesting question is: What happens if get_size returns 0? The answer is that our code works fine. The language specifies that a call to new to create an array of size zero is legal. It is legal even though we could not create an array variable of size 0:

有趣的是,如果 get_size 返回 0 则会怎么样?答案是:代码仍然正确执行。C++ 虽然不允许定义长度为 0 的数组变量,但明确指出,调用 new 动态创建长度为 0 的数组是合法的:

          char arr[0];            // error: cannot define zero-length array
          char *cp = new char[0]; // ok: but cp can't be dereferenced

When we use new to allocate an array of zero size, new returns a valid, nonzero pointer. This pointer will be distinct from any other pointer returned by new. The pointer cannot be dereferencedafter all, it points to no element. The pointer can be compared and so can be used in a loop such as the preceeding one. It is also legal to add (or subtract) zero to such a pointer and to subtract the pointer from itself, yielding zero.

new 动态创建长度为 0 的数组时,new 返回有效的非零指针。该指针与 new 返回的其他指针不同,不能进行解引用操作,因为它毕竟没有指向任何元素。而允许的操作包括:比较运算,因此该指针能在循环中使用;在该指针上加(减)0;或者减去本身,得 0 值。

In our hypothetical loop, if the call to get_size returned 0, then the call to new would still succeed. However, p would not address any element; the array is empty. Because n is zero, the for loop effectively compares q to p. These pointers are equal; q was initialized to p, so the condition in the for fails and the loop body is not executed.

在上述例题中,如果 get_size 返回 0,则仍然可以成功调用 new,但是 p 并没有指向任何对象,数组是空的。因为 n 为 0,所以 for 循环实际比较的是 pq,而 q 是用 p 初始化的,两者具有相等的值,因此 for 循环条件不成立,循环体一次都没有执行。

Freeing Dynamic Memory
动态空间的释放

When we allocate memory, we must eventually free it. Otherwise, memory is gradually used up and may be exhausted. When we no longer need the array, we must explicitly return its memory to the free store. We do so by applying the delete [] expression to a pointer that addresses the array we want to release:

动态分配的内存最后必须进行释放,否则,内存最终将会逐渐耗尽。如果不再需要使用动态创建的数组,程序员必须显式地将其占用的存储空间返还给程序的自由存储区。C++ 语言为指针提供 delete [] 表达式释放指针所指向的数组空间:

          delete [] pia;

deallocates the array pointed to by pia, returning the associated memory to the free store. The empty bracket pair between the delete keyword and the pointer is necessary: It indicates to the compiler that the pointer addresses an array of elements on the free store and not simply a single object.

该语句回收了 pia 所指向的数组,把相应的内存返还给自由存储区。在关键字 delete 和指针之间的空方括号对是必不可少的:它告诉编译器该指针指向的是自由存储区中的数组,而并非单个对象。

If the empty bracket pair is omitted, it is an error, but an error that the compiler is unlikely to catch; the program may fail at run time.

如果遗漏了空方括号对,这是一个编译器无法发现的错误,将导致程序在运行时出错。



The least serious run-time consequence of omitting brackets when freeing an array is that too little memory will be freed, leading to a memory leak. On some systems and/or for some element types, more serious run-time problems are possible. It is essential to remember the bracket-pair when deleting pointers to arrays.

理论上,回收数组时缺少空方括号对,至少会导致运行时少释放了内存空间,从而产生内存泄漏(memory leak)。对于某些系统和/或元素类型,有可能会带来更严重的运行时错误。因此,在释放动态数组时千万别忘了方括号对。

Contrasting C-Style Strings and C++ Library strings

C 风格字符串与 C++ 的标准库类型 string 的比较

The following two programs illustrate the differences in using C-style character strings versus using the C++ library string type. The string version is shorter, easier to understand, and less error-prone:

以下两段程序反映了使用 C 风格字符串与 C++ 的标准库类型 string 的不同之处。使用 string 类型的版本更短、更容易理解,而且出错的可能性更小:



          // C-style character string implementation
             const char *pc = "a very long literal string";
             const size_t len = strlen(pc +1);      // space to
 allocate
             // performance test on string allocation and copy
             for (size_t ix = 0; ix != 1000000; ++ix) {
                 char *pc2 = new char[len + 1]; // allocate the space
                 strcpy(pc2, pc);               // do the copy
                 if (strcmp(pc2, pc))           // use the new string
                     ;   // do nothing
                 delete [] pc2;                 // free the memory
          }
          // string implementation
             string str("a very long literal string");
             // performance test on string allocation and copy
             for (int ix = 0; ix != 1000000; ++ix) {
                 string str2 = str; // do the copy, automatically
 allocated
                 if (str != str2)           // use the new string
                       ;  // do nothing
          }
                                            // str2 is
 automatically freed


These programs are further explored in the exercises to Section 4.3.1 (p. 139). 这些程序将在4.3.1节的习题中做进一步探讨。

Using Dynamically Allocated Arrays
动态数组的使用

A common reason to allocate an array dynamically is if its dimension cannot be known at compile time. For example, char* pointers are often used to refer to multiple C-style strings during the execution of a program. The memory used to hold the various strings typically is allocated dynamically during program execution based on the length of the string to be stored. This technique is considerably safer than allocating a fixed-size array. Assuming we correctly calculate the size needed at run time, we no longer need to worry that a given string will overflow the fixed size of an array variable.

通常是因为在编译时无法知道数组的维数,所以才需要动态创建该数组。例如,在程序执行过程中,常常使用char*指针指向多个C风格字符串,于是必须根据每个字符串的长度实时地动态分配存储空间。采用这种技术要比建立固定大小的数组安全。如果程序员能够准确计算出运行时需要的数组长度,就不必再担心因数组变量具有固定的长度而造成的溢出问题。

Suppose we have the following C-style strings:

假设有以下C风格字符串:

          const char *noerr = "success";
          // ...
          const char *err189 = "Error: a function declaration must "
                               "specify a function return type!";

We might want to copy one or the other of these strings at run time to a new character array. We could calculate the dimension at run time, as follows:

我们想在运行时把这两个字符串中的一个复制给新的字符数组,于是可以用以下程序在运行时计算维数:

    const char *errorTxt;
    if (errorFound)
        errorTxt = err189;
    else
        errorTxt = noerr;
    // remember the 1 for the terminating null
    int dimension = strlen(errorTxt) + 1;
    char *errMsg = new char[dimension];
    // copy the text for the error into errMsg
    strncpy (errMsg, errorTxt, dimension);

Recall that strlen returns the length of the string not including the null. It is essential to remember to add 1 to the length returned from strlen to accommodate the trailing null.

别忘记标准库函数 strlen 返回的是字符串的长度,并不包括字符串结束符,在获得的字符串长度上必须加 1 以便在动态分配时预留结束符的存储空间。

Exercises Section 4.3.1

Exercise 4.27:

Given the following new expression, how would you delete pa?

假设有下面的 new 表达式,请问如何释放 pa

     int *pa = new int[10];
Exercise 4.28:

Write a program to read the standard input and build a vector of ints from values that are read. Allocate an array of the same size as the vector and copy the elements from the vector into the array.

编写程序由从标准输入设备读入的元素数据建立一个 intvector 对象,然后动态创建一个与该 vector 对象大小一致的数组,把 vector 对象的所有元素复制给新数组。

Exercise 4.29:

Given the two program fragments in the highlighted box on page 138,

对本小节第 5 条框中的两段程序:

  1. Explain what the programs do.

    解释这两个程序实现什么功能?

  2. As it happens, on average, the string class implementation executes considerably faster than the C-style string functions. The relative average execution times on our more than five-year-old PC are as follows:

    平均来说,使用 string 类型的程序执行速度要比用 C 风格字符串的快很多,在我们用了五年的 PC 机上其平均执行速度分别是:

              user       0.47    # string class
              user       2.55    # C-style character string
    

Did you expect that? How would you account for it?

你预计的也一样吗?请说明原因。

Exercise 4.30:

Write a program to concatenate two C-style string literals, putting the result in a C-style string. Write a program to concatenate two library strings that have the same value as the literals used in the first program.

编写程序连接两个C风格字符串字面值,把结果存储在一个C风格字符串中。然后再编写程序连接两个 string 类型字符串,这两个 string 类型字符串与前面的C风格字符串字面值具有相同的内容。

4.3.2. Interfacing to Older Code

4.3.2. 新旧代码的兼容

Many C++ programs exist that predate the standard library and so do not yet use the string and vector types. Moreover, many C++ programs interface to existing C programs that cannot use the C++ library. Hence, it is not infrequent to encounter situations where a program written in modern C++ must interface to code that uses arrays and/or C-style character strings. The library offers facilities to make the interface easier to manage.

许多 C++ 程序在有标准类之前就已经存在了,因此既没有使用标准库类型 string 也没有使用 vector。而且,许多 C++ 程序为了兼容现存的 C 程序,也不能使用 C++ 标准库。因此,现代的 C++ 程序经常必须兼容使用数组和/或 C 风格字符串的代码,标准库提供了使兼容界面更容易管理的手段。

Mixing Library strings and C-Style Strings
混合使用标准库类 string 和 C 风格字符串

As we saw on page 80 we can initialize a string from a string literal:

正如第 3.2.1 节中显示的,可用字符串字面值初始化 string 类对象:

          string st3("Hello World");  // st3 holds Hello World

More generally, because a C-style string has the same type as a string literal and is null-terminated in the same way, we can use a C-style string anywhere that a string literal can be used:

通常,由于 C 风格字符串与字符串字面值具有相同的数据类型,而且都是以空字符 null 结束,因此可以把 C 风格字符串用在任何可以使用字符串字面值的地方:

  • We can initialize or assign to a string from a C-style string.

    可以使用 C 风格字符串对 string 对象进行初始化或赋值。

  • We can use a C-style string as one of the two operands to the string addition or as the right-hand operand to the compound assignment operators.

    string 类型的加法操作需要两个操作数,可以使用 C 风格字符串作为其中的一个操作数,也允许将 C 风格字符串用作复合赋值操作的右操作数。

The reverse functionality is not provided: there is no direct way to use a library string when a C-style string is required. For example, there is no way to initialize a character pointer from a string:

反之则不成立:在要求C风格字符串的地方不可直接使用标准库 string 类型对象。例如,无法使用 string 对象初始化字符指针:

          char *str = st2; // compile-time type error

There is, however, a string member function named c_str that we can often use to accomplish what we want:

但是,string 类提供了一个名为 c_str 的成员函数,以实现我们的要求:

          char *str = st2.c_str(); // almost ok, but not quite

The name c_str indicates that the function returns a C-style character string. Literally, it says, "Get me the C-style string representation"that is, a pointer to the beginning of a null-terminated character array that holds the same data as the characters in the string.

c_str 函数返回 C 风格字符串,其字面意思是:“返回 C 风格字符串的表示方法”,即返回指向字符数组首地址的指针,该数组存放了与 string 对象相同的内容,并且以结束符 null 结束。

This initialization fails because c_str returns a pointer to an array of const char. It does so to prevent changes to the array. The correct initialization is:

如果 c_str 返回的指针指向 const char 类型的数组,则上述初始化失败,这样做是为了避免修改该数组。正确的初始化应为:

          const char *str = st2.c_str(); // ok

The array returned by c_str is not guaranteed to be valid indefinitely. Any subsequent use of st2 that might change the value of st2 can invalidate the array. If a program needs continuing access to the data, then the program must copy the array returned by c_str.

c_str 返回的数组并不保证一定是有效的,接下来对 st2 的操作有可能会改变 st2 的值,使刚才返回的数组失效。如果程序需要持续访问该数据,则应该复制 c_str 函数返回的数组。

Using an Array to Initialize a vector
使用数组初始化 vector 对象

On page 112 we noted that it is not possible to initialize an array from another array. Instead, we have to create the array and then explicitly copy the elements from one array into the other. It turns out that we can use an array to initialize a vector, although the form of the initialization may seem strange at first. To initialize a vector from an array, we specify the address of the first element and one past the last element that we wish to use as initializers:

第 4.1.1 节提到不能用一个数组直接初始化另一数组,程序员只能创建新数组,然后显式地把源数组的元素逐个复制给新数组。这反映 C++ 允许使用数组初始化 vector 对象,尽管这种初始化形式起初看起来有点陌生。使用数组初始化 vector 对象,必须指出用于初始化式的第一个元素以及数组最后一个元素的下一位置的地址:

          const size_t arr_size = 6;
          int int_arr[arr_size] = {0, 1, 2, 3, 4, 5};
          // ivec has 6 elements: each a copy of the corresponding element in int_arr
          vector<int> ivec(int_arr, int_arr + arr_size);

The two pointers passed to ivec mark the range of values with which to initialize the vector. The second pointer points one past the last element to be copied. The range of elements marked can also represent a subset of the array:

传递给 ivec 的两个指针标出了 vector 初值的范围。第二个指针指向被复制的最后一个元素之后的地址空间。被标出的元素范围可以是数组的子集:

          // copies 3 elements: int_arr[1], int_arr[2], int_arr[3]
          vector<int> ivec(int_arr + 1, int_arr + 4);

This initialization creates ivec with three elements. The values of these elements are copies of the values in int_arr[1] through int_arr[3].

这个初始化创建了含有三个元素的 ivec,三个元素的值分别是 int_arr[1]int_arr[3] 的副本。

Exercises Section 4.3.2

Exercise 4.31:

Write a program that reads a string into a character array from the standard input. Describe how your program handles varying size inputs. Test your program by giving it a string of data that is longer than the array size you've allocated.

编写程序从标准输入设备读入字符串,并把该串存放在字符数组中。描述你的程序如何处理可变长的输入。提供比你分配的数组长度长的字符串数据测试你的程序。

Exercise 4.32:

Write a program to initialize a vector from an array of ints.

编写程序用 int 型数组初始化 vector 对象。

Exercise 4.33:

Write a program to copy a vector of ints into an array of ints.

编写程序把 intvector 复制给 int 型数组。

Exercise 4.34:

Write a program to read strings into a vector. Now, copy that vector into an array of character pointers. For each element in the vector, allocate a new character array and copy the data from the vector element into that character array. Then insert a pointer to the character array into the array of character pointers.

编写程序读入一组 string 类型的数据,并将它们存储在 vector 中。接着,把该 vector 对象复制给一个字符指针数组。为 vector 中的每个元素创建一个新的字符数组,并把该 vector 元素的数据复制到相应的字符数组中,最后把指向该数组的指针插入字符指针数组。

Exercise 4.35:

Print the contents of the vector and the array created in the previous exercise. After printing the array, remember to delete the character arrays.

输出习题 4.34中建立的 vector 对象和数组的内容。输出数组后,记得释放字符数组。

Team LiB
Previous Section Next Section