3.2. Library string Type3.2. 标准库 string 类型The string type supports variable-length character strings. The library takes care of managing the memory associated with storing the characters and provides various useful operations. The library string type is intended to be efficient enough for general use. string 类型支持长度可变的字符串,C++ 标准库将负责管理与存储字符相关的内存,以及提供各种有用的操作。标准库 string 类型的目的就是满足对字符串的一般应用。 As with any library type, programs that use strings must first include the associated header. Our programs will be shorter if we also provide an appropriate using declaration: 与其他的标准库类型一样,用户程序要使用 string 类型对象,必须包含相关头文件。如果提供了合适的 using 声明,那么编写出来的程序将会变得简短些: #include <string> using std::string; 3.2.1. Defining and Initializing strings3.2.1. string 对象的定义和初始化The string library provides several constructors (Section 2.3.3, p. 49). A constructor is a special member function that defines how objects of that type can be initialized. Table 3.1 on the facing page lists the most commonly used string constructors. The default constructor (Section 2.3.4, p. 52) is used "by default" when no initializer is specified. string 标准库支持几个构造函数(第 2.3.3 节)。构造函数是一个特殊成员函数,定义如何初始化该类型的对象。表 3.1 列出了几个 string 类型常用的构造函数。当没有明确指定对象初始化式时,系统将使用默认构造函数(第 2.3.4 节)。 Table 3.1. Ways to Initialize a string表 3.1. 几种初始化 string 对象的方式
3.2.2. Reading and Writing strings3.2.2. string 对象的读写我们已在第一章学习了用 iostream 标准库来读写内置类型的值,如 int double 等。同样地,也可以用 iostream 和 string 标准库,使用标准输入输出操作符来读写 string 对象: // Note: #include and using declarations must be added to compile this code int main() { string s; // empty string cin >> s; // read whitespace-separated string into s cout << s << endl; // write s to the output return 0; } This program begins by defining a string named s. The next line, 以上程序首先定义命名为 s 的 string 第二行代码: cin >> s; // read whitespace-separated string into s reads the standard input storing what is read into s. The string input operator: 从标准输入读取 string 并将读入的串存储在 s 中。string 类型的输入操作符:
So, if the input to this program is "Hello World!", (note leading and trailing spaces) then the output will be "Hello" with no extra spaces. 如果给定和上一个程序同样的输入,则输出的结果是"Hello World!"(注意到开头和结尾的空格),则屏幕上将输出"Hello",而不含任何空格。 The input and output operations behave similarly to the operators on the builtin types. In particular, the operators return their left-hand operand as their result. Thus, we can chain together multiple reads or writes: 输入和输出操作的行为与内置类型操作符基本类似。尤其是,这些操作符返回左操作数作为运算结果。因此,我们可以把多个读操作或多个写操作放在一起: string s1, s2; cin >> s1 >> s2; // read first input into s1, second into s2 cout << s1 << s2 << endl; // write both strings If we give this version of the program the same input as in the previous paragraph, our output would be 如果给定和上一个程序同样的输入,则输出的结果将是:
HelloWorld!
The programs presented from this point on will assume that the needed #include and using declarations have been made. 从本例开始的程序均假设程序中所有必须 #include 和 using 声明已给出。 Reading an Unknown Number of strings读入未知数目的 string 对象Like the input operators that read built-in types, the string input operator returns the stream from which it read. Therefore, we can use a string input operation as a condition, just as we did when reading ints in the program on page 18. The following program reads a set of strings from the standard input and writes what it has read, one string per line, to the standard output: 和内置类型的输入操作一样,string 的输入操作符也会返回所读的数据流。因此,可以把输入操作作为判断条件,这与我们在 1.4.4 节读取整型数据的程序做法是一样的。下面的程序将从标准输入读取一组 string 对象,然后在标准输出上逐行输出:
int main()
{
string word;
// read until end-of-file, writing each word to a new line
while (cin >> word)
cout << word << endl;
return 0;
}
In this case, we read into a string using the input operator. That operator returns the istream from which it read, and the while condition tests the stream after the read completes. If the stream is validit hasn't hit end-of-file or encountered an invalid inputthen the body of the while is executed and the value we read is printed to the standard output. Once we hit end-of-file, we fall out of the while. 上例中,用输入操作符来读取 string 对象。该操作符返回所读的 istream 对象,并在读取结束后,作为 while 的判断条件。如果输入流是有效的,即还未到达文件尾且未遇到无效输入,则执行 while 循环体,并将读取到的字符串输出到标准输出。如果到达了文件尾,则跳出 while 循环。 Using getline to Read an Entire Line使用 getline 读取整行文本There is an additional useful string IO operation: getline. This is a function that takes both an input stream and a string. The getline function reads the next line of input from the stream and stores what it read, not including the newline, in its string argument. Unlike the input operator, getline does not ignore leading newlines. Whenever getline encounters a newline, even if it is the first character in the input, it stops reading the input and returns. The effect of encountering a newline as the first character in the input is that the string argument is set to the empty string. 另外还有一个有用的 string IO 操作:getline。这个函数接受两个参数:一个输入流对象和一个 string 对象。getline 函数从输入流的下一行读取,并保存读取的内容到不包括换行符。和输入操作符不一样的是,getline 并不忽略行开头的换行符。只要 getline 遇到换行符,即便它是输入的第一个字符,getline 也将停止读入并返回。如果第一个字符就是换行符,则 string 参数将被置为空 string。 The getline function returns its istream argument so that, like the input operator, it can be used as a condition. For example, we could rewrite the previous program that wrote one word per line to write a line at a time instead: getline 函数将 istream 参数作为返回值,和输入操作符一样也把它用作判断条件。例如,重写前面那段程序,把每行输出一个单词改为每次输出一行文本:
int main()
{
string line;
// read line at time until end-of-file
while (getline(cin, line))
cout << line << endl;
return 0;
}
Because line does not contain a newline, we must write our own if we want the strings written one to a line. As usual, we use endl to write a newline and flush the output buffer. 由于 line 不含换行符,若要逐行输出需要自行添加。照常,我们用 endl 来输出一个换行符并刷新输出缓冲区。
3.2.3. Operations on strings3.2.3. string 对象的操作Table 3.2 on the next page lists the most commonly used string operations. 表 3.2 列出了常用的 string 操作。 Table 3.2. string Operations
The string size and empty Operationsstring 的 size 和 empty 操作The length of a string is the number of characters in the string. It is returned by the size operation: string 对象的长度指的是 string 对象中字符的个数,可以通过 size 操作获取: int main() { string st("The expense of spirit\n"); cout << "The size of " << st << "is " << st.size() << " characters, including the newline" << endl; return 0; } If we compile and execute this program it yields 编译并运行这个程序,得到的结果为: The size of The expense of spirit is 22 characters, including the newline Often it is useful to know whether a string is empty. One way we could do so would be to compare size with 0: 了解 string 对象是否空是有用的。一种方法是将 size 与 0 进行比较:
if (st.size() == 0)
// ok: empty
In this case, we don't really need to know how many characters are in the string; we are only interested in whether the size is zero. We can more directly answer this question by using the empty member: 本例中,程序员并不需要知道 string 对象中有多少个字符,只想知道 size 是否为 0。用 string 的成员函数 empty() 可以更直接地回答这个问题:
if (st.empty())
// ok: empty
The empty function returns the bool (Section 2.1, p. 34) value true if the string contains no characters; otherwise, it returns false. empty() 成员函数将返回 bool(2.1 节),如果 string 对象为空则返回 true 否则返回 false。 string::size_typestring::size_type 类型It might be logical to expect that size returns an int, or, thinking back to the note on page 38, an unsigned. Instead, the size operation returns a value of type string::size_type. This type requires a bit of explanation. 从逻辑上来讲,size() 成员函数似乎应该返回整形数值,或如 2.2 节“建议”中所述的无符号整数。但事实上,size 操作返回的是 string::size_type 类型的值。我们需要对这种类型做一些解释。
The string classand many other library typesdefines several companion types. These companion types make it possible to use the library types in a machine-independent manner. The type size_type is one of these companion types. It is defined as a synonym for an unsigned typeeither unsigned int or unsigned longthat is guaranteed to be big enough to hold the size of any string. To use the size_type defined by string, we use the scope operator to say that the name size_type is defined in the string class. string 类类型和许多其他库类型都定义了一些配套类型(companion type)。通过这些配套类型,库类型的使用就能与机器无关(machine-independent)。size_type 就是这些配套类型中的一种。它定义为与 unsigned 型(unsigned int 或 unsigned long)具有相同的含义,而且可以保证足够大能够存储任意 string 对象的长度。为了使用由 string 类型定义的 size_type 类型是由 string 类定义。
Although we don't know the precise type of string::size_type, wedo know that it is an unsigned type (Section 2.1.1, p. 34). We also know that for a given type, the unsigned version can hold a positive value twice as large as the corresponding signed type can hold. This fact implies that the largest string could be twice as large as the size an int can hold. 虽然我们不知道 string::size_type 的确切类型,但可以知道它是 unsigned 型(2.1.1 节)。对于任意一种给定的数据类型,它的 unsigned 型所能表示的最大正数值比对应的 signed 型要大倍。这个事实表明 size_type 存储的 string 长度是 int 所能存储的两倍。 Another problem with using an int is that on some machines the size of an int is too small to hold the size of even plausibly large strings. For example, if a machine has 16-bit ints, then the largest string an int could represent would have 32,767 characters. A string that held the contents of a file could easily exceed this size. The safest way to hold the size of a string is to use the type the library defines for this purpose, which is string::size_type. 使用 int 变量的另一个问题是,有些机器上 int 变量的表示范围太小,甚至无法存储实际并不长的 string 对象。如在有 16 位 int 型的机器上,int 类型变量最大只能表示 32767 个字符的 string 个字符的 string 对象。而能容纳一个文件内容的 string 对象轻易就会超过这个数字。因此,为了避免溢出,保存一个 stirng 对象 size 的最安全的方法就是使用标准库类型 string::size_type。
The string Relational Operatorsstring 关系操作符The string class defines several operators that compare two string values. Each of these operators works by comparing the characters from each string. string 类定义了几种关系操作符用来比较两个 string 值的大小。这些操作符实际上是比较每个 string
The equality operator compares two strings, returning true if they are equal. Two strings are equal if they are the same length and contain the same characters. The library also defines != to test whether two strings are unequal. == 操作符比较两个 string 对象,如果它们相等,则返回 true。两个 string 对象相等是指它们的长度相同,且含有相同的字符。标准库还定义了 != 操作符来测试两个 string 对象是否不等。 The relational operators <, <=, >, >= test whether one string is less than, less than or equal, greater than, or greater than or equal to another: 关系操作符 <,<=,>,>= 分别用于测试一个 string 对象是否小于、小于或等于、大于、大于或等于另一个 string 对象:
string big = "big", small = "small"; string s1 = big; // s1 is a copy of big if (big == small) // false // ... if (big <= s1) // true, they're equal, so big is less than or equal to s1 // ... The relational operators compare strings using the same strategy as in a (case-sensitive) dictionary: 关系操作符比较两个 string 对象时采用了和(大小写敏感的)字典排序相同的策略:
As an example, given the strings 举例来说,给定 string 对象; string substr = "Hello"; string phrase = "Hello World"; string slang = "Hiya"; then substr is less than phrase, and slang is greater than either substr or phrase. 则 substr 小于 phrase,而 slang 则大于 substr 或 phrase Assignment for stringsstring 对象的赋值In general the library types strive to make it as easy to use a library type as it is to use a built-in type. To this end, most of the library types support assignment. In the case of strings, we can assign one string object to another: 总体上说,标准库类型尽量设计得和基本数据类型一样方便易用。因此,大多数库类型支持赋值操作。对 string 对象来说,可以把一个 string 对象赋值给另一个 string 对象; // st1 is an empty string, st2 is a copy of the literal string st1, st2 = "The expense of spirit"; st1 = st2; // replace st1 by a copy of st2 After the assignment, st1 contains a copy of the characters in st2. 赋值操作后,st1 就包含了 st2 串所有字符的一个副本。 Most string library implementations go to some trouble to provide efficient implementations of operations such as assignment, but it is worth noting that conceptually, assignment requires a fair bit of work. It must delete the storage containing the characters associated with st1, allocate the storage needed to contain a copy of the characters associated with st2, and then copy those characters from st2 into this new storage. 大多数 string 库类型的赋值等操作的实现都会遇到一些效率上的问题,但值得注意的是,从概念上讲,赋值操作确实需要做一些工作。它必须先把 st1 占用的相关内存释放掉,然后再分配给 st2 足够存放 st2 副本的内存空间,最后把 st2 中的所有字符复制到新分配的内存空间。
Adding Two strings两个 string 对象相加Addition on strings is defined as concatenation. That is, it is possible to concatenate two or more strings through the use of either the plus operator (+) or the compound assignment operator (+=) (Section 1.4.1, p. 13). Given the two strings string 对象的加法被定义为连接(concatenation)。也就是说,两个(或多个)string 对象可以通过使用加操作符 + 或者复合赋值操作符 +=(1.4.1 节)连接起来。给定两个 string 对象:
string s1("hello, "); string s2("world\n"); we can concatenate the two strings to create a third string as follows: 下面把两个 string 对象连接起来产生第三个 string 对象:
string s3 = s1 + s2; // s3 is hello, world\n
If we wanted to append s2 to s1 directly, then we would use +=: 如果要把 s2 直接追加到 s1 的末尾,可以使用 += 操作符: s1 += s2; // equivalent to s1 = s1 + s2 Adding Character String Literals and strings和字符串字面值的连接The strings s1 and s2 included punctuation directly. We could achieve the same result by mixing string objects and string literals as follows: 上面的字符串对象 s1 和 s2 直接包含了标点符号。也可以通过将 string 对象和字符串字面值混合连接得到同样的结果: string s1("hello"); string s2("world"); string s3 = s1 + ", " + s2 + "\n"; When mixing strings and string literals, at least one operand to each + operator must be of string type: 当进行 string 对象和字符串字面值混合连接操作时,+ 操作符的左右操作数必须至少有一个是 string 类型的: string s1 = "hello"; // no punctuation string s2 = "world"; string s3 = s1 + ", "; // ok: adding a string and a literal string s4 = "hello" + ", "; // error: no string operand string s5 = s1 + ", " + "world"; // ok: each + has string operand string s6 = "hello" + ", " + s2; // error: can't add string literals The initializations of s3 and s4 involve only a single operation. In these cases, it is easy to determine that the initialization of s3 is legal: We initialize s3 by adding a string and a string literal. The initialization of s4 attempts to add two string literals and is illegal. s3 和 s4 的初始化只用了一个单独的操作。在这些例子中,很容易判断 s3 的初始化是合法的:把一个 string 对象和一个字符串字面值连接起来。而 s4 的初始化试图将两个字符串字面值相加,因此是非法的。 The initialization of s5 may appear surprising, but it works in much the same way as when we chain together input or output expressions (Section 1.2, p. 5). In this case, the string library defines addition to return a string. Thus, when we initialize s5, the subexpression s1 + ", " returns a string, which can be concatenated with the literal "world\n". It is as if we had written s5 的初始化方法显得有点不可思议,但这种用法和标准输入输出的串联效果是一样的(1.2 节)。本例中,string 标准库定义加操作返回一个 string 对象。这样,在对 s5 进行初始化时,子表达式 s1 + ", " 将返回一个新 string 对象,后者再和字面值 "world\n"连接。整个初始化过程可以改写为:
string tmp = s1 + ", "; // ok: + has a string operand s5 = tmp + "world"; // ok: + has a string operand On the other hand, the initialization of s6 is illegal. Looking at each subexpression in turn, we see that the first subexpression adds two string literals. There is no way to do so, and so the statement is in error. 而 s6 的初始化是非法的。依次来看每个子表达式,则第一个子表达式试图把两个字符串字面值连接起来。这是不允许的,因此这个语句是错误的。 Fetching a Character from a string从 string 对象获取字符The string type uses the subscript ([ ]) operator to access the individual characters in the string. The subscript operator takes a size_type value that denotes the character position we wish to fetch. The value in the subscript is often called "the subscript" or "an index." string 类型通过下标操作符([ ])来访问 string 对象中的单个字符。下标操作符需要取一个 size_type 类型的值,来标明要访问字符的位置。这个下标中的值通常被称为“下标”或“索引”(index)
It is an error to use an index outside this range. 引用下标时如果超出下标作用范围就会引起溢出错误。 We could use the subscript operator to print each character in a string on a separate line: 可用下标操作符分别取出 string 对象的每个字符,分行输出: string str("some string"); for (string::size_type ix = 0; ix != str.size(); ++ix) cout << str[ix] << endl; On each trip through the loop we fetch the next character from str, printing it followed by a newline. 每次通过循环,就从 str 对象中读取下一个字符,输出该字符并换行。 Subscripting Yields an Lvalue下标操作可用作左值Recall that a variable is an lvalue (Section 2.3.1, p. 45), and that the left-hand side of an assignment must be an lvalue. Like a variable, the value returned by the subscript operator is an lvalue. Hence, a subscript can be used on either side of an assignment. The following loop sets each character in str to an asterisk: 前面说过,变量是左值(2.3.1 节),且赋值操作的左操作的必须是左值。和变量一样,string 对象的下标操作返回值也是左值。因此,下标操作可以放于赋值操作符的左边或右边。通过下面循环把 str 对象的每一个字符置为 ‘*’: for (string::size_type ix = 0; ix != str.size(); ++ix) str[ix] = '*'; Computing Subscript Values计算下标值Any expression that results in an integral value can be used as the index to the subscript operator. For example, assuming someval and someotherval are integral objects, we could write 任何可产生整型值的表达式可用作下标操作符的索引。例如,假设 someval 和 someotherval 是两个整形对象,可以这样写: str[someotherval * someval] = someval; Although any integral type can be used as an index, the actual type of the index is string::size_type, which is an unsigned type. 虽然任何整型数值都可作为索引,但索引的实际数据类型却是类型 unsigned 类型 string::size_type。
When we subscript a string, we are responsible for ensuring that the index is "in range." By in range, we mean that the index is a number that, when assigned to a size_type, is a value in the range from 0 through the size of the string minus one. By using a string::size_type or another unsigned type as the index, we ensure that the subscript cannot be less than zero. As long as our index is an unsigned type, we need only check that it is less than the size of the string. 在使用下标索引 string 对象时,必须保证索引值“在上下界范围内”。“在上下界范围内”就是指索引值是一个赋值为 size_type 类型的值,其取值范围在 0 到 string 对象长度减 1 之间。使用 string::size_type 类型或其他 unsigned 类型,就只需要检测它是否小于 string 对象的长度。
3.2.4. Dealing with the Characters of a string3.2.4. string 对象中字符的处理Often we want to process the individual characters of a string. For example, we might want to know if a particular character is a whitespace character or whether the character is alphabetic or numeric. Table 3.3 on the facing page lists the functions that can be used on the characters in a string (or on any other char value). These functions are defined in the cctype header. 我们经常要对 string 对象中的单个字符进行处理,例如,通常需要知道某个特殊字符是否为空白字符、字母或数字。表 3.3 列出了各种字符操作函数,适用于 string 对象的字符(或其他任何 char 值)。这些函数都在 cctype 头文件中定义。 Table 3.3. cctype Functions
These functions mostly test the given character and return an int, which acts as a truth value. Each function returns zero if the test fails; otherwise, they return a (meaningless) nonzero value indicating that the character is of the requested kind. 表中的大部分函数是测试一个给定的字符是否符合条件,并返回一个 int 作为真值。如果测试失败,则该函数返回 0 ,否则返回一个(无意义的)非 0 ,表示被测字符符合条件。 For these functions, a printable character is a character with a visible representation; whitespace is one of space, tab, vertical tab, return, newline, and formfeed; and punctuation is a printable character that is not a digit, a letter, or (printable) whitespace character such as space. 表中的这些函数,可打印的字符是指那些可以表示的字符,空白字符则是空格、制表符、垂直制表符、回车符、换行符和进纸符中的任意一种;标点符号则是除了数字、字母或(可打印的)空白字符(如空格)以外的其他可打印字符。 As an example, we could use these functions to print the number of punctuation characters in a given string: 这里给出一个例子,运用这些函数输出一给定 string 对象中标点符号的个数: string s("Hello World!!!"); string::size_type punct_cnt = 0; // count number of punctuation characters in s for (string::size_type index = 0; index != s.size(); ++index) if (ispunct(s[index])) ++punct_cnt; cout << punct_cnt << " punctuation characters in " << s << endl; The output of this program is 这个程序的输出结果是:
3 punctuation characters in Hello World!!!
Rather than returning a truth value, the tolower and toupper functions return a charactereither the argument unchanged or the lower- or uppercase version of the character. We could use tolower to change s to lowercase as follows: 和返回真值的函数不同的是,tolower 和 toupper 函数返回的是字符,返回实参字符本身或返回该字符相应的大小写字符。我们可以用 tolower 函数把 string 对象 s 中的字母改为小写字母,程序如下: // convert s to lowercase for (string::size_type index = 0; index != s.size(); ++index) s[index] = tolower(s[index]); cout << s << endl; which generates 得到的结果为:
hello world!!!
![]() |