The New C: Integers, Part 2

新的C语言:整型,第二部分

By Randy Meyers, June 01, 2001


The new C Standard has a novel idea: just accept that machine-word sizes will grow. Randy explains C's proactive strategy for accommodating the inevitable.

    新的C语言标准有一个新概念:接受机器字将会变长这个事实。Randy 解释了C为了适应这个不可避免趋势而主动出击的策略。


In December’s column, I began discussing the new integer features of C99, in particular, the long long data type. In this month’s column, I discuss how the C99 Standard generalizes the support for integers and allows your compiler to support additional integer data types. While it might not be immediately obvious, these C99 features were motivated in part by the introduction of 64-bit machines.


    在12月的专栏里,我开始讨论了C99中新整型的特性,特别是 long long 数据类型。在本月的专栏中,我将讨论C99标准如何推广对整型的支持,使你的编译器能够支持额外的数据类型。虽然不是非常明显,采用64位机器某种程度上推动了这些C99特性。


As the C99 Standard was being developed, 64-bit machines started to appear, and with them came the contentious issue of the mapping from C keywords to hardware integer data types. An informal group of vendors of 64-bit hardware and software met a few times to reach consensus on a common mapping, and failed to do so for a period of two or three years. Many of the proposed mappings called for a new integer type to be added to C. For example, one proposal mapped int and long to 32-bit integers and specified a new C type for 64-bit integers. Another proposal mapped int and long to 64-bit integers and specified a new C type for 32-bit integers.


    在开发C99标准的过程中,64位机器开始出现,伴随它们而来的是有争议的问题:把C的关键字映射到硬件整数数据类型。一个非正式的64位硬件和软件制造商团体,尝试通过几次会晤达成一个常用映射的共识,但是在两三年后失败了。许多推荐的映射需要在C中添加一个新的整型。例如,一个推荐的映射是 int long 对应32整型,并为64位整型指定一个新的C类型。另一个推荐的映射是 int long 对应64位整型,并为32位整型指定一个新的C类型。


Eventually, two 64-bit mappings won. Vendors most concerned about compatibility with their 32-bit offerings mapped short to 16 bits, int and long to 32 bits, and added a new 64-bit integer type (long long). Vendors most concerned about elegant access to 64-bit hardware mapped short to 16 bits, int to 32 bits, and long to 64 bits. (This is also the mapping used in Java.)


    最终,两个64位的映射取胜了。制造商最关心与他们的32位提供的映射的兼容性,即 short 对应 16位, int long 对应 32位, 并增加了一个新的64位整型(long long)。在64位硬件的映射上,制造商最关心的是简洁的访问,short 对应16位, int 对应 32位, long 对应64位。(这也是Java所使用的映射)。


The C standards committee realized that there was a lesson to be learned from the 64-bit vendors: C probably would gain more integer types in the future. Someday, there will be interest in 128-bit integers. Occasionally, there is interest in adding unusual integer types, such as specialized counters for digital signal processors or integers with different "endian-ness" (byte ordering). The committee decided that it would be best if the Standard addressed the issue of adding new integer types to C in order to give direction to both implementations and programmers. (Some programmers complained about the addition of long long to the language since, rather than use typedefs or macros, they had hard-coded the information about the largest integer type in their programs.)


    C标准委员会意识到,从64位制造商上汲取了教训:C 将来很可能会获得更多的整型。有一天,人们会对128位的整数有兴趣。偶尔,有人会有兴趣添加罕见的整数类型,例如数字信号处理器专用的计数器,或是拥有不同“端序”(字节序)的整数。委员会决定,最好在C标准中解决往C中添加新整型的问题,以给C实现和程序员指明方向。(有些程序员抱怨这个额外的 long long , 因为它使用的不是typedef 或宏,他们对自己程序中最大的整型信息有深入的了解。)

Generalized Integers

泛化的整型

The model for extended integer types (as the Standard calls implementation-defined integers) should seem pretty natural to most C programmers. Like the standard integer types (as the Standard calls the integers with which you are familiar, such as int, signed char, and unsigned long), extended integers come in pairs of types. For every extended signed integer type, there is a corresponding extended unsigned integer type. Together the standard integer types and the extended integer types (along with enum types and char) are collectively known as the integer types. Thus, all of the statements in the Standard about the integer types also apply to any extended integer types supported by an implementation.


    扩展的整数类型模型(标准要求的实现定义的整型)应让C程序员看起来相当自然。如同标准的整数类型(你所熟悉的标准要求的整型,例如 int signed charunsigned long),扩展的整数类型成队出现。对于每一个有符号的扩展整型,就有一个对应的扩展的无符号整型。标准的整数类型和扩展的整数类型(连同 enum 类型 和 char )一起统称为整型。因此,所有标准中关于整型的说明也被应用到任何实现支持的扩展整型中。


The Standard does not say what the names of any extended integer types are. An implementation might name the types using a combination of current keywords (e.g., long long long int) or invent new keywords spelled using the name patterns that the Standard reserves for use by implementations (e.g., __int24 or _BigEndian32).


    标准没有说任何扩展整型的名字是什么。某个实现可能使用现有关键字的组合来命名一个类型(例如 long long long int),或者发明一个新关键字,使用标准保留的用于实现的名字来拼写(例如 __int24_BigEndia32)。


Since C89, integers in C have been required to be binary numbers, and signed integers are required to be represented in one’s complement, two’s complement, or sign and magnitude notation. Integers (except unsigned char) may have unused bits in their representation. (On some machines, integers have the same storage representation as floating-point numbers with the exponent bits ignored.) Integers might have representations for both positive and negative zero. Extended integers obey these same rules. Thus, extended integers cannot be binary coded decimal. Since integers in C have fixed sizes, extended integers cannot be LISP-like bignums whose storage dynamically grows and shrinks to hold numbers of unlimited size. Extended integers are like standard integers. If a programmer was given a typedef defined to be an integer, it would not make much difference whether the typedef was an extended integer or an unspecified standard integer.


    从C89开始,C中的整数就要求是二进制数字,有符号整数要求使用1的补码(one’s complement)[a]、2的补码(two’s complement)[b]、或是符号数值表示法[c]来表示。整数(除了 unsigned char )在它们的表示中可能有未使用的位。(在一些机器中,整数跟浮点数拥有相同能干的存储表示,只是忽略了指数位。)整数可能有正零和负零两种表示。扩展的整数遵守同样的规则。因此,扩展的整型不能是二-十进位编码(BCD码)。因为C中的整数有固定的大小,扩展整数不能是类似LISP中存储空间动态增长和收缩能保存无限数字大小的bignum 。扩展的整数就像标准整数。如果程序员给出的typedef被定义为一个整数, typedef的是一个扩展的整数还是一个非特定的标准整数,都不会有太大的区别。

Expression Evaluation

表达式求值

The rules for how extended and standard integers work in expressions is determined by a new concept called the integer conversion rank. The Standard requires every implementation to rank all of its integer types according to the following rules:


    扩展和标准整数如何在表达式中工作的规则,由一个叫整数转换优先级的新概念决定。标准要求每一个实现按照下面的规则给它的所有整数类型划分优先级:

As you probably suspect, the integer conversion rank is used to define the rules for expression evaluation. Those rules are named the integer promotions and the usual arithmetic conversions.


    就如你可能猜想的那样,整数转换优先级用于定义表达式求值的规则。这些规则叫做整型提升和常用算术转换。


The integer promotions rule states that an integer with rank less than int may be used in an expression anywhere where an int or unsigned int may be used (so can a bitfield). The integer value used in the expression is converted to int or unsigned int, depending upon which can hold all of the values of the original type. For example, if short and int have the same representation, an unsigned short promotes to unsigned int when used in an expression.


    整型提升规则规定了一个优先级比 int 低的整数可以用于任何 int unsigned int 能使用的地方(一个位段也可以)。表达式中的该整数的值被转换成 int unsigned int,取决于哪一个能够存储它原来类型的所有值。例如,如果 short int 有相同的表示,一个 unsigned short 在表达式中被提升为 unsigned int


The usual arithmetic conversions are the rules that determine the result type of most of the two-operand operators in C when they are operating on integer or floating-point operands. The usual arithmetic conversion rules state that first you perform the integer promotions on any integer operands. Then you determine a common type, convert the operands to the common type, and the result of the operator has that common type.


    常用算术转换的规则决定了C中进行整数或浮点数运算时大多数两操作数运算的结果类型。常用算术转换规则规定在任意整数运算时首先执行整型提升。然后确定一个公共类型,把操作数转换为该公共类型,这次运算的结果就是该公共类型。

The first few rules for determining the common type for cases involving integers are pretty simple:


    涉及整型的情况时,用于确定公共类型的前几个规则非常简单:   

Unfortunately, the Standard permits an implementation to make all of the integer types the same size, and that complicates the rules for when a signed integer type meets an unsigned integer type. Normally, if you multiplied a long by an unsigned int, you would expect the common type (and thus the result type) to be long. However, if int and long have the same representation, an unsigned int behaves like an unsigned long in terms of expression evaluation. Thus, when int and long are "the same," multiplying a long by an unsigned int is like multiplying a long by an unsigned long: the common type is unsigned long. The rules for when one operand has signed integer type and the other operand has unsigned integer type are:


    不幸的是,标准允许实现使得所有整数类型都是同样的大小,这使一个有符号整型遇到一个无符号整型时的规则变得复杂了。通常,如果你把一个 long 乘以一个unsigned int, 你会期望公共类型(同样就是结果的类型)是 long。然而,如果 int long 有相同的表示,从表达式求值方面来说,一个 unsigned int 的行为就像 unsigned long 那样。因此,当 intlong 是“相同”的时候,把一个 long 乘以一个 unsigned int 就像把一个 long 乘以 一个 unsigned long: 公共类型是 unsigned long。当一个操作数是有符号整型而另一个是无符号整型时规则如下:

While the integer conversion rank and expression evaluation rules may seem abstract, they are nothing more than the rules programmers assume about integers every day, particularly when dealing with typedefs. Consider a simple statement like: x = y + z; where x, y, and z all have the same typedef type and the only thing you know about that type is that it is a signed integer. You assume that the result of the addition has the same type as the variables, or that it is int if the type of the variables is smaller than int. You assume that the result does not overflow unless the true result would not fit in x. These are the sort of properties that result from the rules. They guarantee consistent and natural expression evaluation.


    虽然整型转换优先级和表达式求值规则可能看起来很抽象,它们只不过是程序员对整型日常假设的规则,特别是使用 typedef 处理时。考虑一个简单的语句,如: x = y + z;当 xyz都有相同的 typedef 的类型,而你唯一知道的是这个类型是一个有符号整型。你假设这次加法的结果的类型跟变量相同,或是 int 如果变量的类型比 int 小。你假设这个结果不会溢出除非真实的结果对于 x 来说太大了。这是那些来自规则的结果造成的属性。它们保证了一致性和自然的表达式求值。

Constants

常量

C99 does not introduce any new special syntax for integer constants that have an extended integer type. However, implementations might introduce such syntax as an extension. For example, if an implementation has a 128-bit integer type named long long long, it might allow you to suffix a decimal constant with LLL to indicate its type.


    C99没有为扩展整数类型的整数常量引入任何新的特殊语法。然而,实现可能会引入这样的语法作为一个扩展。例如,如果一个实现有一个128位的整型叫做 long long long, 它可能允许你给十进制常量加上 LLL 的后缀来指明类型。


In C, integer constants have a type based on their value. For example, a decimal integer constant without a suffix has the first type from this list that can represent its value: int, long, or long long. The Standard permits a constant to have an extended integer type if its value cannot be represented by long long (or unsigned long long, if appropriate) and the extended type can represent its value.


    在C中,整型常量的类型基于它们的值。例如,一个不带后缀的十进制常量的类型是这个列表中第一个能表示它的值的类型:intlong 或是 long long。标准允许一个常量为扩展整型,如果他的值不能以 long long 表示 (或是 unsigned long long,如果适用的话)该扩展类型能表示它的值。


The extended integer type used to represent a constant must be a signed type if the constant is normally signed (e.g., a decimal constant without a U suffix). The extended integer type must be unsigned if the U suffix appears in the constant. For octal or hexadecimal constants without a U suffix, the extended type may be signed or unsigned. (Unsuffixed octal and hexadecimal constants normally have the first type from this list that can represent their value: int, unsigned int, long, unsigned long, long long, or unsigned long long.)


    用来表示一个常量的扩展整型必须是一个有符号类型,如果该常量是符合规则的有符号的(例如,一个不带后缀 U的进制常量)。扩展整型必须是无符号的如果后缀 U出现在常量中。对于没有后缀 U 的八进制和十六进制常量,扩展类型可以是有符号的或是无符号的。(无后缀的八进制和十六进制的类型通常是这个列表中第一个能表示它的值的类型:intunsigned intlongunsigned longlong long或是unsigned long long。)

Next Month

下一个月

Next month’s column wraps up integer support in C99 by discussing the new headers <stdint.h> and <inttypes.h>. These headers allow you to use both standard integer types and extended integer types in a more portable fashion.

    

    下一个月的专栏将以讨论新的头文件 <stdint.h> and <inttypes.h> 来结束C99中对整数的支持。这些头文件允许你以移植性更好的方式使用标准整型和扩展整型。

Randy Meyers is consultant providing training and mentoring in C, C++, and Java. He is the current chair of J11, the ANSI C committee, and previously was a member of J16 (ANSI C++) and the ISO Java Study Group. He worked on compilers for Digital Equipment Corporation for 16 years and was Project Architect for DEC C and C++. He can be reached at rmeyers@ix.netcom.com.


    Randy Meyers 是为C、C++和JAVA提供培训和指导的顾问。他目前是ANSI C委员会J11的主席,之前是J16(ANSI C++)和ISO JAVA学习小组(ISO Java Study Group)的成员。他曾经在DEC公司(Digital Equipment Corporation)研究编译器长达16年,并且是DEC C和C++的项目架构师。可以通过以下地址与他联系:rmeyers@ix.netcom.com。


注释

[a] 即反码,见:http://zh.wikipedia.org/zh-cn/%E8%A1%A5%E7%A0%81
[b] 即补码。
[c] 最高有效位是符号位,确定剩下的位应该取负权还是正权(这句话是从cs:app里面抄来的)。

原文地址

http://www.ddj.com/cpp/184401339