The New C: Integers in C99, Part 1

新的C语言:C99中的整型,第一部分

By Randy Meyers, December 01, 2000


C has its roots in typeless languages, but it has come a long long way from its humble beginnings.

C起源于无类型语言,从简陋的开端起已经走过了很长很长的一段路。


Back in the 1980s before the ANSI C Standard and function prototypes, there were occasional flame wars in the comp.lang.c newsgroup over the C type system. The wars would start when Pascal proponents would berate C for lacking strict type checking. Out of pure contrariness, the C camp would sometimes disingenuously argue against type checking in general: "C doesn't need strong typing because C programs only use three types: int, char, and pointer to char," wrote one poster.

This exaggeration has some humor to it because it has some truth to it as well. In much of C programming, particularly systems programming, integers of different sizes are the fundamental data type. Sure, you have arrays of them and structures of them; sometimes they represent numbers and sometimes they represent characters, but almost everything is some type of integer. The primary data type for SNOBOL is the string; for FORTRAN, the floating-point number; and for C, it is the int.

That is not to say that C99 does not have significant new floating-point features, because it does. However, to many C programmers, the integer is still king, and this month and the next I will cover the new integer features of C99.


    早在ANSI C标准和函数原型以前的20世纪80年代,在comp.lang.c新闻组上偶尔就有针对C语言类型系统(type system)激烈的争论。当Pascal支持者痛斥C缺少严格的类型检查时,争论就开始了。出于纯粹的作对,C阵营有时候会口是心非的反驳,通常是这样的:“C语言不需要强类型,因为C程序员只使用三种类型:intchar和指向 char 的指针”,一个回复者写道。


    该夸大之词有些幽默,因为它有一些道理。在许多C语言编程中,特别是系统编程中,不同大小的整型是基本的数据类型。当然,你也拥有由它们组成的结构和数组;有时候它们代表数字,有时它们代表字符,但是几乎所有东西都是某种类型的整型。SNOBOL的主要数据类型是字符串;对FORTRAN来说,是浮点数;对C来说,是 int

    这并不是说,C99没有重要的新浮点特性,因为它确实有。然而,对许多C程序员来说,整型依然是王者,这个月以及下一个月我将会涵盖C99中新的整型特性。

No Longer the Default

C grew out of the typeless languages BCPL and B, and for a brief time was a typeless language itself [1]. It is not that those languages had no types; it is more accurate to say that they had one type, the machine word, which most operators treated as an int. Needless to say, declarations in those languages did not need to include a type specifier since there was only one (unnamed) type available. When Dennis Ritchie first added types to C, there were only two: int and char. I do not know if it was a nod to C's roots as a typeless language, or the popularity of int, or C's notable brevity, but C from its beginnings, until changed by C99, made int the default data type. If you declared an object or function and did not specify a type, or you implicitly declared a function by just calling it, the default type was int.

For example, before C99, if the following was a complete translation unit:

不再是默认

    C源自无类型语言BCPL和B,曾经有一段短暂的时间,它本身也是一种无类型语言[1]。这并不是说这些语言没有任何类型;更准确的说,它们只有一种类型,即机器字(the machine word),大多数操作将其当作 int 来对待。无需多言,在这些语言中的声明不需要包括一个类型限定符,因为它们可用的只有一种(未命名的)类型。当 Dennis Ritchie 首次在C语言中加入类型的时候,只有两种:intchar。我不知道这是否是对C 源自无类型语言的认同,还是由于 int 的流行,或是由于C语言出了名的简洁,但是C从一开始,直到C99改变以前,int 都是默认的数据类型。如果你声明一个对象或者一个函数,但是没有指明一种类型,或是你通过调用来隐含的声明一个函数,默认的类型是 int


    例如,在C99以前,如果下面是一段完整的翻译单元:

extern x;
f(y)
{
    register z = g(x) + y;
    return z;
}
then the variables x, y, and z all had type int, and the functions f and g had return type int. C99 requires that an implementation issue a diagnostic whenever a type would have defaulted to int under earlier definitions of C. The motivation for the diagnostic is that many uses of "implicit int" are errors that can be hard to spot, as in this complete translation unit:

    那么变量 xyz 都是 int 类型,函数 fg 都返回 int 类型。当使用更早C语言定义,一个类型被默认为 int 时,C99要求实现发出一个诊断信息。发出一个诊断信息的原因是,许多对“隐含为整型”的运用带来了难以发现的错误,如下面完整的翻译单元中:
int main()
{
    double d;
    d = sqrt(2.0);
    return 0;
}
Since sqrt is implicitly declared, the compiler treats it as returning an int. While sqrt will correctly store somewhere its double return value, main will not necessarily know where that return value is stored. (On some machines, integers and floating point use different registers.) After main loads the return value (from possibly the wrong location), it will then convert the value from int to double in order to assign it to d. Since the result value was already a double (assuming that main correctly found it), this results in completely scrambling the result from sqrt. This sort of error can take many forms. For example, you might include the wrong header and not get a needed function declaration, or you might forget to include a type specifier when declaring an extern variable. The convenience of implicit int is outweighed by the difficulty of spotting the bugs it introduces. Even Ritchie himself reports he is glad to use a C compiler that forces function prototypes to be declared [2]. Note that the C99 Standard does not require that the diagnostic about implicit int be an error that stops the compilation. A wise implementation will make the diagnostic merely a warning, and have options to make the message an error or turn it off completely at the programmer's discretion.

    因为 sqrt 是隐含声明的,编译器把它的返回值当作 int。虽然 sqrt 会在某个地方正确的存储它的 double 类型的返回值, main 不需要知道这个返回值存储在什么地方[a](在一些机器上,整型和浮点类型使用不同的寄存器)。在 main 装载了返回值后(可能来自一个错误的地方),它将会把这个值从 int 转换为 double 来赋值给 d。因为结果的值已经是个double(假设main能正确的找到这个int类型的返回值),这个结果完全干扰了来自 sqrt 的结果。这种类型的错误可能以多种形式出现。例如,你可能包含了错误的头文件而没有得到需要的函数声明,或者你可能在声明一个外部变量(extern variable)时缺少一个类型限定符。隐含为 int 带来的难以发现 bug 抵消了它的便利。即使是 Ritchie 也提到他自己更乐意使用需要强制声明函数原型的C编译器[2]。注意,C99标准并不要求把隐含为 int 诊断为错误并停止编译。一个聪明的编译器仅仅将其诊断为一个警告,并包含一个选项允许程序员酌情的决定把这个信息定义为一个错误或是完全把它关闭。

long long and unsigned long long

The type long long int (a 64-bit or greater integer) has been an extension in some C compilers since the mid-1980s. It was added to C99 for several reasons: Note that long long is not part of the C++98 Standard, but it is increasingly common in C++ compilers. It is likely that a future revision of the C++ Standard will incorporate long long for compatibility with C. The types long long and unsigned long long are integer data types with at least 64 bits. They may be used wherever any integer type can be used. The header <limits.h> now defines the macros: Since long long and unsigned long long are at least 64 bits long, LLONG_MAX expands into a 19-digit number that starts with 9. LLONG_MIN expands into a negative 19-digit number. ULLONG_MAX expands into a 20-digit number. Of course, an implementation may use more than 64 bits for long long, and the above limits would be adjusted up as necessary.

long long 和 unsigned long long

    long long int 类型(至少64位的整型)自从80年代中期开始就已经存在于某些C编译器中的扩展中。把它加入C99中有几个理由:

    注意,long long 不是C++98标准的一部分,但是在C++编译器中越来越常见。很有可能,在C++标准未来的修订版本中将纳入 long long 来保持与C的兼容性。long long unsigned long long 是至少64位的整数数据类型。它们可以用于任何能够使用整型的地方。头文件 <limits.h> 现在定义了这些宏:

    因为 long long unsigned long long 的长度至少是64位,LLONG_MAX 被扩展为一个以9开头的19位数字。LLONG_MIN 被扩展为一个19位的负数。ULLONG_MAX 被扩展为一个20位的数字。当然,实现可以使 long long 大于64位,并按照需要调高限制。

long long constants

The new suffix ll or LL may be added to the end of an integer constant. Some examples: 7ll, 7LL, 07LL, 0x7ll. When added to a decimal integer constant, the constant has type long long. When added to an octal or hexadecimal integer constant, the constant has type long long if the constant can be represented, or unsigned long long if the constant is too big for long long. The ll and LL suffixes can be combined with the u and U suffixes to force the constant to have type unsigned long long. Some examples: 7Ull, 7LLu, 0x7llu, 07ULL. You need not use the new suffixes to have constants of type long long or unsigned long long. If an integer constant is too big to fit in any other type, it will have type long long or unsigned long long depending upon whether the constant was decimal versus octal or hexadecimal, and whether the u or U suffix was used or not. Next month's column will discuss this topic more fully, and explain how in a few very rare cases this may break old programs.

long long 常量

    新的后缀 llLL 可以添加到整数常量的后面。一些例子:7ll7LL07LL0x7ll。当添加到一个十进制整数常量时,该常量的类型是 long long。当添加到八进制或十六进制整数常量时,如果能够表示的话该常量的类型就是 long long ,如果这个常量对于 long long 来说太大了,它的类型就是 unsigned long longllLL 后缀可以跟 uU 后缀结合,强制该常量的类型为 unsigned long long。一些例子:7Ull7LLu0x7llu07ULL。你不一定要使用 long long 或者 unsigned long long 类型的常量。如果一个整数常量太大以致于不能使用其他类型,它的类型将是 long long 或者 unsigned long long ,取决于该常量是十进制还是八进制或十六进制,以及是否使用了 u U 后缀。下一个月的专栏将会更充分的讨论这些话题,并解释在一些非常罕见的情况的情况下,可能会破坏旧的程序。

Conversions

The usual arithmetic conversions work with long long and unsigned long long as you would suspect. If you add a long long and a smaller integer type, the result is long long. If you add unsigned long long to a smaller integer type, the result is unsigned long long. If you add long long and unsigned long long, the result is unsigned long long. If you add long long or unsigned long long to a floating-point type, the result has the same floating-point type.

转换

    对于 long longunsigned long long 常见的算数转换,就如你猜测的那样。如果你把一个 long long 跟 一个较小整数类型相加,结果是 long long。如果你把一个 unsigned long long 跟一个较小的整数类型相加,结果是 unsigned long long。如果你把一个 long long 跟一个 unsigned long long 相加,结果是 unsigned long long。如果你把一个 long long 或是 unsigned long long 跟一个浮点类型相加, 结果是相同的浮点类型。

Operations

Since long long and unsigned long long are integer types, they may be used wherever any integer type can be used. This means that all of the usual arithmetic operators work on them, and they can be converted to floating-point types, etc.

运算

    既然 long long 和 unsigned long long 都是整型,它们能够用在任何可以使用整型的地方。这意味着所有常见的算数运算都对它们起作用,它们也能够被转换成浮点类型,等等。

printf and scanf

In C99, the printf and scanf families of functions support an optional ll length modifier (note that it must be in lower case). The ll length modifier can appear immediately before the d, i, o, u, x, or X format conversion specifier characters. For the printf functions, this means that the item being printed is long long or unsigned long long. For the scanf functions, this means that the corresponding argument is a pointer to long long or unsigned long long. You may also use the ll length modifier before the n conversion specifier character in printf functions in order to store a count of the characters written thus far into a long long pointed to by the corresponding printf argument. For example:

printf 和 scanf

    在C99中,printf scanf 函数家族支持一个可选的 ll 长度修饰符(注意一定要小写)。这个 ll 长度修饰符可以直接出现在dioux 或是 X 格式转换说明字符前面。对于 printf 函数,这表示正在打印项目是 long long 或是 unsigned long long。对于 scanf 函数,这表示对应的参数是一个指向 long long 或者 unsigned long long 的指针。你也可以在 printf 函数的 n 转换说明符(即前面提到的dioux 或是 X)前面使用 ll 长度修饰符,来指明long long 对应的已经存储的printf参数的字符长度[b]。例如[c]
long long x;
scanf("%lld", &x);
printf("%19lld is hex %#llX\n", x, x);

Next Month

I have glossed over a few of the details concerning the rules giving the types of constants and the usual arithmetic conversions because they are best dealt with in next month's column. C99 permits implementations to add extra integer data types to the language, and the various rules concerning integers were generalized to handle the traditional integer types (char, short, int, long) and the new C99 integer types (long long and _bool) as well as implementation defined integers. Next month this will be covered, along with the new headers <inttypes.h> and <stdint.h>, which allow access to extended integers and which aid in increasing program portability.

下一个月

    我掩饰了一个有关规则提供的常量类型以及算术转换的细节,因为它们最好留待下一个月的专栏来解决。C99允许实现为语言添加额外的整数数据类型,各种有关整数的规则都被推广到处理传统整数类型(charshortintlong)和C99中新的整数类型(long long_bool),以及实现定义的整型。下一个月将涉及这些内容,同时还有新的头文件 <inttypes.h><stdint.h>,这两个头文件允许存取为了增加程序可移植性的扩展整型。

Own a copy of the Standard

You can now download a copy of the C99 Standard in Adobe PDF format for $18. (In contrast, a paper copy of the Standard costs $220!). Visit http://www.techstreet.com/ncitsgate.html, and enter 9899 in the first search box.

You might also want to pick up a copy of the C++ Standard for another $18. Search for standard 14882.

拥有一份标准的副本

    你现在可以以$18下载一份Adobe PDF格式的C99标准副本。(相比之下,纸质的副本售价为$220!)。访问http://www.techstreet.com/ncitsgate.html,在第一个搜索框中键入9899。

    你可能还想要另外一份$18的C++标准副本。请搜索标准14882。

References

[1] Dennis Ritchie. "The Development of the C Programming Language," In Bergin and Gibson, editors, History of Programming Languages (Addison Wesley, 1996). Originally in ACM SIGPLAN Notices, Vol. 28, No. 3 (March 1993).

[2] Dennis Ritchie. "Transcript of Question and Answer Session," page 696. In Bergin and Gibson, editors, History of Programming Languages (Addison Wesley, 1996). (Not part of the earlier ACM SIGPLAN Notices publication.)


Randy Meyers is consultant providing training and mentoring in C, C++, and Java. He is the current chair of J11, the ANSI C committee, and previously was a member of J16 (ANSI C++) and the ISO Java Study Group. He worked on compilers for Digital Equipment Corporation for 16 years and was Project Architect for DEC C and C++. He can be reached at rmeyers@ix.netcom.com.


    Randy Meyers 是为C、C++和JAVA提供培训和指导的顾问。他目前是ANSI C委员会J11的主席,之前是J16(ANSI C++)和ISO JAVA学习小组(ISO Java Study Group)的成员。他曾经在DEC公司(Digital Equipment Corporation)研究编译器长达16年,并且是DEC C和C++的项目架构师。可以通过以下地址与他联系:rmeyers@ix.netcom.com。

注释

[a] 这里存在一些约定,所以不必知道该返回值存储在什么地方,以ia32上的gcc为例,当一个返回值是整型时,这个返回值保存在eax中。当返回值为double时,该返回值保存在edx:eax两个寄存器中。

[b] 英语中无敌的定语从句,我花了很长时间才大致看明白他在说什么。

《C语言参考手册》上有一段话可以参考:
    长度指定符ll用于转换操作d、i、o、u、x 和 X时,表示转换参数的类型是long long或unsigned long long。当它用于n转换时,它指定了参数具有long long *类型。ll长度指定符是C99新增的。

以ia-32上gcc为例,我猜测是这个意思:

原文的意思大致就是说,加上ll表示压入参数的大小是8。比如%d表示一个4字节的整型,%lld表示一个8字节的整型。

这要从参数传递开始说起,printf 的原型为int printf(const char *restrict format, ...),第一个参数是const char *,一个指向字符串的指针,后面跟上可变参数。调用printf之前首先压入参数,例如下面的代码:
int i = 1;
long long l = 1;

printf ("%d %lld \n", i, l);
i++;
压参从右往左开始,首先把l压栈,大小为8字节;然后把i压栈,大小是4字节,最后把format,也就是字符串"%d %lld \n"的地址,大小为4字节,压入栈中。
接下来调用call指令,把返回地址(即printf下一条指令i++的地址)压栈,然后跳转到printf函数。

在gcc中函数的开头总会有两条这样的指令:
pushl %ebp            ;把前一个函数的ebp(栈顶)压入栈中
movl %esp, %ebp    ;把前一个函数的esp(栈底)作为当前函数的栈顶
于是%ebp起始的4个字节保存的是前一个函数的栈顶地址,%ebp+4起始的4个字节保存返回地址,%ebp + 8开始为第一个参数地址。

然后说到printf函数的实现:
注意,以下纯粹是猜测,毫无凭据,我试图以glibc的printf 实现来印证,但是最后也没法从一堆宏中找到printf最终调用了哪个函数并观察其实现。

函数printf怎么知道到底有哪些可变参数呢?这就是在format指向的字符串中指明的。在上一个例子中,format保存在%ebp + 8的位置,大小为4字节(32位机器上的指针长度),值为“%d %lld \n”的地址。如果printf还有别的参数,下一个参数的地址就是%ebp + 12。printf函数解析format指向的字符串,读到%d的时候,得知后面有一个整型的参数,整型的长度为4,于是从%ebp + 12的位置读入4个字节。然后读到%lld,得知后面还有一个long long类型的整型,于是从%ebp + 16的位置读入8个字节。假如后面还有别的参数,这个参数的地址将会从%ebp + 24开始。

[c] 在mingw32-5.1.4(gcc version 3.4.5)中,无论加上--std=c99选项与否,这段代码都打印了错误的值。而在linux环境下的gcc version 4.4.1中,无论加上--std=c99选项与否,这段代码都打印了正确的值。

原文地址

http://www.ddj.com/dept/cpp/184401323