In C99, the bounds of an array can now be a run-time expression. Such arrays are called variable length arrays or VLAs for short. VLAs can simplify storage management in a program and allow the use of the normal array notation even when the problem to be solved requires arrays to have different sizes at different times. To appreciate VLAs, you need to understand the problems in C90 that they remedy. To understand the problems and the way that VLAs work, you need to understand the relationship between pointers and arrays in C. This month’s column concentrates on the above topics. Next month, VLAs will be presented in their entirety.
C99中,数组的边界现在可以是运行时表达式。这样的数据叫做变长数组,缩写为VLA。VLA可以简化程序的内存管理,并且可以使用通常的数组表示法,即使是在解决问题的时候需要数组在不同的时候有不同的大小。想要领会VLA,你需要了解他们所补救的C90中的问题。想要明白这个问题以及VLA工作的方式,你需要理解C语言中指针和数组的关系。本月的专栏集中于以上话题。下一个月,VLA将以完整的形式出现。
If you ask programmers to name the greatest strength of C, many of them will say pointers. There is good reason for this, since pointers in C have the following four powerful features:
While other languages have one or more of these pointer features, C was the first language to have them all. For example, some earlier systems implementation languages had pointer arithmetic, but that pointer arithmetic involved adding or subtracting the correct offset in bytes between objects. If C had adopted this model, then incrementing a pointer to an int would look like p + sizeof(int). , pointer arithmetic in C automatically multiplies the integer added or subtracted from a pointer by the size of the object being pointed to. When two pointers are subtracted, the result is automatically divided by the size of the objects. Without the automatic multiplying and dividing in C pointer arithmetic, programmers would have a little more work to do, have a few more chances for error, and have a needless temptation to write non-portable code. (Lazy programmers might tire of typing sizeof (int) when they could just type 4.)
在其他语言只有这些指针特性中的一个或更多的时候,C是第一个包含以上全部特性的语言。例如,例如,一些早期的系统实现语言拥有指针运算,但是涉及加法和减法的指针运算结果是对象之间以字节计算的偏移量。如果C采用这种模型,那么一个指向 int 的指针的增量将看起来 p + sizeof(int)。作为代替,C中的指针运算自动为这个与指针相加或者相加的整数乘以该指针指向对象的大小。C中的指针运算没有这种自动的乘和除的话,程序员将有更多工作要做,将有更多的几率导致错误,并有不必要的诱惑来写不可移植的代码(懒惰的程序员也许厌倦了键入 sizeof(int),当他可以直接键入 4 的时候)。
One of the consequences of C’s pointer arithmetic was that arrays and pointers became interrelated concepts. In the definition of C, the [] operator is defined in terms of pointer arithmetic: a[i] means the same as *(a + i). In turn, pointer arithmetic is defined in terms of the storage layout of arrays. If a pointer p points to an element of an array, then adding n to the pointer makes it point at the nth element following the original one. The equiVLAence of indexing and pointer arithmetic answers one of the common questions asked by new C programmers, “Why do array indexes start at zero rather than one?” If pointer p points at the first element of an array, then p + 0 still points at that first element. Since an array name is just a pointer to the first element of the array when used with the index operator, and since indexing is by definition pointer arithmetic, a[0] must reference the first element of the array a.
C指针运算的一个重要结果就是,数组和指针变成了相互关联的概念。在C的定义中,[] 运算符是依据指针运定义的:a[i] 的意义与 *(a + i) 相同。相应的,指针运算是依照数组的存储布局来定义的。如果指针 p 指向一个数组中的某个元素,那么对这个指针加 n 使它指向原来元素后面的第 n 个元素。索引和指针运算的等价回答了一个新的C程序员常问的问题,“为什么数组索引以零开始而不是一?”如果指针 p 指向数组的第一个元素,那么 p + 0 仍然指向第一个元素。因为在使用索引运算符的时候,数据的名字就是一个指向数组第一个元素的指针,又因为索引是由指针运算定义的,a[0] 一定引用了数组 a 的第一个元素。
Pointers are also very important to the processing of arrays. Since arrays cannot be passed by VLAue to a function, a function that operates on an array is passed a pointer to the first element of the array. The language facilitates this by automatically rewriting the type of a parameter declared to be an array into a pointer to the array’s element type. An array name used as an argument to a function is automatically converted to a pointer to the first element of the array. A pointer may be indexed by the [] operator since pointer arithmetic is the same for both arrays and pointers. The relationship between arrays and pointers is such that it really doesn’t matter whether code operating on an “array” is really dealing with an array or a pointer to the first element of an array.
指针在处理数组的时候也是非常重要的,因为数组不能按值传递给一个函数,操作一个数组的函数被传入一个指向该数组第一个元素的指针。语言通过自动把申明成数组参数的类型重写为一个指向该数组元素的类型。函数中作为参数使用的数组名称自动转换成指向数组第一个元素的指针。指针也可以通过 [] 运算符索引,因为指针运算对数组和指针都是相同的。数组和指针的关系是这样的:它确实不关心代码操作的一个“数组”是真正按照数组处理的还是按照指向数组第一个元素的指针处理的。
Given that pointers are one of the strongest features of C, and that arrays are closely tied to pointers, it is surprising that C90 arrays are somewhat weak compared to other languages. The source of this weakness is two places in C90 where the size of objects must be known at compile time. Interestingly, these places hardly affect single dimension arrays, but greatly impact multidimensional arrays.
考虑到指针是C中最强大的特性之一,并且数组与指针紧密相关,令人惊讶的是 C90 的数组有点弱于其他语言的。这个弱点来自两个地方,在C90中对象的大小必须在编译时确定。有趣的是,这些地方对一维数组几乎没有影响,但是极大地影响了多维数组。
The first place is that the size of all types must be known at compile time. (There are incomplete types, not discussed further in this article, whose size need not be known at compile time, but their use is so restricted that you can neither fetch or store a VLAue of incomplete type.) This means that you cannot declare an array whose size varies with the needs of the program as it runs. However, if you only need a single dimensional array, you can get the effect of an array with variable bounds by dynamically allocating it. For example, the following dynamically allocates an “array” of n floats and initializes all of the elements with 1.0.
第一个地方是所有类型的大小必须在编译时已知。(存在不完整类型,但是不会在本文中讨论,它们的大小不必在编译时确定,但是他们的用法如此受限以至于你不能存取一个不完整类型的值)。这意味着你不能声明一个在程序运行时根据需要大小不同的数组。然而,如果你只需要一个一维数组,你可以达到这个效果,通过动态分配得到一个可变边界的数组。例如,下面动态分配了一个 n 个浮点数的“数组”,并把所有元素初始化为 1.0 。
float *a;Note that even though a is a pointer, it can be indexed like an array using the [] operator because of the definition of [] in terms of pointer arithmetic. This approach works well for single dimensional arrays because, even though we have to dynamically allocate and free the storage ourselves, we can pretend that our pointer is an array in the rest of the code. Unfortunately, this does not work as well for multidimensional arrays, which brings us to the to the second place where C90 requires the size of objects to be a compile-time constant: pointer arithmetic. The above code works because we do not need to know the size of the entire array to index into it, only the size of an element. However, if you have a two-dimensional array with bounds m and n, int a[m][n], that means that a is an array of [m] elements each of which is an array of [n] ints. EVLAuating the expression a[i] requires knowing the size of the element type, which is the size of an array of [n] ints. Since n is a run-time expression, the size of the array that is an element of the multidimensional array is a run-time expression, and it becomes impossible to calculate the first index operator given the restrictions in C90. The workaround in C90 for this problem is to take over not only the storage management of the multidimensional array, but its index calculations as well. The code looks like:
a = malloc(n * sizeof(float));
for (i = 0; i < n; ++i)
a[i] = 1.0;
float *a;Note that uses of this “multidimensional array” do not look very natural. Instead of having two indexes for the two-dimensional array, we only have one. We manually have to multiply the “first” index by the number of elements in the second dimension, since each index VLAue of the first dimension means skipping over the number of elements in the second dimension. The index expression looks ugly and gets uglier if there are more than two dimensions. In a three-dimensional array with bounds [L][M][N], the first index is multiplied by M*N and the second index is multiplied by N. The C90 requirements for compile-time constant array bounds results in exactly the sorts of manual multiplies, verbosity, and opportunities for error that C so cleverly avoided in pointer arithmetic.
a = malloc(m * n * sizeof(float));
for (i = 0; i < m; ++i)
for (j = 0; j < n; ++j)
a[i*n + j] = 1.0;
void f(int m, int n)The correct amount of storage for a VLA is automatically allocated when the block containing the array is entered and the declaration of the VLA is reached; the storage is automatically deallocated when leaving the block. Thus, VLAs can simplify storage management of programs since some uses that required manual use of malloc and free in C90 can be replaced by VLAs. For example, instead of code like:
{
float a[m][n];
int i, j;
for (i = 0; i < m; ++i)
for (j = 0; j < n; ++j)
a[i][j] = 1.0;
}
void process(int n)you could write:
{
// Set up a buffer of n characters 设置 n 个字符的缓冲区
char *b = malloc(n);
// do the work 工作
...
// free the buffer 释放缓冲区
free(b);
}
void process(int n)
{
// Set up a buffer of n characters 设置 n 个字符的缓冲区
char b[n];
// do the work 工作
...
}
The addition of variable length arrays to C99 finally gives arrays the same flexibility and elegance that pointer types in C have traditionally had. The compiler now automatically handles the storage management and the complexities of index calculations of arrays whose bounds are not compile-time constants. The result is that programmers can use the natural array notation even when the size of arrays is only known at run time. Next month’s column will cover the full semantics of variable length arrays and pointers to variably length arrays, including variable length arrays that are function parameters.
C99加入的变长数组最终给数组带来与C的指针类型传统上相同的灵活和优雅。编译器现在可以自动处理存储管理,以及计算边界为非编译时常量数组索引的复杂过程。结果是,程序要可以在只有运行才时知道数组大小的时候使用自然的数组表示法。下一个月的专栏将涵盖变长数组的完整语义,以及指向变长数组的指针,包括作为函数参数的变长数组。
Randy Meyers is consultant providing training and mentoring in C, C++, and Java. He is the current chair of J11, the ANSI C committee, and previously was a member of J16 (ANSI C++) and the ISO Java Study Group. He worked on compilers for Digital Equipment Corporation for 16 years and was Project Architect for DEC C and C++. He can be reached at rmeyers@ix.netcom.com.
Randy Meyers 是为C、C++和JAVA提供培训和指导的顾问。他目前是ANSI C委员会J11的主席,之前是J16(ANSI C++)和ISO JAVA学习小组(ISO Java Study Group)的成员。他曾经在DEC公司(Digital Equipment Corporation)研究编译器长达16年,并且是DEC C和C++的项目架构师。可以通过以下地址与他联系:rmeyers@ix.netcom.com。
http://www.drdobbs.com/cpp/184401444