In my last column [1], I discussed the deficiencies of arrays in C before C99. The definitions of pointer arithmetic and the index operator in C intertwine the concepts of pointer and array. Before C99, C required that the size of objects be a compile-time constant. Since pointer arithmetic and thus array indexing depend on the size of objects, this restriction made arrays (particularly multidimensional arrays) less flexible in C than other languages.
在我的上一个专栏中[1],我讨论了C99以前数组的不足。C中指针运算的定义与索引运算符跟指针与数组的概念纠缠在一起。在C99以前,C要求对象的大小是一个编译时常量。由于指针运算和数组索引同样地依赖于对象的代谢哦啊,该限制使C中的数组不如其他语言中的灵活。
C99 removed the restriction that the size of arrays needs to be a compile-time constant by allowing run-time expressions to be used as the bounds of an array. Arrays with run-time bounds are called VLAs (variable length arrays). VLAs have three major benefits:
Some C programmers might be surprised that the third benefit above is lacking in pre-C99 C. The reason for this is that single-dimensional arrays are more common than multidimensional arrays in the traditional application areas of C (systems programming, application programming, embedded programming). C has always been able to handle arrays whose first (leftmost) or only dimension was not known because the size of the total array itself as opposed to the size of its elements (which might be sub-arrays in a multidimensional array) is not needed to do the pointer arithmetic underlying the index operator. However, if you have more than a single dimension and the size of the second or later dimensions was not known at compile time, then arrays in C before C99 became almost useless. Unfortunately, this situation frequently occurs in numerical programming (“you have N equations with M variables...”). My previous column [1] explains this problem in more detail. While I believe that column will make you appreciate VLAs more, this month’s column is understandable without that background.
一些C程序员可能很惊讶,在C99以前的C中缺乏上述的第三个优点。这个原因是在C传统的应用程序领域(系统编程、应用程序编程、嵌入式编程)中一维数组比多维数组更常见。C总是能够处理第一个维度(最左边的)未知的或是仅有的一个维度也未知的数组,因为相对于数组元素的大小来说,不必知道整个数组本身的大小就可以进行指针运算。然而,如果你不只有一个维度,并且第二维及后面的维度在编译时未知,那么这样的数组在C99以前的C中几乎毫无用处。不幸的是,这样的情形在数组编程中经常发生(“你有N个包含M个变量的方程式……”)。我的前一个专栏[1]更详细地解释了这个问题。虽然我相信该专栏会使你能更深入地理解VLA,没有这个背景知识也是也可以理解本月专栏的。
As we will see in my next column, the new VLA feature affects not only arrays, but also pointers, function arguments and parameters, and even typedefs. However, in this column, we will look at arrays of variable length themselves.
就如我们将会在我的下一个专栏中看到的,新的VLA特性不仅仅影响数组,还影响了指针、函数实参和形参[a]、甚至是typedef。然而,本专栏中,我们将着眼于变长数组本身。
If an array has a run-time expression as its bounds, then it is a VLA. If an array has a constant expression [2] (an expression that can be evaluated at compile time) as its bounds or empty braces (only permitted in a few contexts), then the array is not a VLA and has the same semantics such arrays have always had in C.
如果一个数组以一个运行时表达式作为边界,那么它就是一个变长数组。如果一个数组以常量表达式[2](能够在编译时求值的表达式)或是空括号(只在少数情况下允许)作为边界,那么这个数组不是VLA并且跟一直在C中的数组拥有相同的语义。
The bounds of a VLA must have integer type and must evaluate to a value greater than zero. If the bounds is less than one, then the program has a run-time error that the C implementation has no obligation to catch: the program might terminate immediately, continue to run with mysterious behavior, or even appear to work. The bounds expression of a VLA can use any operator, even assignment and function call. However, if the bounds expression is a comma expression, it must be enclosed in parentheses.
VLA的边界必须是整型,并且所求得的值必须大于0。如果边界小于0,那么程序出现一个C的实现没有义务去捕捉的运行时错误:这个程序可能马上终止、以神秘的行为继续运行、或是看上去工作。VLA的边界表达式可以使用任何运算符,甚至是赋值和函数调用。然而,如果边界表达式是一个逗号表达式,一定要用园括号括起来。
VLAs must be auto (as opposed to static or extern) variables in a block. When the block containing the array is entered and the declaration of the array is reached, the implementation evaluates the bounds expressions and allocates the array with that bounds. When the array goes out of scope, the C implementation automatically deallocates it.
VLA必须是块中的自动(相对于静态或外部)变量。当进入包含数组的块并到达数组的声明时,实现对边界表达式求值并以这个边界分配空间。当数组离开作用域时,C实现自动释放它。
In the above example, array is a VLA since at least one of its dimensions is a run-time expression (in fact, both bounds are). When example 1 is called and reaches the declaration of array, the implementation will evaluate the bounds expressions, calculate the sizes of the dimensions of the array, and save those results for later use. It will then allocate space for array. If the argument n to the function had the value 3, then array will act as if it had been declared as:#define BOUNDS(a) ((sizeof (a))/(sizeof ((a)[0])))
void example1(int n)
{
double array[n][n+1];
printf("sizeof array = %u\n",
(unsigned) sizeof array);
printf("1st dimension = %u\n",
(unsigned) BOUNDS(array));
printf("2nd dimension = %u\n",
(unsigned) BOUNDS(array[0]));
}
double array[3][4];Typical implementations will allocate the space for the array on the stack. This is a very efficient operation since it only involves adding or subtracting a value from the stack pointer. However, at least one implementation stores VLAs on a heap and generates function calls to allocate and deallocate VLAs. Unlike non-variable length arrays, sizeof is a run-time operation on VLAs. The C implementation will use the bounds information it saves when it allocates the array to compute sizeof when needed. If you assume that sizeof (double) is 8 and that the parameter n has the value 3, then example1(3) prints:
sizeof array = 96The BOUNDS macro is a C idiom used by many programmers to compute the bounds of an array by dividing the size of the array by the size of its element. The BOUNDS macro works for any array, whether variable length or not, and uses no C99-specific features. If the array is not variable length, then BOUNDS is a compile-time constant expression. If the array is variable length, then BOUNDS is a run-time expression. BOUNDS can calculate the bounds of any dimension of a multidimensional array. If x is a multidimensional array, then BOUNDS(x) is the first dimension, BOUNDS(x[0]) is the second dimension, BOUNDS(x[0][0]) is the third dimension, etc. In general, it is more efficient for you to save your own copy of the bounds of a VLA, but if you fail to do so, you can use the BOUNDS macro to calculate the bounds based on the C implementation’s saved information. The size and bounds of a VLA are fixed from the point the declaration of the array is executed until the array goes out of scope.
1st dimension = 3
2nd dimension = 4
void example2(int n)In example2, the size of array does not change even though the variable originally used to calculate its dimensions has a new value. The call example2(3) prints the same results as example1(3). The C implementation uses the information it saves when the array is declared whenever it needs to know the size of the array or any of its dimensions. It does not reevaluate the original bounds expressions. There is a sequence point at the end of a full declarator of an object in a declaration, but there are no sequence points associated with nested array declarators. This means that a compiler is free to evaluate the bounds expressions of a multidimensional VLA in any order it chooses, but all of the bounds expressions of one VLA must complete their evaluations before beginning the evaluations of bounds expressions of the next object. For example,
{
double array[n][n+1];
n += 10;
printf("sizeof array = %u\n",
(unsigned) sizeof array);
printf("1st dimension = %u\n",
(unsigned) BOUNDS(array));
printf("2nd dimension = %u\n",
(unsigned) BOUNDS(array[0]));
}
int a1[f()][g()][h()], a2[k()];The compiler might call f, g, and h in any order, but it must have called all three before calling k. When a VLA goes out of scope or its lifetime ends, the VLA is deallocated. Two more common ways for this to happen are by returning from a function or by exiting the block containing the VLA. However, C99 permits mixing statements and declarations [2], and the lifetime of an object in C99 ends if a goto branches backwards to a label before the object’s declaration [2].
void example3()In example3, goto loop causes a2’s lifetime to end immediately before branching to loop. In contrast, the lifetimes of a1 and i do not end since their declarations appear before the label loop. After branching to loop, the declaration of a2 is executed for the second time with the latest value for i. This causes this program to print the following:
{
int i = 1;
char a1[i];
++i;
loop:
char a2[i];
++i;
printf("sizeof a1 = %u\n",
(unsigned) sizeof a1);
printf("sizeof a2 = %u\n",
(unsigned) sizeof a2);
printf("i = %d\n", i);
if (i < 4)
goto loop;
printf("last sizeof a2 = %u\n",
(unsigned) sizeof a2);
}
sizeof a1 = 1
sizeof a2 = 2
i = 3
sizeof a1 = 1
sizeof a2 = 3
i = 4
last sizeof a2 = 3
Note that the dynamic ending of an object’s lifetime does not mean that it is out of scope and cannot be referenced by later code. In example3, the lifetime of a2 only ends if the goto is executed. If the goto is not executed, the lifetime does not end, and since a2 is still in scope, it may legitimately be used by later code. a2 has the size and contents that it had before the if statement skipped over the goto. (Interestingly, C++ constructors and destructors behave exactly this same way: a backwards goto causes an automatic object’s destructor to execute, and the object will be constructed again when its declaration is executed.)
注意,对象生存期的动态终结并不表示它超出作用域以及不能被以后的代码引用。在 example3 中, a2 的生存期只有在执行 goto 以后才终结。如果 goto 没有执行, 生存期就没有终结,并且因为 a2 仍然在作用域,它可以在后面的代码中合法地使用。a2 的大小和内容跟它在跳过 goto 的 if 语句之前相同(有趣的是,C++ 的构造函数和析构函数行为正如这种相同的方式:一个后退的 goto 导致一个自动对象的析构函数执行,并且在执行该目标的声明时再次构造它)。
Next month’s column will cover pointers to VLAs and VLAs as function parameters (a special case of pointers to VLAs).
[1] Randy Meyers. “The New C: Why Variable Length Arrays?,” C/C++ Users Journal, October 2001.
[2] Randy Meyers. “The New C: Declarations and Initializations,” C/C++ Users Journal, April 2001, <www.cuj.com/reference/articles/2001/0104/0104d/0104d.htm>.
[3] Randy Meyers. “The New C: Compound Literals,” C/C++ Users Journal, June 2001.
Randy Meyers is a consultant providing training and mentoring in C, C++, and Java. He is the current chair of J11, the ANSI C committee, and previously was a member of J16 (ANSI C++) and the ISO Java Study Group. He worked on compilers for Digital Equipment Corporation for 16 years and was Project Architect for DEC C and C++. He can be reached at rmeyers@ix.netcom.com.
Randy Meyers 是为C、C++和JAVA提供培训和指导的顾问。他目前是ANSI C委员会J11的主席,之前是J16(ANSI C++)和ISO JAVA学习小组(ISO Java Study Group)的成员。他曾经在DEC公司(Digital Equipment Corporation)研究编译器长达16年,并且是DEC C和C++的项目架构师。可以通过以下地址与他联系:rmeyers@ix.netcom.com。
[a] 原文为arguments and parameters,argument和parameter并不完全是同一个概念,argument是调用函数时实际传入的参数、parameter是函数声明中的参数。许多翻译,包括本系列文章中,都把这两个词统一地翻译为参数,但是这个地方需要区别对待。见:http://www.devx.com/tips/Tip/13049