The New C: Variable Length Arrays, Part 2

新的C语言:变长数组,第二部分

By Randy Meyers, December 01, 2001


Unlike C or C++, C99 lets you define the bounds of multidimensional arrays at run time, much to scientific programmers' delight.

 与C或C++不同,C99允许你在运行时定义多维数组的边界,这对从事科学的程序员来说是个好消息。

In my last column [1], I discussed the deficiencies of arrays in C before C99. The definitions of pointer arithmetic and the index operator in C intertwine the concepts of pointer and array. Before C99, C required that the size of objects be a compile-time constant. Since pointer arithmetic and thus array indexing depend on the size of objects, this restriction made arrays (particularly multidimensional arrays) less flexible in C than other languages.


    在我的上一个专栏中[1],我讨论了C99以前数组的不足。C中指针运算的定义与索引运算符跟指针与数组的概念纠缠在一起。在C99以前,C要求对象的大小是一个编译时常量。由于指针运算和数组索引同样地依赖于对象的代谢哦啊,该限制使C中的数组不如其他语言中的灵活。


C99 removed the restriction that the size of arrays needs to be a compile-time constant by allowing run-time expressions to be used as the bounds of an array. Arrays with run-time bounds are called VLAs (variable length arrays). VLAs have three major benefits:

  1. The size of arrays (even multidimensional arrays) can be appropriate for the problem at hand, even if that size cannot be known until run time.
  2. The C implementation automatically provides storage management for VLAs.
  3. Functions or other code that processes multidimensional arrays can be more flexible. The bounds of any dimension of the array can be passed as an argument rather than fixed at compile time.

    C99移除了数组的大小需要为编译时常量的限制,允许运行时表达式作为数组的边界。拥有运行时边界的数组叫做VLA(变长数组)。VLA有三个优点:
  1. 数组(甚至是多维数组)的大小适合手头上的问题,即使这个大小只有在运行时才能知道。
  2. C的实现自动为VAL提供存储管理。
  3. 处理多维数组的函数或是其他代码可以更灵活。任何维度的数组边界可以作为参数传入,而不是在编译时固定。

Some C programmers might be surprised that the third benefit above is lacking in pre-C99 C. The reason for this is that single-dimensional arrays are more common than multidimensional arrays in the traditional application areas of C (systems programming, application programming, embedded programming). C has always been able to handle arrays whose first (leftmost) or only dimension was not known because the size of the total array itself as opposed to the size of its elements (which might be sub-arrays in a multidimensional array) is not needed to do the pointer arithmetic underlying the index operator. However, if you have more than a single dimension and the size of the second or later dimensions was not known at compile time, then arrays in C before C99 became almost useless. Unfortunately, this situation frequently occurs in numerical programming (“you have N equations with M variables...”). My previous column [1] explains this problem in more detail. While I believe that column will make you appreciate VLAs more, this month’s column is understandable without that background.


    一些C程序员可能很惊讶,在C99以前的C中缺乏上述的第三个优点。这个原因是在C传统的应用程序领域(系统编程、应用程序编程、嵌入式编程)中一维数组比多维数组更常见。C总是能够处理第一个维度(最左边的)未知的或是仅有的一个维度也未知的数组,因为相对于数组元素的大小来说,不必知道整个数组本身的大小就可以进行指针运算。然而,如果你不只有一个维度,并且第二维及后面的维度在编译时未知,那么这样的数组在C99以前的C中几乎毫无用处。不幸的是,这样的情形在数组编程中经常发生(“你有N个包含M个变量的方程式……”)。我的前一个专栏[1]更详细地解释了这个问题。虽然我相信该专栏会使你能更深入地理解VLA,没有这个背景知识也是也可以理解本月专栏的。

VLAs

变长数组

As we will see in my next column, the new VLA feature affects not only arrays, but also pointers, function arguments and parameters, and even typedefs. However, in this column, we will look at arrays of variable length themselves.


    就如我们将会在我的下一个专栏中看到的,新的VLA特性不仅仅影响数组,还影响了指针、函数实参和形参[a]、甚至是typedef。然而,本专栏中,我们将着眼于变长数组本身。


If an array has a run-time expression as its bounds, then it is a VLA. If an array has a constant expression [2] (an expression that can be evaluated at compile time) as its bounds or empty braces (only permitted in a few contexts), then the array is not a VLA and has the same semantics such arrays have always had in C.


    如果一个数组以一个运行时表达式作为边界,那么它就是一个变长数组。如果一个数组以常量表达式[2](能够在编译时求值的表达式)或是空括号(只在少数情况下允许)作为边界,那么这个数组不是VLA并且跟一直在C中的数组拥有相同的语义。


The bounds of a VLA must have integer type and must evaluate to a value greater than zero. If the bounds is less than one, then the program has a run-time error that the C implementation has no obligation to catch: the program might terminate immediately, continue to run with mysterious behavior, or even appear to work. The bounds expression of a VLA can use any operator, even assignment and function call. However, if the bounds expression is a comma expression, it must be enclosed in parentheses.


    VLA的边界必须是整型,并且所求得的值必须大于0。如果边界小于0,那么程序出现一个C的实现没有义务去捕捉的运行时错误:这个程序可能马上终止、以神秘的行为继续运行、或是看上去工作。VLA的边界表达式可以使用任何运算符,甚至是赋值和函数调用。然而,如果边界表达式是一个逗号表达式,一定要用园括号括起来。


VLAs must be auto (as opposed to static or extern) variables in a block. When the block containing the array is entered and the declaration of the array is reached, the implementation evaluates the bounds expressions and allocates the array with that bounds. When the array goes out of scope, the C implementation automatically deallocates it.


    VLA必须是块中的自动(相对于静态或外部)变量。当进入包含数组的块并到达数组的声明时,实现对边界表达式求值并以这个边界分配空间。当数组离开作用域时,C实现自动释放它。


#define BOUNDS(a) ((sizeof (a))/(sizeof ((a)[0])))

void example1(int n)
{
    double array[n][n+1];

    printf("sizeof array = %u\n",
        (unsigned) sizeof array);
    printf("1st dimension = %u\n",
        (unsigned) BOUNDS(array));
    printf("2nd dimension = %u\n",
        (unsigned) BOUNDS(array[0]));
}
In the above example, array is a VLA since at least one of its dimensions is a run-time expression (in fact, both bounds are). When example 1 is called and reaches the declaration of array, the implementation will evaluate the bounds expressions, calculate the sizes of the dimensions of the array, and save those results for later use. It will then allocate space for array. If the argument n to the function had the value 3, then array will act as if it had been declared as:

    上例中,array 是一个VLA因为它至少有一个维度是运行时表达式(事实上,两个边界都是)。当调用 example1 并且控制到达 array 的声明时,实现将对边界表达式求值,计算出数组维度的大小,并保存这些结果供后续使用。然后它为 array 分配空间。如果函 数的实参 n 的值为 3,那么数组的行为就如同这样声明的:
double array[3][4]; 
Typical implementations will allocate the space for the array on the stack. This is a very efficient operation since it only involves adding or subtracting a value from the stack pointer. However, at least one implementation stores VLAs on a heap and generates function calls to allocate and deallocate VLAs. Unlike non-variable length arrays, sizeof is a run-time operation on VLAs. The C implementation will use the bounds information it saves when it allocates the array to compute sizeof when needed. If you assume that sizeof (double) is 8 and that the parameter n has the value 3, then example1(3) prints:

    通常实现会在栈上为数组分配空间。这是非常有效率的操作,因为它只涉及对栈指针值的加减。然而,至少有一个实现把VLA存储在堆上,并产生函数调用来分配和收回VLA。不同于非变长数组,在VLA上 sizeof 是一个运行时表达式。C实现会在需要的时候使用它分配数组时存储的边界信息来计算 sizeof 。如果你假设 sizeof(double)8 并且形参 n 的值为3,那么 expample1(3) 打印出:
sizeof array = 96
1st dimension = 3
2nd dimension = 4
The BOUNDS macro is a C idiom used by many programmers to compute the bounds of an array by dividing the size of the array by the size of its element. The BOUNDS macro works for any array, whether variable length or not, and uses no C99-specific features. If the array is not variable length, then BOUNDS is a compile-time constant expression. If the array is variable length, then BOUNDS is a run-time expression. BOUNDS can calculate the bounds of any dimension of a multidimensional array. If x is a multidimensional array, then BOUNDS(x) is the first dimension, BOUNDS(x[0]) is the second dimension, BOUNDS(x[0][0]) is the third dimension, etc. In general, it is more efficient for you to save your own copy of the bounds of a VLA, but if you fail to do so, you can use the BOUNDS macro to calculate the bounds based on the C implementation’s saved information. The size and bounds of a VLA are fixed from the point the declaration of the array is executed until the array goes out of scope.

    BOUNDS 宏是许多程序员的习惯用法,以数组的大小除以它的元素大小计算出数组的边界。BOUNDS 宏作用于任何数组,无论是否变长,并且使用非C99指定的特性。如果数组不是变长的,BOUNDS 就是一个编译时常量表达式。如果数组是变长的,BOUNDS 就是一个运行时表达式。BOUNDS可以计算出多维数组任何一维的边界。如果 x 是一个多维数组,那么 BOUNDS(x) 是第一维,BOUNDS(x[0]) 是第二维,BOUNDS(x[0][0]) 是第三维,等等。通常,自己保存VLA边界的副本是最高效的,但是如果你没能这么做,你可以使用 BOUNDS 宏根据 C 实现保存的信息计算出边界。VLA 边界的大小从执行数组声明点开始到数组超出作用域之间都是固定的。
void example2(int n)
{
    double array[n][n+1];
    n += 10;
    printf("sizeof array = %u\n",
        (unsigned) sizeof array);
    printf("1st dimension = %u\n",
        (unsigned) BOUNDS(array));
    printf("2nd dimension = %u\n",
        (unsigned) BOUNDS(array[0]));
}
In example2, the size of array does not change even though the variable originally used to calculate its dimensions has a new value. The call example2(3) prints the same results as example1(3). The C implementation uses the information it saves when the array is declared whenever it needs to know the size of the array or any of its dimensions. It does not reevaluate the original bounds expressions. There is a sequence point at the end of a full declarator of an object in a declaration, but there are no sequence points associated with nested array declarators. This means that a compiler is free to evaluate the bounds expressions of a multidimensional VLA in any order it chooses, but all of the bounds expressions of one VLA must complete their evaluations before beginning the evaluations of bounds expressions of the next object. For example,

    在 example2 中,array 的大小没有改变,即使原本用来计算它的维度的变量有了一个新值。调用 example2(3) 打印跟 example1(3) 相同的结果。C的实现在什么时候需要知道数组中任意维度大小时都使用数组声明时保存的信息。它不对原来的边界表达式重新求值。声明中的一个对象的完整声明器后面有一个序列点,但是没有序列点跟嵌套的数组声明器相关。这表示编译器可以自由地以任何它选择的顺序对多维VLA的边界表达式求值,但是一个VLA的所有边界表达式的求值必须在开始对下一个对象的边界表达式求值以前完成。例如:
int a1[f()][g()][h()], a2[k()];
The compiler might call f, g, and h in any order, but it must have called all three before calling k. When a VLA goes out of scope or its lifetime ends, the VLA is deallocated. Two more common ways for this to happen are by returning from a function or by exiting the block containing the VLA. However, C99 permits mixing statements and declarations [2], and the lifetime of an object in C99 ends if a goto branches backwards to a label before the object’s declaration [2].

    编译器可能以任何顺序调用f、g、h,但是它必须在调用 k 以前调用这三个函数。当一个VLA超出作用范围或是它的生存期结束了,该VLA就被释放了。要做到这一点,另外两个常见的方式是从一个函数返回(return)或是从包含VLA的块中退出(exit)。然而,C99允许混合语句和声明[2],如果一个goto 分支后退到在一个对象声明之前的标签,C99中该对象的生存期就结束了[2]
void example3()
{
    int i = 1;
    char a1[i];
    ++i;
loop:
    char a2[i];
    ++i;
    printf("sizeof a1 = %u\n",
        (unsigned) sizeof a1);
    printf("sizeof a2 = %u\n",
        (unsigned) sizeof a2);
    printf("i = %d\n", i);
    if (i < 4)
        goto loop;
    printf("last sizeof a2 = %u\n",
        (unsigned) sizeof a2);
}
In example3, goto loop causes a2’s lifetime to end immediately before branching to loop. In contrast, the lifetimes of a1 and i do not end since their declarations appear before the label loop. After branching to loop, the declaration of a2 is executed for the second time with the latest value for i. This causes this program to print the following:

    在 example3 中,goto loop 导致 a2 的生存期在分支跳转到 loop 前马上终结,a1i 的生存期并不终结,因为他们的声明出现在标签 loop 以前。在分支跳准到 loop 以后,a2 的声明又以 i 的最新值执行第二次。这使得程序打印出下面的:
sizeof a1 = 1
sizeof a2 = 2
i = 3
sizeof a1 = 1
sizeof a2 = 3
i = 4
last sizeof a2 = 3

Note that the dynamic ending of an object’s lifetime does not mean that it is out of scope and cannot be referenced by later code. In example3, the lifetime of a2 only ends if the goto is executed. If the goto is not executed, the lifetime does not end, and since a2 is still in scope, it may legitimately be used by later code. a2 has the size and contents that it had before the if statement skipped over the goto. (Interestingly, C++ constructors and destructors behave exactly this same way: a backwards goto causes an automatic object’s destructor to execute, and the object will be constructed again when its declaration is executed.)


    注意,对象生存期的动态终结并不表示它超出作用域以及不能被以后的代码引用。在 example3 中, a2 的生存期只有在执行 goto 以后才终结。如果 goto 没有执行, 生存期就没有终结,并且因为 a2 仍然在作用域,它可以在后面的代码中合法地使用。a2 的大小和内容跟它在跳过 goto if 语句之前相同(有趣的是,C++ 的构造函数和析构函数行为正如这种相同的方式:一个后退的 goto 导致一个自动对象的析构函数执行,并且在执行该目标的声明时再次构造它)。

Limitations on VLAs

变长数组的限制



Next Month

Next month’s column will cover pointers to VLAs and VLAs as function parameters (a special case of pointers to VLAs).

References

[1] Randy Meyers. “The New C: Why Variable Length Arrays?,” C/C++ Users Journal, October 2001.

[2] Randy Meyers. “The New C: Declarations and Initializations,” C/C++ Users Journal, April 2001, <www.cuj.com/reference/articles/2001/0104/0104d/0104d.htm>.

[3] Randy Meyers. “The New C: Compound Literals,” C/C++ Users Journal, June 2001.


Randy Meyers is a consultant providing training and mentoring in C, C++, and Java. He is the current chair of J11, the ANSI C committee, and previously was a member of J16 (ANSI C++) and the ISO Java Study Group. He worked on compilers for Digital Equipment Corporation for 16 years and was Project Architect for DEC C and C++. He can be reached at rmeyers@ix.netcom.com.


Randy Meyers 是为C、C++和JAVA提供培训和指导的顾问。他目前是ANSI C委员会J11的主席,之前是J16(ANSI C++)和ISO JAVA学习小组(ISO Java Study Group)的成员。他曾经在DEC公司(Digital Equipment Corporation)研究编译器长达16年,并且是DEC C和C++的项目架构师。可以通过以下地址与他联系:rmeyers@ix.netcom.com。

注释

[a] 原文为arguments and parameters,argument和parameter并不完全是同一个概念,argument是调用函数时实际传入的参数、parameter是函数声明中的参数。许多翻译,包括本系列文章中,都把这两个词统一地翻译为参数,但是这个地方需要区别对待。见:http://www.devx.com/tips/Tip/13049


原文地址

http://www.drdobbs.com/cpp/184401468