Every C programmer knows that pointers and arrays are closely related. In fact, many students learning C wonder how they differ once they are told that you can apply the square bracket indexing operator to both arrays and pointers, and that an array name becomes a pointer to the first element of the array except when the array name is the operand of sizeof or the address of operator (unary &).
每一个C程序员都知道指针和数组是密切相关的。方括号索引运算符可以作用于指针和数组,并且一个数组的名字变成指向该数组第一个元素的指针除非数组的名字是 sizeof 或是运算符的地址(一元 &)操作数。事实上,许多学习C的学生在被告知以上事实时都会疑惑它们的区别是什么。
Pointer arithmetic is one of the reasons why arrays and pointers are intertwined [1]; another reason is that many operations on arrays cannot be performed on arrays directly. In particular, you cannot pass an entire array as an argument to a function. Instead, a pointer to the array is passed, and the function operates on the array indirectly through the pointer. So close is the relationship between arrays and pointers that C syntax and semantics somewhat obscure the fact that C lacks array parameters.
指针运算是为何数组与指针相互纠缠[1]的原因之一;另一个原因是许多在数组上的运算不能直接数组上进行。特别是,你不能把整个数组作为实参传递给一个函数。作为替代,传递的是一个指向该数组的指针,并且函数通过这个指针间接地操作该数组。数组和指针之间的关系如此紧密以致于C的语法和语义有点掩盖C缺少数组形参这个事实。
It should not be surprising that the new VLA (variable length array) feature in C99 [2] has the companion feature of pointer to variable array, and that one of the useful places to use a pointer to a VLA is as a parameter to a function.
不应该感到奇怪,C99中的新的VLA (变长数组)特性[2] 包含相伴的指向可变数组的指针,并且其中一个有用的地方是,使用一个指向VLA的指针作为函数的参数。
A pointer to VLA can be declared using the syntax similar to pointer to (normal) array:
一个指向变长数组的指针可以通过跟指向(普通)数组类似的语法来声明:
int (*pa)[10];pa is a pointer to an array of 10 ints. pvla is a pointer to a VLA of the number of ints given by expression f() when the declaration is reached in the normal flow of control in the program. The difference between a pointer to an array and a pointer to a VLA is that the bounds of the (normal) array is a constant expression [3] while the bounds of the VLA is a run-time expression. Normally such a small difference between a new language feature and an old feature would mean that programmers would have little trouble understanding the new feature. Unfortunately, even though pointers to arrays date back to early C, many programmers are unfamiliar with them. There are two reasons for this unfamiliarity. First, as we will see in the next section, pointers to arrays most naturally occur as function parameters, and C syntax and semantics handle this with so much grace that many programmers fail to notice. Second, while pointers to arrays can be used to process a single dimensional array, it is more natural in C to process such an array using a pointer to the element type. Consider Listing 1, which initializes the elements of an array and a VLA to 1 indirectly through pointers. The pattern in this code should look familiar to even a programmer with little experience with pointers to arrays. A pointer is declared. The pointer is assigned or initialized with a pointer to the object that it is to operate on, usually by applying the & operator to the object to be accessed indirectly. The * operator is applied to the pointer, and the result of the * operator is treated as if it was the original object referenced by the pointer. The only unusual thing about Listing 1 is that applying * to the pointers yields arrays. Listing 2 is the more common way to write the function in Listing 1. In Listing 2, pointers to int are used to process the arrays of ints rather than using pointers to (normal or variable length) arrays of ints. In C, because of pointer arithmetic and the fact that the index operator is defined in terms of pointer arithmetic [1], a pointer of type T can also be used to process an array of type T. As Listing 2 shows, a pointer can process all of the elements of an array merely by indexing the pointer. (Remember, E[i] means *(E + i) regardless of whether E is an array or a pointer expression.) Contrast the initializations of the pointers in Listings 1 and 2. In Listing 1, the initializations of pa and pvla use the & operator on arrays yielding respectively a pointer to an array of three ints and a pointer to a VLA of bounds ints. In Listing 2, the initializations of p1 and p2 just use the array names without the & operator. Whenever an array name appears in an expression except as the operand of unary & or the sizeof operator, the value of the array name becomes a pointer to the first element of the array. More formally, if A is an expression with type array, except when the operand of unary & or sizeof, A has the value and type of &((A)[0]). Thus in Listing 2, p1 and p2 are initialized with pointers to ints. Note that a single dimensional VLA yields a pointer type that carries no hint that it came from a VLA. Given Listings 1 and 2, why are pointers to arrays needed at all? The answer is that pointers to arrays are useful when processing multidimensional arrays. Consider Listing 3. Listing 3 seems to be a cross between Listing 1 and Listing 2 for good reasons. Listing 1 uses pointers to arrays, as does Listing 3. Listing 2 shows how a pointer to type T can be used to process an array with elements of type T, as does Listing 3. The difference between Listing 2 and Listing 3 is that in Listing 3 type T is an array type rather than a basic type like int. In Listing 3, the pointers are pointers to arrays (normal or variable length). When you dereference a pointer to an array, the result is an array (which might then become a pointer to its first element, as described above). When you add one to a pointer to an array, then you move the pointer to the next entire array that follows the one the pointer originally pointed to. When you index a pointer to an array, each index selects an array object. Thus in Listing 3, pa[i] or pvla[i] yields an array object that may be further indexed. As I wrote above, in C, if you have a pointer to type T, you can use it to process an array of type T, even if type T is an array type. Note that when pa and pvla are initialized in Listing 3 that just the array names are used (no & operator). As explained above, the array names become pointers to their first elements. Thus, pa is initialized with &(a[0]), a pointer to an array of three ints. pvla is initialized with &(vla[0]), a pointer to an array of bounds ints. As I discussed in [1], pointer arithmetic in C requires knowing the size of the object that the pointer is pointing to. In Listing 3, the size of the objects pointed to by pa is known at compile time: it is sizeof (int [3]). In contrast, the size of the objects pointed to by pvla is not known at compile time: it is sizeof(int [bounds]). As I discussed in [2], the result of a sizeof operator is computed at run time for a VLA. Not surprisingly, sizeof is also a run-time operation if you ask for the size of a VLA reached indirectly through a pointer. Thus in Listing 3, sizeof (*pvla) is an expression whose value is computed at run time and is equal to sizeof(int [bounds]). If sizeof(int) is 4 and bounds had the value 3, the result of the sizeof expressions would be 12. Note that sizeof (pvla) is not a run-time operation since it is just the size of the pointer pvla itself, which is known at compile time. sizeof(*pvla) is used whenever pointer arithmetic or indexing is done on pvla. This means that the C implementation must record the size of the VLA type that the pointer type points to. Like other VLA types, the size information associated with a pointer to VLA is saved when the declaration is executed and does not change during the declaration’s lifetime. The expression that is the bounds of the VLA is not reevaluated until next time the declaration is executed. Consider Listing 4. (By the way, the “z” in the format is a new C99 modifier that means the argument is size_t or the corresponding signed integer type. Thus, “%zu” prints a size_t number as unsigned.) When run, the program in Listing 4 prints out 10 20 30 even though the value of n changes between when the pointer to VLA is declared and the sizeof expression that yields the size of the array pointed to. However, since each pass through the loop enters and exits the block that is the loop body, each pass of the loop picks up a new value of n for the bounds of the pointer to VLA. Listings 1, 2, and 3 show a useful coding technique. Although from the C implementation’s point of view the bounds of a VLA are fixed from the time its declaration is executed until the lifetime of the declaration ends, that does not mean that the programmer can conveniently compute that bounds later in the program. If the bounds expression of a VLA is complex or might change value, you might want to assign the value of the bounds expression to a local variable and use the local variable as the bounds in the declaration. If you fail to do this, all is not lost: see the discussion of sizeof in [2]. Listing 4 also shows another point about the size information that the C implementation saves for VLAs. That size information is associated with the type and not the value of the pointer to VLA or even the VLA object itself. In Listing 4, pvla is uninitialized stack trash (that is OK since the sizeof expression does not actually evaluate its operand: pvla is never actually dereferenced). Clearly, the size of the array that pvla is suppose to point to is not part of the value of pvla. Likewise, there is no array to which pvla points in Listing 4, so the size is not part of the array object. Instead, every VLA type in a program causes the C implementation to set aside an unnamed variable to hold the size of arrays of that type. (The optimizer might combine several such variables into one if it proves that they hold the same value). Note that this approach uses less memory than making the size information part of the array object itself. Consider the declaration:
int (*pvla)[f()];
int x[m][n];
void f(int n, int a[n])after the compiler automatically rewrites the function, it becomes:
void f(int n, int *a)However, multiple dimensional VLA parameters become pointers to VLAs after the rewrite. For example,
void g(int n, int a[n][n+1])becomes:
void g(int n, int (*a)[n+1])Of course, multiple dimensional normal array parameters become pointers to normal arrays. It is in this context that most C programmers have used pointers to arrays without realizing it. The act of passing an “array” argument to an “array” function parameter is really a form of pointer assignment and works as described in the previous section. Thus you can pass either a normal array or a VLA to a function whose parameter is a normal array. You can also pass either a normal array or a VLA to a function whose parameter is a VLA. Listing 6 shows a function that sets the diagonal of its square array parameter to one and sets all other elements to zero. This function can be called on any n by n array of ints since the bounds of the array is passed as an argument. It is fairly common for the bounds of VLA parameters to be another parameter to the same function as in Listing 6, but this is not required. The run-time expression that is the bounds of the VLA may be any expression involving any variables or functions that are in scope at the time the parameter is declared. The bounds expression is evaluated each time the function is called since calling the function causes its parameter’s declarations to be executed, and the lifetime of the parameters ends when the function returns. Function prototypes with VLA parameters can be written just like the function definition. For example, a prototype for the function in Listing 6 could be:
void diag(int n, int a[n][n]);There is an advantage to writing the prototype that way since it makes clear the relationship between the parameter n and the bounds of a. However, the bounds expression is not really needed for the prototype, and sometimes the bounds expressions might be complex or reference identifiers only in scope at the point of the function definition. Because of this, the bounds expression of a VLA in a function prototype may be replaced with a “*” character. In this context, the asterisk is just a placeholder for a run-time expression that will appear in the function definition. Thus, the function prototype for the function in Listing 6 can also be written as:
void diag(int n, int a[*][*]);
As far as the compiler is concerned, the two prototypes for diag given above are identical.
对编译器来说,上面给出的两个 diag 原型是完全相同的。
Like VLAs, pointers to VLAs cannot appear at file scope. They must be either parameters to function prototypes or local variables of a block. (The C Standard considers a function’s parameters to be locals of the block that is the function body.)
如果VLA,指向VLA的指针不能出现在文件作用域内。它们必须是函数原型的形参,或是语句块的局部变量(C标准认为函数形参是函数体块的局部)。
Pointers to VLAs may not be static or extern. Such objects have a lifetime that starts before main is called and ends when the program exits. Since the size information for a VLA or pointer to VLA is fixed during its lifetime, such objects would have a size fixed during the running of the program. That sort of takes the variable out of variable length.
指向VLA的指针不能是 static 或是 extern 的。这样的对象的生存期始于调用main之前并在程序退出时结束。因为一个VLA或是指向VLA的指针的大小信息在它的生存期内是固定的,这样的对象将在程序的运行时拥有固定的大小。这使得这些变量有点失去了变长的特性。
[1] Randy Meyers. “The New C: Why Variable Length Arrays?” C/C++ Users Journal, October 2001.
[2] Randy Meyers. “The New C: Variable Length Arrays, Part 2,” C/C++ Users Journal, December 2001.
[3] Randy Meyers. “The New C: Declarations and Initializations,” C/C++ Users Journal, April 2001, <www.cuj.com/reference/articles/2001/0104/0104d/0104d.htm>.
Randy Meyers is a consultant providing training and mentoring in C, C++, and Java. He is the current chair of J11, the ANSI C committee, and previously was a member of J16 (ANSI C++) and the ISO Java Study Group. He worked on compilers for Digital Equipment Corporation for 16 years and was Project Architect for DEC C and C++. He can be reached at rmeyers@ix.netcom.com.
Randy Meyers 是为C、C++和JAVA提供培训和指导的顾问。他目前是ANSI C委员会J11的主席,之前是J16(ANSI C++)和ISO JAVA学习小组(ISO Java Study Group)的成员。他曾经在DEC公司(Digital Equipment Corporation)研究编译器长达16年,并且是DEC C和C++的项目架构师。可以通过以下地址与他联系:rmeyers@ix.netcom.com。
void ex1()
{
int i;
int a[3];
int (*pa)[3] = &a;
for (i = 0; i < 3; ++i)
(*pa)[i] = 1;
// Save the result of calling f()
// so the bounds of vla, pvla, and
// the loop will be consistent even
// if f() returns a different value
// each time it is called
// 保存调用f() 的结果,那么vla、pvla的边界,以及循环将是一致的,即使每一次调用f()都会返回一个不同的值
int bounds = f();
int vla[bounds];
int (*pvla)[bounds] = &vla;
for (i = 0; i < bounds; ++i)
(*pvla)[i] = 1;
}
— End of Listing —
void ex2()
{
int i;
int a[3];
int *p1 = a;
for (i = 0; i < 3; ++i)
p1[i] = 1;
// Save the result of calling f()
// so the bounds of vla and
// the loop will be consistent
// 保存调用f() 的结果,那么vla、pvla的边界,以及循环将是一致的
int bounds = f();
int vla[bounds];
int *p2 = vla;
for (i = 0; i < bounds; ++i)
p2[i] = 1;
}
— End of Listing —
void ex3()
{
int i, j;
int a[3][3];
int (*pa)[3] = a;
for (i = 0; i < 3; ++i)
for (j = 0; j < 3; ++j)
pa[i][j] = 1;
// Save the result of calling f()
// so the bounds of vla, pvla, and
// the loop will be consistent
// 保存调用f() 的结果,那么vla、pvla的边界,以及循环将是一致的
int bounds = f();
int vla[3][bounds];
int (*pvla)[bounds] = vla;
for (i = 0; i < 3; ++i)
for (j = 0; j < bounds; ++j)
pvla[i][j] = 1;
}
— End of Listing —
#include <stdio.h>
int main()
{
int n = 10;
for (int i = 0; i < 3; ++i) {
char (*pvla)[n];
n += 10;
printf("%zu ", sizeof *pvla);
}
return 0;
}
— End of Listing —
void ex5(int n)
{
int a[10];
int vla[n];
int (*pa)[10];
int (*pvla)[n];
pa = &a;
pa = &vla;
pvla = &a;
pvla = &vla;
}
— End of Listing —
void diag(int n, int a[n][n])
{
int i, j;
for (i = 0; i < n; ++i)
for (j = 0; j < n; ++j)
a[i][j] = i == j ? 1 : 0;
}
— End of Listing —