The New C: VLAs, Part 4: VLA typedefs and Flexible Array Members

新的C语言:变长数组,第四部分:VLA typedef 和灵活数组成员

Randy Meyers


The Rest of the Story on variable-length arrays in C99. Yes, they’re well-behaved and very flexible, but use them with caution.


    C99变长数组剩下的部分。是的,它们表现良好也非常灵活,但是使用它们时要小心。


My last few columns have dealt with VLAs (Variable Length Arrays) in C99 [1, 2, 3]. VLAs are arrays with run-time expressions instead of compile-time constant expressions for the bounds of the array. The bounds expression is evaluated when the declaration of a VLA is reached inside of a block, and the array has the calculated bounds until its lifetime ends (usually by exiting the block).


    我的前几个专栏阐述了C99的VLA (变长数组)[1, 2, 3]。VLA数组是边界为运行时表达式而非编译时常量的数组。边界表达式在在到达块中VLA的声明时求值,数组边界为计算出的值直到生存期结束(通常是退出该块时)。


This column discusses the remaining feature of VLAs, VLA typedefs. I will also discuss flexible array members, a C99 feature similar to VLAs.


    本专栏讨论了VLA的剩余特性,VLA typedef。我还会讨论灵活数组成员,一个类似 VLA 的C99特性。

VLA typedefs

As I discussed in previous columns, the size of a VLA is needed at run time to perform indexing and address arithmetic, so the compiler must make arrangements to store the size of the array somewhere. However, the size is not stored in the array object itself. It is not stored as part of the pointer if you have a pointer to a VLA. The size of a VLA is an attribute of the VLA type [3].


    如我前一个专栏中讨论的,需要VLA的大小在运行时进行索引和地址运算,所以编译器必须做出安排把数组的大小存储在某个地方。然而,这个大小没有存储在数组对象本身中。如果你有一个指向VLA的指针,它不作为数组的一部分存储。VLA 的大小是VLA类型的一个属性[3]


Consider the following:


考虑下面的:

void ex1(int n)
{
    char (*pvla)[n];
    n += 10;
    printf("%zu", sizeof *pvla);
}
pvla is a pointer to a VLA of n chars. In order to do pointer arithmetic with pvla or in order to be able to return the size of the objects to which pvla points, the compiler must calculate the size of a VLA of n elements of type char. Since the C99 rules say that a VLA’s size is fixed at the point the declaration of its type is encountered, the compiler must perform the calculation of the array’s size at the point of the declaration to protect against the value of the bounds expression changing later in the program. The function ex1 prints the size of the array to which pvla points. Since the size of an array of n elements of type char is just n, that is the value that the function prints. However, it prints the original value of n passed to the function, not the value of n after 10 has been added to it. Note that the function ex1 works even though pvla is uninitialized stack trash. The program is perfectly valid because sizeof does not actually evaluate its argument: the uninitialized pointer pvla is not actually dereferenced. The sizeof operator only inspects its operand in order to determine the resulting type, and in C, size is an attribute of the type of an expression. The function ex1 makes this clear. pvla does not actually point at an array, so the size information could not be stored as part of the array object. Likewise, pvla is uninitialized stack trash, so the size information could not be part of its value. Instead, compilers generate code to record the size of VLA types in the program, not the VLA objects themselves. For every VLA type that occurs in a block, the compiler creates an unnamed automatic temporary variable that holds the size of that VLA type during its lifetime. When the type is executed by program flow of control reaching a declaration or cast involving a VLA type, the size of the VLA type is stored in the temporary variable. If the size of a VLA is needed, then the value is fetched from the temporary variable associated with the VLA type. When the block containing the VLA type exits, then the temporary variable is deallocated along with all of the other automatic variables. Of course, a clever compiler might not create a temporary for every VLA type in a block. If the compiler can prove that several of the temporaries always hold the same value or that the temporaries are not used later in the block, the compiler might optimize them away. Clearly, C99 compilers are proficient in handling the bookkeeping associated with VLA types. The C99 language builds upon that by allowing VLA typedefs.

    pvla 是一个指向 n char 的VLA的指针。为了进行与 pvla 有关的指针运算,或是为了能够返回 pvla 所指对象的大小,编译器必须计算出该 n char 类型元素的VLA的大小。由于C99的规则说明了 VLA 的大小在遇到它的类型声明以后就是固定的,编译器必须在该声明点进行数组大小的计算,以防止边界表达式的值在以后的程序中发生变化。函数 ex1 打印出 pvla 所指数组的大小。由于 n char 类型元素的数组大小正好就是 n,也就是函数打印出的值。然而,它打印的是原来传递给函数的 n 值, 而不是 n10 以后的值。注意,即使 pvla 是未初始化的内存垃圾,函数 ex1 依然工作。该程序完全有效,因为 sizeof 并不实际计算它的参数:未初始化的指针 pvla 并没有实际解引用。 sizeof 运算符仅仅检查操作数来决定结果类型,在 C 中, 大小是表达式类型的一个属性。函数 ex1 明确了这一点。pvla 并没有实际指向任何数组,所以大小信息不能作为数组对象的一部分存储。同样的, pvla 是未初始化的内存垃圾,所以大小信息也不能是其值的一部分。作为代替,编译器产生代码来记录程序中 VLA 类型的大小,而不是 VLA 对象本身。对于块中出现的每一个 VLA 类型, 编译器产生一个未命名的自动临时变量,在VAL生存期见保存它的大小。当程序控制流到达一个升级 VLA 类型的声明和强制类型转换,执行该类型时,该 VLA 类型的大小存入该临时变量中。如果需要 VLA 的大小,就从有关该 VLA 类型的临时变量中取出值。当包含该 VLA 类型的块退出时,该临时变量与其他自动变量一起释放。当然,一个聪明的编译器可能不会为块中的每一个 VLA 类型都穿件一个临时变量。如果编译器能够证明几个临时变量总是有相同的值,或者该临时变量没有在块中余下的代码中用到,编译器可能会把它们优化掉。显然,C99编译器娴熟地处理与VLA类型相关的记录。C99语言依赖于允许 VLA typedef
 void ex2(int n)
{

    typedef int VARRAY[n];
    n += 10;
    VARRAY a1, a2;
}
The typedef declares VARRAY to be the name of the type “variable length array of n elements of type int,” where n has the value it had at the point the typedef declaration was executed. VARRAY is used to declare a1 and a2 to be VLAs of n elements of type int where n has the value it had when the typedef was executed. Thus, if you make the call ex2(5), a1 and a2 are both VLAs of five ints even though the value of n has been changed to 15 by the time a1 and a2 are declared. Of course, a1 and a2 can be used like any other arrays of five ints. VLA typedefs follow the same rules as other VLA types. They can only appear in a block: they cannot appear at file scope. (VLA parameters are permitted because parameters are considered to be local to the function body.) The size of a VLA typedef is constant during its lifetime. The size is fixed when the typedef is executed. The size is no longer associated with the VLA typedef when the lifetime ends by either exiting the block or branching backwards in the block to a point before the typedef declaration [2]. VLA typedefs, like other VLAs, cannot be struct or union members.

    这个typedef 声明了VARRAY 是“包含 n int 类型元素的变长数组”类型的名字,其中 n 的值为其在执行该 typedef 声明时的值。VARRY 用来把 a1 a2 声明成 n int 类型元素的 VLA,其中 n 的值为其在执行该 typedef 时的值。因此,如果你调用 ex(5)a1a2 都是五个 int 的VLA,即使在声明 a1a2n 的值已经变为15。当然,a1a2 都能像任何五个 int 的数组那样使用。 VLA typedef 遵循与其他 VLA 类型相同的规则。它们只能够出现在块中:它们不能出现在文件作用域。(VLA 形参是允许的,因为形参被当作函数体的局部)。 VLA typedef 的大小在其生存期内是常量。当执行该 typedef 时就固定下来。当退出块或是分支向后跳转到 typedef 声明点之前的块中[2]从而结束该 VLA typedef 的生存期时,这个大小就不再与之相关。VLA typedef,就如其他 VLA,不能是 struct union 的成员。

Flexible Array Members

灵活数组成员

The last rule above about VLAs probably disappoints some of you. There are times when it would be useful for a VLA to be a struct member. While C99 does not permit that, it does permit a similar feature that standardizes an extension that some pre-C99 compilers permit in one form or another. In C99, the last member of a struct may be an array with no bounds expression, called a flexible array member. A struct ending with a flexible array member allows you to have a struct object that ends with an array of any size you choose, if you are willing to do a little extra work. In fact, every different object with that struct type may end in a different-sized array. The C99 compiler treats the flexible array member mostly like it is a zero-length array (ignoring the fact that zero-length arrays are invalid in C). So, the size of struct containing the flexible array member is identical to the offset in bytes of the flexible array member. If you just declare an object of a struct type with a flexible array member, you get an object that behaves normally except that no space is allocated for the elements of the flexible array member, and thus it is invalid to attempt to use those array elements. If that was the full story of flexible array members, they would not be very useful. But, that brings us to that matter of extra work: if you allocate a struct with a flexible array member yourself on the heap, you control how much memory the object uses. If you allocate extra memory, it can be used for the elements of the flexible array member. It is valid to access any flexible array elements for which you allocated space. For example, if you allocate enough extra space for a three-element array, you can access elements zero through two of the flexible array member. Listing 1 shows the use of a flexible array member. Some programming languages store strings not as a zero-terminated sequence of bytes like C, but as a count followed by the number of bytes specified by the value of the count. PL/I uses such a representation for its “varying strings.” Java uses a similar representation (including an extra descriptor member) for string literals in class files. In Listing 1, the struct PLIstring gives the layout of a PL/I string. The member s is the flexible array member whose elements hold the characters in the string. The function toPLI converts the C string that is its argument into a newly allocated PL/I string on the heap. Note that a call to malloc passes not just the size of struct PLIstring, which is the size of the struct without any array elements, but it adds the size of the array that is to appear at the end of this particular PLIstring object, which is the value len. If you run the program in Listing 1 using the command:

    上面有关 VLA 的最后一条规则可能会使一些人失望。VLA 作为 struct 成员,有时候是很有用的。虽然C99不允许这一点,它却允许一个类似的特性,这个特性标准化了一些 C99 以前的编译器允许的这样或那样的扩展。在C99中,struct 的最后一个成员可以是没有边界表达式的数组,叫做灵活数组成员。以灵活数组成员结束的 struct 允许你拥有一个以任何你所选大小的数组结束的 struct 对象,如果你愿意做一些额外的工作的话。事实上,每一个这样的 struct 类型的对象都可以以不同大小的数组结尾。C99 编译器大多喜欢把灵活数组成员当作一个零长度数组(忽略零长度数组在C中是无效的这个事实)。所以,包含灵活数组成员的 struct 大小跟该灵活数组成员以字节计的偏移量相同。如果你声明了一个类型为包含灵活数组成员的 struct 的对象,该对象的行为如同普通的那样,除了它没有为灵活数组成员分配空间,因此尝试使用这些数组元素是无效的。如果这就是灵活数组成员的所有故事,它们也就不是很有用。但是,这给我们带来些额外的工作:如果你自己在堆(heap)上分配了一个带有灵活数组成员的 struct ,你可以控制该对象使用多少内存。如果你分配了额外的内存,它可以用于灵活数组成员的元素。存取任何你分配了空间的灵活数组元素都是有效的。例如,假设你为三个元素的数组分配了足够的额外空间,你可以存取数组成员的第零个到第二个元素。Listing 1 展示了灵活数组成员的这种用法。一些程序语言不像C那样以一个以零终结的字节序列来存储字符串,而是使用一个计数,后面跟随由该计数指定的数目个字节。PL/I 使用这样的方式表示“变长字符串”。JAVA 在类文件中为字符串字面量使用类似的表示法(包括一个额外的描述成员)。在 Listing 1 中,struct PLIstring 给出了PL/I字符串的布局。成员 s 是灵活数组成员,它的元素保存了字符串中的字母。函数 toPLI 把参数的C字符串转换成在堆上新分配的 PL/I 字符串。注意调用 malloc 时传递的不是 struct PLIstring 的大小,即不包含任何数组元素的 struct 大小,而是加上了出现在这个特定 PLIstring 对象后面的数组的大小,其值为 len。如果你使用这个命令运行 Listing 1 中的程序:
listing1 this is a test
you get the output (the first line is system specific):

你得到的输出(第一行是系统特定的):
count=12, s="listing1.exe"
count=4, s="this"
count=2, s="is"
count=1, s="a"
count=4, s="test"

There are various rules that flexible array members must follow:


有几个灵活数组成员必须遵守的规则:


Unlike VLAs, the C implementation keeps no run-time information about the size of a flexible array member. It is the programmer’s responsibility to allocate the space for the array and remember the number of elements in the array. If you assign a struct with a flexible array member or pass it as an argument to a function (not through a pointer), then the compiler generates code based on its compile-time information about the struct type. Since the compiler believes that the flexible array member has no elements, no elements will be copied during assignment. If you want to assign structs that contain flexible array elements, you must make sure the target has the proper amount of memory allocated and then use memcpy or a loop to copy the flexible array elements.


    不同于 VLA,C的实现不保存任何与灵活数组成员大小相关的运行时信息。为数组分配空间以及记住数组元素的数组是程序员的责任。如果你为带有灵活数组成员的struct 赋值或是把它作为参数传递给一个函数(不是通过指针),那么编译器产生的代码基于编译时有关该 struct 类型的信息。因为编译器认为灵活数组成员没有元素,在赋值时不会复制任何元素。如果你想要为包含灵活数组成员的 struct 复制,你必须保证目标拥有合适大小的已分配内存,然后使用  memcpy 或者是一个循环来复制该灵活数组元素。


As mentioned before, some pre-C99 compilers permit flexible array members. Some of those compilers use a slightly different syntax: rather than the flexible array member having no bounds inside the [], the compilers permit the array bounds to be zero. (Officially, arrays of zero elements are not permitted in C.) Programs that use the [0] form of the extension can be converted to C99 merely by removing the 0.


    就如前面提到的,一些C99以前的编译器允许灵活数组成员。这些编译器中的一些使用了稍微不同的语法:相对于灵活数组成员在 [] 中不包含边界,这些编译器允许数组的边界为零(正式地说,零个元素的数组在C中是不允许的)。使用 [0] 这种形式扩展的程序可以简单的删除 0 来转换成 C99。


Unfortunately, in some cases, programmers relied on tricks before C99 to get the effect of flexible array members. Perhaps the most common form of that trick is to declare the fake flexible array with bounds 1. When allocating the struct with malloc, extra space for an array of one less than the desired number of elements was allocated since the struct already had one element built in. While this technique is likely to work for most C and C++ implementations, it does break the rules. A small number of C implementations generate code to check if array indexes are in bounds, and they will complain about any index other than 0 being used with the fake flexible array. (Such checking is automatically turned off when using a real C99 flexible array.)


    不幸的是,在某些时候,程序员依赖于一些C99以前的技巧来得到灵活数组成员的功效。或许这些技巧中最常见的形式就是声明一个边界为 1 的伪灵活数组。当使用 malloc 分配这样的 struct 时,为一个元素数组分配的空间要少于期望元素的数目,因为 struct 已经有内建了一个元素。虽然这项技术可能适用于大多数 C 和 C++ 实现,它确实打破了规则。少数 C 实现产生代码来检查数组索引是否在边界以内,它们会抱怨任何不以 0 对该伪灵活数组的索引(这样检查在使用真正的 C99 灵活数组成员时会关闭)。

References

[1] Randy Meyers. “The New C: Why Variable Length Arrays?” C/C++ Users Journal, October 2001.

[2] Randy Meyers. “The New C: Variable Length Arrays, Part 2,” C/C++ Users Journal, December 2001.

[3] Randy Meyers. “The New C: Variable Length Arrays, Part 3: Pointers and Parameters,” C/C++ Users Journal, January 2002.


Randy Meyers is a consultant providing training and mentoring in C, C++, and Java. He is the current chair of J11, the ANSI C committee, and previously was a member of J16 (ANSI C++) and the ISO Java Study Group. He worked on compilers for Digital Equipment Corporation for 16 years and was Project Architect for DEC C and C++. He can be reached at rmeyers@ix.netcom.com.


Randy Meyers 是为C、C++和JAVA提供培训和指导的顾问。他目前是ANSI C委员会J11的主席,之前是J16(ANSI C++)和ISO JAVA学习小组(ISO Java Study Group)的成员。他曾经在DEC公司(Digital Equipment Corporation)研究编译器长达16年,并且是DEC C和C++的项目架构师。可以通过以下地址与他联系:rmeyers@ix.netcom.com。

Listing 1: A flexible array member

#include <assert.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

// The representation of a PL/I string
// PL/I 字符串的表示

struct PLIstring {
    unsigned short count;
    // s is a flexible array member            
       //s 是一个灵活数组成员
    char s[];
};

// Convert the C language string cstr to a PL/I string        
// allocated on the heap
// 把C语言的字符串cstr转换成堆上分配的PL/I字符串
struct PLIstring *toPLI(char *cstr)
{
    struct PLIstring *pli;
    size_t len = strlen(cstr);
    // We allocate len extra bytes as storage for the s array
    // 我们分配恰好 len 个额外字节作为数组 s 的存储空间
    pli = malloc(sizeof (struct PLIstring) + len);
    assert(pli != NULL);
    pli->count = len;
    // Copy len bytes into the flexible array s. Note the zero byte
    // ending the C string is not copied.
    // 把 len 个字节复制到灵活数组 s。注意结束C字符串的零字节没有复制。
    memcpy(pli->s, cstr, len);
    return pli;
}

int main(int argc, char **argv)
{
int i;

    // Convert our program arguments to PL/I strings and print them
    // 把我们程序的参数转换成PL/I字符串并打印它们
    for (i = 0; i < argc; ++i) {
        struct PLIstring *pli = toPLI(argv[i]);
        // print the PL/I string. By specifying a precision for %s, we
        // can force it to stop printing before finding a zero byte.
        // By making the precision be *, we can pass it as an argument
        // to printf.
        // 打印PL/I字符串。通过为%s指定精度,我们可以强制它在找到一个零字节前停止。
        //把精度指定为*,我们可以把它作为一个参数传递给 printf。

        printf("count=%hu, s=\"%.*s\"\n", pli->count, pli->count,
            pli->s);
    }

    return EXIT_SUCCESS;
}
— End of Listing —

原文地址

http://www.drdobbs.com/cpp/184401497