The New C: Compound Literals

新的C语言:复合字面量

By Randy Meyers, June 01, 2001


Structs in C are not quite "first class types, but with the help of compound literals, they are at least a lot easier to use.

    C语言中的结构不是“头等类型”,但在复合字面量的帮助下,它们至少更容易使用。


The creators and critics of programming languages sometimes classify the data types in a programming language as to whether they are first class types or not. A first class type is one that has the full set of reasonable operations and possible uses defined for it. For example, arrays in C are not first class types because you cannot perform array assignment using the assignment operator or pass an entire array by value as an argument or return an array as a result from a function. In contrast, int in most programming languages is the quintessential first class type: Not only are all reasonable operators defined upon int, but you can also have arrays of int, pass int as an argument, return int from a function, and so on.


    编程语言的创造者和批评家有时候把编程语言中的数据类型分成是否为头等类型。头等类型拥有合理操作的全套,并且可能定义了它的用法。例如,C语言中的数组不是头等类型,因为你不能在数组上执行赋值操作符来赋值,或是把整个数组的值作为参数,或是从一个函数中返回一个数组作为结果。相反,int 在多数编程语言中是典型的头等类型:不仅所有合理的操作符在 int 上定义了,而且你可以拥有 int 的数组,传递 int 作为一个参数,从一个函数中返回一个 int ,等等。


Originally, structs in C suffered many of the same deficiencies as arrays, but it was commonplace even before the ANSI C Standard for compilers to support struct assignments, struct arguments, and struct function return values. Structs in modern C are almost first class types, but they still lack support for comparisons for equality or inequality using the == and != operators. The C committee has entertained proposals for supporting == and != for structs, but the debate over how to treat union members of structs caused the proposal to be shelved.


    起初,C语言中的结果跟数组有许多相同的缺陷,但是在给编译器的 ANSI C 标准支持结果赋值,结构参数,以及结构函数返回类型以前就已经司空见惯了。结构在现代C语言中几乎就是头等类型,但是它们仍然不支持通过 == 和 != 操作符判断相等与否的比较。C委员会已经受理让结构支持 == 和 != 的建议,但是关于如何对待结构中的联合成员的讨论导致该建议被搁置了。


You might wonder, if structs need the equality operators defined in order to be first class types, do they also need the relational operators, e.g. < or >, to be defined in order to be first class types? This brings us to the “reasonable” in the definition of first class types above. Consider:


    你可能会猜想,如果结构需要定义等于运算符以成为头等类型,它们是否还需要定义关系运算符,例如 < 或者 > 来成为头等类型。这给我们带来的是上述头等类型定义中的“合理”。考虑:

struct S {int a, b;};
struct S x = {1,2};
struct S y = {2,1};

Given that x.a < y.a but x.b > y.b, is it more reasonable to say that x < y, or that x > y, or that no automatic, general definition of < and > on structs is reasonable? I would argue that since programmers lay out structs in order to minimize padding, or to match an externally declared layout, or in the order that members occur to them, and not in an order that results in a natural comparison order for < and >, that it is unreasonable to provide a standard definition of < and > in the C Language.

    给出了 x.a < y.a 但是 x.b > y.b,是说 x < y 还是 x > y 更合理?还是非自动、非通用的 < > 定义在结构上更合理[a]?我认为,由于程序员为了尽量减少填充,或是为了跟一个外部声明的布局匹配,或是按成员在它们中发生的顺序来布置结构,并且不以 <> 自然的比较顺序排序,那么在C语言中提供 <> 的标准定义是不合理的。

Not surprisingly, students of programming language design at times disagree whether a particular operation or use of a type is reasonable or necessary in order to be a first class type. Dennis Ritchie pointed out [1] that some might not consider structs in C90 to be first class types because there are no constants of type struct.

    毫不奇怪,编程语言设计的学生有时候不同意某个特定的操作或某个类型的用法是合理的或是必要的,来成为头等类型。Dennis Ritchie 指出[1]一些让 C90 不会考虑让结构成为头等类型的理由,因为不存在类型结构的常量。

C99 [2] added exactly that feature: constants of almost any type including struct, union, and array. This feature, called compound literals, is based on the brace-enclosed initializer syntax. The motivation for adding this feature to C99 was its notational conciseness, convenience, and usefulness, rather than an abstract desire to make struct a first class type.

C99 增加的正是这些特性:包括结构、联合和数组几乎任何类型的常量。这个特性,叫做复合字面量,基于括号封闭初始器语法。在C99中增加这个特性的动机是它的记法简介、方便,和实用性,而不是抽象的愿望使结构成为头等类型。

Constant Versus Literal

常量 VS 字面量

Compound literals are not true constants in that the value of the literal might change, as is shown later. This brings us to a bit of terminology. The C99 and C90 Standards [2, 3] use the word “constant” for tokens that represent truly unchangeable values that are impossible to modify in the language. Thus, 10 and 3.14 are an integer decimal constant and a floating constant of type double, respectively. The word “literal” is used for the representation of a value that might not be so constant. For example, early C implementations permitted the values of quoted strings to be modified. C90 and C99 banned the practice by saying that any program than modified a string literal had undefined behavior, which is the Standard’s way of saying it might work, or the program might fail in a mysterious way. This allowed implementations to pool strings and place them in read-only storage. However, the Standard knew that some implementations might continue allowing quoted strings to be modified (sometimes a compiler option must be used), and called tokens like "ABC" string literals rather than string constants. Unfortunately, the C++ Standard [4] does not use the word “literal” with the same meaning as the C Standard. In C++, 10 is called an integer literal, for example.

    复合字面量不是真正的常量,在于字面量的值可能会改变,如稍后展示的。这给我们带来一些术语。C99 和 C90 标准使用单词“常量”,来表示那些代表语言中不可改变值的标记。因此,10 3.14 分别是一个整数十进制常量和一个 double 类型的浮点数常量。单词“字面量”用于表示那些不是那么持久不变的值。例如,早期的C实现允许修改引号包围的字符串的值。C90 和 C99 声明任何修改一个字符串字面量的程序属于未定义行为而禁止了这种做法,以标准的方式表示它可能会工作,或是以神秘的方式失败了。这允许实现把字符串其中起来并放入只读的存储区域。然而,标准知道一些实现可能继续允许修改字符串(有时候一定要使用一个编译器选项),并把如 “ABC” 的标记称为字符串字面量。不幸的是,C++标准没有让单词 “literal” 跟C标准中的意思一致。在C++中,例如10被称为一个整型字面量。

Compound literals might or might not be constant depending upon whether their programmer-specified type is const or not. Unlike string literals, it is portable to modify a non-const compound literal.

    复合字面量可以是常量也可以不是,取决于程序员是否将其指定为const类型,跟字符串字面量不同,修改一个非const的复合字面值是可移植的。

Compound Literals

Syntactically, a compound literal looks like a cast followed by a brace-enclosed initializer. Given the following two types:

    语法上,符合字面量看起来就像一个强制类型转换[b]后面接一个括号封闭初始器。给出下面两种类型:

struct POINT {int x, y;};
union U {float f; int i;};

Here are some examples of compound literals:

    这是一些复合字面量的例子:

(int) {1}
(const int) {2}
(float[2]) {2.7, 3.1}
(struct POINT) {0, 0}
(union U) {1.4}

The value of the compound literal is an anonymous object whose type is specified by the “cast.” The anonymous object has been initialized by the brace-enclosed initializer list. As the last three compound literals in the above example show, compound literals give you a constant-like notation for arrays, structs, unions, as well as any other object type (except for C99 variable length arrays).

A compound literal can be used anywhere an object with the same type of the compound literal could be used. For example,

    该符合字面量的值是一个匿名对象,它的类型由“强制类型转换”指定。该匿名对象已经由括号封闭初始器列表初始化了。如以上例子中最后三个复合字面量展示的,复合字面量给你带来用于数组、结构、联合、以及其他对象类型(除了C99的变长数组)的如同常量的表示法。

    复合字面量可以用于任何与该复合字面量同类型对象可以使用的地方。例如,

int x;
x = (int) {1} + (int) {3};

is equivalent to

等价于

int x; int unnamed1 = {1};
int unnamed2 = {3};
x = unnamed1 + unnamed2;

Compound literals are particularly useful as function arguments. For example, suppose you were using a graphics library that used struct POINTs to express coordinates. You might draw a pixel in a window like this:
    
    复合字面量作为函数参数尤其有用。例如,假设你正在使用的一个图形库,以 struct POINT 作为表达式坐标。你可能像这样在窗口中画一个点:

extern drawpixel(struct POINT where);
drawpixel((struct POINT) {5, 5});

Compound literals yield lvalues. This means that you can take the address of a compound literal, which is the address of the unnamed object declared by the compound literal. As long as the compound literal does not have a const-qualified type, you can use the pointer to modify it.

    复合字面量产生左值。这意味着你可以取一个复合字面量的地址,即由这个复合字面量声明的未命名对象的地址。因为这个字面复合量不带const限定类型,你可以使用指针修改它。

struct POINT *p;
p = &(struct POINT) {1, 1};
p->x = 2; p->y = 2;
printf("*p = %d, %d\n", p->x, p->y);

causes *p = 2, 2 to be printed.

导致 *p = 2, 打印出2

Compound literals are in effect declarations and initializations of unnamed objects that can appear in expressions. The unnamed objects and their initializations follow the same rules [5] as normal declarations, and have the same special treatment depending upon whether the compound literal appears within a function body or not.

    复合字面量实际上是未命名对象的声明和初始化,它们能出现在表达式中。未命名对象和它们的初始化遵从与通常的声明相同的规则[5],并且依照该复合字面出现在函数体中与否有着相同的处理方式。

If a compound literal appears outside of a function body, then the unnamed object has static storage duration, just like all other objects declared outside of a function. It is allocated and initialized once before the program begins to run and remains allocated as long as the program is running. Since the initialization occurs before running the program, all of the initializers in the brace-enclosed list must be constant expressions [5].

If a compound literal appears inside the body of a function, then the unnamed object has automatic storage duration and acts like a local variable of the immediately enclosing block. It is allocated and initialized when its “declaration” is reached in the block and deallocated upon exiting the block [5]. The expressions in the brace-enclosed initializer list can be any run-time expressions.

    如果复合字面量出现在函数体以外,那么该未命名对象拥有静态存储周期,就如所有其它在函数外生命的对象。它在程序开始运行前马上分配空间和初始化并且在程序运行中持续存在。如果复合字面量出现在函数体中,那么该未命名对象拥有自动存储周期,其行为如同当前括号块中的一个局部变量。它在到达块中的“声明”时分配空间和初始化,并在退出该块以前收回[5]。括号封闭初始器列表中的表达式可以是任何运行时表达式。

void f()
{
    int *p;
    extern int g(void);
    {
        p = &(int) {g()};
        *p = 1;     //OK
    }
    // p points to deallocated
    // stack space
    *p = 2;     //BAD
}

In the same way that the declaration and initialization of an automatic variable acts like an assignment to that variable [5], every time control passes through the body of a compound literal with automatic storage duration, the unnamed variable is reinitialized. Thus, the following function draws a diagonal line from (0,0) to (9, 9).

    这种变量的赋值与自动变量的声明和初始化有着同样的方式,每一次控制语句通过拥有自动存储周期的复合字面量时,该未命名变量再一次初始化。因此,以下函数从 (0, 0) (9, 9) 画了一条对角线。

void line()
{
    int i;
    for (i = 0; i < 10; ++i)
        drawpixel((struct POINT) {i, i});
}

The brace-enclosed initializer list for a compound literal has the same semantics as a brace-enclosed initializer list in a declaration. If you only provide initializers in the list for some of the members of a struct or elements of an array, the other members or elements are implicitly initialized with zeros of the appropriate type. Thus, (int [10]) {0} is an array of ten integers all initialized to zero. This means that it might be safer to assign to a struct using a compound literal rather than assigning its members individually. Contrast the following lines in a function:

    复合字面量的括号封闭初始器列表跟声明中的括号封闭初始器列表有着相同的语义。如果你在列表中只提供了结构中部分成员或是数组中部分元素的初始器,其他的成员或元素则隐含地以适当类型的零初始化。因此,(int [10]) {0} 是一个是个整数都初始化为零的数组。这意味着以一个复合字面量对一个结构赋值比单独对它的成员赋值更安全。对比以下函数中的行:

struct POINT p;
p.x = x; p.y = y;

versus:

    对

struct POINT p;
p = (struct POINT) {x, y};

Suppose in the future you add a z member to POINT to make it a three-dimensional point. When you assign the members individually, the z member never receives a value and contains stack trash. When a compound literal is used to assign p, the z member is assigned the default value of zero (probably a reasonable default for a 3-D graphics package).

    假设将来你在 POINT 中加入成员 z 使其成为一个三维的点。当你单独的对成员赋值时,成员 z 从来没有得到一个值,包含了栈上的垃圾。当使用一个复合字面量对 p 赋值时, 成员 z 被附了一个默认值零(对 3-D 图形包来说也是是一个合理的默认值)。

Like any other brace-enclosed initializer list, the initializer list in a compound literal may use the new C99 feature of designated initializers [5], where the member or array element being initialized may be named. When a function takes a struct as an argument, compound literals and designated initializers can be used to call the function with a poor man’s version of keyword arguments to a function and default argument values for a function, as in:
    
    跟任何其他括号封闭初始器列表一样,复合字面量的初始器列表可以使用C99的新特性指定初始器,可以指出正在初始化的成员或数组元素。当一个函数以一个结构作为参数时,字面复合量和指定初始器可以用一个减弱人类视力的关键字参数作为默认参数值来调用这个函数,如:

drawpixel((struct POINT) {.y=12});
Here, the designated initializer .y acts like a keyword argument to the function, and the .x “argument” to the function receives a default value of zero.

    这里,指定初始器 .y 行为如同一个函数的关键字参数,同时函数的 .x “参数”接受到一个默认值零。

Like normal declarations, if the type inside of the “cast” of a compound literal is an array of unknown size, then the number of elements of the array is determined by the brace-enclosed initializer. A compound literal with type array has the same semantics as a variable with type array. Except when used as the operand of sizeof or &, an array used in an expression is converted to a pointer to the first element of the array. In the following, p points to the first element of an array of three ints.

    如同普通的生命,如果在复合字面量中“强制类型转换”的类型是一个未知大小的数组,那么该数组元素的数据由括号封闭初始器决定。这种类型数组的复合字面量跟这种类型数组的变量有同样的语义。除了在用作 sizeof 或 & 的操作数中,表达式中的数组被转换为指向数组中第一个元素的指针。在下面,p 指向包含三个 int 的数组中的第一个元素。

int *p;
p = (int []) {1, 2, 3};

Normally, every compound literal that you write results in a distinct unnamed object. However, if the type of the compound literal is const-qualified, and the compound literal is initialized with constant expressions, then the compiler is free to pool the compound literals (only store one copy) and to place the unnamed object(s) in write-locked storage. Such compound literals are true constants, not just literals.


    通常,你写的每一个复合字面量产生一个独特的未命名对象。然而,如果该复合字面量的类型是 const 限定的,并且该复合字面量是以常量表达式初始化的,那么编译器可以自由地存储这个复合字面量(只存储一个副本)并且把该未命名对象放入禁止写入的存储区中。这样的复合字面值就是真正的常量,不仅仅是字面量。


Thus, those programmers who worry about whether their types are first class types and consider “having a constant representation” to be a requirement, have one less thing to worry about.


    因此,那些担心他们的类型是否为头等类型,并且考虑需要“拥有一个常量表示”的程序员,少了一件需要担心的东西。

References

[1] Dennis Ritchie. “The Development of the C Programming Language,” in Bergin and Gibson, editors, History of Programming Languages (Addison Wesley, 1996).

[2] ANSI/ISO/IEC 9899:1999, Programming Languages - C. 1999. Available in Adobe PDF format for $18 from <http://www.techstreet.com/ncitsgate.html>.

[3] ANSI/ISO/IEC 9899:1990, Programming Languages - C. 1990.

[4] ANSI/ISO/IEC 14882:1998., Programming Languages - C++. 1998. Available in Adobe PDF format for $18 from <http://www.techstreet.com/ncitsgate.html>.

[5] Randy Meyers. “The New C: Declarations and Initializations,” C/C++ Users Journal, April 2001.

Randy Meyers is consultant providing training and mentoring in C, C++, and Java. He is the current chair of J11, the ANSI C committee, and previously was a member of J16 (ANSI C++) and the ISO Java Study Group. He worked on compilers for Digital Equipment Corporation for 16 years and was Project Architect for DEC C and C++. He can be reached at rmeyers@ix.netcom.com.


 Randy Meyers 是为C、C++和JAVA提供培训和指导的顾问。他目前是ANSI C委员会J11的主席,之前是J16(ANSI C++)和ISO JAVA学习小组(ISO Java Study Group)的成员。他曾经在DEC公司(Digital Equipment Corporation)研究编译器长达16年,并且是DEC C和C++的项目架构师。可以通过以下地址与他联系:rmeyers@ix.netcom.com。


注释

[a] 说的是运算符重载?
[b] 原文为cast,不知道怎么翻译合适

原文地址

http://www.drdobbs.com/cpp/184401404