When C was first invented, the register keyword was a good idea. Back in those days, many compilers for most languages ran in 64 KB or less of memory. Even on mainframes, optimizing compilers (and optimizing compilers were unusual) for large languages like PL/I might run in only 256 KB of memory. The algorithms for register allocation were somewhat new and had a tendency to dramatically increase the effort to write compilers as well as the memory and execution time that compilers required. Due to the tight memory constraints, compilers tended to process each source statement in isolation from other statements. Such compilers would do all of the work of compiling a statement, from parsing to code generation, before moving to the next statement.
当C刚发明的时候,register 关键字是个好主意。回到那些日子,大多数语言编译器中的许多都运行在64KB或更少的内存中。即使是大型机,用于大型语言如 PL/I 的优化编译器(优化编译器那时候并不常见)也可能只在256KB的内存上运行。寄存器分配的算法一定程度上还是新的,并且编写编译器所需的投入以及编译所需内存和执行时间的有急剧增加的趋势。由于紧密的内存限制,编译器往往单独地处理每一条源代码语句。这样的编译器会在进入下一条语句前,做完编译前一条语句的所有工作,从分析到代码生成。
That sort of compiler organization precludes good register allocation since good register allocation requires analyzing all of the statements in a function before making any decisions. For example, the best register allocation might be to allocate no registers to variables used in the current statement because the statements that follow need the registers more for other purposes. The register keyword in C was a great help to such compilers, since it allowed the programmer to tell the compiler something that the compiler might not be able to figure out on its own.
这种编译器组织妨碍了良好的寄存器分配,因为良好的寄存器分配在做出任何选择以前需要分析一个函数中的所有语句。例如,最好的寄存器分配可能是不为当前语句中的变量分配寄存器,因为下一条语句为了其他目的更需要寄存器。C中的 register 是对这样的编译器是一个极大的帮助,因为它允许程序员告诉编译器一些它自己无法得知的信息。
Modern compilers no longer compile a statement at a time. Taking advantage of the megabytes of memory now available, compilers translate the entire source module into an internal representation, which is then repeatedly analyzed in order to make good decisions about code generation. These days, compilers are as good or better than programmers at register allocation. And thus, most modern compilers ignore the register keyword. (Actually, the C Standard requires that compilers issue a diagnostic if the address-of operator is applied to a variable declared register. Compilers note that a variable was declared register only to produce that message.)
现代编译器不再一次编译一条语句了。利用现在以百万字节计的内存,编译器把整个源代码模块翻译成一个内部表示,反复地分析这个内部表示来作出关于代码生成的良好选择。目前,编译器在寄存器分配上做得程序员一样甚至更好。因此,大部分现代编译器忽略了 register 关键字(事实上,C标准要求编译器在取址运算符作用于声明为 register 的变量上时发出一条诊断信息。编译器注意到变量声明为 register ,仅仅是发出一条信息)。
The subject of this month’s column is the modern equivalent of the register keyword: the inline keyword allows the programmer to tell the compiler something it might have a hard time figuring out automatically. However, in the future, compilers may be able to do a better job of making inline decisions than programmers. When that happens, the inline keyword might be regarded as a quaint reminder of when programmers were forced to worry about details of code generation. Until that happens, programmers should be aware of how to use the new C99 inline keyword.
本月专栏的主题是当代等价于 register 关键字的:inline 关键字,它允许程序员告诉编译器现在有一些很难自动弄清楚的东西[a]。然而,在将来,比起程序员编译器也许能做出更好的内联选择。当这种情况发生时,inline 关键字也许会被视为程序员不得不关心代码生成细节的少见的提示。在这发生以前,程序员应该知道如何去使用新的C99 inline 关键字。
The optimization underlying the inline keyword is an inline function call substitution. This optimization is similar in some ways to macro expansion in that the code for a function is inserted inline at the point a function is called. Given a function:
inline 关键字优化的基础是内联函数调用替换。这种优化有点类似于宏扩展,在调用该函数的地方内联地插入函数的代码。给出一个函数:
void f(int *x, int y)and a call to that function:
{
*x = 10*y;
}
extern int a, b, c;The body of that function can be substituted for the call to that function, in effect rewriting the caller as:
void caller1()
{
a = 10*b;
f(&c, b);
}
void caller1()However, unlike macro expansion, inline substitution is not textual replacement. The compiler must be very careful to preserve the exact semantics of the function call so that the program cannot tell if the optimization was performed or not. This includes such properties of function calls as the arguments being evaluated exactly once, that variable names in the called function are distinct from the caller, and that the parameters of the called function are distinct variables from the arguments passed. Thus, the rewrite of caller1() above is actually an optimized version of the inline substitution that the compiler first performs. The compiler probably first rewrote caller1() as:
{
// after inline substitution
a = 10*b;
*&c = 10*b;
}
void caller1()Note several things about this rewrite. All parameters f became local variables of a new block representing the inline expansion. (The _F_ prefix added to variable names local to the inline avoids conflicts between the names from the inline substitution and names used in the argument expressions.) Those parameters/local variables were initialized with the values of the arguments when the block is entered. Thus, the arguments are evaluated exactly once. The local variable representing the parameters perfectly capture the semantics of parameters of functions: they act as distinct local variables that if assigned to, do not alter the original arguments “passed” to the function. Further optimizations performed by a compiler may dramatically simplify the code. For example, a common optimization is for a compiler to recognize that a variable only exists to be a copy of another variable. A similar optimization is to sometimes replace a variable with the expression that gave the variable its last value. Since the function f contains no assignments to its parameters, a compiler is likely to optimize caller1() into:
{
a = 10*b;
{
int *_F_x = &c;
int _F_y = b;
*_F_x = 10*_F_y;
}
}
void caller1()Further common optimizations are to eliminate variables that are not used, remove blocks that have no local variables, and to eliminate pairs of indirection operators immediately followed by address-of operators. Thus, after inline substitution and further optimization, caller1() performs as if it was written:
{
a = 10*b;
{
int *_F_x;
int _F_y;
*&c = 10*b;
}
}
void caller1()It is important to realize that the compiler was careful to perform the initial inline substitution in a way that preserved the semantics of a function call and then performed general optimizations, which are done to both code resulting from inlining and code written directly by the programmer, to transform the program in ways that do not alter the results. For example, if f is assigned into its parameter y, then the local variable for the parameter y and its initialization would not have been optimized away. Likewise, if the arguments had been expressions with side effects, the compiler would not have eliminated the local variables for the parameters (except if very special conditions existed). All of the optimizations are careful to preserve the meaning of the function call. Thus, a call to f is the same whether an actual call is made or the body of f is inline substituted.
{
a = 10*b;
c = 10*b;
}
if ((ptr = malloc(100)) == NULL)where the die() function prints an error message, performs a little cleanup, and then exits the program. There might be lots of calls to die(), die() might even be a very short function, but it would never pay to inline the function since it is never executed. Since the function calls are not executed, you want the calls to be as short as possible in order to minimize page faults and cache misses.
die();
inline float cube(float x) {returnEither static or extern functions may be declared inline. Unlike C++, a function declared inline without a storage-class specifier is an extern function, not a static function (more on this later.) Either a function definition or a function prototype may be declared inline. If a function prototype is declared inline, a separate definition of the function must appear somewhere else in the module if the function is called or if the function is extern. Like register, the inline keyword is only a suggestion that an optimization be performed. Some compilers might ignore it completely and never inline. Others might ignore it and inline based on criteria that usually result in best performance. Still other compilers might only honor the keyword if additional requirements are met by the program. Inlining a function call is an optimization that a compiler may perform on any call at any time. About the only requirement from the compiler’s point of view is that the compiler needs a copy of the body of the function if it is to inline a call to it. Since the optimization produces an identical result as a normal call to the function, compilers do not need any special permission to perform the optimization. In fact, for a number of years now, most compilers do inline substitution as a normal optimization. Therefore, you might find it surprising that C99 added the inline keyword. There are three reasons for this. First, while most compilers have the modern organization described at the start of this article and attempt to do some inlining automatically, there are still some compilers that are written to minimize memory use during compilation or do not attempt any automatic inlining. These compilers benefit from having an inlining hint from the programmer. For example, a small memory footprint compiler might compile a source file a function at a time and normally discard its internal representation of a function being complied after generating code for that function. The inline keyword can inform such a compiler to save its internal representation of the function so that it can inline it later. Such compilers might only honor inline for calls that appear after the definition (body) of an inline function is seen. (Most modern compilers do not have any ordering requirements.) Second, since inlining has a potential downside, compilers try to be reasonable in making decisions about which functions to inline. The programmer might determine that inlining is useful for a large function that the compiler would not automatically inline. Some compilers might honor an explicit inline request from the programmer for such functions. Third, compilers need help from programmers to handle extern inline functions because of limitations due to linkers and separate compilation. Unlike normal extern functions where the definition (body) of the function appears in only one module, extern inline functions need their definitions duplicated in every module that contains calls to the function if those calls are to be inlined. Normally this is done by putting the function definition in a header file and including it where needed so that you only have to maintain a single textual copy of the function. The ramifications of this make up the rest of this column.
x*x*x;}
static int inline h();
inline extern void g();
// mymath.hThat header can be included in as many modules as you wish. In exactly one module, you should include the header file and then declare prototypes for the functions using the extern keyword in order to get callable copies of the functions (those prototype need not repeat the inline keyword):
inline float square(float x) {return x*x;}
inline float cube(float x) {return x*x*x;}
// mymath.cC99 places a few restrictions on extern inline functions (static inline functions have no restrictions). Because the body of an extern inline function will appear in many different modules, an extern inline function may not reference static functions or objects from the surrounding scope since such objects would be different in every module:
extern float square(float x);
extern float cube(float x);
static int x;C99 also prohibits extern inline functions from declaring static objects unless they are not modifiable:
static void f();
inline void g()
{
x = 0; // invalid 无效
f(); // invalid 无效
}
inline void h()
{
static int x; // bad
static const float pi=3.1; // ok
}
Inline substitution is a general optimization that can be controlled to some extent using the C99 inline keyword. Inlining a function produces the same results as a normal call to the function, but may run faster and may permit more optimization than a normal call. Static inline functions have no special considerations, but extern inline functions require that the programmer pick one module to contain a real callable version of the function and follow some restrictions about accessing statics.
[1] Randy Meyers. “The New C: Restricted Pointers,” C/C++ Users Journal, November 2000.
Randy Meyers is a consultant providing training and mentoring in C, C++, and Java. He is the current chair of J11, the ANSI C committee, and previously was a member of J16 (ANSI C++) and the ISO Java Study Group. He worked on compilers for Digital Equipment Corporation for 16 years and was Project Architect for DEC C and C++. He can be reached at rmeyers@ix.netcom.com.
Randy Meyers 是为C、C++和JAVA提供培训和指导的顾问。他目前是ANSI C委员会J11的主席,之前是J16(ANSI C++)和ISO JAVA学习小组(ISO Java Study Group)的成员。他曾经在DEC公司(Digital Equipment Corporation)研究编译器长达16年,并且是DEC C和C++的项目架构师。可以通过以下地址与他联系:rmeyers@ix.netcom.com。
void g (int i)我按照作者的说法来理解,实参就是函数 f 中的 i ,形参则是被压入栈、值等于 i 的变量,不知道我这个理解是否正确。
{
return i * 2;
}
void f (void)
{
int i = 1;
int j = g (i);
}