[14]The New C: Inline Functions

The New C: Inline Functions

新的C语言：内联函数

By Randy Meyers, July 01, 2002

As if C weren't fast enough already, C99 supports inline functions. Faster code, anyone?

似乎C已经不够快了，于是C99支持内联函数。更快的代码，存在吗？

When C was first invented, the register keyword was a good idea. Back in those days, many compilers for most languages ran in 64 KB or less of memory. Even on mainframes, optimizing compilers (and optimizing compilers were unusual) for large languages like PL/I might run in only 256 KB of memory. The algorithms for register allocation were somewhat new and had a tendency to dramatically increase the effort to write compilers as well as the memory and execution time that compilers required. Due to the tight memory constraints, compilers tended to process each source statement in isolation from other statements. Such compilers would do all of the work of compiling a statement, from parsing to code generation, before moving to the next statement.

当C刚发明的时候，register 关键字是个好主意。回到那些日子，大多数语言编译器中的许多都运行在64KB或更少的内存中。即使是大型机，用于大型语言如 PL/I 的优化编译器（优化编译器那时候并不常见）也可能只在256KB的内存上运行。寄存器分配的算法一定程度上还是新的，并且编写编译器所需的投入以及编译所需内存和执行时间的有急剧增加的趋势。由于紧密的内存限制，编译器往往单独地处理每一条源代码语句。这样的编译器会在进入下一条语句前，做完编译前一条语句的所有工作，从分析到代码生成。

That sort of compiler organization precludes good register allocation since good register allocation requires analyzing all of the statements in a function before making any decisions. For example, the best register allocation might be to allocate no registers to variables used in the current statement because the statements that follow need the registers more for other purposes. The register keyword in C was a great help to such compilers, since it allowed the programmer to tell the compiler something that the compiler might not be able to figure out on its own.

这种编译器组织妨碍了良好的寄存器分配，因为良好的寄存器分配在做出任何选择以前需要分析一个函数中的所有语句。例如，最好的寄存器分配可能是不为当前语句中的变量分配寄存器，因为下一条语句为了其他目的更需要寄存器。C中的 register 是对这样的编译器是一个极大的帮助，因为它允许程序员告诉编译器一些它自己无法得知的信息。

Modern compilers no longer compile a statement at a time. Taking advantage of the megabytes of memory now available, compilers translate the entire source module into an internal representation, which is then repeatedly analyzed in order to make good decisions about code generation. These days, compilers are as good or better than programmers at register allocation. And thus, most modern compilers ignore the register keyword. (Actually, the C Standard requires that compilers issue a diagnostic if the address-of operator is applied to a variable declared register. Compilers note that a variable was declared register only to produce that message.)

现代编译器不再一次编译一条语句了。利用现在以百万字节计的内存，编译器把整个源代码模块翻译成一个内部表示，反复地分析这个内部表示来作出关于代码生成的良好选择。目前，编译器在寄存器分配上做得程序员一样甚至更好。因此，大部分现代编译器忽略了 register 关键字（事实上，C标准要求编译器在取址运算符作用于声明为 register 的变量上时发出一条诊断信息。编译器注意到变量声明为 register ，仅仅是发出一条信息）。

The subject of this month’s column is the modern equivalent of the register keyword: the inline keyword allows the programmer to tell the compiler something it might have a hard time figuring out automatically. However, in the future, compilers may be able to do a better job of making inline decisions than programmers. When that happens, the inline keyword might be regarded as a quaint reminder of when programmers were forced to worry about details of code generation. Until that happens, programmers should be aware of how to use the new C99 inline keyword.

本月专栏的主题是当代等价于 register 关键字的：inline 关键字，它允许程序员告诉编译器现在有一些很难自动弄清楚的东西[a]。然而，在将来，比起程序员编译器也许能做出更好的内联选择。当这种情况发生时，inline 关键字也许会被视为程序员不得不关心代码生成细节的少见的提示。在这发生以前，程序员应该知道如何去使用新的C99 inline 关键字。

Inline Substitution Optimization

内联替换优化

The optimization underlying the inline keyword is an inline function call substitution. This optimization is similar in some ways to macro expansion in that the code for a function is inserted inline at the point a function is called. Given a function:

inline 关键字优化的基础是内联函数调用替换。这种优化有点类似于宏扩展，在调用该函数的地方内联地插入函数的代码。给出一个函数：

void f(int *x, int y)
{
*x = 10*y;
}

and a call to that function:

以及一个对该函数的调用。

extern int a, b, c;
void caller1()
{
a = 10*b;
f(&c, b);
}

The body of that function can be substituted for the call to that function, in effect rewriting the caller as:

函数体可以用来替换对该函数的调用，实际上把调用者重写为：

void caller1()
{
   // after inline substitution
   a = 10*b;
   *&c = 10*b;
}

However, unlike macro expansion, inline substitution is not textual replacement. The compiler must be very careful to preserve the exact semantics of the function call so that the program cannot tell if the optimization was performed or not. This includes such properties of function calls as the arguments being evaluated exactly once, that variable names in the called function are distinct from the caller, and that the parameters of the called function are distinct variables from the arguments passed. Thus, the rewrite of caller1() above is actually an optimized version of the inline substitution that the compiler first performs. The compiler probably first rewrote caller1() as:

然而，不同于宏扩展，内联替换不是文本替换。编译器必须非常小心地保持函数调用的准确语义，使程序无法判断是否进行了该优化。这包含这些函数调用的性质：实参只能恰好求值一次、被调用函数的变量名与调用者的不同，以及被调用函数的形参与传入的参数要是不同的变量[b]。因此，上面对 caller1() 的重写实际上是对编译器首先执行内联替换后的优化版本。编译器可能刚开始这样重写 caller1()：

void caller1()
{
   a = 10*b;
   {
       int *_F_x = &c;
       int _F_y = b;
       *_F_x = 10*_F_y;
   }
}

Note several things about this rewrite. All parameters f became local variables of a new block representing the inline expansion. (The _F_ prefix added to variable names local to the inline avoids conflicts between the names from the inline substitution and names used in the argument expressions.) Those parameters/local variables were initialized with the values of the arguments when the block is entered. Thus, the arguments are evaluated exactly once. The local variable representing the parameters perfectly capture the semantics of parameters of functions: they act as distinct local variables that if assigned to, do not alter the original arguments “passed” to the function. Further optimizations performed by a compiler may dramatically simplify the code. For example, a common optimization is for a compiler to recognize that a variable only exists to be a copy of another variable. A similar optimization is to sometimes replace a variable with the expression that gave the variable its last value. Since the function f contains no assignments to its parameters, a compiler is likely to optimize caller1() into:

注意这个重写中的几点。所有 f 的形参都变成了表示内联扩展的新块中的局部变量（内联中变量名前面的 _F_ 前缀避免了内联替换名字实参表达式的名字之间的冲突）。这些形参/局部变量在进入块时以实参的值初始化。因此，形参恰好求值了一次。表示形参的局部变量完美地得到函数形参的语义：它们充当不同的局部变量，如果对其赋值，并不会修改原来“传入”函数的实参。编译器执行的进一步的优化措施可能显著地简化这些代码。例如，一个常见的优化措施是编译器察觉到一个变量只是作为另一个变量的副本存在。一个类似的优化措施是把以变量替换为给出该变量最新值的表达式。因为函数 f 不包含对其形参的赋值，编译器可能把 caller1() 优化为：

void caller1()
{
   a = 10*b;
   {
       int *_F_x;
       int _F_y;
       *&c = 10*b;
   }
}

Further common optimizations are to eliminate variables that are not used, remove blocks that have no local variables, and to eliminate pairs of indirection operators immediately followed by address-of operators. Thus, after inline substitution and further optimization, caller1() performs as if it was written:

进一步的常见优化是消除未使用的变量，移除没有局部变量的快，并清除在间接访问运算符紧接出现取址运算符的运算符对。因此，在内联替换和进一步优化后，caller1() 工作起来如同它是这样编写的：

void caller1()
{
a = 10*b;
c = 10*b;
}

It is important to realize that the compiler was careful to perform the initial inline substitution in a way that preserved the semantics of a function call and then performed general optimizations, which are done to both code resulting from inlining and code written directly by the programmer, to transform the program in ways that do not alter the results. For example, if f is assigned into its parameter y, then the local variable for the parameter y and its initialization would not have been optimized away. Likewise, if the arguments had been expressions with side effects, the compiler would not have eliminated the local variables for the parameters (except if very special conditions existed). All of the optimizations are careful to preserve the meaning of the function call. Thus, a call to f is the same whether an actual call is made or the body of f is inline substituted.

编译器小心地进行初始的内联替换，某种程度上是为了维护函数调用的语义然后进行通用的优化，这些优化同时作用于内联生成的代码以及程序员直接写下的代码，把程序转换为不会修改结果的方式，认识到这一点很重要。例如，如果 f 对它的形参 y 赋值，那么用于形参 y 的局部变量及其初始化不会被优化掉。同样的，如果形参有副作用，编译器也不会消除用于该形参局部变量（一些很特殊的情况除外）。所有这些优化措施都是为了小心地维持函数调用的意义。因此，不管是实际调用还是内联了 f 的函数体，对函数 f 的调用都是一样的，

Inline Advantages

内联的优点

Inline substitution can pay off in several ways. First, it can eliminate the overhead in doing a function call. When a function is called, the following steps are usually taken:

Argument values are copied to the stack or special registers.
A return address is created and stored on the stack or to a register.
The program branches to the function.
A stack frame is set up for the local variables of the function.
After the function finishes, the stack frame is torn down.
The return address is retrieved.
A branch is made to the return address.

内联替换可以得到几方面的好处。首先，它可以消除函数调用的开销。当调用一个函数时，通常会发生下面几步：

实参的值被复制到栈上或特殊的寄存器中。
创建一个返回地址并存入栈上或是一个寄存器中。
程序分支跳转到这个函数。
为函数的局部变量设置一个栈桢。
函数返回后，该栈桢被销毁。
恢复返回地址。
分支跳转到返回地址。

This overhead can be a sizable percentage of the execution of very small functions. As we have already seen, inlining frequently can eliminate the copies of the arguments. There is no return address or branches since the inline substitution is straight-line code. (On many modern machines, branches are expensive because they tend to disrupt the instruction pipeline.) Even allocating the space for the stack frame may be eliminated or folded into setting up the stack frame for the enclosing block by the optimizer. Inline substitution allows the optimizer to do a better job. For example, the expression 10*b is a common sub-expression that occurs both in the original body of caller1() and the inline substituted code. After inlining, the optimizer can recognize that 10*b has the same value in both places and compute that expression only once and use it twice. Likewise, if the call f(&c, 10) was made, the compiler could perform the arithmetic in the assignment to c at compile time. Inline substitution can also aid register allocation. First, by analyzing both the caller and the inlined body of a called function, the register allocator can do a better job. Second, many calling standards set aside some number of temporary registers that are not saved and restored when calling a function. These registers are used to hold intermediate results and common sub-expressions, but their values are considered to have been lost if a function call occurs since the called function might have used those temporary registers for its own purposes. If a function call is inlined, then there is no actual function call, and the compiler can determine whether any temporary register actually was reused for another purpose. This allows the compiler to manage the temporary registers more efficiently. Inlining also enables more opportunities for superscalar optimizations [1]. Modern processors run at several times the speed of memory, and loading a value from memory can be the slowest instruction. To get around this problem, the load instruction does not wait for the fetch from memory to complete before allowing the next instruction to execute. The processor will execute instructions following the load while the memory system produces the requested value in parallel. If any instruction attempts to use the register that the memory was fetched into before the load completes, the processor will stall waiting on the load to finish. On the other hand, if the load finishes before any instruction attempts to use the register, then the machine never slows down. To maximize the speed of the machine, compilers attempt to move load instructions to earlier points in the program. A compiler cannot move the first loads in a function to before the function is called, but it can move the first loads in an inline substitution into code before the inlined call.

这个开销对小函数的执行来说可以占相当大的比例。如我们已经看到的，经常的内联可以消除参数的复制。没有返回地址或是分支跳转，因为内联替换是直线式的代码（在许多现代机器上，分支跳转是昂贵的，因为它们往往打断了指令流水线）。即使是为栈桢分配空间也可以消除，或者折叠成由优化程序为封闭的块设置栈桢。内联替换可以让优化程序工作得更好。例如，表达式 10*b 是一个在 caller1() 原来的函数体以及内联替换代码中都出现的公共子表达式。在内联以后，优化程序注意到 10*b 在两个地方都有相同的值，于是只计算一次这个表达式的值而使用两次。同样的，如果调用 f(&c, 10)，编辑器可以在编译时就执行为 c 赋值的运算。内联替换还有助于寄存器分配。首先，通过分析调用者和被调用函数的内联函数体，寄存器分配器可以工作得更好。其次，许多调用标准在调用函数时留出一些临时寄存器，它们不会被保存和恢复[c]。这些寄存器用来保存中间结果以及公共的子表达式，但是发生函数调用时，它们的值被认为是丢失了，因为被调用的函数可能为其自身目的而使用了这些临时寄存器。如果函数调用被内联了，那么没有实际上的函数调用，编译器可以决定任何临时寄存器是否可以重用于别的目的。这让编译器更有效地管理临时寄存器。内联还为超标量优化[1]带来更多机会。现代处理器的运行速度是内存的数倍，从内存加载一个值是最慢的指令。为了解决这个问题，加载指令不等待从内存取值完成就执行下一条指令。当内存系统产生需要的值时，处理器将并行地执行加载指令后面的指令。如果任何指令在加载完成前尝试使用将要保存内存取出值的寄存器，处理器将暂停（stall）等待加载完成。另一方面，如果加载在任何尝试使用该寄存器的指令前就完成了，那么机器绝不会变慢。为了最大限度提高机器的速度，编译器尝试把加载指令移动到程序中较前的位置。编译器不能把函数中的第一条加载移动到调用函数以前，但是可以把内联替换中的第一条加载移动到调用该内联前的代码中。

Inline Disadvantages

内联的缺点

The primary disadvantage to inline substitution is that it usually makes the program code bigger. In extreme cases, this can degrade program performance by increasing page faults and cache misses. Reading a page from the disk may take as long as executing hundreds of thousands of instructions. Poor cache performance may slow down a program by a factor of two. Reasonable care must be taken when inlining not to make the program so big that either paging or caching problems dominate the execution time. There are also functions that it does not pay to inline. Consider:

内联替换的主要缺点是它通常使程序代码变得更大。在极端的情况下，这会因增加页面错误（page faults）以及缓存不命中而降低程序性能。从硬盘读取一页也许跟执行成百上千条指令一样长。糟糕的缓存性能可能使程序性能降低两个系数。内联时必须采取合理的关心，不要使程序太大以至于页面或缓存问题影响执行时间。还有一些函数不值得内联。考虑：

if ((ptr = malloc(100)) == NULL)
die();

where the die() function prints an error message, performs a little cleanup, and then exits the program. There might be lots of calls to die(), die() might even be a very short function, but it would never pay to inline the function since it is never executed. Since the function calls are not executed, you want the calls to be as short as possible in order to minimize page faults and cache misses.

其中函数 die() 打印一条错误信息，执行一些清理，然后退出程序。可能有许多对 die() 的调用，甚至 die() 也可能是非常短的函数。但是永远不值得内联这个函数，因为它绝不会执行[d]。因为这个函数没有执行，你会想让这个调用尽可能的短来最小化页面错误和缓存不命中。

inline Keyword

内联关键字

C99 has added a new keyword, inline, which allows the programmer to hint that calls to that function should be inlined. The inline keyword may appear anywhere among the storage class specifiers, type specifiers, or type modifiers at the start of a declaration of a function. Some examples:

C99增加了一个新的关键字，inline，它允许程序员提示对这个函数的调用应该要内联。inline 关键字可以出现在函数声明起始的任何存储类型说明符、类型说明符、或是类型修饰符之间。一些例子：

inline float cube(float x) {return
x*x*x;}
static int inline h();
inline extern void g();

Either static or extern functions may be declared inline. Unlike C++, a function declared inline without a storage-class specifier is an extern function, not a static function (more on this later.) Either a function definition or a function prototype may be declared inline. If a function prototype is declared inline, a separate definition of the function must appear somewhere else in the module if the function is called or if the function is extern. Like register, the inline keyword is only a suggestion that an optimization be performed. Some compilers might ignore it completely and never inline. Others might ignore it and inline based on criteria that usually result in best performance. Still other compilers might only honor the keyword if additional requirements are met by the program. Inlining a function call is an optimization that a compiler may perform on any call at any time. About the only requirement from the compiler’s point of view is that the compiler needs a copy of the body of the function if it is to inline a call to it. Since the optimization produces an identical result as a normal call to the function, compilers do not need any special permission to perform the optimization. In fact, for a number of years now, most compilers do inline substitution as a normal optimization. Therefore, you might find it surprising that C99 added the inline keyword. There are three reasons for this. First, while most compilers have the modern organization described at the start of this article and attempt to do some inlining automatically, there are still some compilers that are written to minimize memory use during compilation or do not attempt any automatic inlining. These compilers benefit from having an inlining hint from the programmer. For example, a small memory footprint compiler might compile a source file a function at a time and normally discard its internal representation of a function being complied after generating code for that function. The inline keyword can inform such a compiler to save its internal representation of the function so that it can inline it later. Such compilers might only honor inline for calls that appear after the definition (body) of an inline function is seen. (Most modern compilers do not have any ordering requirements.) Second, since inlining has a potential downside, compilers try to be reasonable in making decisions about which functions to inline. The programmer might determine that inlining is useful for a large function that the compiler would not automatically inline. Some compilers might honor an explicit inline request from the programmer for such functions. Third, compilers need help from programmers to handle extern inline functions because of limitations due to linkers and separate compilation. Unlike normal extern functions where the definition (body) of the function appears in only one module, extern inline functions need their definitions duplicated in every module that contains calls to the function if those calls are to be inlined. Normally this is done by putting the function definition in a header file and including it where needed so that you only have to maintain a single textual copy of the function. The ramifications of this make up the rest of this column.

static 或是 extern 函数都可以声明为 inline。不同于C++，一个声明为 inline、不带存储类型说明符的函数是一个 extern 函数，而不是 static 函数（后面有更多这样的例子）。函数定义或是函数原型都可以声明为 inline。如果函数原型声明为 inline，一个单独的函数定义必须出现在模块中的某个地方，如果这函数被调用了或者这个函数是 extern 的话。就像 register， inline 关键字只是一个应该进行优化的建议。一些编译器可能会忽略它们并且永远不内联。另外一些可能会基于通常产生最优结果的条件而忽略它。其他的编译器可能会兑现这个关键字，如果程序满足了附加的要求。内联一个函数调用是编译器可能在任何时候都可能进行的优化。如果需要内联调用要给函数，从编译器的角度来看，唯一的要求就是编译器需要一份该函数体的副本。由于优化程序产生了跟正常调用函数一样的结果，编译器不需要任何特殊许可就可以进行优化。事实上这几年来，大多数编译器都把内联替换作为一种常用的优化措施。因此，你可能会对C99加入 inline 关键字感到奇怪。这又三个原因。首先，当大多数编译器具有文章开头所说的现代机制、尝试自动做一些内联时，还有一些以编译时最小化内存使用或是不尝试任何自动内联的方式编写的编译器。这些编译器得益于来自程序员的内联提示。例如，一个小内存占用的编译器可能一次编译一个源文件中的一个函数，并且在函数的代码生成以后，其中间结果通常会被丢弃。inline 关键字可以通知这样的编译保存在函数的代码生成以后保存其中间结果（大多数现代编译器没有顺序要求）。其次，由于内联有一个潜在的不利因素，编译器尝试合理地决定哪些函数要内联。程序员可能觉得内联一个编译器不会自动内联的大函数很有用。一些编译器将会兑现程序员对这些函数显示的内联请求。最后，编译器需要程序员的帮助来处理 extern inline 函数，由于链接器和分开编译的限制。不同于通常的定义（函数体）只出现在一个模块中的 extern 函数，每一个包含对 extern inline 函数内联调用的模块都需要复制一份这些函数的定义。通常这是通过把函数定义放入头文件中，并在需要的地方包含它，这样你就只需要维护这些函数的一个文本副本。对此衍生的结果构成了本专栏的剩余部分。

extern inline

The duplication of the extern inline function causes problems for the compiler. Under some circumstances, the compiler needs to produce a real, callable copy of an inline function. This might happen because some of the calls were not inlined, because the address of the function was taken so that it could be called through a pointer, or because the inline function was recursive. (Even if a compiler inlines a recursive function in itself a few times, at some point the compiler must generate a real call to the function or compile forever). The problem is how to pick an object file to contain the code for the callable function, which must have a real external name for the linker and a unique address in the program. C99 and C++ have solved this problem differently. C++ requires that the C++ implementation find some way to automatically pick. In the long term, this solution is probably the right one since it is convenient for the programmer, and eventually the tools used to build programs will handle this gracefully. But, currently this approach has some rough edges. Some C++ compilers solve this problem by always generating a real callable copy of the function in every module that contains a copy of the definition. The linker is modified to throw away silently all but one copy of the function code. The disadvantages of this approach are that it slows down every compilation to produce the callable copies of the functions, it makes object files larger with the redundant copies, and it slows down the linker who must read and discard the extra copies. Other C++ compilers generate a callable copy in the first module ever compiled that contains a definition of the function. The compiler must then maintain a database to be consulted when compiling every module that tells of compilation decisions made in all of the other modules. The contents of an object file depend not only on the source code of the module, but also which other modules have been compiled in the order they were compiled. If the same module is part of two different programs, you may need two different object files for the same module in order to record different decisions about who is responsible for the callable version of an extern inline function. C99 took an alternative approach: it requires the programmer to pick a module to contain the callable copy of an extern inline function. By default in C99, inline functions without any storage class are extern. If all of the declarations of an inline function in a module lack a storage class specifier, then the function is an extern inline function, and the module will not produce a callable copy of the function. On the other hand, if one declaration of the inline function in the module explicitly contains the keyword extern, then that module will produce a callable copy of the function. This leads to the following source code organization for C99. Put definitions of extern inline functions in header files and do not use the keyword extern. For example, mymath.h might include:

重复的 extern inline 对编译器会引起问题。在某些情况下，编译器需要产生一个真实的、可调用的 inline 函数副本。这可能会因这些原因而发生：这个调用没有内联、取了函数的地址以便用指针调用它、或是因为该 inline 函数是递归的（即使编译器为函数内联了几次递归函数，某个时候编译器也不得不产生一个对该函数的递归调用，不然就会不停的编译下去）。问题是如何选择一个目标文件来包含该可调用函数的目标文件，它必须对链接器有一个真正的外部名称、并在程序中有一个唯一的地址。C99 和 C++ 用不同的方法解决了这个问题。C++ 要求 C++ 实现找到某种方法自动地选择。从长远来看，这种解决办法可能是正确的因为它对程序员来说很方便，并且最终用来构建程序的工具将会优雅地处理这些。但是，目前这种方法有一些恶劣的地方。一些C++编译器解决了这个问题，通过为每一个包含函数定义副本的模块生成一个真实可调用的函数副本。链接器被修改为除了保留一份外、悄悄地丢弃所有函数代码的副本。这种方法的缺点是产生可调用函数副本减慢了每一次编译速度，它因冗余的副本使目标文件变得更大，它也减慢了不得不读入并丢弃多余副本的链接器。另一些C++编译器在第一个包含函数定义的模块中生成可调用副本。编译器必须在编译每一个模块时维护一个数据库，让其他模块查阅其中作出的编译选择。一个目标文件的内容不仅依赖于模块的源代码，还依赖于按照顺序它们编译时已经编译了哪些模块。如果同一个模块是两个不同程序的一部分，你可能要需要这个相同模块的两个不同目标文件，来记录关于谁为可调用 extern inline 函数负责的不同选择。C99采用了一种替换的方法：它需要程序员选择一个模块来包含 extern inline 函数的可调用副本。C99 中默认，不带任何存储类型的 inline 函数是 extern 的。如果模块中一个 inline 函数的所有声明都没有存储类型说明符，那么这个函数是一个 extern inline 函数，并且这个模块将不会产生该函数的可调用副本。另一方面，如果模块中一个 inline 函数显式地包含了关键字 extern，那么这个模块将会产生该函数的可调用副本。这为C99带来下面的源代码组织。把 extern inline 函数的声明放在头文件中，并且不使用关键字 extern。例如，mymath.h 可能包含：

// mymath.h
inline float square(float x) {return x*x;}
inline float cube(float x) {return x*x*x;}

That header can be included in as many modules as you wish. In exactly one module, you should include the header file and then declare prototypes for the functions using the extern keyword in order to get callable copies of the functions (those prototype need not repeat the inline keyword):

如你所愿，这头文件可以包含在许多模块中。只有在一个模块中，你因该包含这个头文件，然后使用 extern 关键字来声明函数原型，来得到该函数的一份可调用副本（这些原型不需要再重复 inline 关键字）。

// mymath.c
extern float square(float x);
extern float cube(float x);

C99 places a few restrictions on extern inline functions (static inline functions have no restrictions). Because the body of an extern inline function will appear in many different modules, an extern inline function may not reference static functions or objects from the surrounding scope since such objects would be different in every module:

C99 给了 extern inline 函数一些约束（static inline 函数则没有）。因为 extern inline 函数的函数体将出现在许多模块中，一个 extern inline 函数不能引用 static 函数或是来自括号范围内的对象，因为这样的对象在每一个模块中都不同。

static int x;
static void f();
inline void g()
{
x = 0; // invalid 无效
f(); // invalid 无效
}

C99 also prohibits extern inline functions from declaring static objects unless they are not modifiable:

C99还禁止 extern inline 函数中声明 static 对象，除非它们是不可修改的。

inline void h()
{
static int x; // bad
static const float pi=3.1; // ok
}

Summary

总结

Inline substitution is a general optimization that can be controlled to some extent using the C99 inline keyword. Inlining a function produces the same results as a normal call to the function, but may run faster and may permit more optimization than a normal call. Static inline functions have no special considerations, but extern inline functions require that the programmer pick one module to contain a real callable version of the function and follow some restrictions about accessing statics.

内联替换是一个通用的优化措施，使用C99的 inline 关键字可以一定程度上控制它。内联一个函数产生跟正常调用该函数相同的结果，但是可能运行得更快、并且相对于正常调用来说允许等多的优化。静态 inline 函数不需要特殊考虑，但是 extern inline 函数需要程序员选择一个模块来包含一个真实可调用函数版本，并遵循跟访问 static 一样的限制。

Reference

[1] Randy Meyers. “The New C: Restricted Pointers,” C/C++ Users Journal, November 2000.

Randy Meyers is a consultant providing training and mentoring in C, C++, and Java. He is the current chair of J11, the ANSI C committee, and previously was a member of J16 (ANSI C++) and the ISO Java Study Group. He worked on compilers for Digital Equipment Corporation for 16 years and was Project Architect for DEC C and C++. He can be reached at rmeyers@ix.netcom.com.

Randy Meyers 是为C、C＋＋和JAVA提供培训和指导的顾问。他目前是ANSI C委员会J11的主席，之前是J16（ANSI C＋＋）和ISO JAVA学习小组（ISO Java Study Group）的成员。他曾经在DEC公司（Digital Equipment Corporation）研究编译器长达16年，并且是DEC C和C＋＋的项目架构师。可以通过以下地址与他联系：rmeyers@ix.netcom.com。

注释

[a] 我没太弄明白这里 register 和 inline 是怎样的类比关系，如果有人恰好知道，请告诉我。
[b] 读到这个地方，我又不得不继续探究形参和实参是什么样的关系。假如有这样两个函数：

void g (int i)
{
    return i * 2;
}

void f (void)
{
    int i = 1;
    int j = g (i);
}

我按照作者的说法来理解，实参就是函数 f 中的 i ，形参则是被压入栈、值等于 i 的变量，不知道我这个理解是否正确。
[c] gcc x86 的函数调用中，eax、ecx、edx是调用者保存的寄存器，即文中所说的那些寄存器；ebx、esi、edi是被调用者保存的寄存器。
[d] 有了虚拟存储器机制，就我所见过的情况来说，malloc() 从来没失败过。

原文地址

http://www.drdobbs.com/cpp/184401540