当前位置: 首页 > >

Certificate Translation for Optimizing Compilers (extended abstract)

发布时间:

Certi?cate Translation for Optimizing Compilers
(extended abstract)
Gilles Barthe, Benjamin Gr?egoire, C?esar Kunz, and Tamara Rezk
INRIA Sophia-Antipolis, Project EVEREST, {Gilles.Barthe,Benjamin.Gregoire,Cesar.Kunz,Tamara.Rezk}@sophia.inria.fr
Abstract. Certifying compilation provides a means to ensure that untrusted mobile code satis?es its functional speci?cation. A certifying compiler generates code as well as a machine-checkable “certi?cate”, i.e. a formal proof that establishes adherence of the code to speci?ed properties. While certi?cates for safety properties can be built fully automatically, certi?cates for more expressive and complex properties often require the use of interactive code veri?cation. We propose a technique to provide code consumers with the bene?ts of interactive source code veri?cation. Our technique, certi?cate translation, extends program transformations by o?ering the means to turn certi?cates of functional correctness for programs in high-level languages into certi?cates for executable code. The article outlines the principles of certi?cate translation, using speci?cations written in ?rst order logic. This translation is instantiated for standard compiler optimizations in the context of an intermediate RTL Language.
1 Introduction
Program veri?cation environments provide a means to establish that programs meet their speci?cations, and are increasingly being used to validate safetycritical or security-critical software. Most often, such environments target highlevel languages. However it is usually required to achieve correctness guarantees for compiled programs, especially in the context of mobile code—because code consumers may not have access to the source program or, if they do, may not trust the compiler. Yet there is currently no mechanism for bringing the bene?ts of interactive source code veri?cation to code consumers. The objective of our work is precisely to propose such a mechanism, called certi?cate translation, which allows us to transfer evidence from source programs to compiled programs.
The starting point of our work is Proof Carrying Code (PCC) [9], which provides a means to establish trust in a mobile code infrastructure, by requiring that mobile code is sent along with a formal proof (a.k.a. certi?cate) showing its adherence to a property agreeable by the code consumer. While PCC does not preclude generating certi?cates from interactive veri?cation of source programs,
This work is partially funded by the IST European Project Mobius.

the prominent approach to certi?cate generation is certifying compilation [11], which constructs automatically certi?cates for safety properties such as memory safety or type safety. Certifying compilation is by design restricted to a speci?c class of properties and programs—a deliberate choice of the authors [11] whose primary goal was to reduce the burden of veri?cation on the code producer side. In contrast, certi?cate translation is by design very general and can be used to enforce arbitrary properties on arbitrary programs. Of course, generality comes at the cost of automation, so we must assume that programs have been annotated and proved interactively.
Thus the primary goal of certi?cate translation is to transform certi?cates of original programs into certi?cates of compiled programs. Given a compiler . , a function . spec to transform speci?cations, and certi?cate checkers (expressed as a ternary relation “c is a certi?cate that P adheres to φ”, written c : P |= φ), a certi?cate translator is a function . cert such that for all programs p, policies φ, and certi?cates c,
c : p |= φ =? c cert : p |= φ spec
The paper outlines the principles of certi?cate translation, and illustrates its mechanisms in the context of an an optimizing compiler for a Register Transfer Language (RTL). The compiler proceeds in a step by step fashion. For each optimization step, we build an appropriate certi?cate translator, and combine them to obtain a certi?cate translator for the complete compilation process.
Building a certi?cate translator for a non-optimizing compiler is relatively simple to construct since proof obligations are preserved (up to minor di?erences). Dealing with optimizations is more challenging. The major di?culty arises from the fact that certi?cate translators for optimizations often take as argument, in addition to the certi?cate of the original program, a certi?cate of the results of the analysis that justi?es the optimization. In order to enable such an aggregation, one must therefore express the results of the analysis in the logic of the PCC architecture, and enhance the analyzer so that it produces a certi?cate of the analysis for each program. The overall architecture of a certi?cate translator is given in Figure 1.
Contents. Sections 2 introduces our programming language RTL and our PCC infrastructure. Section 3 provides a high-level overview of the principles and components that underline certi?cate translation, whereas Section 4 describe certi?cate translators for several standard optimizations (at RTL level). In section 5 we compare our work with recent related developments. We conclude in Section 6 with future work.
2 Setting
RTL Language. Our language RTL (Register Transfer Language) is a low-level, side-e?ect free, language with conditional jumps and function calls, extended with annotations drawn from a suitable assertion language. The choice of the

Analyzer

Optimizing Compiler
Certifying Analyzer

Program f
Speci?cation of f
Certi?cate of Correctness for f

Program f
Results of Analysis
Proof of Analysis for f

Certi?cate Translator
Optimized Program f? Speci?cation of f? and Results of Analysis Certi?cate of Correctness for f?

TCB
VC Gen
Proof Checker

Fig. 1. Overall picture of certi?cate translation

assertion language does not a?ect our results, provided assertions are closed under the connectives and operations that are used by the veri?cation condition generator.
The syntax of expressions, formulas and RTL programs (suitably extended to accommodate certi?cates, see Subsection 2), is shown in Figure 2, where n ∈ N and r ∈ R, with R an in?nite set of register names. We let φ and ψ range over assertions.

comparison expressions
assertions comparisons
operators instr. desc. instructions
fun. decl program

::=< | ≤ | = | ≥ | > e ::= n | r | ?e | e + e | e ? e | . . . φ ::= | e e | ?φ | ?r, φ | . . . cmp ::= r r | r n op ::= n | r | cmp | n + r | . . . ins ::= rd := op, L | rd := f (r), L | cmp ? Lt : Lf | return r | nop, L I ::= (φ, ins) | ins F ::= {r; ?; G; ψ; λ; Λ} p ::= f → F

Fig. 2. Syntax of RTL

A program p is de?ned as a function from RTL function identi?ers to function declarations. We assume that every program comes equipped with a special function, namely main, and its declaration. A declaration F for a function f includes its formal parameters r, a precondition ?, a (closed) graph code G, a postcondition ψ, a certi?cate λ, and a function Λ from reachable labels to certi?cates (the notion of reachable label is de?ned below). For clarity, we often

use in the sequel a subscript f for referring to elements in the declaration of a function f , e.g. the graph code of a function f as Gf .
As will be de?ned below, the VCGen generates one proof obligation for each program point containing an annotation and one proof obligation for the entry point Lsp. The component λ certi?es the latter proof obligation and Λ maps every program point that contains and assertion to the proof of its related proof obligation.
Formal parameters are a list of registers from the set R, which we suppose to be local to f . For speci?cation purposes, we introduce for each register r in r a (pseudo-)register r?, not appearing in the code of the function, and which represents the initial value of a register declared as formal parameter. We let r? denote the set {r? ∈ R | r ∈ r}. We also introduce a (pseudo-)register res, not appearing in the code of the function, and which represents the result or return value of the function. The annotations ? and ψ provide the speci?cation of the function, and are subject to well-formedness constraints. The precondition of a function f , denoted by function pre(f ), is an assertion in which the only registers to occur are the formal parameters r; in other words, the precondition of a function can only talk about the initial values of its parameters. The postcondition of a function f , denoted by function post(f ), is an assertion1 in which the only registers to occur are res and registers from r?; in other words, the postcondition of a function can only talk about its result and the initial values of its parameters.
A graph code of a function is a partial function from labels to instructions. We assume that every graph code includes a special label, namely Lsp, corresponding to the starting label of the function, i.e. the ?rst instruction to be executed when the method is called. Given a function f and a label L in the domain of its graph code, we will often use f [L] instead of Gf (L), i.e. application of code graph of f to L.
Instructions are either instruction descriptors or pairs consisting of an annotation and an instruction descriptor. An instruction descriptor can be an assignment, a function call, a conditional jump or a return instruction. Operations on registers are those of standard processors, such as movement of registers or values into registers rd := r, and arithmetic operations between registers or between a register and a value. Furthermore, every instruction descriptor carries explicitly its successor(s) label(s); due to this mechanism, we do not need to include unconditional jumps, i.e. “goto” instructions, in the language. Immediate successors of a label L in the graph of a function f are denoted by the set succf (L). We assume that the graph is closed; in particular, if L is associated with a return instruction, succf (L) = ?.
Veri?cation Condition Generator. Veri?cation condition generators (VCGens) are partial functions that compute, from a partially but su?ciently an-
1 Notice that a postcondition is not exactly an assertion in the sense that it uses register names from r?, which must not appear in preconditions or annotations of the program.

notated program, a fully annotated program in which all labels of the program have an explicit precondition attached to them. Programs in the domain of the VCGen function are called well annotated and can be characterized by an inductive de?nition. Our de?nition is decidable and does not impose any speci?c structure on programs.
De?nition 1 (Well Annotated Program).
– A label L in a function f reaches annotated labels, if its associated instruction contains an assertion, or if its associated instruction is a return (in that case the annotation is the post condition), or if all its immediate successors reach annotated labels:
f [L] = (φ, ins) ? L ∈ reachAnnotf f [L] = return r ? L ∈ reachAnnotf (?L ∈ succf (L), L ∈ reachAnnotf ) ? L ∈ reachAnnotf
– A function f is well annotated if every reachable point from starting point Lsp reaches annotated labels. A program p is well annotated if all its functions are well annotated.
Given a well-annotated program, one can generate an assertion for each label, using the assertions that were given or previously computed for its successors. This assertion represents the precondition that an initial state before the execution of the corresponding label should satisfy for the function to terminate in a state satisfying its postcondition.
The computation of the assertions for the labels of a function f is performed by a function vcgf , and proceeds in a modular way, using annotations from the function f under consideration, as well as the preconditions and postconditions of functions called by f . The de?nition of vcgf (L) proceeds by cases: if L points to an instruction that carries an assertion φ, then vcgf (L) is set to φ; otherwise, vcgf (L) is computed by the function vcgifd. The formal de?nitions of vcgf and

vcgf (L) = φ

if Gf (L) = (φ, ins)

vcgf (L) = vcgifd(ins) if Gf (L) = ins

vcgifd(rd := op, L) = vcgf (L){rd ← op }

vcgifd(rd := g(r), L) = pre(g){rg ← r} ∧(?res. post(g){r?g ← r} ? vcgf (L){rd ← res})

vcgifd(cmp ? Lt : Lf ) = ( cmp ? vcgf (Lt)) ∧ (? cmp ? vcgf (Lf ))

vcgifd(return r) = post(f ){res ← r}

vcgifd(nop, L) = vcgf (f [L])

Fig. 3. Veri?cation condition generator

vcgifd are given in Figure 3, where e{r ← e } stands for substitution of all occurrences of register r in expression e by e . The de?nition of vcgifd is standard for

assignment and conditional jumps, where op and cmp is the obvious interpretation of operators in RTL into expressions in the language of assertions. For a function invocation, vcgifd(rd := g(r), L) is de?ned as a conjunction of the precondition in the declaration of g where formal parameters are replaced by actual parameters, and of the assertion ?res. post(g){r?g ← r} ? vcgf (L){rd ← res}. The second conjunct permits that information in vcgf (L) about registers di?erent from rd is propagated to other preconditions. In the remainder of the paper, we shall abuse notation and write vcgifd(ins) or vcgifd(L) instead of vcgifd(ins, L ) if f [L] = ins, L and neither L or ins are relevant to the context.
Certi?ed Programs. Certi?cates provide a formal representation of proofs, and are used to verify that the proof obligations generated by the VCGen hold. For the purpose of certi?cate translation, we do not need to commit to a speci?c format for certi?cates. Instead, we assume that certi?cates are closed under speci?c operations on certi?cates, which are captured by an abstract notion of proof algebra.
Recall that a judgment is a pair consisting of a list of assertions, called context, and of an assertion, called goal. Then a proof algebra is given by a set-valued function P over judgments, and by a set of operations, all implicitly quanti?ed in the obvious way. The operations are standard (given in Figure 4), to the exception perhaps of the substitution operator that allows to substitute selected instances of equals by equals, and of the operator ring, which establishes all ring equalities that will be used to justify the optimizations.
axiom : P(Γ ; A; ? A) ring : P(Γ n1 = n2) if n1 = n2 is a ring equality intro? : P(Γ ; A B) → P(Γ A ? B) elim? : P(Γ A ? B) → P(Γ A) → P(Γ B) elim= : P(Γ e1 = e2) → P(Γ A{r ← e1}) → P(Γ A{r ← e2}) subst : P(Γ A) → P(Γ {r ← e} A{r ← e})
Fig. 4. Proof Algebra (excerpts)
As a result of working at an abstract level, we do not provide an algorithm for checking certi?cates. Instead, we take P(Γ φ) to be the set of valid certi?cates of the judgment Γ φ. In the sequel, we write λ : Γ φ to express that λ is a valid certi?cate for Γ φ, and use proof as a synonym of valid certi?cate.
De?nition 2 (Certi?ed Program).
– A function f with declaration {r; ?; G; ψ; λ; Λ} is certi?ed if: ? λ is a proof of ? ? vcgf (Lsp){r? ← r}, ? Λ(L) is a proof of φ ? vcgifd(ins) for all reachable labels L in f such that f [L] = (φ, ins).
– A program is certi?ed if all its functions are.

The veri?cation condition generator is sound, in the sense that if the program p is called with registers set to values that verify the precondition of the function main, and p terminates normally, then the ?nal state will verify the postcondition of main.
3 Principles of Certi?cate Translation
In a classical compiler, transformations operate on unannotated programs, and are performed in two phases: ?rst, a data ?ow analysis gathers information about the program. Then, on the basis of this information, (blocks of) instructions are rewritten. In certi?cate translation, we may also rewrite assertions, and we must also generate certi?cates for the optimized programs.
Certi?cate translation is tightly bound to the optimizations considered. According to di?erent optimizations, certi?cate translators fall in one of the three categories:
– PPO/IPO (Preservation/Instantiation of Proof Obligations): PPO deals with transformations for which the annotations are not rewritten, and where the proof obligations (for the original and transformed programs) coincide. This category covers transformations such as non-optimizing compilation and unreachable code elimination. IPO deals with transformations where the annotations and proof obligations for the transformed program are instances of annotations and proof obligations for the original program, thus certi?cate translation amounts to instantiating certi?cates. This category covers dead register elimination and register allocation;
– SCT (Standard Certi?cate Translation): SCT deals with transformations for which the annotations are not rewritten, but where the veri?cation conditions do not coincide. This category covers transformations such as loop unrolling and in-lining;
– CTCA (Certi?cate Translation with Certifying Analyzers): CTCA deals with transformations for which the annotations need to be rewritten, and for which certi?cate translation relies on having certi?ed previously the analysis results used by the transformation. This category covers constant propagation, common subexpression elimination, loop induction, and other optimizations that rely on arithmetic.
For simplicity, assume for a moment that the transformation ?. does not modify the set of reachable annotated labels. Then certi?cate translation may be achieved by de?ning two functions:
T0 : P( pre(f ) ? vcgifd(Lsp)) → P( pre(f?) ? vcgif?d(Lsp)) Tλ : ?L, P( φL ? vcgifd(L)) → P( φ?L ? vcgif?d(L))
where f? is the optimized version of f , and φL is the original assertion at label L, and φ?L is the rewritten assertion at label L. Here the function T0 transforms the proof that the function precondition implies the veri?cation condition at

program point Lsp for f into a proof of the same fact for f?, and likewise, the function Tλ transforms for each reachable annotated label L the proof that its annotation implies the veri?cation condition at program point L for f into a proof of the same fact for f?.
In the remainder of this section, we justify the need for certifying analyzers, and show how they can be used for speci?c transformations. The following example, which will be used as a running example in the subsequent paragraphs, illustrates the need for certifying analyzers.

Example 1. Let f be a certi?ed function with speci?cation: pre(f ) ≡ and post(f ) ≡ res ≥ b ? n, where b and n are constants. The graph code of f and its proofs obligations are given by:

L1 : ri := 0, L2 L2 : ξ, r1 := b + ri, L3 L3 : ri := c + ri, L4 L4 : rj := r1 ? ri, L5 L5 : ?, (ri = n) ? L6 : L3 L6 : return rj

?0≥0 ξ?φ ? ? (ri = n ? φt ∧ ri = n ? φf )

where, ξ 0 ≤ ri and ? rj = r1 ? ri ∧ r1 ≥ b ∧ ri ≥ 0 and

φ (b + ri) ? (c + ri) = (b + ri) ? (c + ri) ∧ b + ri ≥ b ∧ c + ri ≥ 0 φt rj ≥ b ? n φf r1 ? (c + ri) = r1 ? (c + ri) ∧ r1 ≥ b ∧ c + ri ≥ 0
Suppose that constant propagation is applied to the original program, substituting an occurrence of r1 with b and b + ri with b, as shown in program (a) in Figure 5. If we do not rewrite assertions, that is we let ξcp = ξ and ?cp = ? then the third proof obligation is ? ? (ri = n ? φt ∧ ri = n ? φf ), where φf b ? (c + ri) = r1 ? (c + ri) ∧ r1 ≥ b ∧ c + ri ≥ 0 cannot be proved since there is no information about the relation between r1 and b. A fortiori the certi?cate of the original program cannot be used to obtain a certi?cate for the optimized program.

Motivated by the example above, optimized programs are de?ned augmenting annotations by using the results of the analysis expressed as an assertion, and denoted RESA(L) below.
De?nition 3. The optimized graph code of a function f is de?ned as follows:

Gf?(L) =

(φ ∧ RESA(L), ins

ins ) if Gf (L) = (φ, ins) if Gf (L) = ins

where ins is the optimized version of instruction ins. In the sequel, we write φ?L for φL ∧ RESA(L). In addition, we de?ne the precondition and postcondition of f? to be those of f . Then one can encode elementary reasoning with the rules of the proof algebra to obtain a valid certi?cate for the optimized function f from a function

TLins: ?L, P(

vcgifd(L)

?

RESA(L)

?

vcgid(L))
f

and a certi?ed program
fA = {rf ; ; GA; ; λA; ΛA}
where GA is a new version of Gf annotated with the results of the analysis, i.e. Gf such that GA(L) = (RESA(L), ins) for all label L in f .
Thus, certi?cate translation is reduced to two tasks: de?ning the function TLins, and producing the certi?ed function fA. The de?nition of TLins depends upon the program optimization. In the next paragraph we show that TLins can be built for many common program optimizations, using the induction principle attached to the de?nition of reachAnnotf . As to the second task, it is delegated to a procedure, called certifying analyzer, that produces for each function f the certi?ed function fA. There are two approaches for building certifying analyzers: one can either perform the analysis and build the certi?cate simultaneously, or use a standard analysis and use a decision procedure to generate the certi?cate post-analysis. The merits of both approaches will be reported elsewhere; here we have followed the second approach.
As shown in Figure 1, certifying analyzers do not form part of the Trusted Computing Base. In particular, no security threat is caused by applying an erroneous analyzer, or by verifying a program whose assertions are too weak (e.g. taking RESA(L5) = in the above example) or too strong (by adding unprovable assertions), or erroneous. In these cases, it will either be impossible to generate the certi?cate of the analysis, or of the optimized program.

(a) Constant propagation

L1 :

ri := 0, L2

L2 : ξcp, r1 := b, L3

L3 :

ri := c + ri, L4

L4 :

rj := b ? ri, L5

L5 : ?cp, (ri = n) ? L6 : L3

L6 :

return rj

(b) Loop induction

L1 : ri := 0, L2

L2 : ξli, r1 := b, L3

L3 :

rj := b ? ri, L3

L3 :

ri := c + ri, L3

L3 : rj := m + rj , L4

L4 :

rj := rj , L5

L5 : ?li, (ri = n) ? L6 : L3

L6 : return rj

(c) Dead register

L1 :

ri := 0, L2

L2 : ξdr, set r?1 := b, L3

L3 :

rj := b ? ri, L3

L3 :

ri := c + ri, L3

L3 :

rj := m + rj , L4

L4 :

set r?j := rj , L5

L5 : ?dr, (ri = n) ? L6 : L3

L6 :

return rj

Fig. 5. Example of di?erent optimizations

4 Instances of Certi?cate Translation
This section provides instances of certi?cate translations for common RTL optimizations. The order of optimizations is chosen for the clarity of exposition and does not necessarily re?ect the order in which the optimizations are performed by a compiler. Due to space constraints, we only describe certi?cate translators for constant propagation, loop induction, and dead register elimination. Other

transformations (common subexpression elimination, inlining, register allocation, loop unrolling, unreachable code elimination) will be described in the full version of the article.

4.1 Constant Propagation
Goal. Constant propagation aims at minimizing run-time evaluation of expressions and access to registers with constant values.

Description. Constant propagation relies on a data ?ow analysis that returns a function A : PP × R → Z⊥ (PP denoting the set of program points) such that A(L, r) = n if r holds value n every time execution reaches label L. The optimization consists in replacing instructions by an equivalent one that exploits the information provided by A. For example, if r1 is known to hold n1 at label L, and the instruction is r := r1 + r2, then the instruction is rewritten into r := n1 + r2. Likewise, conditionals which can be evaluated are replaced with nop instructions.

Certifying Analyzer. We have implemented a certifying analyzer for constant
propagation as an extension of the standard data ?ow algorithm. First, we attach
to each reachable label L the assertion EQA(L) (since the result of the analysis is a conjunction of equations, we now write EQA(L) instead of RESA(L)):

EQA(L) ≡

r = A(L, r)

r∈{r|A(L,r)=⊥}

To derive a certi?cate for the analysis we have to prove that, for each reachable label L,
EQA(L) ? vcgifdA (L)
After performing all ?-eliminations (i.e. moving hypotheses to the context), and rewriting all equalities from the context in the goal, one is left to prove closed equalities of the form n = n (i.e. n, n are numbers and not arithmetic expressions with variables). If the assertions are correct, then the certi?cate is obtained by applying re?exivity of equality (an instance of the ring rule). If the assertions are not correct, the program cannot be certi?ed.

Certi?cate Translation. The function TLins is de?ned by case analysis, using the fact that the transformation of operations is correct relative to the results of the

analysis:

Top : ?L, ?op, P(

EQA(L) ? op =

op

op L

)

The expression

op

op L

represents the substitution of variables by constants in

op. The function Top is built using the ring axiom of the proof algebra; a similar

result is required for comparisons and branching instructions.

Example 2. Recall function f , de?ned in Example 1. Using the compiler and transforming the assertions as explained before, we obtain the optimized program shown in Figure 5 (a), where assertions at L1 and L3 have been transformed into ξcp ξ ∧ ri = 0 and ?cp ? ∧ r1 = b. It is left to the reader to check that all proof obligations become provable with the new annotations.

4.2 Loop Induction
Goal. Loop induction register strength reduction aims at reducing the number of multiplication operations inside a loop, which in many processors are more costly than addition operations.
Description. Loop induction depends on two analyzes. The ?rst one is a loop analysis that detects loops and returns for each loop its set of labels {L1, . . . , Ln}, and its header LH , a distinguished label in the above set such that any jump that goes inside the loop from an instruction outside the loop, is a jump to LH .
The second analysis detects inside a loop an induction register ri (de?ned in the loop by an instruction of the form ri := ri + c) and its derived induction register rd (de?ned in the loop by an instruction of the form rd := ri ? b). More precisely, the analysis returns: an induction register ri and the label Li in which its de?nition appears, a derived induction register rd and the label Ld in which its de?nition appears, a new register name rd not used in the original program, two new labels Li and LH not in the domain of Gf and two constant values b, c that correspond to the coe?cient of rd and increment of ri.
The transformation replaces assignments to the derived induction register rd with less costly assignments to an equivalent induction register rd. Then rd is de?ned as a copy of rd.
Certifying Analyzer. Only the second analysis needs to be certi?ed. First, we de?ne EQA(L) ≡ rd = b ? ri if L ∈ {LH , L1, . . . , Ln} \ {LH } and EQA(L) ≡ if L is a label outside the loop or equal to LH . Then, we need to create a certi?cate that the analysis is correct. One (minor) novelty w.r.t. constant propagation is that the de?nition of fA includes two extra labels LH and Li , not present in the original function f . The de?nition of fA is given by the clauses:

fA[LH ] = (EQA(LH ), rd := b ? ri, LH )

fA[LH ] = (EQA(LH ), insLH )

fA[L] = (EQA(L), insL)

if L ∈ dom(Gf ), L ∈ {LH , Li}

fA[Li] = (EQA(Li), insLi {Li ← Li })

fA[Li ] = ( , rd := rd + b ? c, Li)

where insL is the instruction descriptor of f [L], and Li is the successor label of Li in f . Interestingly, the certi?ed analyzer must use the fact that the loop analysis is correct in the sense that one can only enter a loop through its header.
If the loop analysis is not correct, then the certi?cate cannot be constructed.

f [LH ] = rd := b ? ri, LH f [LH ] = f [LH ] f [Li] = ri := ri + c, Li f [Li ] = rd := rd + b ? c, Li{LH ← LH } f [Ld] = rd := rd, Ld{LH ← LH } f [L] = (φ ∧ rd = b ? ri, ins{LH ← LH }) if f [L] = (φ, ins) f [L] = f [L]{LH ← LH } in any other case inside the loop
Fig. 6. Loop Induction
Certi?cate Translation. Figure 6 shows how instructions for labels L1 . . . Ln of a function f are transformed into instructions for the optimized function f . As expected, the transformation for instructions outside the loop is the identity, i.e. f [L] = f [L] for L ∈ {L1, . . . , Ln}.
Certi?cate translation proceeds as with constant propagation, using the induction principle attached to the de?nition of reachAnnotf , and the certi?cate of the analysis, to produce a certi?cate for f?.
Example 3. Applying loop induction to program (a) in Figure 5, we obtain program (b) where m denotes the result of the product b ? c and ξli ξcp and ?li ?cp ∧ rj = b ? ri.
4.3 Dead Register Elimination
Goal. Dead register elimination aims at deleting assignments to registers that are not live at the label where the assignment is performed. As mentioned in the introduction, we propose a transformation that performs simultaneously dead variable elimination in instructions and in assertions.
Description. A register r is live at label L if r is read at label L or there is a path from L that reaches a label L where r is read and does not go through an instruction that de?nes r (including L, but not L ). A register r is read at label L if it appears in an assignment with a function call, or it appears in a conditional jump, or in a return instruction, or on the right side of an assignment of an assignment operation to a register r that is live. In the following, we denote L(L, r) = when a register is live at L.
In order to deal with assertions, we extend the de?nition of liveness to assertions. A register r is live in an assertion at label L, denoted by L(L, r) = φ, if it is not live at label L and there is a path from L that reaches a label L such that r appears in assertion at L or where r is used to de?ne a register which is live in an assertion.
By abuse of notation, we use L(L, r) = ⊥ if r is dead in the code and in assertions.
The transformation deletes assignments to registers that are not live. In order to deal with dead registers in assertions, we rely on the introduction of ghost

variables. Ghost variables are expressions in our language of assertions (we assume that sets of ghost variables names and R are disjoint). We introduce as part of RTL, “ghost assignments”of the form set v? := op, L, where v? is a ghost variable. Ghost assignments do not a?ect the semantics of RTL, but they a?ect the calculus of vcg in the same way as normal assignments.
The transformation is shown below where σL = {r ← r? | L(L, r) = φ} and deadc(L, L ) = {r|L(L, r) = ∧ L(L , r) = φ}.

ghostL((φ, ins)) = (φσL, ghostiLd(ins))

ghostL(ins)

= ghostiLd(ins)

The analysis ghostiLd(ins) is de?ned in Figure 7. We use set r? := r, as syntactic sugar for a sequence of assignments set r?i := ri, where for each register ri in r, r?i in r? is its corresponding ghost variable. The function ghost transforms each instruction of f into a the set of instructions of f . Intuitively,
it introduces for any instruction ins (with successor L ) at label L, a ghost
assignment set r? := r, L immediately after L (at a new label L ) if the register
r is live at L but not live at the immediate successor L of L. In addition, the
function ghostL performs dead register elimination if ins is of the form rd := op, and the register rd is not live at L.

Instantiation of Proof Obligations. Certi?cate translation for dead register elimination falls in the IPO category, i.e. the certi?cate of the optimized program is an instance of the certi?cate of the source program. This is shown by proving that ghost variable introduction preserves vcg up to substitution.

Lemma 1. ?L, vcgf?(L) = vcgf (L)σL
A consequence of this lemma is that if the function f is certi?ed, then it is possible to reuse the certi?cate of f to certify f , as from each proof p : φL ? vcgf (L) we can obtain a proof p : φ?L ? vcgf?(L) by applying subst rule of Figure 4 to p with substitution σL.
After ghost variable introduction has been applied, registers that occur free in vcgf (L) , are live at L, i.e. L(L, r) = .
Example 4. In Figure 5, applying ?rst copy propagation to program (b), we can then apply ghost variable introduction to obtain program (c), where ξdr ξli and ?dr r?j = r?1 ? ri ∧ r?1 ≥ b ∧ ri ≥ 0 ∧ r?1 = b ∧ rj = b ? ri ∧ rj = r?j.

5 Related work
Certi?ed Compilation. Compiler correctness [6] aims at showing that a compiler preserves the semantics of programs. Because compilers are complex programs, the task of compiler veri?cation can be daunting; in order to tame the complexity of veri?cation and bring stronger guarantees on the validity of compiler correctness proofs, certi?ed compilation [8] advocates the use of a proof assistant for machine-checking compiler correctness results. Section 2 of [8] shows

ghostiLd(return r)

= return r

ghostiLd(rd := f (r), L ) = L : rd := f (r), L

L : set ?t := t, L for each t ∈ deadc(L, L )

ghostiLd(nop, L )

= nop, L

ghostiLd(cmp ? L1 : L2) = L : cmp ? L1 : L2

L1 : set ?t1 := t1, L1

L2 : set ?t2 := t2, L2

wheret1 = deadc(L, L1) where t2 = deadc(L, L2)

ghostiLd(rd := op, L ) = nop, L if L(L , rd) = ⊥

= set r?d := opσL, L if L(L , rd) = φ

= L : rd := op, L

if L(L , rd) =

L → set ?t := t, L

where t = deadc(L, L )

Fig. 7. Ghost Variable Introduction-Dead Register Elimination

that it is theoretically possible to derive certi?cate translation from certifying compilation. However, we think that the approach is restrictive and unpractical:
– certi?cates encapsulate the de?nition of the compiler and its correctness proof on the one hand, and the source code and its certi?cate on the other hand. Thus certi?cates are large and costly to check;
– with the above notion of certi?ed compiler, the approach is necessarily con?ned to properties about the input/output behavior of programs, and rules out interesting properties involving intermediate program points that are expressed with assertions or ghost variables;
– and a further di?culty with this approach is that it requires that the source code is accessible to the code consumer, which is in general not the case.
For similar reasons, it is not appropriate to take as certi?cates of optimized programs pairs that consist of a certi?cate for the unoptimized program and of a proof that the optimizations are semantics preserving.
Certifying Compilation. Certifying compilation is concerned with generating automatically safety certi?cates. The Touchstone compiler [11] is a notable example of certifying compiler, which generates type-safety certi?cates for a fragment of C. In Chapter 6 of [10], Necula studies the impact of program optimizations on certifying compilation. For most standard optimizations an informal analysis is made, indicating whether the transformation requires reinforcing the program invariants, or whether the transformation does not change proof obligations.
There are many commonalities between his work and ours, but also some notable di?erences. First, the VCGen used by Necula propagates invariants backwards, whereas ours generates a proof obligation for each invariant. This has subtle implications on the modi?cations required for the invariant. A main difference is that we not only have to strengthen invariants, but also transform the certi?cate; further, when he observes that the transformation produces a logically equivalent proof obligation, we have to de?ne a function that maps

proofs of the original proof obligation into proofs of the new proof obligation after optimization.
Provable Optimizations through Sound Elementary Rules. Rhodium [7] is a domain-speci?c language for declaring and proving correct program optimizations. The domain-speci?c language is used to declare local transformation rules and to combine them into the optimization. Transformations written in Rhodium are given a semantic interpretation that is used to generate su?cient conditions for the correctness of the transformation. The proof obligations are in turn submitted to an automatic prover that attempts to discharge them automatically. The idea also underlies the work of Benton [4], who proposes to use a relational Hoare logic to justify transformation rules from which optimizations can be built. The perspective of decomposing optimizations through sound elementary rules is appealing, but left for future work.
Spec# and BML Project. The Spec# project [2] de?nes an extension of C# with annotations, and a compiler from annotated programs to annotated .NET ?les, which can be run using the .NET platform, and checked against their speci?cations at run-time or veri?ed statically with an automatic prover. The Spec# project implicitly assumes some relation between source and bytecode levels, but does not attempt to formalize this relation. There is no notion of certi?cate, and thus no need to transform them. A similar line of work for Java was pursued independently by Pavlova and Burdy [5] who de?ne a Bytecode Modeling Language into which annotations of the Java Modeling Language and a VCGen for annotated bytecode programs; the generated proof obligations are sent to an automatic theorem prover. They partially formalize the relation between proof obligations at source code and bytecode level, but they do not consider certi?cates.
In a similar spirit, Bannwart and Mu¨ller [1], provide Hoare-like logics for signi?cant sequential fragments of Java source code and bytecode, and illustrate how derivations of correctness can be mapped from source programs to bytecode programs obtained by non-optimizing compilation.
Certifying Analyzers. Speci?c instances of certifying analyzers have been studied independently by Wildmoser, Chaieb and Nipkow [13] in the context of a bytecode language and by Seo, Yang and Yi [12] in the context of a simple imperative language. Seo, Yang and Yi propose an algorithm that automatically constructs safety proofs in Hoare logic from abstract interpretation results.
6 Concluding Remarks
Certi?cate translation provides a means to bring the bene?ts of source code veri?cation to code consumers using PCC architectures. Certi?cate translation signi?cantly extends the scope of PCC in that it allows to consider complex

security policies and complex programs— at the cost of requiring interactive veri?cation. The primary motivation for certi?cate translation are mobile code scenarios, possibly involving with several code producers and intermediaries, where security-sensitive applications justify interactive veri?cation. One important constraint for these scenarios (which originate from mobile phone industry) is that only the code after compilation and optimization is available to the code consumer or a trusted third party: this assumption makes it impossible to use ideas from certi?ed compilation, or to use as certi?cates for optimized programs a pair consisting of a certi?cate of the unoptimized program, and a proof of correctness of the optimizations.
There are many directions for future work, including:
– On a side, we would like to build a generic certi?cate translation, instead of developing a translator per optimization. One natural approach would be to describe standard program optimizations as a combination of more elementary transformations in the style of Rhodium.
– On a practical side, we have developed a prototype certi?cate translator for our RTL language. This prototype generates proof obligations for the initial program that are sent to the Coq theorem prover. Once the proofs obligations are solved, the proofs are sent to the certi?cate translator that automatically optimizes the program and transforms the proofs. In the medium term, we intend to extend our prototype to a mainstream programming language such as C or Java to an assembly language.
– On an experimental side, we would like to gather metrics about the size of certi?cates—which is an important issue, although not always central in the scenarios we have in mind. Preliminary experiments using λ-terms as certi?cates indicate that their size does not explode during translation, provided we perform after certi?cate translation a pass of reduction that eliminates all the redexes created by the translation. For example, the size of certi?cates remains unchanged for dead register elimination. For constant propagation, the size of certi?cates grows linearly w.r.t. the size of the code. There are other opportunities to reduce certi?cate size; in particular, not all annotations generated by certifying analyzers are used to build the certi?cate for the optimized program, so we could use enriched analyses with dependency information to eliminate all annotations that are not used to prove the optimized program, i.e. annotations that are not directly used to justify an optimization, and annotations that are not used (recursively) to justify such annotations;
– On an applicative side, we would like to experiment with certi?cate translation in realistic settings. E.g. certi?cate translation could be useful in the component-based development of security-sensitive software, as the software integrator, who will be liable for the resulting product, could reasonably require that components originating from untrusted third parties are certi?ed against their requirements, and use certi?cate translation to derive a certi?cate for the overall software from certi?cates of each component. The bene?ts of certi?cate translation seem highest in situations where integration

of components involves advanced compilation techniques, e.g. compilation from Domain-Speci?c Languages to conventional languages.
References
1. F. Bannwart and P. Mu¨ller. A program logic for bytecode. In F. Spoto, editor, Proceedings of Bytecode’05, Electronic Notes in Theoretical Computer Science. Elsevier Publishing, 2005.
2. M. Barnett, K.R.M. Leino, and W. Schulte. The Spec# Programming System: An Overview. In G. Barthe, L. Burdy, M. Huisman, J.-L. Lanet, and T. Muntean, editors, Proceedings of CASSIS’04, volume 3362 of Lecture Notes in Computer Science, pages 50–71. Springer-Verlag, 2005.
3. G. Barthe, T.Rezk, and A. Saabas. Proof obligations preserving compilation. In Proceedings of FAST’05, volume 3866 of Lecture Notes in Computer Science, pages 112–126. Springer-Verlag, 2005.
4. N. Benton. Simple relational correctness proofs for static analyses and program transformations. In Proceedings of POPL’04, pages 14–25. ACM Press, 2004.
5. L. Burdy and M. Pavlova. Annotation carrying code. In Proceedings of SAC’06. ACM Press, 2006.
6. J. D. Guttman and M. Wand. Special issue on VLISP. Lisp and Symbolic Computation, 8(1/2), March 1995.
7. S. Lerner, T. Millstein, E. Rice, and C. Chambers. Automated soundness proofs for data?ow analyses and transformations via local rules. In Proceedings of POPL’05, pages 364–377. ACM Press, 2005.
8. X. Leroy. Formal certi?cation of a compiler back-end, or: programming a compiler with a proof assistant. In Proceedings of POPL’06, pages 42–54. ACM Press, 2006.
9. G.C. Necula. Proof-Carrying Code. In Proceedings of POPL’97, pages 106–119. ACM Press, 1997.
10. G.C. Necula. Compiling with Proofs. PhD thesis, Carnegie Mellon University, October 1998. Available as Technical Report CMU-CS-98-154.
11. G.C. Necula and P. Lee. The Design and Implementation of a Certifying Compiler. In Proceedings of PLDI’98, pages 333–344. ACM Press, 1998.
12. S. Seo, H. Yang, and K. Yi. Automatic Construction of Hoare Proofs from Abstract Interpretation Results. In A. Ohori, editor, Proceedings of APLAS’03, volume 2895 of Lecture Notes in Computer Science, pages 230–245. Springer-Verlag, 2003.
13. M. Wildmoser, A. Chaieb, and T. Nipkow. Bytecode analysis for proof carrying code. In F. Spoto, editor, Proceedings of BYTECODE’05, Electronic Notes in Theoretical Computer Science. Elsevier Publishing, 2005.




友情链接: