Breve introduzione a CLI/CLR

11

Breve introduzione a CLI/CLRBreve introduzione a CLI/CLR

Massimo AnconaMassimo AnconaDISI Università di GenovaDISI Università di GenovaTesti: J. Gough, Compiling for .NET Common Language Testi: J. Gough, Compiling for .NET Common Language

Runtime (CLR), .NET Series, B. Mayer Editor Runtime (CLR), .NET Series, B. Mayer Editor J. Richter, CLR via C#, Microsoft PressJ. Richter, CLR via C#, Microsoft Press

22

CLR - Common Language CLR - Common Language RuntimeRuntime

The CLR has been designed with three The CLR has been designed with three objectives: objectives:

1.1. portabilityportability (write once, run anywhere), (write once, run anywhere),2.2. reliabilityreliability (make operations predictable),(make operations predictable),3.3. reusabilityreusability (object-orientation and (object-orientation and

parametric code [generics]). parametric code [generics]). GenCLI has the objective of meeting all the GenCLI has the objective of meeting all the

three objective above. three objective above.

33

CLR 2CLR 2

The CLR machine is composed by the The CLR machine is composed by the CTSCTS specification specification (.NET Common Type (.NET Common Type SpecificationSpecification) and the ) and the CLRCLR instructions. instructions.

The CTS defines all possible data types The CTS defines all possible data types and constructs supported by the .NET and constructs supported by the .NET Run Time Environment (RTE) , while Run Time Environment (RTE) , while

The CLR instructions define a virtual The CLR instructions define a virtual stack-based machine. stack-based machine.

44

Execution Model CLR 3Execution Model CLR 3Code generators for .NET emit Code generators for .NET emit CILCIL (IL for short), either in form (IL for short), either in form

of of text filetext file for subsequent for subsequent assemblyassembly or directly into a file or directly into a file or memory buffer.or memory buffer.

The code of CIL are instructions for a virtual machine and are The code of CIL are instructions for a virtual machine and are always executed indirectly by means of a Just-In-Time always executed indirectly by means of a Just-In-Time compiler (compiler (JITJIT).).

The JIT translates the instructions of IL into machine code for a The JIT translates the instructions of IL into machine code for a specific computer on which the program has to be specific computer on which the program has to be executed.executed.

Program executable modules calledProgram executable modules called assemblies assemblies are usually are usually demand-loadeddemand-loaded and are just-in-time compiled (JIT-ed) at and are just-in-time compiled (JIT-ed) at the time of loading. the time of loading.

55

Execution Model CLR 4Execution Model CLR 4At load time each assembly is subject to At load time each assembly is subject to

some form of checking.some form of checking.The execution engine is able to ensure that The execution engine is able to ensure that

the assembly is the assembly is memory-safememory-safe..Programs that are intended to pass the Programs that are intended to pass the

checks [of verification] are said to checks [of verification] are said to written in in written in in verifiable codeverifiable code. .

66

Verifiable Code CLR 5Verifiable Code CLR 5

Verifiable code must conform to several Verifiable code must conform to several requirements. requirements.

First of all dynamically allocated memory First of all dynamically allocated memory must be must be managed datamanaged data. This means that . This means that all objects must be allocated from the all objects must be allocated from the garbage collected heap, and must be garbage collected heap, and must be self-self-describingdescribing.. The GC must be able to discern The GC must be able to discern the exact type of the object from the exact type of the object from inspection of the object encoding.inspection of the object encoding.

77

Verifiable Code CLR 6Verifiable Code CLR 6 Operations on data must be performed in such Operations on data must be performed in such

a way that the verifier is able to statically a way that the verifier is able to statically prove that the operation is safe for the type of prove that the operation is safe for the type of object. object.

Method calls must pass arguments that are Method calls must pass arguments that are conformant to the statically specified method conformant to the statically specified method signature.signature.

For most programming languages not all PGMs For most programming languages not all PGMs can be translated into verifiable code. In such can be translated into verifiable code. In such cases a programmers who whish their PGM to cases a programmers who whish their PGM to pass verification must restrict themselves to a pass verification must restrict themselves to a subset of the language. subset of the language.

88


Programming constructs that can cause problems Programming constructs that can cause problems are, for example, are, for example, unionunion types ( types (variantvariant types) types) and and pointer arithmeticpointer arithmetic..

As well as speaking of managed data we speak of As well as speaking of managed data we speak of managed codemanaged code. Managed code is code that is . Managed code is code that is executed by the CLR as opposed to ordinary executed by the CLR as opposed to ordinary native-code execution.native-code execution.

An erroneous address computation allows an An erroneous address computation allows an arbitrary memory allocation to be overwritten. arbitrary memory allocation to be overwritten.

99


An erroneous address computation can be An erroneous address computation can be generated by:generated by:

Accessing a deallocated memory locationAccessing a deallocated memory location Accessing a non-existing array elementAccessing a non-existing array element Treating a pointer of one type as another Treating a pointer of one type as another Sending wrongly typed arguments to a Sending wrongly typed arguments to a

function function

1010

Memory Safety by Design 0Memory Safety by Design 0 How to design languages and RTEs for which every

semantically correct source program may be compiled into a memory safe executable program.

One approach is to define a statically typed (or strongly typed) programming language, e.g. Modula-2

.NET system provides a framework for memory-safe programming. There are a number of different aspects of .NET that contribute toward this outcome:

1. dynamically allocated data in verifiable code is garbage collected

2. Every datum is of known type at runtime.

1111

Memory Safety by Design 1Memory Safety by Design 1

Objects of reference type are allocated from a heap called the managed heap. The managed heap is garbage collected and the CLR provides instructions for managing it in a safe way. Value types are not allocated on the managed heap. However, an object of value type can be converted to a reference type by using the

boxing mechanism: a copy of the object value is allocated on the managed heap and its address is returned as a reference type.

1212

Memory Safety by Design 2Memory Safety by Design 2How to design languages and run-times for which every semantically correct source PGM may be compiled into a memory-safe executable PGM.The .NET execution engine is able to ensure that the generated code is safe by performing a verification process. It checks that every method is called with the correct number of parameters, and that each parameter passed is of the correct type.

1313

Memory Safety by Design 3Memory Safety by Design 3In order to be safe the generated code must allocate dynamic objects only as managed data on the managed heap by means of specific CLR instructions. The code generated for .NET is always executed indirectly via a JIT (Just In time Translator) that translates the code generated by a .NET compiler, into native machine code, while safety checks are performed at load time, just before the JIT translation.

1414

Memory Safety by Design 4Memory Safety by Design 4.NET resolves these problems by a combination of load-time and runtime checking.The load-time verifier computes the types of all data used by the IL code of a PGM.This involve significant computations based on the control flow graph: the verifier checks that all data. This involve access to multiple assemblies because consistency of argumt types between method caller and callee may cut across PEM boundariees.

1515

CTSCTS

CTS provides three sets of types:

• primitive types, managed by the compiler, • reference types, allocated on the managed heap, and• value types

1616

CTS Types HierarchyCTS Types Hierarchy

1717

CTS 3CTS 3The CLS (Common Language specification, a subset of CTS) defines the requirements to be met by a language in order to be classified as a safe .NET language. Programs generated by such a compiler, in order to pass the verification process, must be written in verifiable code. Example: GenCLI generates only verifiable high-level IL making the with Rpython compiler, a de facto .NET compiler.

1818

CTS Generics 1CTS Generics 1

The CTS allows the creation of generic reference types as well as generic value types. In addition, the CLR allows the creation of generic classes, interfaces, and generic delegates. Moreover, the CLR allows the creation of generic methods that are defined in a reference type, value type, or interface

1919

CTS Generics 2CTS Generics 2

Adding generics to the CLR required to:• create new IL instructions that are aware of type arguments• insert type names and methods with generic parameters in metadata tables• modify languages, compilers and the JIT compiler to process the new type-argument-aware IL instructions.

2020

CLR Assemblies 1CLR Assemblies 1

Combining managed modules into AssembliesPg 6The CLR does not actually work with modules it works with assemblies. An assembly is a logical grouping of one or more modules or resource files.An assembly is the smallest unit of reuse, security and versioning. It supports the separation of types and resources into separate files used by users of the assembly

2121

CLR/CTS Assemblies 2CLR/CTS Assemblies 2

An assembly is the smallest unit of reuse, security and versioning. It supports the separation of types and resources into separate files used by users of the assembly

2222

CLR/CTS Assemblis 3CLR/CTS Assemblis 3

An assembly is the smallest unit of reuse, security and versioning. It supports the separation of types and resources into separate files used by users of the assembly

2323

Mapping Oberon-2 to CLRMapping Oberon-2 to CLR

The record types of Oberon-2 need to be mapped in some way to the class constructs of the CTS. Oberon-2 does not make a declarative distinction between value and reference aggregate types. Record types always have value semantics, and pointer types always have a reference semantics. Our choice is the following.

2424

Mapping OberonMapping OberonOne of the most relevant features of CLR (.NET 2.0) are generics. With generics, it is now possible for the .NET languages to easily create type-safe, reusable code.

The term generics, means parameterized types. A parameterized type is a class, interface, method, or delegate in which the type of data upon which it operates is specified as a parameter.

A class, interface, method, or delegate that operates on a parameterized type is called generic, class, interface, method or delegate.

2525

Mapping Oberon-2 to CLRMapping Oberon-2 to CLR

Record types that are not extensible [i.e., heirless] nor extensions of another type are implemented as value classes. If a program declare a pointer to such a record type, the pointer type is implemented as a reference class with a single field of the type of the value class.

This reference class is an explicit boxed occurrence of the embedded value class. It has at least one advantage over the automatically boxed classes manipulated by “box” and “unbox” instructions. In this case we may access the fields of the boxed value without unboxing.

2626

CTS X+1CTS X+1

Procedures that are bound to such a record type [equivalent to a method in Oberon-2 ] are implemented as (non-virtual) instance methods of the value class. Procedures bound to a type that is a pointer to the record are implemented as (non-virtual) instance methods of the explicitly boxed class:

MODULE ValCls; IMPORT CPmain;TYPE RecTyp = RECORD c: CHAR END;

PtrTyp=POINTER TO RecTyp;PROCEDURE (IN r:RecTyp) Foo(), NEW; END Foo;PROCEDURE ( r: PtrTyp) Bar(), NEW; END Bar;

There is an interesting artefact of this design. Procedures bound [methods] to the record type, and to the pointer to record type, are bound to the same underlying type in the source semantics but are bound to separate types in the implementation. It seems curious, but no ambiguity can arise [esempio].

2727

PL0 29PL0 29

ssym['+']:=plus; ssym['-']:=minus; ssym['*']:=times; ssym['/']:=slash; ssym['(']:=lparen; ssym[')']:=rparen; ssym['=']:=eql; ssym[',']:=comma; ssym['.']:=period; ssym['#']:=neq; ssym['<']:=lss; ssym['>']:=gtr; ssym['%']:=leq; ssym['@']:=geq; ssym['<']:=lss; ssym['>']:=gtr; ssym[';']:=semicolon;

2828

PL0 30PL0 30

mnemonic[lit]:='LIT '; mnemonic[opr]:='OPR '; mnemonic[lod]:='LOD '; mnemonic[sto]:='STO '; mnemonic[cal]:='CAL '; mnemonic[int]:='INT '; mnemonic[jmp]:='JMP '; mnemonic[jpc]:='JPC '; declbegsys:=[constsym,varsym,procsym]; statbegsys:=[beginsym,callsym,ifsym,whilesym]; facbegsys:=[ident,number,lparen]; RESET(in,'pl0','pgm');err:=0; cc:=0;ll:=0;ch:=' ';kk:=al; REWRITE(cout,'PL0','asm'); getsym; mysys:=[period]+declbegsys+statbegsys; block(0,0,mysys(*[period]+declbegsys+statbegsys*)); WRITELN('END COMPILATION'); IF sym<>period THEN error(9) FI;WRITECODE;CLOSE(cout); IF err = 0 THEN WRITE('CICCIO'); interpret ELSE WRITE('Errors IN PL/0 PROGRAM') FI; WRITELNEND.

2929

Hendren93registerHendren93register

Grafo di interferenza G=(V,E)(Chaiting)Grafo di interferenza G=(V,E)(Chaiting)Ciascun vertice in G corrisponde ad un Ciascun vertice in G corrisponde ad un live live

rangerange di una variabile del programma. di una variabile del programma.Un arco unisce due vertici del grafo se vi è Un arco unisce due vertici del grafo se vi è

interferenza tra i due vertici del grafo cioè interferenza tra i due vertici del grafo cioè un overlapping temporale dei un overlapping temporale dei corrispondenti live range. Più corrispondenti live range. Più precisamente uno è vivo in un punto di precisamente uno è vivo in un punto di definizione del secondo. Un definizione del secondo. Un

3030

Hen 93Hen 93

3131

Hen 93Hen 93

DefinizioneDefinizione..Un grafo di intervalli (grafo di Un grafo di intervalli (grafo di intersezione) intersezione) GG (IG=Interval Graph): è (IG=Interval Graph): è definito da un insieme di intervalli sulla definito da un insieme di intervalli sulla retta nel modo seguente:retta nel modo seguente:

Ad ogni intervallo Ad ogni intervallo II viene associato un viene associato un vertice vertice v v di di VV

Esiste un arco Esiste un arco e e E E e=(v,w) gli intervalli e=(v,w) gli intervalli IvIv e e IwIw, associati a , associati a vv e e ww rispettivamente, hanno intersezione non rispettivamente, hanno intersezione non vuota vuota IvIvIw.Iw.

3232

Hen 93Hen 93

Un vertice del grafo ha grado k se Un vertice del grafo ha grado k se ha k vertici vicini (direttamente ad esso ha k vertici vicini (direttamente ad esso connessi)connessi)

Il metodo di Chaitin colora con m colori il Il metodo di Chaitin colora con m colori il grafo con la proprietà che due vertici grafo con la proprietà che due vertici adiacenti abbiano colori diversi.adiacenti abbiano colori diversi.

Una colorazione del grafo di interferenza Una colorazione del grafo di interferenza con k colori definisce una soluzione con k colori definisce una soluzione feasible con k registrifeasible con k registri

Breve introduzione a CLI/CLR

Documents

Transcript of Breve introduzione a CLI/CLR