Rev2 HPPS Project 2007

Post on 13-Jan-2015

698 views 3 download

description

Rev2 HPPS Project 2007

Transcript of Rev2 HPPS Project 2007

POLITECNICO DI MILANO

High Performance Processors and

Systems PdM – UIC joint master 2007PdM – UIC joint master 2007

Instructor: Prof. Donatella SciutoInstructor: Prof. Donatella Sciuto

HPPS @ PdM – March 2007HPPS @ PdM – March 2007

2

OutlineOutline

DReAMSAlessandro PanellaMatteo Murgida

CITiESAlessio MontoneAlessandro MeroniSimone Corbetta

Operating SystemIvan Beretta

Design FlowAntonio Piazzi

PolarisMassimo MorandiMarco Novati

HLRMarco Maggioni

3

What’s nextWhat’s next

DReAMSAlessandro PanellaMatteo Murgida

CITiESAlessio MontoneAlessandro MeroniSimone Corbetta

Operating SystemIvan Beretta

Design FlowAntonio Piazzi

PolarisMassimo MorandiMarco Novati

HLRMarco Maggioni

POLITECNICO DI MILANO

DDynamicynamic Re Reconfigurabilityconfigurability AAppliedpplied toto M Multi-FPGAulti-FPGA

SSystemsystems

DReAMS

DReAMSDReAMS

Dynamic ReconfigurabilityApplied to Multi-

FPGA SystemsBranch of DRESD projectInherits architectures and tools

Automatic workflow from VHDL system description to FPGA implementation

VHDL parsing and system simulationSystem creation over a specific architectureBitstream creation and download onto FPGAs

DReAMS

POLITECNICO DI MILANO

Multi-FPGA PartitioningMulti-FPGA Partitioning

Alessandro Panellaalessandro.panella@dresd.org

7

Project OrganizationProject Organization

First Phase (15 Mar- 15 Apr) [DONE]Goals

State of the art analysisProposed approach: basic idea

Second Phase (15 Apr – 15 May) [PARTIALLY DONE]Goal

Partitioning algorithm: development and implementation

Third Phase (15 May – 15 June) [TODO]Goal

Algorithm experimental evaluationPhysical evaluation using the DReAMS architecture

8

PartitioningPartitioning

Two kinds of multi-FPGA partitionings:

Topology-awareArchitecture topology is an inputNo optimizaiton in the no. Of FPGAsAssociation between the (larger) system graph and the (smaller) architecture graph => PARTITIONING

Topology-freeArchitecture topology is not providedInput: dimension and communication features of FPGAsMinimization of number of FPGAsPlace and Route after partitioning

9

The algorithm (1)The algorithm (1)

Copes with topology-free problemStructural approach

Exploits the design hierarchyTries to keep modules' integrity

Several advantages, less work to be done

ObjectivesMinimize the number of FPGAsMinimize inter-FPGA communication

Greedy set-covering algorithm

10

The algorithm (2)The algorithm (2)

Nodes can be: COVERED, UNCOVERED, PARTIALLY COVEREDStop condition: TOP = COVEREDIn the exploration of the tree, precedence to siblings w.r.t. children => keep module integrity

Procedure cover(set of nodes)Called recursively, starting from TOP

11

What’s next?What’s next?

Data structure developmentAlgorithm C++ implementationFirst verification and “tuning”Obtain hierarchical trees from synthesis tool (Synplify)VerificationPhisical evaluation

Bound with the other branch of DReAMS

12

What’s nextWhat’s next

DReAMSAlessandro PanellaMatteo Murgida

CITiESSimone CorbettaAlessandro MeroniAlessio Montone

Operating SystemIvan Beretta

Design FlowAntonio Piazzi

PolarisMassimo MorandiMarco Novati

HLRMarco Maggioni

POLITECNICO DI MILANO

ChimeraChimeraMulti-FPGAs Architecture DefinitionMulti-FPGAs Architecture Definition

Matteo Murgidamatteo.murgida@dresd.org

14

Project OrganizationProject Organization

1st PhaseGoals:

Digilent Spartan-3 Starter Board studyBoards connection

2nd PhaseGoals:

Communication between two Microblaze soft-processorsGPIO integration in the architecture

3rd PhaseGoal

Interrupt handlingDesign a simple distributed application to verify the correctness of the proposed approach

Second Phase: results (1/2)Second Phase: results (1/2)

Communication between two Microblaze soft-processorsDevelopment of a display controller to visualize the data flow

16

Second Phase: results (2/2)Second Phase: results (2/2)

Higher architecture portability through the use of the GPIO IP-Core.

17

What’s next ...What’s next ...

Interrupt handling, also through the use of the Interrupt Controller

Development of a simple application to verify the correctness of the proposed approach

18

What’s nextWhat’s next

DReAMSAlessandro PanellaMatteo Murgida

CITiESAlessio MontoneAlessandro MeroniSimone Corbetta

Operating SystemIvan Beretta

Design FlowAntonio Piazzi

PolarisMassimo MorandiMarco Novati

HLRMarco Maggioni

POLITECNICO DI MILANO

CITiESCITiES

CITiESCITiES

POLITECNICO DI MILANO

PProcessingrocessing E Elementslements REREconfigurationconfiguration I Inn

RReconfigurableeconfigurable A Architecturesrchitectures

Alessio Montonealessio.montone@dresd.org

Second Phase GoalsSecond Phase Goals

Create a software thattakes in input .bmm (BRAM used) and .elf (code) fileoutputs: memory configuration bitstreamis device parametricis tailored for Xilinx Virtex II Pro Family FPGAs

Second Phase: results - ISecond Phase: results - I

Second Phase: results - IISecond Phase: results - II

Second Phase: results - IIISecond Phase: results - III

Output binary file is a downloadable bitstream

Target FPGA

Processor #BRAM Blocks

#BRAM column

s involve

d

marBram

execution time(ms)

Commands overhead(approx.

%)

Bitstream size

(Kbytes)

VP7 Microblaze 4 2 179 1.5 56

VP7 PPC-405 8 3 203 1.5 84

VP7 Microblaze 8 5 263 1.5 136

VP20 PPC-405 8 3 248 1.5 112

VP20 Microblaze 8 5 326 1.5 160

VP20 Microblaze 16 5 326 1.5 160(on a Core 2 Duo @ 2.33 GHz)

What’s next…What’s next…

Third phase in detailsPerform functional tests on a single output bitstream

Debug both bitstream structure and software structure

Test a complete processing elementConfiguring it independently from the rest of the architectureswapping its memory content

27

What’s nextWhat’s next

DReAMSAlessandro PanellaMatteo Murgida

CITiESAlessio MontoneAlessandro MeroniSimone Corbetta

Operating SystemIvan Beretta

Design FlowAntonio Piazzi

PolarisMassimo MorandiMarco Novati

HLRMarco Maggioni

POLITECNICO DI MILANO

RReconfigurationeconfiguration O Orientedriented MeMetricstrics

Alessandro Meronialessandro.meroni@dresd.org

29

Second Phase ObjectivesSecond Phase Objectives

Real World Applications AnalysisApplications AnalysisCommon Scenarios IdentificationCharacteristics Evaluation

Metrics Evaluation Through Graphics supported by a Prototype Analyzer (C/C++)

Performance/AreaMaster/Slave

Different Network Simulators AnalysisNS2OMNeT++SSFNetOPnet

30

Application AnalysisApplication Analysis

It’s possible to make a classification that binds together the majority of these applications:

31

Metrics EvaluationMetrics Evaluation

We need to consider different metrics w.r.t. different scenarios

which FPGAs ?how many elements ?which configuration ?

By now, there is a qualitative estimation of some metrics’ trends supported by a Prototype Analyzer

Throughput and Area w.r.t. the # of elements of the system (Master/Slave)no configuration informationno FPGA information...

32

NS-2good hardcoded modulesbad flexibilitymodels are “flat”, cannot create subnetworksdifficult separation of concepts: different parameters in same TCL script

OMNeT++good not only for networks (MP systems and hw architectures)very flexiblesupport for hierarchical module structureenforces the separation between model and experiments

all parameters in the omnet.ini file

SSFNetnot yet supported: last release on January 15, 2004

OPnetnot free

Simulators AnalysisSimulators Analysis

33

Next Phase...Next Phase...

Simulator ExploitationUse of OMNeT++ to gain information w.r.t. the Throughput and other useful metrics

Graphics Redefinition and Expansion

Analyzer Improvement

34

What’s nextWhat’s next

DReAMSAlessandro PanellaMatteo Murgida

CITiESAlessandro MeroniAlessio MontoneSimone Corbetta

Operating SystemIvan Beretta

Design FlowAntonio Piazzi

PolarisMassimo MorandiMarco Novati

HLRMarco Maggioni

POLITECNICO DI MILANO

REREconfigurableconfigurable CCommunicationommunication

IInfrastructurenfrastructure F Foror EEmbedded-systemsmbedded-systems

Simone Corbettasimone.corbetta@dresd.org

3636

April 2007/May 2007: April 2007/May 2007: objectivesobjectives

Extend survey Reconfigurable communication infrastructure exploration

De Micheli Verilog description analysisXPIPES architecture analysisXPIPES synthesis on Xilinx FPGAs

Area requirements

Applications and scenarios of dynamic reconfigurabilityCommunication infrastructure model

First ideasBasis for next-step implementation

3737

April 2007/May 2007 : work April 2007/May 2007 : work (1/3)(1/3)

XPIPES ArchitectureLayered approach to decouple communication communication from computationcomputation

Network switches Network switches and network interfacesnetwork interfacesXPIPES Methodology

XpipesCompiler XpipesCompiler used to automatically generate synthesizable Verilog-based architecture

Table 1: Area requirements of a single-master/single-slave Network-on-Chip

3838

April 2007/May 2007 : work April 2007/May 2007 : work (2/3)(2/3)

Scenarios and applicationsRATIONALE: need of a concrete comparative term of performances of our solution w.r.t third-party ones

NO existing standard benchmark!

Different applications and market segmentsAutomotive

Aerospace & defense

Industrial

Scientific & medical

3939

April 2007/May 2007 : work April 2007/May 2007 : work (3/3)(3/3)

Communication infrastructure model (first ideas)

Layered approachFlexibility and independent optimization

Decoupling communication from computationSwitching and interfacing elements are crucial

Physical and logical addressing methodsUseful for task relocation

Adaptive architectureAchieving fault-toleranceIntegrable with legacy-systems

Bridge is required

Plugging-in and -off of IP-Cores

40

May 2007/June 2007: May 2007/June 2007: objectivesobjectives

XPIPESPossible improvements in the context of dynamic reconfiguration

Implementation (Verilog)

Basic essential elements for the communication infrastructure (reconfigurable switch)

Testing

41

What’s nextWhat’s next

DReAMSAlessandro PanellaMatteo Murgida

CITiESAlessio MontoneAlessandro MeroniSimone Corbetta

Operating SystemIvan Beretta

Design FlowAntonio Piazzi

PolarisMassimo MorandiMarco Novati

HLRMarco Maggioni

POLITECNICO DI MILANO

OOperatingperating Sy System support stem support forfor R Reconfeconfiigurablegurable S SoCoC

POLITECNICO DI MILANO

Development of an OS Development of an OS architecture-independent architecture-independent

layer for dynamic layer for dynamic reconfigurationreconfiguration

Ivan BerettaIvan.beretta@dresd.org

4444

Project OverviewProject Overview

Study of current operating system support for dynamic-reconfigurable architectures

Two solutions inside DRESD group

Definition of an intermediate layer for dynamic reconfiguration support

Architecture independentDistribution independent

4545

Second Phase: GoalsSecond Phase: Goals

Implementation of the DRESD operating system solution

Old kernel recovery Hardware architecture replication using ISE and EDK 9.1 version, on Xilinx Virtex II Pro VP7

Layer definitionComparison between existing solutionsBasic definition of the boundaries of the new intermediate layer

46

Second Phase: Results (1 of 2)Second Phase: Results (1 of 2)

Recovery of DRESD solution for CaronteStatic hardware architectureBootmanager recoveryBootstrap from flash memoryBase kernel

Hardware architectures upgradeNew synthesis tools (Xilinx ISE and EDK 9) and new cores

Kernel compilationRecovery of dynamic-reconfiguration support

46

4747

Second Phase: Results (2 of 2)Second Phase: Results (2 of 2)

Basic definition of the architecture-independent layer

Factorization of existing solutionsInterface to the reconfiguration controller driverAddress space manager moduleDriver loader moduleCore caching and placement module

Introduction of new elementsReconfiguration scheduler

4848

What’s next…What’s next…

Third phase:

Complete definition of the boundaries of the new intermediate layer

Full implementation of DRESD existing solutionsModule-based reconfigurable architecture Virtex II Pro VP7Synthesis flow based on Xilinx ISE and EDK 8.2 and 9.1Porting of YaRA solution on Virtex II Pro VP7

49

What’s nextWhat’s next

DReAMSAlessandro PanellaMatteo Murgida

CITiESAlessio MontoneAlessandro MeroniSimone Corbetta

Operating SystemIvan Beretta

Design FlowAntonio Piazzi

PolarisMassimo MorandiMarco Novati

HLRMarco Maggioni

POLITECNICO DI MILANO

Design FLowDesign FLow

Antonio Piazziantonio.piazzi@dresd.org

5151

Project OrganizationProject Organization

1st phase (15 March – 15 April): BudgetingStudy of the state of art

2nd phase(15 April – 15 May): Realization phaseCostruction of the entire tools based on prevoiusly separated toolsImplementation of a innovative work flow

3rd phase (15 May – 15 June): Project’s validationValidation on real architecture and performance’s quotation

5252

Second Phase: resultsSecond Phase: results

Output files: system.vhd; inserted device wrapper, ngc project files

System.vhd scomposition (ArchGen based)

Output files: fix.vhd and top.vhd

Comunication infrastructure generation (COMiC based)

Output file: <file name>.nmc <file name>.xdl

Collect information about comunication infrastructure from

xdl file

Output file: port.cfg

Adding information to top.vhd

Start related flow tool

Generation of the UCF file

5353

Second Phase: resultsSecond Phase: results

Basic previously tools:ArchGenComICYaRA scriptInCA script

Generated toolEditing ArchGen output file (top.vhd)Parsing xdl to collect information on busmacroTraslation of YaRA script into sequence of C++ instruction to be include into the earendil tool chain.

54

State of the progress

Tool phases

VH

DL

ge

nera

tion

VH

DL

ge

nera

tion

UC

F

gene

ratio

n&

Co

m.

Inf.

Gen

.

UC

F

gene

ratio

n&

Co

m. I

nf.

Gen

.

Bits

trea

m g

en.

Bits

trea

m g

en.

Me

rgin

g p

has

eM

erg

ing

ph

ase

Pla

nnin

gP

lann

ing

54

Second Phase: resultsSecond Phase: results

Tool phases

VH

DL

g

en

era

tion

VH

DL

g

en

era

tion

UC

F

ge

ne

ratio

n&

Co

m.

Inf.

Ge

n.

UC

F

ge

ne

ratio

n&

Co

m.

Inf.

Ge

n.

Bits

tre

am

ge

n.

Bits

tre

am

ge

n.

Me

rgin

g p

ha

seM

erg

ing

ph

ase

Pla

nn

ing

Pla

nn

ing

Manual process

Automated process

Planning

VHDL gen.

UCF and Com. Inf. Gen.

Bitstream gen.

Merging phase

Planning

VHDL gen.

UCF and Com. Inf. Gen.

Bitstream gen.

Merging phase

5555

What’s next…What’s next…

Automated switchingThe tool must be able to recognize from the device type the typology of the communication infrastructure to create and the appropriate flow design

Upgrade of the communication infrastructure with a deep integration of ComIC tool in the project

ComIC maybe considered a extension of ArchGen, this guide us to a different approach that free us from the “parserization” of the top file and the xdl file witch deline the bus

Patch for ComIC to create a bus Wishbone compatible

The idea is to create a complete bus witch presents all signals proposed by Wishbon protocol

5656

What’s nextWhat’s next

DReAMSMatteo MurgidaAlessandro Panella

CITiESSimone CorbettaAlessandro MeroniAlessio Montone

Operating SystemIvan Beretta

Design FlowAntonio Piazzi

PolarisMassimo MorandiMarco Novati

HLRMarco Maggioni

POLITECNICO DI MILANO

PolarisPolaris

58

PolarisPolaris

Create an integrated HW/SW system to manage 2D reconfiguration

SW side:Maintain information on FPGA statusDecide of how to efficiently allocate tasks

HW side:Provide support for effective task allocationPerform 2D bitstream relocation

58

POLITECNICO DI MILANO

Effects of 2D Reconfiguration Effects of 2D Reconfiguration in a Reconfigurable Systemin a Reconfigurable System

Massimo Morandimassimo.morandi@dresd.org

6060

22ndnd Phase Goals Phase Goals

Definition of a 2D reconfiguration allocation manager:

Evaluation of the desired featuresDefinition of its structure

State of the art analysis:Investigation of literature solutionsComparison of their costs, effectiveness, versatility… to propose a novel one representing a good compromise

6161

Allocation managerAllocation manager

Allocation manager desired features:Low TRRLow management overheadHigh routing efficiencyLow fragmentation

Allocation manager structure:Empty space manager

Complete space Heuristic selection

FitterGeneral (FF,BL,BF,WF…)Focused (FA,RA… )

62

Most relevant worksMost relevant worksMaintain complete information on empty space:

KAMER:Keep All Maximally Empty RectanglesApply a general fitting strategy

CUR:Maintain the Countour of a Union of RectanglesApply a focused fitting strategy

Heuristically prune part of the information:KNER:

Keep Non-overlapping Empty RectanglesApply a general fitting strategy

2D-HASHING:Keep Non-ov. Empty Rectangles in optimized data structureApply (exclusively) a general fitting strategy

62

63

EvaluationEvaluation

High placement quality => high complexity

Lowest complexity => no focused fitting (which is bad especially for routing)

63

6464

Next PhaseNext Phase

Chosen approach is heuristic (KNER-like) but with a fitting strategy focused on minimizing routing costs

To be done:Clearly define the interface for the allocation managerDesign KNER-like empty space managerIntegrate routing aware fitting strategy (with Manhattan distance metric)

6565

What’s nextWhat’s next

DReAMSAlessandro PanellaMatteo Murgida

CITiESAlessio MontoneAlessandro MeroniSimone Corbetta

Operating SystemIvan Beretta

Design FlowAntonio Piazzi

PolarisMassimo MorandiMarco Novati

HLRMarco Maggioni

POLITECNICO DI MILANO

Relocation for 2D Relocation for 2D Reconfigurable SystemsReconfigurable Systems

Marco Novatimarco.novati@dresd.org

6767

Goals of 2Goals of 2ndnd phase phase

Implementation of BiRF²:

Define the functionality:Create the new bitstream parserDetermine fomulae for:

– FAR calculation– CRC calculation

Design the structure BiRF²

Hw implementation

6868

New ParserNew Parser

69

CRC CalculationCRC Calculation

Particular CRC value, used by Xilinx tools

Two version of BiRF Square:By using the “predefined” valueWith actual CRC calculation

An optimized algorithm has been used

69

70

Synthesis resultsSynthesis results

On a Virtex-4 with speed grade -12General purpose version: max frequency of 160 MHzSpecific version: maxfrequency of 290Mhz

70

7171

What’s next…What’s next…

Simulation of BiRF Square

Interfacement on OPB Bus

Creation of a toy architecture for the validation

Actual validation on the new Virtex-4

7272

What’s nextWhat’s next

DReAMSAlessandro PanellaMatteo Murgida

CITiESAlessio MontoneAlessandro MeroniSimone Corbetta

Operating SystemIvan Beretta

Design FlowAntonio Piazzi

PolarisMassimo MorandiMarco Novati

HLRMarco Maggioni

POLITECNICO DI MILANO

HHighigh L Levelevel RReconfigurationeconfiguration

Marco Maggionimarco.maggioni@dresd.org

74

Project OrganizationProject Organization

First PhaseTime window: 1st monthGoal: Clustering

Second PhaseTime window: 2nd monthGoal:Coloring

Third PhaseTime window: 3rd monthGoal:Scheduling

ClusteredGraph

MetricCircuit

Representation

ReconfigurableClustered

Graph

AreaLatency

Rec. TimePower

Isomorphic

Target Architecture

Database

Gcc Frontend PartitioningAlgorithmPandA

SchedulingAlgorithm

75

Second Phase: ColoringSecond Phase: Coloring

Theoretical WorkFrom Clusters to Reconfigurable Graph

Definition of the interfaces for Coloring phaseStudy of a metric for cluster execution time

Implementation of the Coloring phaseColoring based onto delay of nodes

Applied to results of isomorphic clusteringGraphGen on Earendil

Produce Graph from specificationAutomatically Integrated with Panda

76

Second Phase: ColoringSecond Phase: Coloring

Add usefull information for next stepsExecution time mandatory for schedulingArea/Power/Rec.Time can optimize the final resultBased onto a target architecture

Interchangeable metrics

ClusteredGraph

Latency Area

Rec. Time

Power

Needed

Usefull

77

Second Phase: GraphGenSecond Phase: GraphGen

Basically a tool for graph generation (DFG,SDG,CDF,BB)...

Write .dot files...

Here some benchmark...AESWhetstone

78

What’s next…What’s next…

Third phase in details

Apply reconfigurable scheduling Adapts specification to reconfigurable architecture Uses information obtained from coloringPossible different algorithms

Define a schedule result structure

Implement the Salomone algorithm

Publish the entire work onto Earendil

79

QuestionsQuestions