Progetto MAIS - WP5 esplorazione di architetture alternative Resoconto delle attività svolte Andrea...

31
Progetto MAIS - WP5 esplorazione di architetture alternative Resoconto delle attività svolte Andrea Pagni STMicroelectronics Advanced System Architectures Group Milano, 17-18 Novembre 2004

Transcript of Progetto MAIS - WP5 esplorazione di architetture alternative Resoconto delle attività svolte Andrea...

Page 1: Progetto MAIS - WP5 esplorazione di architetture alternative Resoconto delle attività svolte Andrea Pagni STMicroelectronics Advanced System Architectures.

Progetto MAIS - WP5 esplorazione di architetture alternative

Resoconto delle attività svolte

Andrea Pagni

STMicroelectronics

Advanced System Architectures Group

Milano, 17-18 Novembre 2004

Page 2: Progetto MAIS - WP5 esplorazione di architetture alternative Resoconto delle attività svolte Andrea Pagni STMicroelectronics Advanced System Architectures.

2Resoconto WP5

Topics

Part 1: VLIW-SIM Overview.

Part 2: VLIW-SIM Performance.

Part 3: VLIW-SIM Library.

Part 4: Next Steps.

Page 3: Progetto MAIS - WP5 esplorazione di architetture alternative Resoconto delle attività svolte Andrea Pagni STMicroelectronics Advanced System Architectures.

Part 1: VLIW-SIM Overview

Page 4: Progetto MAIS - WP5 esplorazione di architetture alternative Resoconto delle attività svolte Andrea Pagni STMicroelectronics Advanced System Architectures.

4Resoconto WP5

Part 1: VLIW-SIM Overview

Simulation Approach (1-7).

Modeled Target Architectures.

Supported platforms.

Simulation functionalities.

Page 5: Progetto MAIS - WP5 esplorazione di architetture alternative Resoconto delle attività svolte Andrea Pagni STMicroelectronics Advanced System Architectures.

5Resoconto WP5

Simulation Approach 1/7Overview

Interpretative Simulation Approach

Simulation Technology based on a set of re-usable sub-blocks

Pipeline modeling

Instruction execution

Memory modeling

Register file management

I/O simulation

Efficient Host Resources Allocation

Target Architecture Description capability (IS, TAD)

Challenging compromise between Speed and Accuracy

BIN loader

Instruction Fetch

Instruction Decode

Instruction Execute

Instruction Set

Register File

Pipeline

Memory Executable file

Program Memory

Initialization

Data Memory Load/Store

Read Operands / Write Results

Load Instructions

Processor Instruction Simulation

Write Instructions into Pipeline

Instructions are executed and passed to

next phase

Instructions are decoded

and passed to next phase

Page 6: Progetto MAIS - WP5 esplorazione di architetture alternative Resoconto delle attività svolte Andrea Pagni STMicroelectronics Advanced System Architectures.

6Resoconto WP5

op8

op4

op8

op4 F

D

R

E1

E2

W

op1 op 2 op3 op4

Operations

op5 op 6 op7 op8

t-1

t

t+1 Operation

Phase

Time

During simulation, the pipeline is represented as a 3-dimensional space (phase, operation, time): operation means the instruction’s position in the bundle, phase is the pipeline’s phase and time is the given time stamp.

Simulation Approach 2/7 pipeline modelling

Page 7: Progetto MAIS - WP5 esplorazione di architetture alternative Resoconto delle attività svolte Andrea Pagni STMicroelectronics Advanced System Architectures.

7Resoconto WP5

Simulation Approach 3/7Pipeline modelling

The pipeline status is modelled via a two-dimension array:

The first index is the pipeline phase and the second one is the position of a certain instruction in the fetch-packet.

The simulation process is based on two arrays like the one described above, to represent the current and the following pipeline statuses.

At each machine cycle the pipeline status is processed: actions depending on which instructions are at that phase and then the instructions are moved to the next pipeline phase.

Page 8: Progetto MAIS - WP5 esplorazione di architetture alternative Resoconto delle attività svolte Andrea Pagni STMicroelectronics Advanced System Architectures.

8Resoconto WP5

F

D

R E1

E2

W

Current Status Following Status

op1 op 2 op3 op4

Pipeline Phases

Operations Operations

op1 op 2 op3 op4

F

D

R E1

E2

W

op1 op 2 op3 op4

op1 op 2 op3 op4

Load fetch packet Pipeline Phases

At each machine cycle the pipeline status is processed

Simulation Approach 4/7 pipeline status update

Page 9: Progetto MAIS - WP5 esplorazione di architetture alternative Resoconto delle attività svolte Andrea Pagni STMicroelectronics Advanced System Architectures.

9Resoconto WP5

Dispatch

unit Decode

instr

opfield Index

Instruction Table

Index

n

Pointer to instruction

routine

Instruction execute

opcode 3 2 1

……… ………

………

………

………

category

Instructions execution is simulated through an Instruction Table which contains the instruction-routine address and the instruction latency value.

Simulation Approach 5/7 Instruction execution

Page 10: Progetto MAIS - WP5 esplorazione di architetture alternative Resoconto delle attività svolte Andrea Pagni STMicroelectronics Advanced System Architectures.

10Resoconto WP5

The simulation environment is based on the progressive pipeline status updating taking into account the data coherence in memory locations and in the register file.

To support data coherence two Register files have been used: one for the current Register File status and the other one for the following.

Each time an instruction is executed its operands are loaded from the current register file and results are stored in the following.

This allows sequential simulation of parallel instruction execution.

Pipeline Phase

….. …..

Reg 0

Register File

t

t+1

Current State

Next State

F

D

R

E1

E2

W

Reg 63

Reg 1

….. …..

Reg 0 Reg 1

Read from

Write to

Reg 63

Simulation Approach 6/7 register file status update

Page 11: Progetto MAIS - WP5 esplorazione di architetture alternative Resoconto delle attività svolte Andrea Pagni STMicroelectronics Advanced System Architectures.

11Resoconto WP5

Simulation Approach 7/7 I/O simulation

I/O

instruction

syscall

bundle (i-1)

bundle (i+1)

...

...

I/O Handling save_core

...

exception is generated

return

I/O operation Selection

...

call

Host File System I/O execution

...

return ...

call

...

call restore_core

...

rfi

...

I/O Target Architecture specific features separated from Simulation kernel

The SYSCALL pseudo-instruction manages the interface between internal I/O instruction (processor side) and File System I/O calls (OS side).

SYSCALL handle also the general Exception Handling

This mechanism is transparent to other simulator modules:

Performance and data flow are not influenced if I/O operation are not present.

DetailsDetails

Page 12: Progetto MAIS - WP5 esplorazione di architetture alternative Resoconto delle attività svolte Andrea Pagni STMicroelectronics Advanced System Architectures.

12Resoconto WP5

Modeled Target Architectures

Multi-cluster Architecture

4-issue VLIW core

I/D-cache memories

6-stages pipeline

RISC-like Instruction Set

64 32-bit General registers, 8 1-bit special registers

ST210TI C62x 8-issue VLIW core

Optional I-cache memory

11-stages pipeline

RISC-like Instruction Set

32 32-bit General registers

TI C64x 8-issue VLIW core

I/D cache memories

11-stages pipeline

RISC/SIMD Instruction Set

64 32-bit General registers

Page 13: Progetto MAIS - WP5 esplorazione di architetture alternative Resoconto delle attività svolte Andrea Pagni STMicroelectronics Advanced System Architectures.

13Resoconto WP5

Windows OS (Visual C++):

• text mode: project file in vliw_sim/vliw_sim

• graphical mode: project file in vliw_sim/gui/gui

Windows OS (Cygwin, gcc):

• text mode: makefile in vliw_sim/vliw_sim

• graphical mode (with XWindows on Cygwin)

Linux OS (RedHat, gcc):

• text mode: makefile in vliw_sim/vliw_sim

• graphical mode: makefile in vliw_sim/gui/gui

Sun OS (Solaris, gcc)

• text mode: makefile in vliw_sim/vliw_sim

• graphical mode: makefile in vliw_sim/gui/gui

vliw_sim• bin_loader• cache• gui/gui• instruction_set• io_interf• memory• pipeline• profdebug• registers• vliw_sim• vliw_sim_dll

Supported Platforms

Page 14: Progetto MAIS - WP5 esplorazione di architetture alternative Resoconto delle attività svolte Andrea Pagni STMicroelectronics Advanced System Architectures.

14Resoconto WP5

Simulation functionalities

Debug Support Step-by-step execution Breakpoint Register & Memory access Pipeline Visibility (instruction &

addresses)

Profiling Application Code region Profile Statistics extraction for profiled code

Simulator Dynamic Library Simulation API SoC simulation facilities

Exception Handling simulation Efficient I/O interface simulation

Page 15: Progetto MAIS - WP5 esplorazione di architetture alternative Resoconto delle attività svolte Andrea Pagni STMicroelectronics Advanced System Architectures.

Part 2: VLIW-SIM Performance

Page 16: Progetto MAIS - WP5 esplorazione di architetture alternative Resoconto delle attività svolte Andrea Pagni STMicroelectronics Advanced System Architectures.

16Resoconto WP5

Part 2: VLIW-SIM Performance

Tested Applications.

SW apps on ST210.

SW apps on TI C62x.

SW apps on TI C64x.

SW apps on ST210 (1-2).

Page 17: Progetto MAIS - WP5 esplorazione di architetture alternative Resoconto delle attività svolte Andrea Pagni STMicroelectronics Advanced System Architectures.

17Resoconto WP5

Tested Applications

ST210.• MPEG-2 Intra Video Encoder (0.2s, 5 frames, 15 Mbit/s).• MPEG-1 Layer 2 Audio Encoder (1s, 32KHz 256 kbit/s).• MPEG-2 M=3 MP@ML Video Decoder (1s, 25 frames/s, 15

Mbit/s).• MPEG-4 QCIF SP@L3 Video Decoder (1s, 25 frames/s, 512

kbit/s).• MPEG-4 QCIF SP@L3 Video Encoder (27 frames, 64 kbit/s,

QP=12).• H.263+ QCIF Video Encoder (10 frames, No rate-control).• G.723.1 Audio Enc-Dec (20 frames, 8 kHz, 5.3 kbit/s).• Automatic Speech Recognition (HMM, 5 words, 8 MEL, 50

active words).

TI C62x & C64x.• H.263+ Video Enc QCIF (5 frames, No rate-control)• G.726 Audio Enc-Dec (10 frames, 8kHz, 32 kbit/s)

Page 18: Progetto MAIS - WP5 esplorazione di architetture alternative Resoconto delle attività svolte Andrea Pagni STMicroelectronics Advanced System Architectures.

18Resoconto WP5

SW apps on TI-C62x

application ME2DYA ISS TI C62x Fast Simulator DIFFERENCE

CPU time (sec) MOPS CPU time (sec) MOPSH263+ with caches with cachesVideo Enc 104 0.61 509 0.125 fr QCIF bundles operations I$ misses D$ misses bundles operations I$ misses D$ misses bundles operations I$ misses D$ missesNo rate-Ctrl 36670008 63448050 36670011 3 0 0

CPU time (sec) MOPS CPU time (sec) MOPSG.726 with caches with cachesAudio Enc-Dec 7 0.59 35 0.1210 fr bundles operations I$ misses D$ misses bundles operations I$ misses D$ misses bundles operations I$ misses D$ misses8KHz 32 Kbps 2366855 4164831 2366858 3 0 0

Simulation Platform: Pentium II 400 MHz 128 MB RAM Windows NT4 SP6

Operation = one syllable (elementary 32-bit RISC instruction)

Page 19: Progetto MAIS - WP5 esplorazione di architetture alternative Resoconto delle attività svolte Andrea Pagni STMicroelectronics Advanced System Architectures.

19Resoconto WP5

SW apps on TI-C64x

application ME2DYA ISS TI C64x C64_CSIM DIFFERENCE

CPU time (sec) MOPS CPU time (sec) MOPSH263+ with caches with cachesVideo Enc 95 0.55 897 0.065 fr QCIF bundles operations I$ misses D$ misses bundles operations I$ misses D$ misses bundles operations I$ misses D$ missesNo rate-Ctrl 28533664 51919843 330117 684905 28533667 387578 770661 3 57461 85756

CPU time (sec) MOPS CPU time (sec) MOPSG.726 with caches with cachesAudio Enc-Dec 8 0.52 78 0.0510 fr bundles operations I$ misses D$ misses bundles operations I$ misses D$ misses bundles operations I$ misses D$ misses8KHz 32 Kbps 2316134 4123399 45277 4286 2316137 60623 4286 3 15346 0

Simulation Platform: Pentium II 400 MHz 128 MB RAM Windows NT4 SP6

Bundle = more syllables (max 8 for TI C6xx, max 4 for ST210) per clock cycle

Page 20: Progetto MAIS - WP5 esplorazione di architetture alternative Resoconto delle attività svolte Andrea Pagni STMicroelectronics Advanced System Architectures.

20Resoconto WP5

SW apps on ST210 1/3

application ME2DYA ISS HP ISS v 2.34 DIFFERENCE%

CPU time (sec) MOPS CPU time (sec) MOPSMPEG 1 Layer 2 with caches with cachesAudio Enc (1 s) 33 1.69 36 1.5532KHz 256Kbps bundles operations I$ misses D$ misses bundles operations I$ misses D$ misses bundles ops I$ mis D$ mis

22244270 55912864 44833 2576 22250327 55922242 47087 101529 0.03 0.02 4.79 97.46

CPU time (sec) MOPS CPU time (sec) MOPSMPEG 4 QCIF with caches with cachesVideo Dec (1s) 27 1.63 27.2 1.6225 fr/s 512 kbps bundles operations I$ misses D$ misses bundles operations I$ misses D$ misses bundles ops I$ mis D$ mis

20937469 43948023 110051 157526 20946900 43945855 103745 227471 0.05 0.00 -6.08 30.75

CPU time (sec) MOPS CPU time (sec) MOPSMPEG-2 Intra with caches with cachesVideo Enc 64 1.92 67 1.83(5 fr) 15 Mbps bundles operations I$ misses D$ misses bundles operations I$ misses D$ misses bundles ops I$ mis D$ mis

41767235 122788664 27438 412745 41793358 122782323 27537 446544 0.06 -0.01 0.36 7.57

Simulation Platform: Pentium II 800 MHz 256 MB RAM Windows 2000

HP ISS configured with:ignore_non_cacheable_areas TRUEprofile_gprof_on FALSE

Page 21: Progetto MAIS - WP5 esplorazione di architetture alternative Resoconto delle attività svolte Andrea Pagni STMicroelectronics Advanced System Architectures.

21Resoconto WP5

SW apps on ST210 2/3

application ME2DYA ISS HP ISS v 2.34 DIFFERENCE%

CPU time (sec) MOPS CPU time (sec) MOPSMPEG 2 Video with caches with cachesDecoder MP@ML 361 1.93 365 1.91Renata M=3 bundles operations I$ misses D$ misses bundles operations I$ misses D$ misses bundles ops I$ mis25 fr/s 210332372 698079303 114211 2149829 210408042 698072522 133596 2704002 0.04 0.00 14.51

CPU time (sec) MOPS CPU time (sec) MOPSMPEG 4 QCIF with caches with cachesVideo Enc SP@L3 829 1.44 740 1.6127 fr 64 kbps bundles operations I$ misses D$ misses bundles operations I$ misses D$ misses bundles ops I$ misQP=12 638253918 1190289114 3387100 186212 638264837 1190299380 3427876 406397 0.05 0.00 -6.08

Simulation Platform: Pentium II 800 MHz 256 MB RAM Windows 2000

HP ISS configured with:ignore_non_cacheable_areas TRUEprofile_gprof_on FALSE

Page 22: Progetto MAIS - WP5 esplorazione di architetture alternative Resoconto delle attività svolte Andrea Pagni STMicroelectronics Advanced System Architectures.

22Resoconto WP5

SW apps on ST210 3/3

application ME2DYA ISS HP ISS v 2.22 DIFFERENCE

CPU time (sec) MOPS CPU time (sec) MOPSH263+ with caches with cachesVideo Enc 24 2.93 29 2.4210 fr QCIF bundles operations I$ misses D$ misses bundles operations I$ misses D$ misses bundles ops I$ misses D$ missesNo rate-Ctrl 30381273 70272426 118303 33190 30381792 70272489 116161 129534 519 63 -2142 96344

CPU time (sec) MOPS CPU time (sec) MOPSG.723.1 with caches with cachesAudio Enc-Dec 11 2.99 12.8 2.5720 fr bundles operations I$ misses D$ misses bundles operations I$ misses D$ misses bundles ops I$ misses D$ misses8KHz 5.3 Kbps 12323550 32934110 25160 791 12323977 32934196 25229 3529 427 86 69 2738

CPU time (sec) MOPS CPU time (sec) MOPSSpeech Rec. with caches with caches5 words 305 2.86 360 2.428 Mels bundles operations I$ misses D$ misses bundles operations I$ misses D$ misses bundles ops I$ misses D$ misses50 active words 386584108 872624864 349878 1372934 386585313 872624303 354724 2845644 1205 -561 4846 1472710

Simulation Platform: Pentium IV 2.00 GHz 256 MB RAM Windows 2000 SP3

MOPS = Millions Of Operations Per Sec

Page 23: Progetto MAIS - WP5 esplorazione di architetture alternative Resoconto delle attività svolte Andrea Pagni STMicroelectronics Advanced System Architectures.

Part 3: VLIW-SIM Library

Page 24: Progetto MAIS - WP5 esplorazione di architetture alternative Resoconto delle attività svolte Andrea Pagni STMicroelectronics Advanced System Architectures.

24Resoconto WP5

Part 3: VLIW-SIM Library

VLIW-SIM Library (1-2).

Page 25: Progetto MAIS - WP5 esplorazione di architetture alternative Resoconto delle attività svolte Andrea Pagni STMicroelectronics Advanced System Architectures.

25Resoconto WP5

VLIW-SIM Library 1/2

The VLIW-SIM can be configured as both stand-alone and dynamic library (DLL).

• extremely useful to interface VLIW-SIM with other applications (system on chip simulation environment, Graphical User Interface, etc.).

The simulator-exported functionalities can be divided into two subgroups:

• Command Functionalities: used to control the simulation (Run, Stop, Insert/remove breakpoint, Continue, Step, etc.)

• Status Functionality: used to retrieve the simulator internal status and resource allocation (pipeline status and size, register file content and size, etc.)

Page 26: Progetto MAIS - WP5 esplorazione di architetture alternative Resoconto delle attività svolte Andrea Pagni STMicroelectronics Advanced System Architectures.

26Resoconto WP5

VLIW-SIM Library 2/2

The simulator DLL exports the following functionalities: Control Functions

• Load• Init• Step / Step N / Stall• Run• Restart

Debug Support• View simulator status ( Pipeline, Register File,

Memory )• Breakpoint

Utility functions• Code profiling• Simulated Program Arguments

Page 27: Progetto MAIS - WP5 esplorazione di architetture alternative Resoconto delle attività svolte Andrea Pagni STMicroelectronics Advanced System Architectures.

Part 4: Next Steps

Page 28: Progetto MAIS - WP5 esplorazione di architetture alternative Resoconto delle attività svolte Andrea Pagni STMicroelectronics Advanced System Architectures.

28Resoconto WP5

Part 4: Next Steps

Where we are.

VLIW-SIM Developments.

Page 29: Progetto MAIS - WP5 esplorazione di architetture alternative Resoconto delle attività svolte Andrea Pagni STMicroelectronics Advanced System Architectures.

29Resoconto WP5

Released version 2.0 and 3.0 of VLIW-SIM.

A lot of SW engineering work to improve:

• Modularity

• Readibility (doxygen generated documentation)

• Simulation speed

• Architectural accuracy:

• ST210: IPU, DPU, Interrupt Controller, Core Memory Controller, I-cache, D-cache

• TI C6x: I-cache and D-cache for CPU style , program memory and data memory for DSP style

• Accurate and not invasive flat profiling (GNU format compatible)

• Architectural flexible re-configurability

• Host platform independency

• Future integration into high level system tools

Where we are

Page 30: Progetto MAIS - WP5 esplorazione di architetture alternative Resoconto delle attività svolte Andrea Pagni STMicroelectronics Advanced System Architectures.

30Resoconto WP5

ST220 accurate modelling

Integration inside MaxSim system simulation tools and related experiments

VLIW-SIM developments

Page 31: Progetto MAIS - WP5 esplorazione di architetture alternative Resoconto delle attività svolte Andrea Pagni STMicroelectronics Advanced System Architectures.

Fine

Domande?