Studi ed esperienze sulla metodologia di progettazione con ... fileStudi ed esperienze sulla...

59
Studi ed esperienze sulla metodologia di progettazione con Network Processors Laura Vellante

Transcript of Studi ed esperienze sulla metodologia di progettazione con ... fileStudi ed esperienze sulla...

Studi ed esperienze sullametodologia di progettazione

con Network Processors

Laura Vellante

Limited Internal 2

Summary

� Reference scenario for ALL-IP convergence network

� Methodology design for re-configurable node based on Network Processor

� Activity carried out within CoRiTeL: Data Plane: Implementation of packet aggregator for

Ethernet and ATM

Limited Internal 3

Reference network scenario

Soft evolution towards ALL-IP context

PSTN accessIP BB access

2G/3G

WLAN, WiMaxWireline access

PLMNPSTN

RNCBSC

Transport backbone

MSG

BRAS

MSG

WSN MGW

GGSN

MGW

�Convergence

Multi-service network & nodes requires to support different technologies and services

�Low cost

Co-existence with existing infrastructure

�Flexibility

Limited Internal 4

Flexibility: an example 1/2

Today: ATM support

Future: Ethernet support

Near Future: coexistence of ATM & Ethernet

Evolution step towards new functionality & coexistence with existing ones

Limited Internal 5

Flexibility: an example 2/2

Evolution towards IP basedfunctions is not clearIt depends on type of node

ATM traffic

Ethernet trafffic

ATM switching with QoS

Ethernet switching with QoS

ATM traffic

Ethernet traffic

ATM switching with QoS

Ethernet switching with QoS

Ethernet traffic Ethernet switching with QoS

ATM switching with QoSATM traffic

Limited Internal 6

Low cost key issues:Minimize time to market (TTM)

Maximize time in market (TIM)

Network equipment requirements:High reusability of HW platformHigh re-configurable

Network processorA consistent answer to meet such requirements

Limited Internal 7

Network Processors

• A programmable processor optimized for use in networking devices to provide steady state packet processing functionality

•High flexibility and programmability

•Complex to implement in order to keep TTM and high performance

��������

����

������

Limited Internal 8

Network Processor: a Wide World

� It doesn’t exists THE Network Processor but there is a lot of families with different characteristics.

� It’s possible to classify existing Network Processors in two groups:

1. High programmability but complex design2. Easier design but low level of flexibility (dedicated

hardware)

Limited Internal 9

NP: the same name for different families

AGERE: Very high throughput for switching with QoS

No flexible for implementing proprietary functions

No suitable for dealing with protocols handling

INTEL/ FREESCALE C family port: Flexible for proprietary functions implementation

Complex to implement

Less performing for switching with QoS functions

An example of two NP families

Limited Internal 10

Network Processors

Data Path (OSI-layer 2-3)

OSI-layer 2-7

OSI-layer 2-7

OSI-layer 2-7

Feasible Applications

No flexible for implementing proprietary functions and no suitable for dealing with protocols handling

Switching, routing, policing, scheduling,QoS applications

Low programmability level (dedicated hardware), high performance for QoS applications

Agere

Complex to implement, less performance than Freescale

Switching, routing, policing, scheduling

High programmability, high performance

Intel

Less performance than Freescale

Switching, routing, policing, scheduling

Medium programmability level, high performance

Mindspeed

Complex to implement, less performing for QoS applications

Switching, policing, MPLS, and IMA for wireless application in RBS, RNC( WNI IP and WNI ATM application)

High programmability, high performance

Freescale C-5/C-5e

LimitationsAvailable Applications

Main featuresNetwork Processor

Limited Internal 11

Activity on NP

1. SW design methodology definition and implementation of a case study on a suitable NP platform in order to test such methodology and to provide the necessary expertise on NP design that is required to define automatic process of code generation.

2. SCTP on Freescale C5 NP

3.3. Packet Aggregator with Packet Aggregator with AgereAgere NPNP

Limited Internal 12

SCTP on Freescale C5

� SCTP: Level 4 application � Freescale C5: 16 “not dedicated” processor

DESIGN METHODOLOGY

� SW Development: DFD and UML� Functions mapping on Device

Limited Internal 13

Conceptual SW model (Platform Independent Model)

Embedded system HW Model

MAPPING:Platform dependent Model

Automatic code generation

Implementation

Performance Analysis

UML

XUML

Suitable code for Platform

Aut om

a ti c co de g e ne ra ti on

Methodology for SW design on Embedded systems

Refinements

Limited Internal 14

Methodology for SW design features

•PRO

• HIGH TTM (Performance analysis at system level )

•Implementation is obtained automatically by converting the language description used for mapping into machine code.

•The validation is based on the test benches developed during thesystem simulation phase.

•CONS

•Suitable trade off between accuracy of models and time to provide it

•HW model in case of NP could be very complex and require long time that impact on TTM for the products

Limited Internal 15

Conceptual SW model (Platform Independent Model)

Embedded system HW Model

MAPPING

Automatic code generation

Implementation

UML

XUML

Suitable code for Platform

Aut om

a ti c co de g e ne ra ti on

Methodology for SW design on NP

Refinements

Performance Analysis

NP HW modeling is complex and long

Limited Internal 16

SCTP on Freescale C5

� SCTP: Level 4 application � Freescale C5: 16 “not dedicated” processor

DESIGN METHODOLOGY

� SW Development: DFD and UML� Functions mapping on Device

Traffic aggregatorWith Agere

Limited Internal 18

Agnostic handling of QoS switching

Classification

Policing

Scheduling

Sw

itching

PacketQoS Switching

Limited Internal 19

ATM handling of QoS switching

� Classes of service:� CBR� Rt-VBR� Nrt-VBR 1,2,3� UBR (UBR+)

Classification

Policing

Scheduling

Sw

itchingPacketQoS Switching

CBRrtVBRnrtVBRUBR

Queues for CoS

Limited Internal 20

Ethernet handling of QoS switching

� Classes of service:� Best effort� Background (bulk transfers, games, etc. )� Spare � Excellent Effort (BE for important users) � Controlled Load (important applications)� Video (<100 ms delay)� Voice (<10 ms delay)� Network Control

Classification

Policing

Scheduling

Sw

itching

PacketQoS Switching

Queues for CoS...

AGERE APP300BLOCKS OVERVIEW

Limited Internal 22

��������

�����

�� ���� ���

����������� �

��������

����������� �

�����

������������ ��

��� ������

�������� ����!

��� �������"���"#$�%�&��� '"������"(

�)���

�������� ��!*

���� �!+�,!+�!"

�� '"���"!"(���(,!+�!"

!-�� !�"(�� "

��� !�"(�� "

�� ���� ���

����������� �

����������� �

��("�#,*���(

�(���� �.�-�!+,!+�!"

��� �������(�+(�#�&���!�(��

!�"(!����"#�(%

���-�� !�"(�� "

��� !�"(�� "

������"#$�"(

��������"(��!�+"(

Agere APP300: blocks overview

AGERE APP300:PROGRAMMING

LANGUAGE

Limited Internal 24

What is Functional Programming Language?� Rules Based High Level Language

– Designed for high-speed protocol processing– Fast pattern matching of the data stream– Easy-to-understand statement semantics– Dynamic updating of FPL programs – A complete software development tool set

•Two types of functions:

- Control Functions: Executed in the Flow Engine Provide Instruction Flow Static

- Tree Functions: Executed in the Tree Engine Used for Pattern Matching Dynamic

Limited Internal 25

Classification

On-Chip Statistics, Policing, OAM Memory

Statistics, Policing, and OAM Engine

External Classification andScheduling PDU Buffer

Traffic Management

On-Chip SED Context Memory

Buffer Manager Traffic Shaper Stream EditorFirst Pass Second Pass

Policing ScriptOAM Script

Maintain policing statisticsGenerate policing decision

Per flow policing statistics

Statistics generated in classification with FPL

Make drop PDU decision

Manage CoS Queues, Dynamic Scheduler, Programmable Round

Robin Scheduler and Shared Dynamic Scheduler

Modify headers and trailers

Reassemble PDUsProcess Cells

Classify and generateDestination ID

APP300 Programming Language

FPL FPL C-NP C-NP C-NP

C-NP

Maintain OAM statistics

Limited Internal 26

Compute Engine Function� Four Compute Engines:

– Stream Editor (SED)– Buffer Manager, Traffic Manager (TM)– Traffic Shaper (TS)– Policing/Statistics Engine

C-NP� Subset of C� Each script is one

function long� No subroutine calls

Does not support:•Floating point math

•Character constants (strings)•Subscripted variables (arrays)•Reference variables (pointers)

Limited Internal 27

C-NP Script Structure

� A C-NP script is composed of a list of – data declarations followed by the keyword script,

– a name for the script

– a compound statement that contains the body of the script.

� Standard preprocessor directives are available.

CASE STUDY:PACKET AGGREGATOR ON

APP300

Limited Internal 29

Case study on APP300

� Packet Aggregator :� Switching with QoS for ATM cells� Switching with QoS for Ethernet packet� ATM over Ethernet (according to AF-FBATM-0139.001)

Flexibility of the NP to traffic changingPerformance evaluation

Limited Internal 30

ATM Switching with QoS

Classes of service:�CBR�Rt-VBR�Nrt-VBR 1,2,3�UBR (UBR+)

Function to perform:�Classification (VPI/VCI)�Policing (GCRA,..)�Scheduling (Static, WRR…)�Traffic Shaping

Limited Internal 31

Performed steps

� According to supported classes of service, Policing and Classification Engines programming (i.e. definition of tree function in FPL code) low workload

� Basing on Quality of Service requirements, Engine Traffic Management configuration, in terms of:

– Scheduling algorithms implementation (e.g. GCRA)– Queues configuration, i.e. WRR, PRR and so on

HIGH WORKLOAD AND “HEURISTIC” TUNING OF THE DEVICE

Limited Internal 32

Performance comparisons

63.589.8920145014501600ATM+Ethernet

97.21001847190019001900Ethernet

57.388.9825142414241600ATM+FATE+Ethernet

10083.11330133013301600ATM+FATE

89.489.11277142914291600ATM (CBR+VBR+UBR)

ATM (VBR+UBR)

ATM (CBR)

89.3

89.4

FPP

1432

1434

INPUT

1600

1600

INPUT RSPOUTPUTOUTPUT

87.112491432

87.312481434

Throughput(%)

RSP Aggregate rate(Mbits/sec)

FPP Aggregate rate (Mbits/sec)Type of

Traffic

Conclusion about APP300and Packet Aggregator

Limited Internal 34

Main Issues and Conclusion � Learning the FPL and C-NP languages� Configuring the NP (in terms of hardware parameters)

� Difficulties in understanding device behaviour� Difficulties in implementing proprietary functions without

impacting on device performance

� ATM over Ethernet: mapping ATM CoS on Ethernet bits; if and how to consider policing

How configuring in the better way the device?How configuring in the better way the device?

Methodology could be usefulMethodology could be useful

Studi ed esperienze sullametodologia di progettazione

con Network Processors

Laura Vellante

Ethernet bridging using APP540

… Layer 2 card product experience

Limited Internal 37

An all-network comprehensive portfolioFocused on market growth areas and move to packets

DSLAMResidential

DSLAM

Business

Mobile

OMS1664/ AXX9x00OMS2400

EPE

OMS UC/EXAXX9100

OMS840AXX9100

STM-16/GE

STM-1/4

FE/GESTM-16

STM-16/64/GE/10GE

OMS 1664AXX9300

OMS UC/EX

OMS3200

MHL3000

Business

IP/MPLS

OMS3200

OMS1664/ AXX9300OMS2400

ASTN/OTN core

OMS2400

WDM

Voice Video Data

Mobile Internet

ServiceOn OpticalNetwork management

Customer Access Metro Core

STM-1/4

OMS UC/EXAXX9200 / 9300

OMS UC/EXAXX9200 / 9300

OMS1664/ AXX9300OMS2400

MHL3000

Limited Internal 38

Features on Ethernet L2 cards

Layer 2 card OMS16xx

– EPL according to ITU-T G.8011.1– EVPL according to ITU-T G.8011.2– Bridging & Switching:

� VLAN aware bridge according to 802.1Q,� STP & RSTP according to IEEE 802.1D,

– SDWRR scheduling, 4 service queues– LLF and Pause– Priority handling (IEEE 802.1 p)– Multiple STP (IEEE 802.1s)– EFM of MRV devices, this in compliance to IEEE 803.2ah– Per - VLAN policing

2x Galazar STM16Ethernet Mapper

Agere APP540 forpacket processing

� 2x GigE (optical SFP) on card� supports

– 16x FE interfaces in LTU area– STM16 backplane capacity

Limited Internal 39

Architecture and Modules

TDMswitchTDM

switch

PH

YP

HY

Framer(GFP,

VCG, LCAS)

Framer(GFP,

VCG, LCAS)

SDHline

interface

SDHline

interface

GbE

FE

ELS 1000(S)

PacketSwitching

Engine

PacketSwitching

Engine

SMAOMS1664

Function already available on Ethernet mapper productsFunction already available on Ethernet mapper products

New functionality of layer 2 cards “ELS1000(S)”New functionality of layer 2 cards “ELS1000(S)”

Limited Internal 40

APP 540

SW Architecture – Overview

HALand

Drivers

Data Plane (NP-API Layer)

Stack Adaptation Layer PDU FwdConfiguration

CFA

RSTP/MSTP

Learning & Ageing

GARP

L2 Stack

Qx Manager

Control Plane

CPAPI

Layer

CP-APIVLAN

L2-IW

F

PDU-APIFS-HW-API

ManagementPlane

PduInterfaceNpApi-Adaptation

SDH framerSFP(FE)

Qx Manager

SDHDomain

Eth Classificat.

Policing

Scheduling

Sw

itching

Limited Internal 41

1

0

interface

Str

ict P

rio

2

High Priority QueuePer service SDWRR

Expedited Forwarding Queue

Assured Forwarding Queue

w_2

w_3

Best Effort Queue

Less than Best Effort Queue

w_4

Policer

w_1 (readonly)

Bridge

interface

Policer

interface

Policer

interface

Policer

TQ D

emux

interface

IngressPorts EgressPort

BDPUsfrom host

InterfaceRate R

SD

WR

R

low

high

Example 1: scheduling configuration on APP540

Limited Internal 42

Example 1: scheduling configuration

•XML configuration to define number of queues and scheduling policies

• Mapping of services on available queues

• Dynamic behaviour is also depending on NP HW costraints (i.e. NP port priorities, port rates, port manager fixed/not fixed priorities)

Limited Internal 43

PM 1

PORT MANAGERS(256 Total)

DataPort 0

PM 3

LOGICAL PORTS(512 Total)

LP 1

SCHEDULERS(1024 Total)

(Max 4 per LP, from 3 types)

PHYSICALPORTS

PM 256

LP 2

LP 3

LP 512

Static

QUEUES(2k Total)

(Max 16 CoS per QoS)

8-bit Configurable

QoS

CoS

CoS

QoS

QoS

QoS

QoS

QoS Static

Dynamic

PRR

32-bitI/F

PM 2Data

Port 18-bit Configurable

DataPort 2

8-bit Configurable

DataPort 3

8-bit Configurable

DataPort 4

8/16-bit Configurable

HostPort

Dynamic

Dynamic

PRR

Dynamic

16-bitI/F

CoS

CoS

CoS

CoS

Example 1: scheduling configuration

Example 2: throughput measuraments

�the following throughput limits refer to Net Bitrate

�the displayed throughput limit is defined as the point of first packetloss; note

that burstiness of the traffic will lead to a lower throughput limit

�the values are derived from measurements with equal utilization on all

interfaces.

�There are several internal throughput limits (rate limit based on max. packet

(64 Byte granularity) throughput, classification limit, scheduling limit). It

depends on the traffic type and packet size, which of the limits cuts in first.

The “white box” test will show the “envelope” of all limits.

Limited Internal 45

Performance test scenario (evpl & evplan service)

ELS

-100

0s

FE0

FE1

(HBT4)

FE15

FE16

ELS

-100

0s

SM

B

5A01

5A02

2A02

2A01

(HBT2)

GE1

GE2

2A03

2A04

ELS-1000 ELS-1000SMA-1664

SUT

VC-4--7v

TDM switch

max 90%SMB

max 95%

VC-3-2v

max 45%VC-3-1v

Limited Internal 46

Classification Performance

0

500

1000

1500

2000

2500

3000

3500

0 200 400 600 800 1000 1200 1400 1600

frame size / byte

data

rat

e / M

bit/s

Measured Classification Performance best case classification performance maximum arriving traffic

SMA4 - 133 MHz

Problem for frames ofsize 65 .. 100

Limited Internal 47

Classification problem summary� Interfaces provide 3 Gbit/s gross traffic (4FE, 2GbE, 600M VCG)� Requirement: classify all frames which can arrive at all interfaces� Classification performance is below maximum arriving traffic if

– frames of size 65 ...100 byte arrive AND– on all interfaces concurrently AND– at a load > 73%

� Effect: uncontrolled loss of frames may happen only on GbE– VCGs have priority on ingress

Limited Internal 48

Throughput Performance

0

500

1000

1500

2000

2500

3000

0 200 400 600 800 1000 1200 1400 1600

frame size / byte

data

rat

e / M

bit/s

Measured Throughput calculated worst case throughput traffic to be transported over backplane maximum arriving traffic

SMA4 - 133 MHz

Problem may be presentfor frames of size 65 .. 88

no local switching

Limited Internal 49

Throughput problem summary� Interfaces provide 3 Gbit/s gross traffic (4FE, 2GbE, 600M VCG)

� Requirement: gross throughput >=backplane capacity- 600 Mbit/s bidirectional = 1.2 Gbit/s unidirectional (gross rate)

� Reality: the throughput may be below backplane capacity,in the worst case 827 Mbit/s, but only if:

- frames of size 65 ...88 byte are served AND- on all interfaces concurrently1

� Effect: uncontrolled loss of frames on VCG- FE and GbE have priority on egress

Limited Internal 50

Net Throughput ELS1000(S)

Net Throughput in Mbps vs. Packetsize

0500

1000150020002500300035004000

64 128 256 512 768 1024 1472

ELS1000; EPL and EVPL

ELS1000; EVPLAN

ELS1000S; EPL and EVPL

ELS1000S; EVPLAN

Limited Internal 51

Conclusions based on design experience

• NP programming & configuration is extremely complex and time consuming.

TTM expected benefits not easily achieved

The level of knowledge on the NP building blocks is not detailed enough to predict the effect of a design choices

Strong cooperation with supplier is needed during design

Debug extremely complex

TIM can be prolonged but mantain the know-how is a key issue

Limited Internal 52

Limited Internal 53

Back up slides....

Limited Internal 54

Backplane

Cbus

A/B

ControlData

R1 development

L2C Software building blocks SMA Series 1.2, 3, 4

CM AMSampletaskReporttask

PMCountertask

DD

DATAdomain

Vx2pSOS

PPCBOOT

FlashLayout

NVDB(only SMA)

pSOS-Sim

“Qx-Agent”

CBus termination:CBus-A (AM/PM-SDH), CBus-B (CM, QX)

EMSFP-control

pSOS BSP

MUX-Management (message router, verification, reports)Testing Agent

Commondomain

LED, etc.diverse HW

HAL / DD

CommonMgt.

Framework

BPIF

HAL /DD

HAL

SDHdomai

n

Device Driver

2nd Galazar (only 1664)

Galazar

SDH Manag.

CM AM PM RT-Task

NP-API

EFMOAMinP/ALS

HAL /DD

Network Processor

OAMinF

RMON

EFM

API (Agere)

(GE)

PHY/SFP(FE)

(GE)

HAL /DD (FE)

Bridge(fast path)

Link Agg.

PAUSEPSF

Eth

Bridge(slow path)

Adaption Layer (DBAL) L2S Adaption Layer

STP

/RS

TPM

ST

GV

RP

GM

RP

LAC

P80

2.1X

IGS Q

oS

L2 Stack

MR development

R2 development

Limited Internal 55

Features on Ethernet L2 cardsLayer 2 card for SMA, UC &EX familiy

Galazar STM16Ethernet Mapper

Agere NP APP540for aggregation

� 2x GigE (optical SFP) on card� 4xFE on card� STM4 backplane capacity

� R3.1 April 2005– Release of card HW and SW– EPL according to ITU-T G.8011.1– EVPL according to ITU-T G.8011.2

� R3.1.1 Dec. 2005– Jumbo Frames– Pause and LLF

� Rel 3.2 July 2006– Bridging & Switching:

– VLAN aware bridge according to 802.1Q,– STP & RSTP according to IEEE 802.1D,

– SDWRR scheduling, 4 service queues– Priority handling (IEEE 802.1 p)– EFM of MRV devices, this in compliance to IEEE 803.2ah

Limited Internal 56

APP540 Multistage Pipeline Architecture

Classification&

PolicingStage

BufferManagement(En_Queue)

Stage

TrafficShaping

(De-Queue)Stage

OutputInterface

STAGE#1/2Classification

ETAC(FPL)

ReassemblyBuffer

STAGE#1Statistics

AndPolicing

CE(C-NP)

Packet GeneratorEngine

InputInterface

PDU Memory

PDUAssembler STAGE#5

StreamEditor

CE(C-NP)

ROBSTAGE#3BufferManager

CE(C-NP)

STAGE#4TrafficShaping

CE(C-NP)

SED Context

Receive Interface

ModificationStage

TransmitInterface

C-NP

FPL

C-NP C-NP C-NP

Classification

Subcomponent

Traffic Management

Subcomponent

Limited Internal 57

Block Diagram OMS1664 with LTU

Mgmt. Controller MPC8250

- Framing- Mapping- LO SDH- VCx- LCAS

- HO SDH

BPIF- pSOH- SDM- J2 Trace- Switch

Protection

FPGA3.3 V

DDR SDRAMVirtual

ConcatenationGFP/LAPS/POSVC-12/3/4-nv

SCC

TSA

4 x STM-4 @ 622 MHz

...

PLL

16 x pSTM-1@ 38 MHz

164

4

DDR FCRAMBuffer

Memory

DDR SDRAMQueuing

Linked Lists

Galazar MSF-250

SP

I-3

Control Bus, Alarm Bus, Overhead Bus

SysClockin-band

DC/DC+48 V

PCI

pSTM

SPI-3DP3

DDR FCRAMProgram

DDR FCRAM

SEDParameters

TrafficManager

Cla

ssifi

catio

nP

roce

ssor

Agere APP540Network Proc.

DRA- CDR

- Framing- Mapping- LO SDH- VCx- LCAS

- HO SDHTSA...

SP

I-M

UX

SP

I-3

(1.5..3.3) V

Backplane IFLVDS@ 622 MHz(working)

Backplane IFLVDS @ 622 MHz(protecting)

SP

I-3 16

SysClock

Overhead Bus

MACPort 2+3

MACPort 0+1

SFP

2xSMII

2xGMII

SFP

16 x FastE MDI (PECL) to/from LTU

2 x GbEPHYPHY

MDIs

ALS I²C

2xGMIIMDIs

16 x4

OAMiP

GigE

OAMiP

FastE

�P

CPLD?

Serial Communication Bus to/from LTU

4

PHYPHY

2xSMII

?

Limited Internal 58

Performance test scenario ELS1000SEPL and EVLP Services

ELS

-100

0s

SM

B

FE0

FE1

(HBT4)

ELS-1000s Product Test load testing scenario 1

SMA 16 Series 3 (release 1.3.8 rev2)

FE2

FE3

ELS

-100

0s

SM

B

1A01

5A01

5A02

2A02

2A01

4A01

4A02

(HBT2)

GE1

GE2

GE1

GE2

2A03

2A04

ELS-1000s ELS-1000sSMA-16

SUT200.000kbit/s

200.000kbit/s

2.000.000kbit/s 599.040kbit/s

2xVC-4

2xVC-4

TDM switch

Limited Internal 59

Net / Gross Throughput

� All bits of the frame, including first byte of the Ethernet destination

Address up to the last byte of the CRC determines the “Net Bit Rate”

� For comparing the throughput with percent of wirespeed, the impact of the

gross/net overhead factor is important

� Example: – 80 bytes length (including CRC of 4 bytes); note that some measurement

equipment displays “76 bytes without CRC”.

– Overhead of 8 bytes preamble and a minimum of 12 bytes interframe gap.

– Min. bandwidth required to transmit 80 Mbit/s NetRate of 80 byte packets

is B*L/ (L+20 bytes) = 100 Mbit/s

� Note that the SDH transport via VCs requires an additional overhead due

to the FCS (frame check sequence).