Post on 20-Feb-2019
Limited Internal 2
Summary
� Reference scenario for ALL-IP convergence network
� Methodology design for re-configurable node based on Network Processor
� Activity carried out within CoRiTeL: Data Plane: Implementation of packet aggregator for
Ethernet and ATM
Limited Internal 3
Reference network scenario
Soft evolution towards ALL-IP context
PSTN accessIP BB access
2G/3G
WLAN, WiMaxWireline access
PLMNPSTN
RNCBSC
Transport backbone
MSG
BRAS
MSG
WSN MGW
GGSN
MGW
�Convergence
Multi-service network & nodes requires to support different technologies and services
�Low cost
Co-existence with existing infrastructure
�Flexibility
Limited Internal 4
Flexibility: an example 1/2
Today: ATM support
Future: Ethernet support
Near Future: coexistence of ATM & Ethernet
Evolution step towards new functionality & coexistence with existing ones
Limited Internal 5
Flexibility: an example 2/2
Evolution towards IP basedfunctions is not clearIt depends on type of node
ATM traffic
Ethernet trafffic
ATM switching with QoS
Ethernet switching with QoS
ATM traffic
Ethernet traffic
ATM switching with QoS
Ethernet switching with QoS
Ethernet traffic Ethernet switching with QoS
ATM switching with QoSATM traffic
Limited Internal 6
Low cost key issues:Minimize time to market (TTM)
Maximize time in market (TIM)
Network equipment requirements:High reusability of HW platformHigh re-configurable
Network processorA consistent answer to meet such requirements
Limited Internal 7
Network Processors
• A programmable processor optimized for use in networking devices to provide steady state packet processing functionality
•High flexibility and programmability
•Complex to implement in order to keep TTM and high performance
��������
����
������
Limited Internal 8
Network Processor: a Wide World
� It doesn’t exists THE Network Processor but there is a lot of families with different characteristics.
� It’s possible to classify existing Network Processors in two groups:
1. High programmability but complex design2. Easier design but low level of flexibility (dedicated
hardware)
Limited Internal 9
NP: the same name for different families
AGERE: Very high throughput for switching with QoS
No flexible for implementing proprietary functions
No suitable for dealing with protocols handling
INTEL/ FREESCALE C family port: Flexible for proprietary functions implementation
Complex to implement
Less performing for switching with QoS functions
An example of two NP families
Limited Internal 10
Network Processors
Data Path (OSI-layer 2-3)
OSI-layer 2-7
OSI-layer 2-7
OSI-layer 2-7
Feasible Applications
No flexible for implementing proprietary functions and no suitable for dealing with protocols handling
Switching, routing, policing, scheduling,QoS applications
Low programmability level (dedicated hardware), high performance for QoS applications
Agere
Complex to implement, less performance than Freescale
Switching, routing, policing, scheduling
High programmability, high performance
Intel
Less performance than Freescale
Switching, routing, policing, scheduling
Medium programmability level, high performance
Mindspeed
Complex to implement, less performing for QoS applications
Switching, policing, MPLS, and IMA for wireless application in RBS, RNC( WNI IP and WNI ATM application)
High programmability, high performance
Freescale C-5/C-5e
LimitationsAvailable Applications
Main featuresNetwork Processor
Limited Internal 11
Activity on NP
1. SW design methodology definition and implementation of a case study on a suitable NP platform in order to test such methodology and to provide the necessary expertise on NP design that is required to define automatic process of code generation.
2. SCTP on Freescale C5 NP
3.3. Packet Aggregator with Packet Aggregator with AgereAgere NPNP
Limited Internal 12
SCTP on Freescale C5
� SCTP: Level 4 application � Freescale C5: 16 “not dedicated” processor
DESIGN METHODOLOGY
� SW Development: DFD and UML� Functions mapping on Device
Limited Internal 13
Conceptual SW model (Platform Independent Model)
Embedded system HW Model
MAPPING:Platform dependent Model
Automatic code generation
Implementation
Performance Analysis
UML
XUML
Suitable code for Platform
Aut om
a ti c co de g e ne ra ti on
Methodology for SW design on Embedded systems
Refinements
Limited Internal 14
Methodology for SW design features
•PRO
• HIGH TTM (Performance analysis at system level )
•Implementation is obtained automatically by converting the language description used for mapping into machine code.
•The validation is based on the test benches developed during thesystem simulation phase.
•CONS
•Suitable trade off between accuracy of models and time to provide it
•HW model in case of NP could be very complex and require long time that impact on TTM for the products
Limited Internal 15
Conceptual SW model (Platform Independent Model)
Embedded system HW Model
MAPPING
Automatic code generation
Implementation
UML
XUML
Suitable code for Platform
Aut om
a ti c co de g e ne ra ti on
Methodology for SW design on NP
Refinements
Performance Analysis
NP HW modeling is complex and long
Limited Internal 16
SCTP on Freescale C5
� SCTP: Level 4 application � Freescale C5: 16 “not dedicated” processor
DESIGN METHODOLOGY
� SW Development: DFD and UML� Functions mapping on Device
Limited Internal 18
Agnostic handling of QoS switching
Classification
Policing
Scheduling
Sw
itching
PacketQoS Switching
Limited Internal 19
ATM handling of QoS switching
� Classes of service:� CBR� Rt-VBR� Nrt-VBR 1,2,3� UBR (UBR+)
Classification
Policing
Scheduling
Sw
itchingPacketQoS Switching
CBRrtVBRnrtVBRUBR
Queues for CoS
Limited Internal 20
Ethernet handling of QoS switching
� Classes of service:� Best effort� Background (bulk transfers, games, etc. )� Spare � Excellent Effort (BE for important users) � Controlled Load (important applications)� Video (<100 ms delay)� Voice (<10 ms delay)� Network Control
Classification
Policing
Scheduling
Sw
itching
PacketQoS Switching
Queues for CoS...
Limited Internal 22
��������
�����
�� ���� ���
����������� �
��������
����������� �
�����
������������ ��
��� ������
�������� ����!
��� �������"���"#$�%�&��� '"������"(
�)���
�������� ��!*
���� �!+�,!+�!"
�� '"���"!"(���(,!+�!"
!-�� !�"(�� "
��� !�"(�� "
�� ���� ���
����������� �
����������� �
��("�#,*���(
�(���� �.�-�!+,!+�!"
��� �������(�+(�#�&���!�(��
!�"(!����"#�(%
���-�� !�"(�� "
��� !�"(�� "
������"#$�"(
��������"(��!�+"(
Agere APP300: blocks overview
Limited Internal 24
What is Functional Programming Language?� Rules Based High Level Language
– Designed for high-speed protocol processing– Fast pattern matching of the data stream– Easy-to-understand statement semantics– Dynamic updating of FPL programs – A complete software development tool set
•Two types of functions:
- Control Functions: Executed in the Flow Engine Provide Instruction Flow Static
- Tree Functions: Executed in the Tree Engine Used for Pattern Matching Dynamic
Limited Internal 25
Classification
On-Chip Statistics, Policing, OAM Memory
Statistics, Policing, and OAM Engine
External Classification andScheduling PDU Buffer
Traffic Management
On-Chip SED Context Memory
Buffer Manager Traffic Shaper Stream EditorFirst Pass Second Pass
Policing ScriptOAM Script
Maintain policing statisticsGenerate policing decision
Per flow policing statistics
Statistics generated in classification with FPL
Make drop PDU decision
Manage CoS Queues, Dynamic Scheduler, Programmable Round
Robin Scheduler and Shared Dynamic Scheduler
Modify headers and trailers
Reassemble PDUsProcess Cells
Classify and generateDestination ID
APP300 Programming Language
FPL FPL C-NP C-NP C-NP
C-NP
Maintain OAM statistics
Limited Internal 26
Compute Engine Function� Four Compute Engines:
– Stream Editor (SED)– Buffer Manager, Traffic Manager (TM)– Traffic Shaper (TS)– Policing/Statistics Engine
C-NP� Subset of C� Each script is one
function long� No subroutine calls
Does not support:•Floating point math
•Character constants (strings)•Subscripted variables (arrays)•Reference variables (pointers)
Limited Internal 27
C-NP Script Structure
� A C-NP script is composed of a list of – data declarations followed by the keyword script,
– a name for the script
– a compound statement that contains the body of the script.
� Standard preprocessor directives are available.
Limited Internal 29
Case study on APP300
� Packet Aggregator :� Switching with QoS for ATM cells� Switching with QoS for Ethernet packet� ATM over Ethernet (according to AF-FBATM-0139.001)
Flexibility of the NP to traffic changingPerformance evaluation
Limited Internal 30
ATM Switching with QoS
Classes of service:�CBR�Rt-VBR�Nrt-VBR 1,2,3�UBR (UBR+)
Function to perform:�Classification (VPI/VCI)�Policing (GCRA,..)�Scheduling (Static, WRR…)�Traffic Shaping
Limited Internal 31
Performed steps
� According to supported classes of service, Policing and Classification Engines programming (i.e. definition of tree function in FPL code) low workload
� Basing on Quality of Service requirements, Engine Traffic Management configuration, in terms of:
– Scheduling algorithms implementation (e.g. GCRA)– Queues configuration, i.e. WRR, PRR and so on
HIGH WORKLOAD AND “HEURISTIC” TUNING OF THE DEVICE
Limited Internal 32
Performance comparisons
63.589.8920145014501600ATM+Ethernet
97.21001847190019001900Ethernet
57.388.9825142414241600ATM+FATE+Ethernet
10083.11330133013301600ATM+FATE
89.489.11277142914291600ATM (CBR+VBR+UBR)
ATM (VBR+UBR)
ATM (CBR)
89.3
89.4
FPP
1432
1434
INPUT
1600
1600
INPUT RSPOUTPUTOUTPUT
87.112491432
87.312481434
Throughput(%)
RSP Aggregate rate(Mbits/sec)
FPP Aggregate rate (Mbits/sec)Type of
Traffic
Limited Internal 34
Main Issues and Conclusion � Learning the FPL and C-NP languages� Configuring the NP (in terms of hardware parameters)
� Difficulties in understanding device behaviour� Difficulties in implementing proprietary functions without
impacting on device performance
� ATM over Ethernet: mapping ATM CoS on Ethernet bits; if and how to consider policing
How configuring in the better way the device?How configuring in the better way the device?
Methodology could be usefulMethodology could be useful
Limited Internal 37
An all-network comprehensive portfolioFocused on market growth areas and move to packets
DSLAMResidential
DSLAM
Business
Mobile
OMS1664/ AXX9x00OMS2400
EPE
OMS UC/EXAXX9100
OMS840AXX9100
STM-16/GE
STM-1/4
FE/GESTM-16
STM-16/64/GE/10GE
OMS 1664AXX9300
OMS UC/EX
OMS3200
MHL3000
Business
IP/MPLS
OMS3200
OMS1664/ AXX9300OMS2400
ASTN/OTN core
OMS2400
WDM
Voice Video Data
Mobile Internet
ServiceOn OpticalNetwork management
Customer Access Metro Core
STM-1/4
OMS UC/EXAXX9200 / 9300
OMS UC/EXAXX9200 / 9300
OMS1664/ AXX9300OMS2400
MHL3000
Limited Internal 38
Features on Ethernet L2 cards
Layer 2 card OMS16xx
– EPL according to ITU-T G.8011.1– EVPL according to ITU-T G.8011.2– Bridging & Switching:
� VLAN aware bridge according to 802.1Q,� STP & RSTP according to IEEE 802.1D,
– SDWRR scheduling, 4 service queues– LLF and Pause– Priority handling (IEEE 802.1 p)– Multiple STP (IEEE 802.1s)– EFM of MRV devices, this in compliance to IEEE 803.2ah– Per - VLAN policing
2x Galazar STM16Ethernet Mapper
Agere APP540 forpacket processing
� 2x GigE (optical SFP) on card� supports
– 16x FE interfaces in LTU area– STM16 backplane capacity
Limited Internal 39
Architecture and Modules
TDMswitchTDM
switch
PH
YP
HY
Framer(GFP,
VCG, LCAS)
Framer(GFP,
VCG, LCAS)
SDHline
interface
SDHline
interface
GbE
FE
ELS 1000(S)
PacketSwitching
Engine
PacketSwitching
Engine
SMAOMS1664
Function already available on Ethernet mapper productsFunction already available on Ethernet mapper products
New functionality of layer 2 cards “ELS1000(S)”New functionality of layer 2 cards “ELS1000(S)”
Limited Internal 40
APP 540
SW Architecture – Overview
HALand
Drivers
Data Plane (NP-API Layer)
Stack Adaptation Layer PDU FwdConfiguration
CFA
RSTP/MSTP
Learning & Ageing
GARP
L2 Stack
Qx Manager
Control Plane
CPAPI
Layer
CP-APIVLAN
L2-IW
F
PDU-APIFS-HW-API
ManagementPlane
PduInterfaceNpApi-Adaptation
SDH framerSFP(FE)
Qx Manager
SDHDomain
Eth Classificat.
Policing
Scheduling
Sw
itching
Limited Internal 41
1
0
interface
Str
ict P
rio
2
High Priority QueuePer service SDWRR
Expedited Forwarding Queue
Assured Forwarding Queue
w_2
w_3
Best Effort Queue
Less than Best Effort Queue
w_4
Policer
w_1 (readonly)
Bridge
interface
Policer
interface
Policer
interface
Policer
TQ D
emux
interface
IngressPorts EgressPort
BDPUsfrom host
InterfaceRate R
SD
WR
R
low
high
Example 1: scheduling configuration on APP540
Limited Internal 42
Example 1: scheduling configuration
•XML configuration to define number of queues and scheduling policies
• Mapping of services on available queues
• Dynamic behaviour is also depending on NP HW costraints (i.e. NP port priorities, port rates, port manager fixed/not fixed priorities)
Limited Internal 43
PM 1
PORT MANAGERS(256 Total)
DataPort 0
PM 3
LOGICAL PORTS(512 Total)
LP 1
SCHEDULERS(1024 Total)
(Max 4 per LP, from 3 types)
PHYSICALPORTS
PM 256
LP 2
LP 3
LP 512
Static
QUEUES(2k Total)
(Max 16 CoS per QoS)
8-bit Configurable
QoS
CoS
CoS
QoS
QoS
QoS
QoS
QoS Static
Dynamic
PRR
32-bitI/F
PM 2Data
Port 18-bit Configurable
DataPort 2
8-bit Configurable
DataPort 3
8-bit Configurable
DataPort 4
8/16-bit Configurable
HostPort
Dynamic
Dynamic
PRR
Dynamic
16-bitI/F
CoS
CoS
CoS
CoS
Example 1: scheduling configuration
Example 2: throughput measuraments
�the following throughput limits refer to Net Bitrate
�the displayed throughput limit is defined as the point of first packetloss; note
that burstiness of the traffic will lead to a lower throughput limit
�the values are derived from measurements with equal utilization on all
interfaces.
�There are several internal throughput limits (rate limit based on max. packet
(64 Byte granularity) throughput, classification limit, scheduling limit). It
depends on the traffic type and packet size, which of the limits cuts in first.
The “white box” test will show the “envelope” of all limits.
Limited Internal 45
Performance test scenario (evpl & evplan service)
ELS
-100
0s
FE0
FE1
(HBT4)
FE15
FE16
ELS
-100
0s
SM
B
5A01
5A02
2A02
2A01
(HBT2)
GE1
GE2
2A03
2A04
ELS-1000 ELS-1000SMA-1664
SUT
VC-4--7v
TDM switch
max 90%SMB
max 95%
VC-3-2v
max 45%VC-3-1v
Limited Internal 46
Classification Performance
0
500
1000
1500
2000
2500
3000
3500
0 200 400 600 800 1000 1200 1400 1600
frame size / byte
data
rat
e / M
bit/s
Measured Classification Performance best case classification performance maximum arriving traffic
SMA4 - 133 MHz
Problem for frames ofsize 65 .. 100
Limited Internal 47
Classification problem summary� Interfaces provide 3 Gbit/s gross traffic (4FE, 2GbE, 600M VCG)� Requirement: classify all frames which can arrive at all interfaces� Classification performance is below maximum arriving traffic if
– frames of size 65 ...100 byte arrive AND– on all interfaces concurrently AND– at a load > 73%
� Effect: uncontrolled loss of frames may happen only on GbE– VCGs have priority on ingress
Limited Internal 48
Throughput Performance
0
500
1000
1500
2000
2500
3000
0 200 400 600 800 1000 1200 1400 1600
frame size / byte
data
rat
e / M
bit/s
Measured Throughput calculated worst case throughput traffic to be transported over backplane maximum arriving traffic
SMA4 - 133 MHz
Problem may be presentfor frames of size 65 .. 88
no local switching
Limited Internal 49
Throughput problem summary� Interfaces provide 3 Gbit/s gross traffic (4FE, 2GbE, 600M VCG)
� Requirement: gross throughput >=backplane capacity- 600 Mbit/s bidirectional = 1.2 Gbit/s unidirectional (gross rate)
� Reality: the throughput may be below backplane capacity,in the worst case 827 Mbit/s, but only if:
- frames of size 65 ...88 byte are served AND- on all interfaces concurrently1
� Effect: uncontrolled loss of frames on VCG- FE and GbE have priority on egress
Limited Internal 50
Net Throughput ELS1000(S)
Net Throughput in Mbps vs. Packetsize
0500
1000150020002500300035004000
64 128 256 512 768 1024 1472
ELS1000; EPL and EVPL
ELS1000; EVPLAN
ELS1000S; EPL and EVPL
ELS1000S; EVPLAN
Limited Internal 51
Conclusions based on design experience
• NP programming & configuration is extremely complex and time consuming.
TTM expected benefits not easily achieved
The level of knowledge on the NP building blocks is not detailed enough to predict the effect of a design choices
Strong cooperation with supplier is needed during design
Debug extremely complex
TIM can be prolonged but mantain the know-how is a key issue
Limited Internal 54
Backplane
Cbus
A/B
ControlData
R1 development
L2C Software building blocks SMA Series 1.2, 3, 4
CM AMSampletaskReporttask
PMCountertask
DD
DATAdomain
Vx2pSOS
PPCBOOT
FlashLayout
NVDB(only SMA)
pSOS-Sim
“Qx-Agent”
CBus termination:CBus-A (AM/PM-SDH), CBus-B (CM, QX)
EMSFP-control
pSOS BSP
MUX-Management (message router, verification, reports)Testing Agent
Commondomain
LED, etc.diverse HW
HAL / DD
CommonMgt.
Framework
BPIF
HAL /DD
HAL
SDHdomai
n
Device Driver
2nd Galazar (only 1664)
Galazar
SDH Manag.
CM AM PM RT-Task
NP-API
EFMOAMinP/ALS
HAL /DD
Network Processor
OAMinF
RMON
EFM
API (Agere)
(GE)
PHY/SFP(FE)
(GE)
HAL /DD (FE)
Bridge(fast path)
Link Agg.
PAUSEPSF
Eth
Bridge(slow path)
Adaption Layer (DBAL) L2S Adaption Layer
STP
/RS
TPM
ST
GV
RP
GM
RP
LAC
P80
2.1X
IGS Q
oS
L2 Stack
MR development
R2 development
Limited Internal 55
Features on Ethernet L2 cardsLayer 2 card for SMA, UC &EX familiy
Galazar STM16Ethernet Mapper
Agere NP APP540for aggregation
� 2x GigE (optical SFP) on card� 4xFE on card� STM4 backplane capacity
� R3.1 April 2005– Release of card HW and SW– EPL according to ITU-T G.8011.1– EVPL according to ITU-T G.8011.2
� R3.1.1 Dec. 2005– Jumbo Frames– Pause and LLF
� Rel 3.2 July 2006– Bridging & Switching:
– VLAN aware bridge according to 802.1Q,– STP & RSTP according to IEEE 802.1D,
– SDWRR scheduling, 4 service queues– Priority handling (IEEE 802.1 p)– EFM of MRV devices, this in compliance to IEEE 803.2ah
Limited Internal 56
APP540 Multistage Pipeline Architecture
Classification&
PolicingStage
BufferManagement(En_Queue)
Stage
TrafficShaping
(De-Queue)Stage
OutputInterface
STAGE#1/2Classification
ETAC(FPL)
ReassemblyBuffer
STAGE#1Statistics
AndPolicing
CE(C-NP)
Packet GeneratorEngine
InputInterface
PDU Memory
PDUAssembler STAGE#5
StreamEditor
CE(C-NP)
ROBSTAGE#3BufferManager
CE(C-NP)
STAGE#4TrafficShaping
CE(C-NP)
SED Context
Receive Interface
ModificationStage
TransmitInterface
C-NP
FPL
C-NP C-NP C-NP
Classification
Subcomponent
Traffic Management
Subcomponent
Limited Internal 57
Block Diagram OMS1664 with LTU
Mgmt. Controller MPC8250
- Framing- Mapping- LO SDH- VCx- LCAS
- HO SDH
BPIF- pSOH- SDM- J2 Trace- Switch
Protection
FPGA3.3 V
DDR SDRAMVirtual
ConcatenationGFP/LAPS/POSVC-12/3/4-nv
SCC
TSA
4 x STM-4 @ 622 MHz
...
PLL
16 x pSTM-1@ 38 MHz
164
4
DDR FCRAMBuffer
Memory
DDR SDRAMQueuing
Linked Lists
Galazar MSF-250
SP
I-3
Control Bus, Alarm Bus, Overhead Bus
SysClockin-band
DC/DC+48 V
PCI
pSTM
SPI-3DP3
DDR FCRAMProgram
DDR FCRAM
SEDParameters
TrafficManager
Cla
ssifi
catio
nP
roce
ssor
Agere APP540Network Proc.
DRA- CDR
- Framing- Mapping- LO SDH- VCx- LCAS
- HO SDHTSA...
SP
I-M
UX
SP
I-3
(1.5..3.3) V
Backplane IFLVDS@ 622 MHz(working)
Backplane IFLVDS @ 622 MHz(protecting)
SP
I-3 16
SysClock
Overhead Bus
MACPort 2+3
MACPort 0+1
SFP
2xSMII
2xGMII
SFP
16 x FastE MDI (PECL) to/from LTU
2 x GbEPHYPHY
MDIs
ALS I²C
2xGMIIMDIs
16 x4
OAMiP
GigE
OAMiP
FastE
�P
CPLD?
Serial Communication Bus to/from LTU
4
PHYPHY
2xSMII
?
Limited Internal 58
Performance test scenario ELS1000SEPL and EVLP Services
ELS
-100
0s
SM
B
FE0
FE1
(HBT4)
ELS-1000s Product Test load testing scenario 1
SMA 16 Series 3 (release 1.3.8 rev2)
FE2
FE3
ELS
-100
0s
SM
B
1A01
5A01
5A02
2A02
2A01
4A01
4A02
(HBT2)
GE1
GE2
GE1
GE2
2A03
2A04
ELS-1000s ELS-1000sSMA-16
SUT200.000kbit/s
200.000kbit/s
2.000.000kbit/s 599.040kbit/s
2xVC-4
2xVC-4
TDM switch
Limited Internal 59
Net / Gross Throughput
� All bits of the frame, including first byte of the Ethernet destination
Address up to the last byte of the CRC determines the “Net Bit Rate”
� For comparing the throughput with percent of wirespeed, the impact of the
gross/net overhead factor is important
� Example: – 80 bytes length (including CRC of 4 bytes); note that some measurement
equipment displays “76 bytes without CRC”.
– Overhead of 8 bytes preamble and a minimum of 12 bytes interframe gap.
– Min. bandwidth required to transmit 80 Mbit/s NetRate of 80 byte packets
is B*L/ (L+20 bytes) = 100 Mbit/s
� Note that the SDH transport via VCs requires an additional overhead due
to the FCS (frame check sequence).