S.Veneziano, 2010, Roma
ASIC designTecnologie disponibili, flusso di progettazione.
Panoramica dei linguaggi utilizzati e librerie. Specifiche, partizionamento del progetto. Simulazione e
“testbench”. Sintesi logica. Vincoli di progetto. Analisi statica dei tempi, “floorplan” e vincoli alla sintesi fisica.
Piazzamento. Routing. Estrazione di resistenze e capacita’ parassite.
1
Friday, January 22, 2010
S.Veneziano, gennaio 2010, Roma
Design domains
• una sola gerarchia non e’ sufficiente a descrivere il problema. Si distinguono tre domini:
• dominio comportamentale
• dominio strutturale
• dominio fisico
BEHAVIORAL DOMAIN
PHYSICAL DOMAIN
Physical partitions
Floorplans
Module layout
Cell layout
Transistor layout
Systems
Algorithms
Register transfers
Logic
Transfer functions
Processors
ALU’s, RAM, etc.
Gates, flip-flops, etc.
Transistors
STRUCTURAL DOMAIN
2
Friday, January 22, 2010
S.Veneziano, gennaio 2010, Roma
Design actions• e’ possibile descrivere una
metodologia di progetto descrivendo le transizioni tra i tre domini
• le transizioni tra i domini o all’interno di uno stesso dominio sono di vari tipi, e fanno uso di diversi strumenti di:• sintesi (correct by
construction o verificati)• verifica• ottimizzazione• design management
BEHAVIORAL DOMAIN
PHYSICAL DOMAIN
Physical partitions
Floorplans
Module layout
Cell layout
Transistor layout
Systems
Algorithms
Register transfers
Logic
Transfer functions
Processors
ALU’s, RAM, etc.
Gates, flip-flops, etc.
Transistors
STRUCTURAL DOMAIN
3
Friday, January 22, 2010
S.Veneziano, gennaio 2010, Roma
Design automation tools
• algorithmic and system design
• structural and logic design• transistor-level design• layout design• verification• design management• le soluzioni CMOS
disponibili oggi si differenziano per l’uso di un diverso set di tools di design automation
BEHAVIORAL DOMAIN
PHYSICAL DOMAIN
STRUCTURAL DOMAIN
Algorithmic and system design
Structural and logic design
Transistor-level design
Layout design
4
Friday, January 22, 2010
S.Veneziano, gennaio 2010, Roma
verification methods
• prototyping
• simulation
• formal verification (specification vs implementation)
5
Friday, January 22, 2010
S.Veneziano, gennaio 2010, Roma
Technologies and design approach
• soluzioni CMOS disponibili oggi:
• Full custom• Standard cell• Structured ASIC• Embedded array• Gate Array• Field Programmable Gate Array
• Hanno in comune la tecnologia della singola cella ma cambia il numero di maschere dedicate e il flusso di progettazione
6
Friday, January 22, 2010
S.Veneziano, gennaio 2010, Roma
Full custom designs• Wikipedia:
• Full-Custom ASIC Design defines all the photo lithographic layers of the device. Full Custom Design is used for both ASIC design and for Standard Product design.
•The benefits of Full custom Design usually include reduced area (and therefore recurring component cost), performance improvements and also the ability to integrate (include) analog components and other pre-designed (and thus fully verified) components such as microprocesser cores etc that form a System-On-Chip.
•The disadvantages of Full-Custom can include increased manufacturing and design time, increased non-recurring engineering costs, more complexity in the Computer Aided Design (CAD) system and a much higher skill requirement on the part of the design team.
• La progettazione full custom e’ ancora utilizzata e necessaria per la realizzazione di celle di librerie standard, memorie, parti analogiche, celle IO, PLL.
7
Friday, January 22, 2010
S.Veneziano, gennaio 2010, Roma
Progetti full-custom
• in un progetto full-custom si fa uso di:
• layout editors (symbolic editor+compactors)
• design rule checkers
• circuit extractors (parasitics)
36 IC LAYOUT Chapter 3
Design-Rule Checking
Design rules were introduced in Chapter 2 as a set of layout restrictions that ensure the
manufactured design will operate as desired with no short or open circuits. A prime
requirement of the physical layout of a design is that it adhere to these rules. This can be
verified with the aid of a design-rule checker (DRC), which uses as inputs the physical
layout of a design and a description of the design rules presented in the form of a technol-
ogy file. Since a complex circuit can contain millions of polygons that must be checked
against each other, efficiency is the most important property of a good DRC tool. The ver-
ification of a large chip can take hours or days of computation time. One way of expedit-
ing the process is to preserve the design hierarchy at the physical level. For instance, if a
cell is used multiple times in a design, it should be checked only once. Besides speeding
up the process, the use of hierarchy can make error messages more informative by retain-
ing knowledge of the circuit structure.
DRC tools come in two formats: (1) The on-line DRC runs concurrent with the lay-
out editor and flags design violations during the cell layout. For instance, max has a built-
in design-rule checking facility. An example of on-line DRC is shown in Figure 3.4. (2)
Batch DRC is used as a post-design verifier, and is run on a complete chip prior to ship-
ping the mask descriptions to the manufacturer.
Circuit Extraction
Another important tool in the custom-design methodology is the circuit extractor, which
derives a circuit schematic from a physical layout. By scanning the various layers and
their interactions, the extractor reconstructs the transistor network, including the sizes of
the devices and the interconnections. The schematic produced can be used to verify that
the artwork implements the intended function. Furthermore, the resulting circuit diagram
contains precise information on the parasitics, such as the diffusion and wiring capaci-
tances and resistances. This allows for a more accurate simulation and analysis. The com-
plexity of the extraction depends greatly upon the desired information. Most extractors
extract the transistor network and the capacitances of the interconnect with respect to
GND or other network nodes. Extraction of the wiring resistances already comes at a
Figure 3.4 On-line design rule checking. The white dots
indicate a design rule violation. The violated rule can be
obtained with a simple mouse click.
poly_not_fet to all_diff minimum spacing = 0.14 um.
DMIA.fm Page 36 Monday, September 4, 2000 11:19 AM
Section 35
wires). The absolute coordinates of these elements are determined automatically by the
editor using a compactor [Hsueh79, Weste93]. The compactor translates the design rules
into a set of constraints on the component positions, and solves a constrained optimization
problem that attempts to minimize the area or another cost function.
An example of a symbolic notation for a circuit topology, called a sticks diagram, is
shown in Figure 3.2. The different layout entities are dimensionless, since only position-
ing is important. The advantage of this approach is that the designer does not have to
worry about design rules, because the compactor ensures that the final layout is physically
correct. Thus, she can avoid cumbersome polygon manipulations. Another plus of the
symbolic approach is that cells can adjust themselves automatically to the environment.
For example, automatic pitch-matching of cells is an attractive feature in module genera-
tors. Consider the case of Figure 3.3 (from [Croes88]), in which the original cells have dif-
ferent heights, and the terminal positions do not match. Connecting the cells would require
extra wiring. The symbolic approach allows the cells to adjust themselves and connect
without any overhead.
The disadvantage of the symbolic approach is that the outcome of the compaction
phase is often unpredictable. The resulting layout can be less dense than what is obtained
with the manual approach. Notwithstanding, symbolic layout tools have improved consid-
erably over the years and are currently a part of the mainstream design process.
Figure 3.2 Sticks representation of CMOS inverter. The numbers
represent the (Width/Length)-ratios of the transistors.
1
3
In Out
VDD
GND
Figure 3.3 Automatic pitch matching of datapath cells based on symbolic layout.
BEFORE
AFTER
DMIA.fm Page 35 Monday, September 4, 2000 11:19 AM
8
Friday, January 22, 2010
S.Veneziano, gennaio 2010, Roma
Design rules
• Lambda=0.25 um per un processo 0.5um
9
MOSIS SCMOS - Poly http://www.mosis.org/Technical/Designrules/scmos/scmos-poly.html
1 of 1 01/01/2005 16:13
SCMOS Layout Rules - Poly
Rule DescriptionLambda
SCMOS SUBM DEEP
3.1 Minimum width 2 2 2
3.2 Minimum spacing over field 2 3 3
3.2.a Minimum spacing over active 2 3 4
3.3 Minimum gate extension of active 2 2 2.5
3.4 Minimum active extension of poly 3 3 4
3.5 Minimum field poly to active 1 1 1
Friday, January 22, 2010
S.Veneziano, gennaio 2010, Roma
Standard cell design
• utilizziamo strumenti di piazzamento e routing, a partire da librerie di celle (gate-level)
10
Friday, January 22, 2010
S.Veneziano, gennaio 2010, Roma
Macrocelle e compilatori
• utilizzati nella progettazione standard cell
The eSi-RAM/2P two-port register file compiler is a high performance
memory solution for embedded applications. "What-if" scenarios are easy to
explore using the eSi-RAM/2P SRAM compiler helping designers to
perform optimal floorplanning of their designs.
Typical applications include register files, buffer memories, fall-through
FIFO’s, palette RAM’s, tag RAM etc. Wide bits make the memory very
suitable for address matching of physical and virtual addresses. Low area
overhead of decode-logic and sense-amplifiers are the salient features of the
two port compilers.
Key FeaturesKey Features! High performance – 650MHz for a 128x64 array! Synchronous read/write operations
! Dedicated read and write address ports
! Ports may be operated synchronously or independently
! Data retention at low voltage
! Zero standby current
! 3-State output buffers
! Byte-write control for selective data write! Ability to compile to multiple aspect ratios
! Interface to industry standard BIST controllers
Features Capability
Maximum Macro Size 64K
Maximum Words 1024
Minimum Words 8
Word Increment 2 * column mux
Maximum Data Width 128 bits
Minimum Data Width 4 bits
Data Width Increment 1 bit
Column Mux 1, 2, 4
Synthesis .libs (3 corners)
Behavioral Verilog, VITAL
Static Timing .libs
Place and Route LEF
Physical Views GDSII
Netlist HSPICE
Data Sheets Postscript, HTML
BenefitsBenefits
! High performance registerfiles
! Dedicated read & writeports operating withindependent clocks
! Java based compilerenhances ease-of-use and
automates EDA viewgeneration
! Accurate EDA modelingfor design integration
Data Preliminary Subject to Change Rev. 3.0
Copyright © 2000 Virtual Silicon Technology
All rights reserved
0.180.18mm eSiliconeSilicon Embedded Components Embedded Components
eSi-RAM/2P!!Two Port Register Compiler
0.18µµ L180 1.8V
High Performance Process
Silicon ReadySilicon Ready
! Engineered to foundry’sexact process for optimum
density and performance
! Proven test chips insure firsttime silicon success
! High resolution EDA modelsassure rapid timingconvergence
EDA VIEWS SUPPORT
0.18µµ L180, 1.8V
High Performance Process
In order to take advantage of the high intrinsic speed of advanced silicon
technologies, off-chip external clocks need to be multiplied to obtain high
on-chip frequencies. Many system-on-chip designs require multiple
application frequencies and therefore multiple PLL (Phase Locked Loop)
instances. The eSi-PLL is a compiler-based technology that allows systemdesigners to choose the center frequency, duty cycle, and create a
customized PLL. The PLL is created to be fully modular and embedded in
the I/O pad ring with no area overhead on the core. This also offers
complete modularity, eliminating manual wiring of analog power supply.
The PLL is best suited for applications such as frequency multiplication,
multi-phase clock generation, clock tree delay cancellation, clock
regeneration functions, duty cycle correction and jitter removal. The PLL
offers very high stability and good noise rejection characteristics, low power
consumption requiring no additional band-gaps or external components.
Key FeaturesKey Features! Compiler technology enhances ease-of-use! User provides 3 parameters – input, output frequencies and duty cycle
! Custom PLL has zero area overhead on core logic
! PLL module entirely located in the I/O pad rings
! Uses two dedicated analog power supply pine (AVDD, AVSS)
! Built-in ESD and latch-up protection structures
! Integrated loop filter and oscillator
! Integrated dividers are automatically programmed for duty cycle andinput clock division.
Parameters Capability
Input Frequency Range 14MHz to 200 MHz
Output Frequency Range 40MHz to 800MHz
Duty Cycle Range 33%, 50%, 66%
Max. long term jitter < 1% or 100ps
Synthesis (Black Box) .libs (3 corners)
Behavioral Verilog, VITAL
Static Timing .libs, TLF
Place and Route LEF, Frame-GDSII
Physical Views GDSII
Netlist HSPICE
Data Sheets PDF, HTML
BenefitsBenefits
! Wide frequency ranges to
meet multiple applicationrequirements
! Compiler technologygenerates customized PLL
! Module located inside I/Opad ring reduces die area
! Integrated loop filterensures high stability andnoise immunity
! Low Power
Data Preliminary Subject to Change Rev.3..0
Copyright © 2000 Virtual Silicon Technology
All rights reserved
0.180.18mm eSiliconeSilicon Embedded Components Embedded Components
eSi-PLL!!Phase Locked Loop Compiler
Silicon ReadySilicon Ready
! Engineered to foundry’sexact process for optimum
density and performance
! Proven test chips insure firsttime silicon success
! Ultra-high resolution of EDAmodels assures rapid timingconvergence
EDA Views Support
11
Friday, January 22, 2010
S.Veneziano, gennaio 2010, Roma
Standard cell layout
routing area
routing arearout
ing
area
routing area
12
standardcell
Friday, January 22, 2010
S.Veneziano, gennaio 2010, Roma
Gate array designs
• 2-4 layers di metallo dedicate al routing
• le dimensioni del die e il pincount sono predeterminati
Strong Technical SupportClose customer support provided by
NEC Electronics' global sales network.
Stable Delivery !Efficient production line operation
!Small-lot orders also delivered quickly
No.1 global
share since 1993Note
with
shipments exceeding
10,000 codes
Gate arrays are used to create a specific LSI by executing routing on a master wafer on which gates have been pre-placed in an
array formation.
The most beneficial feature of gate arrays is that a chip with the transistors embedded already exists, so only the routing process
needs to be carried out in order to realize (customize) the circuit. This means that a device can be developed much faster than a
full-custom or cell-based IC.
Products known as embedded arrays are also included in the gate array category. Embedded arrays are high-performance semi-
custom LSIs that use basic gate array cells in the internal area, but that feature embedded memories and cores of the same type
as cell-based ICs. When creating circuits with high-capacity memories, embedded arrays are the better option in terms of unit
price, but development will cost more in order to build the embedded array into the base wafer like cell-based ICs.
Don't hesitate to contact NEC Electronics for a quote to determine whether a gate array or embedded array is the better choice for
your design.
The Gate Array solution you don't want
No.1 global
share since 1993Note
with
shipments exceeding
10,000 codes
Extensive Product Lineup!Usable gates: 1.5 K to 1.5 M
!Power supply voltage: 5 V to 1.8 V
!Process: 0.5 m to 0.25 mµ µ
Note Source: Gartner Dataquest (April 2003)
2 Pamphlet A16841EJ2V0PF
"Low Cost" and "Time-to-Market" are the key demands that drive new
Now, master-slice-type products such as gate arrays have re-emerged as a
Your rivals are using gate array products. You can gain the edge by che
NEC
Product outlines
Basic Performance
PLL Macro Lineup
High-Speed I/O Lineup
Remark !: Not supported
µ
Parameter
Process
Wiring layers
Power supply voltage
Operating frequency
Usable gates
Package pin count
CMOS-N5
0.5 m
2-layer
2.7 to 5.5 V
5 V:60 MHz
3 V:33 MHz
1536 to 92538
20 to 304
EA-9HD
0.35 m
3/4-layer
2.7 to 3.6 V
100 MHz
9712 to 1386739
48 to 696
CMOS-10HD
0.25 m
3/4-layer
2.5/3.3 V
1.8/3.3 V
2.5 V:133 MHz
1.8 V:66 MHz
38143 to 1563458
44 to 680
Remark All series come complete with CTS and oscillation blocks. Scan path and boundary scan tests are also supported.
Note The input frequency is restricted to the range of 45 MHz " input frequency # internal multiplication factor setting " 100
MHz. Therefore, the minimum input frequency of approximately 6 MHz is supported only when the internal multiplication
factor is set to 8, and the maximum input frequency of 100 MHz is supported only when the internal multiplication factor is
set to 1.
Remark !: Not supported
Item
Skew adjustment DPLL
(F9E4)
Multiplication DPLL (F9H2)
Multiplication DPLL (F9H3)
Analog PLL
PLL for EMI noise
reduction:SSCG macro
CMOS-N5
!
!
!
!
!
CMOS-9HD
33 to 80
33 to 80
25 to 100
(Customer-specific support)
!
6 to 100Note
EA-9HD
33 to 80
33 to 80
25 to 100
(Customer-specific support)
4 to 160
6 to 100Note
CMOS-10HD
2.5 V: 33 to 133
1.8 V: 25 to 66
!
2.5 V: 33 to 133
1.8 V: 25 to 66
!
Under development
Input frequency (Unit: MHz)
Input/output frequency (Unit: MHz)
Item
PCI
GTL+
PECL
SSTL2
SSTL3
LVDS
CMOS-N5
VDD = 5.0 V
!
!
!
!
!
!
CMOS-9HD
VDD = 3.3 V
33/33
100/100
156/!
!
!
!
EA-9HD
VDD = 3.3 V
33/33
100/100
156/!
!
!
!
CMOS-10HD
VDD = 2.5 V
66/66
250/100
200/!
250/200
200/200
200/156
CMOS-9HD
0.35 m
3/4-layer
2.7 to 3.6 V
100 MHz
11207 to 1505570
48 to 696
µ µ µ
6 Pamphlet A16841EJ2V0PF
13
Friday, January 22, 2010
S.Veneziano, gennaio 2010, Roma
Embedded arrays, Structured ASICs
• come nei gate-arrays le maschere di diffusione sono comuni a tutti i progetti
• memorie, dll, microprocessori etc sono prediffusi.
• Migliori prestazioni rispetto ai gate array14
Friday, January 22, 2010
S.Veneziano, gennaio 2010, Roma
Metal layers
• i layers di metallo utilizzabili per le connessioni tra celle sono ridotti rispetto ad un progetto custom
• flessibilita’ sulle prestazioni (high-speed, low-power)
0.18µm ASIC Technology
Features
: SC82 and EA82 series
Overview
EA82 Embedded Array series is based on a 2P and 2N-
transistor basic cells (BC). It supports high-speed RAMs, ROMs,
CAM, mixed-signal macros, and a variety of other embedded
functions.
The EA82 series offers density (over 23 million total gates) and
provides the time-to-market advantage of gate arrays.
This series of devices include 44µm, 66µm, or 88µm pad pitch for
a cost-effective solution for both pad-limited and core-limited
designs.
With a nominal 1.1V to 1.8V core operation and with 2.5V
and 3.3Vand 5V tolerant I/Os, the EA82 series features a very
low-power consumption of 0.8 nW/gate/MHz.
Clk I/O
Hard
Macro
RAMs
SC82 Standard Cell library is optimized for synthesis-based
designs, and is designed for low power by improving transistor
count and minimizing crowbar current. Using optimized gates
instead of stacked transistors further minimizes power consump-
tion.
The core process operates at 1.1V to 1.8V with I/Os operating at
2.5V, 3.3V or 5V tolerant conditions. The library supports the
most popular third-party tools and data exchange file standards.
As with EA82, both standard and staggered I/O pad configura-
tions are supported. Interface options include low-swing, high-
speed I/Os and high-speed bus interface I/Os.
In addition to the traditional QFP packages, the SC82 family is
available in Ball Grid Array and Flip Chip packages.
CS81 offers a rich set of ADCs and DACs, digital and ana-log
PLLs, high-speed RAMs, ROMs and DRAMs, as well as
a variety of other embedded functions.
Standard Cell
(High Speed)
IP/Macro
(High Speed)
SRAM
Standard Cell
(Low Power)
IP/Macro
(Low Power)
Design Methodology
Fujitsu’s design methodology ensures first-silicon success by
integrating proprietary point tools with the most popular,
sign-off quality, industry-standard CAD tools. The following
are among these tools:
• Simultaneous Switching Output analysis
• Logic design rule checker
• 0.13µm effective channel length
• Up to 28 million gates per chip
• 3 to 5 layers of metal interconnects
• Very high density: 110K raw gates/mm 2
• 5 nW/gate/MHz power dissipation at 1.1V
• Core supply voltage: 1.1V to 1.8V
• 11 ps gate delay at 1.8V and 1 fan-out
• Junction temperation: -40 to 125oC
• I/Os: 3.3V, 2.5V, 1.8V, 5V tolerant
• High density diffused RAM, ROM, CAM
• High speed mixed signal macros, APLL
• Comprehensive list of high speed IO’s and popular standard
bus interface
• IPWareTM Soft IP and ARM7TDMI macro
• In-line and staggered IO in various pitches
• Advanced packaging support
• Proven design methodology and tool support
1.8V CMOS
2.5V CMOS
3.3V CMOS
5.0V TTL
1.8V Device
2.5V Device
3.3V Device
5.0V Device
1.8VCMOS
2.5VCMOS
3.3VCMOS
5.0VTolerant
PCI IEEE1394 USB
ADC/DAC
Cx81 Core
( 1.8V )
T-LVTTLP-CMLLVDSSDRAM I/FSSTLHSTLGTL +
High-speedInterface
High-speedDevices
AnalogInterface
PCI Bus IEEE1394 USB Devices
15
Friday, January 22, 2010
S.Veneziano, gennaio 2010, Roma
SOC architettureT Y P I C A L S O C D I E L A Y O U T
Embedded Microcontroller Coresfor system control and application support
MCU Peripheralscompatible with the MCU core
SRAM Cache Memoryfor rapid access to operational data
Industry-standard Interfacesfor networking and communication
Analog Interfacing and Processingfor direct connection to external devices
EEPROMData Memorygives instant
access to data
Flash ProgramMemory
eliminates the costand delay of ROM
code changes
EmbeddedDSP Core
for high-bandwidthsignal processing
Customer-specificLogic
that makesthe SoC unique
4B24 Silicon City 12 pg.xp 16/04/04 15:33 Page 10
16
Friday, January 22, 2010
S.Veneziano, gennaio 2010, Roma
MOSIS 0.13 um MOSIS WAFER ACCEPTANCE TESTS RUN: T59M (8RF_8LM_DM) VENDOR: IBM-BURLINGTON TECHNOLOGY: SCN013 FEATURE SIZE: 0.13 microns Run type: SKD
INTRODUCTION: This report contains the lot average results obtained by MOSIS from measurements of MOSIS test structures on each wafer of this fabrication lot. SPICE parameters obtained from similar measurements on a selected wafer are also attached.
COMMENTS: 8RF_IBM-BURLIN
TRANSISTOR PARAMETERS W/L N-CHANNEL P-CHANNEL UNITS MINIMUM 0.16/0.12 Vth 0.41 -0.45 volts SHORT 20.0/0.12 Idss 462 -179 uA/um Vth 0.45 -0.45 volts Vpt 3.0 -3.9 volts WIDE 20.0/0.12 Ids0 277.7 -91.4 pA/um LARGE 50/50 Vth 0.11 -0.22 volts Vjbkd 2.6 -2.9 volts Ijlk <50.0 <50.0 pA Gamma 0.27 0.23 V^0.5 K' (Uo*Cox/2) 289.0 -49.3 uA/V^2 Low-field Mobility 535.64 91.37 cm^2/V*s
PROCESS PARAMETERS N+ P+ POLY M1 M2 M3 M4 UNITS Sheet Resistance 3.6 7.4 7.0 0.13 0.10 0.12 0.06 ohms/sq Contact Resistance 13.1 15.4 9.9 0.77 1.40 1.72 ohms Gate Oxide Thickness 32 angstrom PROCESS PARAMETERS M5 N+BLK PPLY+BLK M8 M6 POLY_NON TaN M7 N_W UNITS Sheet Resistance 0.06 75.7 328.2 0.01 0.09 1537.9 61.4 0.01 553 ohms/sq Contact Resistance 1.93 2.72 2.25 2.41 ohms COMMENTS: BLK is silicide block.
18
Friday, January 22, 2010
S.Veneziano, gennaio 2010, Roma
Elementi parassiti
• C, R
p-tub
poly poly
n+n+
metal 1
metal 3
metal 2
vias
19
Friday, January 22, 2010
S.Veneziano, gennaio 2010, Roma
Diffusioni
• formati dalle giunzioni p e n• n-type (0.5 μm process):
• bottomwall: 0.6 fF/μm2
• sidewall: 0.2 fF/μm
• p-type:
• bottomwall: 0.9 fF/μm2
• sidewall: 0.3 fF/μm
n+ (ND)
depletion region
substrate (NA)bottomwallcapacitance
sidewallcapacitances
20
Friday, January 22, 2010
S.Veneziano, gennaio 2010, Roma
wire-wire capacitance
• sullo stesso layer o su layers adiacenti
metal 2
metal 1 metal 1
21
Friday, January 22, 2010
S.Veneziano, gennaio 2010, Roma
Resistenza
• R/square
Poly: 4 ohms/squaremetal 1: 0.08 ohms/squaremetal 2: 0.07 ohms/squaremetal 3: 0.03 ohms/squarendiff: 2 ohms/squarepdiff: 2 ohms/square
22
Friday, January 22, 2010
S.Veneziano, gennaio 2010, Roma
Propagation delays
• i modelli di timing possono essere piu’ complessi, come vedremo quando parleremo di timing verification.
23
Friday, January 22, 2010
S.Veneziano, gennaio 2010, Roma
Propagation delays 2
y = 0,7342x + 1,7317
R2 = 0,9999
0,00
5,00
10,00
15,00
20,00
25,00
30,00
35,00
0,00 10,00 20,00 30,00 40,00 50,00
timeLinear (time)
.OP
.TRAN 1NS 100NScl0 output0 Ground 5pFcl1 output1 Ground 10pFcl2 output2 Ground 20pFcl3 output3 Ground 40pF
VIN input Ground 0.01 PWL(0us 5V 10ns 5V 12ns 0V 20ns 0V)VGround 0 Ground DC 0VVdd +5V 0 DC 5Vm10 output0 input Ground Ground NMOS W=100u L=2um20 output0 input +5V +5V PMOS W=200u L=2um11 output1 input Ground Ground NMOS W=100u L=2um21 output1 input +5V +5V PMOS W=200u L=2um12 output2 input Ground Ground NMOS W=100u L=2um22 output2 input +5V +5V PMOS W=200u L=2um13 output3 input Ground Ground NMOS W=100u L=2um23 output3 input +5V +5V PMOS W=200u L=2u
.model nmos nmos level=2 vto=0.78 tox=400e-10 nsub=8.0e15 xj=-0.15e-6+ ld=0.20e-6 uo=650 ucrit=0.62e5 uexp=0.125 vmax=5.1e4 neff=4.0+ delta=1.4 rsh=37 cgso=2.95e-10 cgdo=2.95e-10 cj=195e-6 cjsw=500e-12+ mj=0.76 mjsw=0.30 pb=0.80.model pmos pmos level=2 vto=-0.8 tox=400e-10 nsub=6.0e15 xj=-0.05e-6+ ld=0.20e-6 uo=255 ucrit=0.86e5 uexp=0.29 vmax=3.0e4 neff=2.65+ delta=1 rsh=125 cgso=2.65e-10 cgdo=2.65e-10 cj=250e-6 cjsw=350e-12+ mj=0.535 mjsw=0.34 pb=0.80.end
time
voltage
XXX
0.0 5.0 10.0 15.0 20.0 25.0 30.0 35.0 40.0
ns
-1
0
1
2
3
4
5
V output3
time
voltage
XXX
0.0 5.0 10.0 15.0 20.0 25.0 30.0 35.0 40.0
ns
-1
0
1
2
3
4
5
V output0
24
Spice
Cload [pF]
delay [ns]
Friday, January 22, 2010
S.Veneziano, gennaio 2010, Roma
drive strenghts
• normally we require bn = bp -> W/L p = 2X W/L n
• celle caratterizzate da X1(minimum size), X2, X4, X8
AOI21M20D1AND OR Gate
© Virtual Silicon Technology, Inc. Any information in this databook is subject to change without notice.
0.18um Standard Cell Library 84
Description:
And-Or-Invert, which has drive strength of 1x, with three data inputs (not(A1), not(A2), and B), and one output (Z).
Cell Attributes
Area Static power (uW) VDD Temp Process
24.394 0.000 1.8V 25°C Typical
Truth Table
A1 A2 B Z
dc dc 1 0
0 0 dc 0
1 dc 0 1
dc 1 0 1
Propagation Delays(ns) for Sample Loads, standard load = 0.08pF slew = 0.8ns
Fanout 1 2 3 4 8 16
Edge Rise Fall Rise Fall Rise Fall Rise Fall Rise Fall Rise Fall
A1!"Z 0.305 0.424 0.528 0.532 0.750 0.637 0.972 0.742 1.853 1.164 3.613 2.008
A2!"Z 0.342 0.391 0.564 0.501 0.786 0.607 1.007 0.712 1.894 1.134 3.666 1.976
B!"Z 0.498 0.252 0.761 0.429 1.002 0.570 1.229 0.705 2.118 1.199 3.889 2.186
Power Dissipation(pJ) for Sample Loads, standard load = 0.08pF slew = 0.8ns
Fanout 1 2 3 4 8 16
Edge Rise Fall Rise Fall Rise Fall Rise Fall Rise Fall Rise Fall
A2!"Z 0.198 0.213 0.327 0.341 0.456 0.471 0.585 0.600 1.105 1.119 2.143 2.156
B!"Z 0.181 0.155 0.304 0.277 0.431 0.405 0.557 0.532 1.070 1.045 2.095 2.072
A1!"Z 0.193 0.206 0.322 0.334 0.452 0.464 0.581 0.593 1.100 1.111 2.135 2.148
Pin Descriptions
Pin Capacitance pF Descritpion
A1 input 0.006 inverted data input
A2 input 0.006 inverted data input
B input 0.008 data input
Z output data output
ZA2
A1
B
25
AOI21M20D4AND OR Gate
© Virtual Silicon Technology, Inc. Any information in this databook is subject to change without notice.
0.18um Standard Cell Library 86
Description:
And-Or-Invert, which has drive strength of 4x, with three data inputs (not(A1), not(A2), and B), and one output (Z).
Cell Attributes
Area Static power (uW) VDD Temp Process
48.787 0.000 1.8V 25°C Typical
Truth Table
A1 A2 B Z
dc dc 1 0
0 0 dc 0
1 dc 0 1
dc 1 0 1
Propagation Delays(ns) for Sample Loads, standard load = 0.08pF slew = 0.8ns
Fanout 1 2 3 4 8 16
Edge Rise Fall Rise Fall Rise Fall Rise Fall Rise Fall Rise Fall
A1!"Z 0.185 0.412 0.226 0.451 0.262 0.483 0.297 0.512 0.431 0.616 0.700 0.813
A2!"Z 0.217 0.357 0.259 0.396 0.295 0.428 0.330 0.457 0.464 0.562 0.732 0.760
B!"Z 0.329 0.179 0.364 0.210 0.398 0.238 0.432 0.264 0.566 0.364 0.836 0.560
Power Dissipation(pJ) for Sample Loads, standard load = 0.08pF slew = 0.8ns
Fanout 1 2 3 4 8 16
Edge Rise Fall Rise Fall Rise Fall Rise Fall Rise Fall Rise Fall
A2!"Z 0.287 0.334 0.411 0.455 0.537 0.580 0.662 0.704 1.177 1.214 2.212 2.244
B!"Z 0.305 0.289 0.433 0.416 0.563 0.545 0.691 0.673 1.211 1.191 2.248 2.228
A1!"Z 0.277 0.328 0.401 0.448 0.528 0.570 0.654 0.692 1.169 1.201 2.204 2.232
Pin Descriptions
Pin Capacitance pF Descritpion
A1 input 0.009 inverted data input
A2 input 0.009 inverted data input
B input 0.006 data input
Z output data output
ZA2
A1
B
Friday, January 22, 2010
S.Veneziano, gennaio 2010, Roma
Livelli logici: simulazione
type std_ulogic is ( 'U', -- Uninitialized 'X', -- Forcing Unknown '0', -- Forcing 0 '1', -- Forcing 1 'Z', -- High Impedance 'W', -- Weak Unknown 'L', -- Weak 0 'H', -- Weak 1 '-' -- Don't care );
03/02/2006 11:02 AMVerilog HDL On-line Quick Reference body
Page 3 of 31http://www.sutherland-hdl.com/on-line_ref_guide/vlog_ref_body.html
x or X unknown or uninitialized
4.6 Logic Strengths
The Verilog HDL has 8 logic strengths: 4 driving, 3 capacitive, and high impedance (no strength).
Strength
LevelStrength Name
Specification
Keyword
Display
Mnemonic
7 Supply Drive supply0 supply1 Su0 Su1
6 Strong Drive strong0 strong1 St0 St1
5 Pull Drive pull0 pull1 Pu0 Pu1
4 Large Capacitive large La0 La1
3 Weak Drive weak0 weak1 We0 We1
2 Med. Capacitive medium Me0 Me1
1 Small Capacitive small Sm0 Sm1
0 High Impedance highz0 highz1 HiZ0 HiZ1
4.7 Literal Integer Numbers
Syntax
size'base valueSized integer in a specific radix
(base)
size (optional) is the number of bits in the number. Unsized integers default to at least 32-
bits.
'base (optional) represents the radix. The default base is decimal.
Base Symbol Legal Values
binary b or B 0, 1, x, X, z, Z, ?, _
octal o or O 0-7, x, X, z, Z, ?, _
decimal d or D 0-9, _
hexadecimal h or H 0-9, a-f, A-F, x, X, z, Z, ?, _
The ? is another way of representing the Z logic value.
An _ (underscore) is ignored (used to enhance readability).
Values are expanded from right to left (lsb to msb).
When size is less than value, the upper bits are truncated.
When size is larger than value, and the left-most bit of value is 0 or 1, zeros are left-
extended to fill the size.
When size is larger than value, and the left-most bit of value is Z or X, the Z or X is left-
26
Verilog
VHDL
Friday, January 22, 2010
S.Veneziano, gennaio 2010, Roma
Design libraries
• simulation
• synthesis
• place and routing
• testvector generation
27
Friday, January 22, 2010
S.Veneziano, gennaio 2010, Roma
Introduzione ai linguaggi
• definizione, simulazione, sintesi, test:
• Verilog, VHDL ...
• scripting languages:
• tcl, perl, awk ...
• descrizione del circuito
• EDIF, SDF ...
28
Friday, January 22, 2010
S.Veneziano, gennaio 2010, Roma
Verilogmodule counter; reg clock; // Declare a reg data type for the clock. integer count; // Declare an integer data type for the count.initial // Initialize things; this executes once at t=0. begin clock = 0; count = 0; // Initialize signals. #340 ; // Finish after 340 time ticks. end /* An always statement to generate the clock; only one statement follows the always so we don't need a begin and an end. */always #10 clock = ~ clock; // Delay (10ns) is set to half the clock cycle./* An always statement to do the counting; this executes at the same time (concurrently) as the preceding always statement. */always begin // Wait here until the clock goes from 1 to 0. @ (negedge clock); // Now handle the counting. if (count == 7) count = 0; else count = count + 1; $monitor("time = ",," count = ", count); end endmodule
29
Friday, January 22, 2010
S.Veneziano, gennaio 2010, Roma
Esempi di celle di libreria// // Copyright (C) 2001 Virtual Silicon Technology Inc.. All Rights Reserved. // "eSilicon", "eSi", "eSi-Route", "eSi-Pad", "eSi-RAM", "eSi-ROM", "eSi-PLL", // "The Heart of Great Silicon(R)", "Silicon Ready(R)", "IP Ambassador", // "Design Service Ambassador", and "Virtual Silicon" are trademarks // of Virtual Silicon Technology Inc. // // Virtual Silicon Technology Inc. // 1200 Crossman Ave Suite 200 // Sunnyvale, CA 94089-1116 // Phone : 408-548-2700 // Fax : 408-548-2750 // Web Site : www.virtual-silicon.com // // File Name: DFFPB1.v // Library Name: umcl18u250t2 // Library Release: 2.1 // Process: eSi/Route-11 UMC L180 // verigen patch-level 1.132-m 02/27/2000 13:49:14`celldefine// Positive Edge, D Flip-Flop; Q, QB Outputs// Q = rising(CK) ? D : 'p';QB = !Qmodule DFFPB1 (Q, QB, CK, D); output Q; output QB; input CK; input D; reg notifier; p_ff _i0 (Q, D, CK, notifier); not _i1 (QB,Q); specify (CK => Q) = (1,1); (CK => QB) = (1,1);`ifdef no_tchk`else $hold(posedge CK,negedge D,0,notifier); $hold(posedge CK,posedge D,0,notifier); $setup(negedge D,posedge CK,0,notifier); $setup(posedge D,posedge CK,0,notifier); $width(negedge CK,1,0,notifier); $width(posedge CK,1,0,notifier);`endif endspecifyendmodule`endcelldefine
// POSITIVE EDGE TRIGGERED D FLIP-FLOPprimitive p_ff (Q, D, CP,notifier); output Q; reg Q; input D, CP,notifier;
table // D CP No : Qt : Qt+1 1 (01) ? : ? : 1; // clocked data 0 (01) ? : ? : 0; 1 (x1) ? : 1 : 1; // reducing pessimism 0 (x1) ? : 0 : 0; 1 (bx) ? : 1 : 1; 0 (bx) ? : 0 : 0; ? (?0) ? : ? : -; * b ? : ? : -; // ignore edges on data ? ? * : ? : x; endtableendprimitive
30
Friday, January 22, 2010
S.Veneziano, gennaio 2010, Roma
Esempi di file SDF(DELAYFILE(SDFVERSION "OVI 2.1")...(DIVIDER /)// OPERATING CONDITION "BEST::WORST"(VOLTAGE 1.98::1.62)(PROCESS "1.000::1.000")(TEMPERATURE 0.00::125.00)(TIMESCALE 1ns)(CELL (CELLTYPE "cma_io") (INSTANCE) (DELAY (ABSOLUTE (INTERCONNECT U110/Z out_k_31/DO (0.016::0.016) (0.015::0.016)) (INTERCONNECT U116/Z out_k_30/DO (0.022::0.022) (0.021::0.022)) (INTERCONNECT U122/Z out_k_29/DO (0.022::0.023) (0.022::0.023)) (INTERCONNECT U134/Z out_k_28/DO (0.022::0.022) (0.022::0.023)) (INTERCONNECT U41/Z out_k_27/DO (0.013::0.013) (0.013::0.013))...(CELL (CELLTYPE "DFFRPB2") (INSTANCE muxb/clockb/divout_reg)
(DELAY (ABSOLUTE (IOPATH RB Q () (0.278:0.278:0.278)) (IOPATH CK Q (0.481:0.481:0.481) (0.452:0.452:0.452)) (IOPATH RB QB (0.381:0.381:0.381) ()) (IOPATH CK QB (0.566:0.566:0.566) (0.595:0.595:0.595)) ) ) (TIMINGCHECK (SETUP (posedge D) (COND shcheckCKDlh === 1'b1 (posedge CK)) (0.115:0.115:0.115)) (SETUP (negedge D) (COND shcheckCKDlh === 1'b1 (posedge CK)) (0.017:0.017:0.017)) (HOLD (posedge D) (COND shcheckCKDlh === 1'b1 (posedge CK)) (0.000:0.000:0.000)) (HOLD (negedge D) (COND shcheckCKDlh === 1'b1 (posedge CK)) (0.010:0.010:0.010)) (WIDTH (negedge RB) (0.338:0.338:0.338)) (RECOVERY (posedge RB) (posedge CK) (-0.304:-0.304:-0.304)) (HOLD (posedge RB) (posedge CK) (0.356:0.356:0.356)) (WIDTH (negedge CK) (0.267:0.267:0.267)) (WIDTH (posedge CK) (0.236:0.236:0.236)) ))
31
Friday, January 22, 2010
S.Veneziano, gennaio 2010, Roma
high-level libraries
• esempio:FIFO
DesignWare IP Family
Memory – FIFO Overview
192 Synopsys, Inc. December 12, 2005
Memory – FIFO OverviewThe FIFOs in this category address a broad array of design requirements. FIFOs, which include dual-port RAM memory arrays, are offered for both synchronous and asynchronous interfaces. The memory arrays are offered in two configurations: latch-based to minimize area, and D flip-flop-based to maximize testability. These two configurations also offer flexibility when working under design constraints, such as a requirement that no latches be employed. Flip-flop-based designs employ no clock gating to minimize skew and maximize performance. All FIFOs employ a FIFO RAM controller architecture in which there is no extended “fall-through” time required before reading contents just written.
Also offered are FIFO Controllers without the RAM array. They consist of control and flag logic and an interface to common ASIC dual port RAMs. Choosing between the two is typically based on the required size of the FIFO. For shallow FIFOs (less than 256 bits), synchronous or asynchronous FIFOs are available which include both memory and control in a single macro. These macros can be programmed via word width, depth, and level (almost-full flag) parameters.
For larger applications (greater than 256 bits), you can use the asynchronous FIFO Controller with a diffused or metal programmable RAM. See Figure 1.
Figure 1: Memory: FIFOs and FIFO Controllers
All FIFOs and Controllers support full, empty, and programmable flag logic. Programmable flag logic may be statically or dynamically programmed. When statically programmed, the threshold comparison value is hardwired at synthesis compile time. When dynamically programmed, it may be changed during FIFO operation.
DiffusedorMetal ProgrammableRAM(on-chip or off-chip)
Synthetic DesignsFIFO RAM Controller
Synthetic Designs FIFO(includes control and memory)
Controller
LatchorFlip-FlopBased RAM
FIFO Controller to be used with a technology-specific vendor supplied RAM
Technology-independent FIFOthat includes control and memory
•For large FIFOs (> 256 bits)•Interfaces to dual port static RAMs
•For shallow FIFOs (< 256 bits)•Self-contained RAM storage array
December 12, 2005 Synopsys, Inc. 193
DesignWare IP Family
DW_asymfifo_s1_dfAsymmetric I/O Synchronous (Single Clock) FIFO with Dynamic Flag
DW
L Sythesizable IP
DW_asymfifo_s1_dfAsymmetric I/O Synchronous (Single Clock) FIFO with Dynamic Flag
! Fully registered synchronous flag output ports
! D flip-flop-based memory array for high testability
! All operations execute in a single clock cycle
! FIFO empty, half full, and full flags
! Parameterized asymmetric input and output bit widths (must be integer-multiple relationship)
! Word integrity flag for data_in_width < data_out_width
! Flushing out partial word for data_in_width < data_out_width
! Parameterized byte (or subword) order within a word
! FIFO error flag indicating underflow, overflow, and pointer corruption
! Parameterized word depth
! Dynamically programmable almost full and almost empty flags
! Parameterized reset mode (synchronous or asynchronous, memory array initialized or not)
push_req_n
full
emptyerror
data_out
pop_req_n
diag_n
rst_nclk
almost_full
half_full
almost_empty
data_in
flush_n
ram_full
part_wdae_level
af_thresh
32
Friday, January 22, 2010
S.Veneziano, gennaio 2010, Roma
Macrocell libraries
• encapsulationlibrary IEEE;use IEEE.std_logic_1164.all;use IEEE.VITAL_timing.all;
library umc;use umc.all;
library work;use work.cma_common.all;
entity cma_pll is port ( clk40 : in STD_LOGIC; -- Input clock bypass : in STD_LOGIC; -- pass thru enable clr : in STD_LOGIC; -- Reset not lock : out STD_LOGIC; -- PLL is in lock clk : out STD_LOGIC -- PLL clock out );end cma_pll;
architecture rtl of cma_pll is component PLL40x320x6LM port ( REF : in STD_LOGIC; -- Input clock FB : in STD_LOGIC; -- Feedback from a delayed place on the chip BYPASS : in STD_LOGIC; -- pass thru mode enable RESET : in STD_LOGIC; -- Reset not LOCK : out STD_LOGIC; -- PLL is in lock PLLOUT : out STD_LOGIC -- PLL clock out ); end component; signal int_clr : STD_LOGIC; signal int_clk : STD_LOGIC;begin clk <= int_clk; clear : process(clr) begin if CLRVAL = '1' then int_clr <= clr; else int_clr <= not clr; end if; end process; pll : PLL40x320x6LM port map ( REF => clk40, FB => int_clk, BYPASS => bypass, RESET => int_clr, LOCK => lock, PLLOUT => int_clk);end rtl;
33
Friday, January 22, 2010
S.Veneziano, gennaio 2010, Roma
Il progetto di un ASIC
• Documento specifiche
• “package” di parametri
• utilizzato anche nella simulazione comportamentale
--*****************************************************--* file: cma_common.vhd--* Package: cma_common--*--* basic package for cma, contains main parameters--* @author : S. Veneziano, R.Vari--* @version 1.0 : 20001001 : birth--*****************************************************--/
library IEEE;library DWARE;use IEEE.NUMERIC_STD.all;use DWARE.DWpackages.all;use IEEE.STD_LOGIC_1164.all;
package cma_common is
--* ASIC identification bit size constant CMA_DEVID_SIZE : integer := 7;
constant SERIALWSIZE : integer := 16; -- serialized data word size
--* I2C register sizes are modulo 8 constant I2CMOD : integer := 8;
--* BC Counter size constant FEBCID_WIDTH : integer := 12;
--* L1ID Counter size constant FEL1ID_WIDTH : integer := 9;
--* Frontend grouping --* in number of strips constant FE_GROUP : integer := 8;
--* deadtime setting of input signals --* in number of time slices constant FE_DEADTIME : integer := 32;
34
Friday, January 22, 2010
S.Veneziano, gennaio 2010, Roma
Behavioral modelmodule fifo( dout, // head of fifo full, // no more space, no shift-in allowed half, // fifo is >=50% full quarter, // fifo is >=25% full empty, // head of fifo is invalid data clk, res_, din, // data to store shiftin, // store data from din in fifo shiftout); // i've read the head of fifo, show me next
parameter WIDTH = 32; // bit width parameter DEPTH = 16; // depth of fifo
output [(WIDTH-1):0] dout; output full, half, quarter, empty; reg full, half, quarter, empty; input clk, res_, shiftin, shiftout; input [(WIDTH-1):0] din;
reg [(WIDTH-1):0] entry [0:(DEPTH-1)]; // the register stages integer wp; // write pointer (points to first free slot) integer i; // loop count
// first stage of fifo is output assign dout = entry[0];
always @(posedge clk or negedge res_) // trigger new evaluation begin if (res_ == 0) // it's a reset begin wp = 0; // the initial values full <= 1'b0; half <= 1'b0; quarter <= 1'b0; empty <= 1'b1; end else // it's a posedge of clk begin case ({shiftin, shiftout}) 2'b00: ; // nothing to do 2'b01: // shift-out(assumes: at least 1 valid value in fifo) begin for (i=1; i<wp; i=i+1) // shift all valid entries entry[i-1] <= entry[i]; wp = wp-1; full <= 1'b0; half <= (wp >= (DEPTH/2)) ? 1'b1 : 1'b0; quarter <= (wp >= (DEPTH/4)) ? 1'b1 : 1'b0; empty <= (wp == 0) ? 1'b1 : 1'b0; end 2'b10: // shift-in (assumes: at least 1 entry free) begin entry[wp] <= din; wp = wp+1; full <= (wp == DEPTH) ? 1'b1 : 1'b0; half <= (wp >= (DEPTH/2)) ? 1'b1 : 1'b0; quarter <= (wp >= (DEPTH/4)) ? 1'b1 : 1'b0; empty <= 1'b0; end 2'b11: //simultaneous shift-in and -out begin // (at least 1 valid entry) for (i=1; i<wp; i=i+1) entry[i-1] <= entry[i]; entry[wp-1] <= din; end endcase // case ({shiftin, shiftout}) end // else: !if(res_ == 0) end // always @ (posedge clk or negedge res_)
endmodule35
Friday, January 22, 2010
S.Veneziano, gennaio 2010, Roma
Architecture design
Block diagrams specify structure: wire bundleof width 5
36
Friday, January 22, 2010
S.Veneziano, gennaio 2010, Roma
System Partitioning
• Il sistema va partizionato in base a:
• funzionalita’
• constraints di sintesi
• constraints fisici
• risorse umane disponibili
37
Friday, January 22, 2010
S.Veneziano, gennaio 2010, Roma
rtl model
A register-transfer machine has combinationallogic connecting registers:
DQ combinationallogic
D QD Q combinationallogic
combinationallogic
38
Friday, January 22, 2010
S.Veneziano, gennaio 2010, Roma
Parametrized blocks
• Per rendere il codice riutilizzabile nello stesso o in altri progetti
ENTITY in_reg_gen IS GENERIC(WIDTH : integer); PORT( a : IN std_logic_vector(WIDTH-1 DOWNTO 0); serout : OUT std_logic; clk40 : IN std_logic; clken_I2C : IN std_logic; clr : IN std_logic; sel : IN std_logic; serin : IN std_logic; shift : IN std_logic );END in_reg_gen;
ARCHITECTURE rtl of in_reg_gen IS
signal q : std_logic_vector(WIDTH-1 downto 0);
BEGIN
serout <= q(q'HIGH); reg: process (clk40,clr) begin if(clr= CLRVAL) then q<=(others=>'0'); elsif rising_edge(clk40) then if(clken_I2C = CLKENVAL) then if(sel = '1' AND shift = '1') then q <= q(WIDTH-2 downto 0) & serin; else q <= a; end if; END if; end if; end process reg;
END rtl;39
Friday, January 22, 2010
S.Veneziano, gennaio 2010, Roma
La simulazione
• possiamo distinguere almeno due tipi:
• Compiler driven
• Event driven
40
Friday, January 22, 2010
S.Veneziano, gennaio 2010, Roma
Unit-delay
• tutti i ritardi dei gate sono unitari
• utile per osservare l’evoluzione temporale (glitches)
time
’0’
’1’
’0’
’1’
0 1 2 3 4
n1
n2
n3
n4
n5
n6
n7
n8
n9
’0’
’1’
’0’
’1’
’0’
’1’
’0’
’1’
’0’
’1’
’0’
’1’
’0’
’1’
for (t ! tstart ; t " tend ; t ! t + 1) {new[1]! A;
new[2]! B;
new[3]! C;
new[4]! D;
new[5]! E;
new[6]! OR(old[1], old[2]);
new[7]! AND(old[4], old[5]);
new[8]! AND(old[6], old[3]);
new[9]! OR(old[7], old[8]);
F! new[9];
old! new;
}
41
Friday, January 22, 2010
S.Veneziano, gennaio 2010, Roma
Simulazione event-driven
• utilizzata nei simulatori gate-level.
• un evento e’ un cambiamento di livello di un segnale ad un certo tempo
• un evento puo’ causare la variazione di livello di altri segnali
struct event {int time;
struct net *node;
struct signal value value;
. . .
};
struct event queue {. . .
};
struct event queue *new queue();
{“create an empty event queue”;
}
struct event *first event(struct event queue *queue)
{“remove the earliest event from an event queue and return it”;
}
insert event(struct event queue *queue, struct event *new event)
{“add a given event to an event queue”;
}
42
Friday, January 22, 2010
S.Veneziano, gennaio 2010, Roma
Delta-cycles
• Eventi vengono inseriti nella coda, ordinati secondo il tempo di aggiornamento del segnale
• Eventi possono essere distanziati di un tempo di simulazione determinato dai gate delays o nullo (delta cycle)
time
current
time
(n! 1)!t
n!t
(n" 1)!t
(n" 2)!t
(n" 3)!t
43
Friday, January 22, 2010
S.Veneziano, gennaio 2010, Roma
Testbench organization
• modello idealeEE495D Fall 2004 Test and Debug Page 13
Gold Model Approach
GoldModel
DUT
InputPatternGenerator
Fail
XOR each pair ofoutput bits individually
Test Bench
44
Friday, January 22, 2010
S.Veneziano, gennaio 2010, Roma
Analisi statistica• Esempio: Un programma di simulazione Montecarlo
produce una lista di stimoli casuali, ma con una struttura vicina alla realta’, in input al dispositivo.
RUN 1 EVENT 1 WINDOW 5 LINES 4 TIME 122.8 I0 0 I1 0 J0 1 28 J1 0 TIME 122.9 I0 0 I1 0 J0 0 J1 1 28 TIME 125.1 I0 1 31 I1 0 J0 0 J1 0 TIME 125.2 I0 0 I1 1 31 J0 0 J1 0 RUN 1 EVENT 2 WINDOW 5 LINES 3 TIME 122.8 I0 0 I1 0 J0 1 21 J1 0 TIME 122.9 I0 0 I1 0 J0 0 J1 1 21 TIME 125.1 I0 1 24 I1 1 24 J0 0 J1 0 RUN 1 EVENT 3 WINDOW 5 LINES 2 TIME 122.7 I0 0 I1 0 J0 1 33 J1 0 TIME 122.8 I0 0 I1 0 J0 0 J1 1 33
45
Friday, January 22, 2010
S.Veneziano, gennaio 2010, Roma
Analisi statistica 2• Il programma di
simulazione contiene anche un modello comportamentale del dispositivo. L’output del modello comportamentale viene confrontato con quello del dispositivo
C001 -- CMID 0 FEL1ID 185A2 -- FEBCID 14421A1F -- BC 3 TIME 2 IJK 0 STRIP 311A3F -- BC 3 TIME 2 IJK 1 STRIP 31195C -- BC 3 TIME 1 IJK 2 STRIP 28199C -- BC 3 TIME 1 IJK 4 STRIP 281ADF -- BC 3 TIME 2 IJK 6 STRIP 311AEB -- BC 3 TIME 2 OVL 2 THR 34025 -- CODE 0 CRC 25
C002 -- CMID 0 FEL1ID 28620 -- FEBCID 15681A18 -- BC 3 TIME 2 IJK 0 STRIP 241A38 -- BC 3 TIME 2 IJK 1 STRIP 241955 -- BC 3 TIME 1 IJK 2 STRIP 211995 -- BC 3 TIME 1 IJK 4 STRIP 211AD8 -- BC 3 TIME 2 IJK 6 STRIP 241AEB -- BC 3 TIME 2 OVL 2 THR 3408B -- CODE 0 CRC 8B
C003 -- CMID 0 FEL1ID 3869E -- FEBCID 16941961 -- BC 3 TIME 1 IJK 3 STRIP 119A1 -- BC 3 TIME 1 IJK 5 STRIP 1406C -- CODE 0 CRC 6C
C001 -- CMID 0 FEL1ID 185A2 -- FEBCID 14421A1F -- BC 3 TIME 2 IJK 0 STRIP 311A3F -- BC 3 TIME 2 IJK 1 STRIP 31195C -- BC 3 TIME 1 IJK 2 STRIP 28199C -- BC 3 TIME 1 IJK 4 STRIP 281ADF -- BC 3 TIME 2 IJK 6 STRIP 311AEB -- BC 3 TIME 2 OVL 2 THR 34025 -- CODE 0 CRC 25
C002 -- CMID 0 FEL1ID 28620 -- FEBCID 15681A18 -- BC 3 TIME 2 IJK 0 STRIP 241A38 -- BC 3 TIME 2 IJK 1 STRIP 241955 -- BC 3 TIME 1 IJK 2 STRIP 211995 -- BC 3 TIME 1 IJK 4 STRIP 211AD8 -- BC 3 TIME 2 IJK 6 STRIP 241AEB -- BC 3 TIME 2 OVL 2 THR 3408B -- CODE 0 CRC 8B
C003 -- CMID 0 FEL1ID 3869E -- FEBCID 16941961 -- BC 3 TIME 1 IJK 3 STRIP 119A1 -- BC 3 TIME 1 IJK 5 STRIP 1406C -- CODE 0 CRC 6C
46
Friday, January 22, 2010
S.Veneziano, gennaio 2010, Roma
Synthesis
Flusso tradizionale: piu’ avanti parleremo del legame tra sintesi e layout
!
"#$!%&'()
*(+(,-.&/+
0(.12
$&3,-,&()
!4(-5&+6
%&'()
71&'5&+6
8/+).,-&+.)
8/92&'(
*(+(,-.(
4(2/,.)
:,&.(!%&'()
!
!"#$%&'()'*+,"-'./012&,",'!345'
!!
6$"-7'8&9"&5'7(;/,(!5&)<1))&+6!</92&'(=!'(.>)!?1&<@'A!,(B&(C!CD-.!A/1!'(-,+(5!&+!E-,.!FG!!HD(!;&,).!).(2!&)!./!)2(+5!)/9(!.&9(!</''(<.&+6!&+;/,9-.&/+!/+!</5&+6!61&5('&+()!(B(+!&;!A/1!-,(+>.!C,&.&+6!.D(!</5(G!!HD(!,(-)/+!A/1!C-+.!./!5/!.D&)!&)!./!-&5!C&.D!)A+.D()&)!3(<-1)(!.D(,(!-,(!.&9()!CD(+!&.!&)!+(<())-,A!./!,(C,&.(!.D(!</5(G!I+/C&+6!-!;(C!61&5('&+()!<-+!D('2!A/1!-<</92'&)D!.D&)!9/,(!?1&<@'AG!HD(!;/''/C&+6!,()/1,<()!C&''!3(!D('2;1'JG!!
!" $K#L!8/5(!<D(<@(,!!
!" %-.M%,((!"#$!0/'BN(.!-,.&<'(!OPPQQPQ!!
!" 0A+/2)A)!,(;(,(+<(!9-+1-')G!!$&3,-,&()!+((5!./!3(!5(;&+(5!3(;/,(!,(-5&+6!&+!A/1,!</5(G!!HD(!'&+@R'&3,-,A=!.-,6(.R'&3,-,A=!)A93/'R'&3,-,A=!-+5!)A+.D(.&<R'&3,-,A!B-,&-3'()!+((5!./!3(!)(.!./!5(;&+(!A/1,!'&3,-,&()G!!HA2&<-''A!.D()(!B-,&-3'(!5(;&+&.&/+)!-,(!'/<-.(5!&+!A/1,!G)A+/2)A)R5<G)(.12!;&'(G!!S/1!<-+!,(-5!&+!A/1,!</5(!3A!1)&+6!.D(!-+-'AT(U('-3/,-.(!/,!,(-5R;&'(!</99-+5)G!HD()(!</99-+5)!3(D-B(!5&;;(,(+.'A!)/!A/1!)D/1'5!,(B&(C!!
! "
47
Friday, January 22, 2010
S.Veneziano, gennaio 2010, Roma
Sintesi top-down!"#$%"&'
!$$$$()"& *+,-#.
"#$%!&'!#'(&)#
%#*&+'
"#*,-.#!/0-(&1-#
&'*($'2#*!,3!$'4
%#*&+'!)#3#)#'2#*5
611-4
7,'*()$&'(*
7,/1&-#
8!)#$%!&'!(9#!#'(&)#!%#*&+'
)#$%!:3!.#)&-,+!;5.
)#$%!:3!.#)&-,+!<5.
)#$%!:3!.#)&-,+!75.
)#$%!:3!.#)&-,+!=5.
)#$%!:3!.#)&-,+!65.
)#$%!:3!.#)&-,+!>?@5.
20))#'(A%#*&+'!>?@
-&'B
8!)#*,-.#!/0-(&1-#!)#3#)#'2#*
0'&C0&34
8$11-4!2,'*()$&'(*!$'%!$(()&D0(#*
*,0)2#!2,'*()$&'(*52,'
8!2,/1&-#!(9#!%#*&+'
2,/1&-#
!
!
(-/0,1$23$!"#$%"&'$%14-/'$()"&$
5".."678#$9"6#-)1$
E,0!*9,0-%!0*#!$!D,((,/:01!$11),$29!3,)!-$)+#)!%#*&+'*5!F#)#!$)#!*,/#!$%.$'($+#*G!!
!" "#C0&)#*!-#**!/#/,)45!!
!" 7,/1&-#*!-$)+#!%#*&+'*!D4!0*&'+!(9#!%&.&%#:$'%:2,'C0#)!$11),$29!!
!" H0&2B-4!&%#'(&3&#*!(9#!2)&(&2$-!1$(9*!3,)!1,**&D-#!)#2,%&'+5!
!
5".."678#$()"&$
6!D,((,/:01!3-,I!)#C0&)#*!$!-,(!/,)#!(&/#!(9$'!$!(,1:%,I'!$11),$295!!F#)#!$)#!(9#!)#C0&)#%!*(#1*J!&'2-0%&'+!*0++#*(#%!*(#1*!(,!3,--,I!&3!(&/&'+!&*!',(!/#(G!!K5! L#'#)$(#!$!%#3$0-(!2,'*()$&'(!3&-#!$'%!*0D%#*&+':*1#2&3&2!2,'*()$&'(!3&-#*5!!>9#!%#3$0-(!2,'*()$&'(!3&-#!*9,0-%!&'2-0%#!+-,D$-!2,'*()$&'(*J!*029!$*!(9#!
! "
48
Friday, January 22, 2010
S.Veneziano, gennaio 2010, Roma
traditional design constraints
• clock definitions
• false paths
• timing checks
49
Friday, January 22, 2010
S.Veneziano, gennaio 2010, Roma
IO constraints
1-10
Chapter 1: Getting Started
The set_output_delay command specifies the amount of
delay from each output to the external device that captures the
output data. It establishes a timing constraint at the outputs and
specifies the required time for output data with respect to the CLK
clock.
Figure 1-3 shows the input and output timing constraints.
Figure 1-3 Input and Output Timing Constraints
clk1
6.6663.3330.000 9.999
create_clock
set_input_delay
set_output_delay
ex1
CLK
CLK
CLK
design
Data arrival time
Data required time
Delay
Delay
...
...
......
in4
in3
in2
in1
op1
op2
CLK
CLK
2.0
1.0
set_clock_latency2.0
Edge arrival time
CLKexternalclock
at input pin
at output pin
at flip-flops
50
Friday, January 22, 2010
S.Veneziano, gennaio 2010, Roma
loads
1-12
Chapter 1: Getting Started
2. To specify the capacitive load at the design outputs:
pt_shell> set_load -pin_load 1 [all_outputs]1
The all_outputs command creates a collection of all output
ports of the design. The set_load command sets the external
capacitive load to 1.0 unit for these ports. The units are defined
in the library to be 1 pF.
Figure 1-4 shows the driving cell and loads that have been
specified.
Figure 1-4 Input and Output Timing Constraints
clk1
ex1design
in4
in3
in2
in1
op1
op2
set_driving_cell
IV
set_loadIV
IV
IV
IV
1 pF
1 pF
52
Friday, January 22, 2010
S.Veneziano, gennaio 2010, Roma
timing verification basics
• path delay
• constraints
• analisi
53
Friday, January 22, 2010
S.Veneziano, gennaio 2010, Roma
Static Timing Analysis
• Esempio: Synopsys Primetime
56
Friday, January 22, 2010
S.Veneziano, gennaio 2010, Roma
Esempio: static timing analysis
1-4
Chapte
r 1: G
ettin
g S
tarte
d
Figure 1-2Sim
ple Design “ex1_design.v”
57
Friday, January 22, 2010
S.Veneziano, gennaio 2010, Roma
Analysis
• esempio di slack negativo (violazione)1-25
Graphical User Interface
3. Check the clock definition again:
pt_shell> report_clock...Clock Period Waveform Attrs Sources-----------------------------------------------------CLK 4.00 {0 2} {clk1}
The clock period is now 4.00 ns.
Analyze the Design Again
To find out whether the faster clock causes any timing violations, use
the report_timing command.
1. First run the command without any options:
pt_shell> report_timing... Point Incr Path ---------------------------------------------------- clock CLK (rise edge) 0.00 0.00 clock network delay (ideal) 2.00 2.00 g9/CP (FD1) 0.00 2.00 r g9/Q (FD1) 1.38 3.38 f g11/Z (IV) 2.00 5.38 r op2 (out) 0.00 5.38 r data arrival time 5.38
clock CLK (rise edge) 4.00 4.00 clock network delay (ideal) 2.00 6.00 output external delay -1.00 5.00 data required time 5.00 ---------------------------------------------------- data required time 5.00 data arrival time -5.38 ---------------------------------------------------- slack (VIOLATED) -0.38
58
Friday, January 22, 2010
S.Veneziano, gennaio 2010, Roma
Path slacks
1-27
Graphical User Interface
3. In the menu bar of the top-level GUI window, choose Timing >
Histogram > Path Slack. This opens the Path Slack dialog box.
Leave the options at their default settings. Click OK to generate
the histogram.
4. Resize the histogram window and move the vertical pane border
to get the approximate dimensions shown in Figure 1-8. To resize
the window, point to outside border and drag. To move the pane
border, point to it and drag left or right.
Figure 1-8 Path Slack Histogram
The histogram shows the distribution of slack values for different
paths, grouped into eight bins. The red bars represent timing
violations (negative slack values), while the green bars represent
slack values that are not violations (positive slack values).
5. Click the leftmost bar to select it. The selected bar is highlighted
in yellow. The slack values and paths for that bin are listed in the
pane on the right. See Figure 1-9.
Pane border
59
Friday, January 22, 2010
S.Veneziano, gennaio 2010, Roma
worst path analysis
1-32
Chapter 1: Getting Started
Waveform Viewer
The waveform viewer shows the timing of clock and data transitions
of the path.
1. Click the Waveform tab. This displays the clock waveform for the
path. To make the whole waveform visible, resize the window and
move the border between the waveform and the schematic. See
Figure 1-13.
Figure 1-13 Waveform Display
1-31
Graphical User Interface
Path Schematic
The path schematic shows the cells and nets that are in the path.
See Figure 1-12. The data path starts at flip-flop g8, goes through
inverter g10, and ends at output op1. The purple highlight shows the
data path traversal. You can also highlight the clock launch path and
clock capture path; you will see in later tutorial chapters.
Figure 1-12 Path Schematic
1. Rest the mouse pointer on the wire between the flip-flop and the
inverter. An InfoTip (pop-up text box) shows the net name,
capacitance, and fanout.
2. Rest the mouse pointer on the inverter. An InfoTip shows the cell
instance name, library name, pins, and signal timing.
3. Experiment with the commands in the path inspector’s View
menu and the equivalent toolbar buttons. Try using the scroll bars
at different viewing scales. When you are done, choose View >
Zoom Full.
60
Friday, January 22, 2010
S.Veneziano, gennaio 2010, Roma
Modelli di timing usati
• viene utilizzata l’informazione con il dettaglio disponibile, che aumenta man mano che il progetto cresce:
• pre-layout -> wire loads
• placement -> routing load estimation
• post-route -> detailed parasitics, RC extraction
61
Friday, January 22, 2010
S.Veneziano, gennaio 2010, Roma
Wire loads esempio
• sintesi tradizionale vs sintesi “fisica”.
/* $RCSfile: n18_wc.wireload,v $ *//* $Source: /tmp/covst4386/RCS/n18_wc.wireload,v $ *//* $Date: 2001/05/18 18:19:33 $ *//* $Revision: 1.1 $ *//* *//* ---------------------- *//* Units: Capacitance - pF, Resistance - KOhm, Length - Micron. 1/7/2000 */ wire_load_table ("suggested_default") {/*Eval at 10K gates*/ fanout_length( 1, 8.89 ) ; fanout_length( 2, 20.70 ) ; fanout_length( 3, 31.46 ) ; fanout_length( 4, 38.39 ) ; fanout_length( 5, 62.43 ) ; fanout_length( 6, 68.05 ) ; fanout_length( 7, 95.55 ) ; fanout_length( 16, 520.96 ) ; fanout_length( 32, 732.71 ) ;" fanout_capacitance( 1, 0.0012 ) ;"" fanout_capacitance( 2, 0.0028 ) ;"" fanout_capacitance( 3, 0.0042 ) ;"" fanout_capacitance( 4, 0.0052 ) ;"" fanout_capacitance( 5, 0.0082 ) ;"" fanout_capacitance( 6, 0.0092 ) ;"" fanout_capacitance( 7, 0.0127 ) ;"" fanout_capacitance( 16, 0.0677 ) ;"" fanout_capacitance( 32, 0.0936 ) ;"" fanout_resistance( 1, 0.02120 ) ;" fanout_resistance( 2, 0.04238 ) ;" fanout_resistance( 3, 0.06233 ) ;" fanout_resistance( 4, 0.08017 ) ;" fanout_resistance( 5, 0.11067 ) ;" fanout_resistance( 6, 0.13520 ) ;" fanout_resistance( 7, 0.16679 ) ;" fanout_resistance( 16, 0.51272 ) ;" fanout_resistance( 32, 0.85764 ) ;
wire_load_table ("suggested_20K") { fanout_length( 1, 59.0417 ) ; fanout_length( 2, 75.1218 ) ; fanout_length( 3, 89.7723 ) ; fanout_length( 4, 99.208 ) ; fanout_length( 5, 131.94 ) ; fanout_length( 6, 139.592 ) ; fanout_length( 7, 177.035 ) ; fanout_length( 16, 756.261 ) ; fanout_length( 32, 1044.57 ) ; fanout_capacitance( 1, 0.00763318 ) ; fanout_capacitance( 2, 0.0098117 ) ; fanout_capacitance( 3, 0.0117179 ) ; fanout_capacitance( 4, 0.0130795 ) ; fanout_capacitance( 5, 0.0171642 ) ; fanout_capacitance( 6, 0.0185257 ) ; fanout_capacitance( 7, 0.0232912 ) ; fanout_capacitance( 16, 0.0981776 ) ; fanout_capacitance( 32, 0.133442 ) ; fanout_resistance( 1, 0.0492636 ) ; fanout_resistance( 2, 0.0781016 ) ; fanout_resistance( 3, 0.105265 ) ; fanout_resistance( 4, 0.129555 ) ; fanout_resistance( 5, 0.171083 ) ; fanout_resistance( 6, 0.204482 ) ; fanout_resistance( 7, 0.247494 ) ; fanout_resistance( 16, 0.718502 ) ; fanout_resistance( 32, 1.18814 ) ;
62
Friday, January 22, 2010
S.Veneziano, gennaio 2010, Roma
Sintesi logicaALGORITHMS FOR VLSI DESIGN AUTOMATION
LOGIC SYNTHESIS AND VERIFICATION
1
July 19, 1999
LOGIC SYNTHESIS AND VERIFICATION
Logic synthesis:
* Starts from a register-transfer level (RTL) description, given in e.g.VHDL or given as a set of Boolean expressions.
* Three different tasks: two-level combinational synthesis, multilevelcombinational synthesis and sequential synthesis.
* Outputs a standard-cell netlist or some other form of realization suchas a PLA.
Verification:
* Checks the equivalence of a specification and an implementation.
63
Friday, January 22, 2010
S.Veneziano, gennaio 2010, Roma
Esempio VHDLlibrary ieee;
use ieee.std logic 1164.all;
entity example is
port (x1, x2, x3, x4, x5: in std logic;
y1, y2: out std logic);
end example;
architecture behavioral of example is
begin
react: process (x1, x2, x3, x4, x5)
begin
if x1 = ’1’ and x2 = ’0’
then
y1<= x3 and x4;
y2<= x3 or x4;
elsif x2 = ’1’
then
y1<= not (x3 and (x4 or x5));
y2<= ’-’;
else
y1 <= ’-’;
y2 <= ’0’;
end if;
end process react;
end behavioral; 64
Friday, January 22, 2010
S.Veneziano, gennaio 2010, Roma
Sintesi di alto livelloALGORITHMS FOR VLSI DESIGN AUTOMATION
HIGH-LEVEL SYNTHESIS
1
June 10, 1999
HIGH-LEVEL SYNTHESIS (HLS)
VHDL synthesis:
* Starts from a register-transfer level (RTL) description; circuit behav-ior in each clock cycle is fixed.
* Uses logic synthesis techniques to optimize the design.
* Generates a standard-cell netlist.
High-level synthesis (also called architectural synthesis):
* Starts from an abstract behavioral description.
* Generates an RTL description.
BEHAVIORAL DOMAIN
PHYSICAL DOMAIN
Physical partitions
Floorplans
Module layout
Cell layout
Transistor layout
Systems
Algorithms
Register transfers
Logic
Transfer functions
Processors
ALU’s, RAM, etc.
Gates, flip-flops, etc.
Transistors
STRUCTURAL DOMAIN
65
Friday, January 22, 2010
S.Veneziano, gennaio 2010, Roma
Modelli hardware
f
i1 i2
o! f (i1, i2)
i1 i2 i3i0
c1
c0
o! ik , k! 2c1" c0
i
o
enable
bus
(a) (b) (c)
(d) (e)
• unita’ funzionali
• registri
• (de)multiplexers
• buses
66
Friday, January 22, 2010
S.Veneziano, gennaio 2010, Roma
Vincoli alla sintesi di alto livello
• ai constraint di sintesi e di timing standard, si aggiungono vincoli sul tipo di unita’ funzionali utilizzabili, sulla possibilita’ di fare pipeline retiming, multicycle operations...
• piu’ in generale sara’ la strategia di clocking, la scelta della connettivita’ (bus vs mux), a determinare l’architettura.
67
Friday, January 22, 2010
S.Veneziano, gennaio 2010, Roma
Esempio di adder
December 12, 2005 Synopsys, Inc. 51
DesignWare IP Family
DW01_addAdder
Arith
DW
L Sythesizable IP
DW01_addAdder
! Parameterized word length
! Carry-in and carry-out signals
! Module Compiler Architectures
Table 1: Pin DescriptionPin Name Width Direction Function
A width bit(s) Input Input dataB width bit(s) Input Input dataCI 1 bit Input Carry-inSUM width bit(s) Output Sum of (A + B + CI)CO 1 bit Output Carry-out
Table 2: Parameter DescriptionParameter Values Description
width !1 Word length of A, B, and SUM
CI
COB
A
SUM
DesignWare IP Family
DW01_addAdder
52 Synopsys, Inc. December 12, 2005
Arith
Table 3: Synthesis Implementationsa
Implementation Name Function License Feature Requiredrpl Ripple-carry synthesis model nonecla Carry-look-ahead synthesis model noneclf Fast carry-look-ahead synthesis model DesignWarebk Brent-Kung architecture synthesis model DesignWare
csmb Conditional-sum synthesis model DesignWare
rpcs Ripple-carry-select architecture DesignWare
clsac MC-inside-DW carry-look-ahead-select DesignWare
csac MC-inside-DW carry-select DesignWare
fastclac MC-inside-DW fast carry-look-ahead DesignWare
pprefixc MC-inside-DW flexible parallel-prefix DesignWare
pparchd Delay-optimized flexible parallel-prefix DesignWare
a. During synthesis, Design Compiler will select the appropriate architecture for your constraints. However, you may force Design Compiler to use one of the architectures described in this table. For more details, please refer to the DesignWare Building Block IP User Guide.
b. The performance of the csm implementation is heavily dependent on the use of a high-performance inverting 2-to-1 multiplexer in the technology library. In such libraries, the csm implementation exhibits a superior area-delay product. Although the csm implementation does not always surpass the delay performance of the clf implementation, it is much lower in area.
c. This architecture is specially generated using Module Compiler technology. It is normally used as a replacement for, rather than in conjunction with, the HDL architectures available for the same DesignWare part. To use this architecture during synthesis, the dc_shell-t variable ‘dw_prefer_mc_inside’ must be set to ‘true.’ From the DC 2004.12 release onward, the MC architectures are not available by default. For more information, refer to the DesignWare Building Block IP Users Guide.
d. This delay-optimized parallel-prefix architecture is generated using Datapath generator technology DW "gensh.” This is ON by default in the Design Compiler flow. The DC variable ‘synlib_enable_dpgen’ must be set to ‘true’ (the default) to make use of this Datapath technology.
68
Friday, January 22, 2010
S.Veneziano, gennaio 2010, Roma
Esempio di sintesi di alto livello// corso master// esempio di codice per sintesi di alto livello//module addera (clk,a,b,d,e,c);input clk;input [7:0] a,b;input [8:0] d,e;output [15:0] c;reg [15:0] c;reg [8:0] banka, bankb;
always @(posedge clk)begin banka <= a + b; bankb <= d - banka; c <= bankb * e;endendmodule
69
Friday, January 22, 2010
S.Veneziano, gennaio 2010, Roma
floorplan
• ciascun blocco puo’ essere piazzato manualmente, definendone la geometria (area fissa)
72
Friday, January 22, 2010
S.Veneziano, gennaio 2010, Roma
floorplan
• output: file di floorplan (DEF), piazzamento “hard” di celle di IO e macrocelle, piazzamento “soft” di blocchi di celle standard.
74
Friday, January 22, 2010
S.Veneziano, gennaio 2010, Roma
macrocells placement
75
• in questa fase viene creata anche la rete di distribuzione di potenza (analisi IR dopo il piazzamento).
Friday, January 22, 2010
S.Veneziano, gennaio 2010, Roma
clock tree planning and implementation
• sintesi
**** Clock Tree muxb/clockb/U18/Z Stat ****Total Clock Level : 10***** Top Nodes *****muxb/clockb/U18/Z delay[0(ps) 0(ps)] ( muxb/int_clk320__L1_I0/A )Level 10 (Total=9267 Sink=9267)Level 9 (Total=183 Sink=0 CLKBD32=183)Level 8 (Total=24 Sink=0 CLKBD32=24)Level 7 (Total=4 Sink=0 CLKBD32=4)Level 6 (Total=2 Sink=0 CLKBD32=2)Level 5 (Total=1 Sink=0 CLKBD32=1)Level 4 (Total=1 Sink=0 CLKBD20=1)Level 3 (Total=1 Sink=0 CLKBD16=1)Level 2 (Total=1 Sink=0 CLKBD16=1)Level 1 (Total=1 Sink=0 CLKBD16=1)Total Sinks : 9267
********** Clock muxb/clockb/U18/Z Pre-Route Timing Analysis **********Nr. of Subtrees : 1Nr. of Sinks : 9267Nr. of Buffer : 218Nr. of Level (including gates) : 9Max trig. edge delay at sink(R): muxb/coreb/trigb/cm_trig_declu_j0/JREGS/sizeeq34_reg_17_/CK 1887.4(ps)Min trig. edge delay at sink(R): muxb/coreb/trigb/cm_core_slice_1/cm_i_18/j_ge1_16bit_reg_2_/CK 1713.5(ps)
(Actual) (Required) Rise Phase Delay : 1713.5~1887.4(ps) 1720~1720(ps) Fall Phase Delay : 1781.1~1965.6(ps) 1720~1720(ps) Trig. Edge Skew : 173.9(ps) 200(ps) Rise Skew : 173.9(ps) Fall Skew : 184.5(ps) Max. Rise Buffer Tran. : 181.2(ps) 300(ps) Max. Fall Buffer Tran. : 188.2(ps) 300(ps) Max. Rise Sink Tran. : 84.5(ps) 200(ps) Max. Fall Sink Tran. : 78.4(ps) 200(ps)
77
## clk320#AutoCTSRootPin muxb/clockb/U18/ZMinDelay 1720psMaxDelay 1720psSinkMaxTran 200psBufMaxTran 300psMaxSkew 200psBuffer CLKBD2 CLKBD4 CLKBD8 CLKBD12 CLKBD16 CLKBD20 CLKBD32 INVBD2 INVBD4 INVBD8 INVBD12 INVBD16 INVBD20 INVBD32 ExcludedPin+ muxb/clockb/U18/Z#MaxCap#+ CLKBD2 0.13pf#+ CLKBD4 0.13pf#+ CLKBD8 0.13pf#+ CLKBD12 0.13pf#+ CLKBD16 0.13pf#+ CLKBD20 0.13pf#+ CLKBD32 0.13pf#+ INVBD2 0.13pf#+ INVBD4 0.13pf#+ INVBD8 0.13pf#+ INVBD12 0.13pf#+ INVBD16 0.13pf#+ INVBD20 0.13pf#+ INVBD32 0.13pfEnd
Friday, January 22, 2010
S.Veneziano, gennaio 2010, Roma
Il piazzamento• E’ il problema di assegnare automaticamente le
posizioni corrette delle celle sul chip, ottimizzando una funzione costo.
• problemi diversi di piazzamento appaiono nel caso di:
• celle standard;
• building blocks;
• celle e building blocks.
• puo’ essere parte integrante del processo di sintesi (sintesi fisica).
79
Friday, January 22, 2010
S.Veneziano, gennaio 2010, Roma
Rappresentazione del circuito
• esempio
S
R
Q
Q
g1
g2
80
Friday, January 22, 2010
S.Veneziano, gennaio 2010, Roma
Cell-port-net modelALGORITHMS FOR VLSI DESIGN AUTOMATION
PLACEMENT AND PARTITIONING
2
July 22, 1999
CELL-PORT-NET MODEL
S
R
Q
Q
g1
g2
n1
n2
n3
n4
struct cell {struct cell master *cell type; /* Access to cell type, e.g. NAND
gate and other generic properties */
char id[ ]; /* A string that uniquely identifies the cell, e.g. g1 */
set of struct port in ports, out ports;
};
struct port {struct port master *port type; /* Access to generic port information */
char id[ ]; /* Unique identification */
struct cell *parent cell; /* To which cell does this port belong? */
struct net *connected net; /* To which net is this port connected? */
};
struct net {char id[ ]; /* Unique identification */
set of struct port joined ports; /* Ports connected by the net */
};
81
Friday, January 22, 2010
S.Veneziano, gennaio 2010, Roma
Standard cell
• piazzamento non vincolato.
VDD
GND
CLK
CELL 1 CELL 2
82
Friday, January 22, 2010
S.Veneziano, gennaio 2010, Roma
Relazione col routing• Idealmente, il piazzamento e il routing
dovrebbe essere fatto simultaneamente perche’ ognuno dipende dai risultati dell’altro. Attualmente non e’ possibile fare routing dettagliato e piazzamento con lo stesso tool.
• In pratica il piazzamento viene effettuato in una fase preliminare. Viene fatta una stima delle lunghezze usando una metrica (minimum spanning tree, steiner tree).
83
Friday, January 22, 2010
S.Veneziano, gennaio 2010, Roma
Relazione con la sintesi
• wire lenght estimation
• dal floorplan al piazzamento85
Friday, January 22, 2010
S.Veneziano, gennaio 2010, Roma
Routing
• Il routing locale e’ il processo volto a determinare la struttura che connette un insieme di terminali in una data area di routing
• Il routing locale e’ diverso dal routing globale, dedicato all ricerca delle aree di routing dove verranno realizzate le connessioni. Quest’ultimo tipo di routing non fissa le strutture di connessione al’interno delle aree di routing.
86
Friday, January 22, 2010
S.Veneziano, gennaio 2010, Roma
Caratterizzazione• Il routing locale e’ caratterizzato da un numero di parametri (ogni set di
valori definisce un tipo di problema):
• numero di layer di routing
• l’orientazione dei segmenti per ogni layers, orizzontale, verticale, altro
• griglia presente o assente
• presenza o assenza di ostacoli nell’area di routing
• aree di routing fisse o estendibili
• posizione dei terminali: su due linee parallele, su un rettangolo, su un area arbitraria.
• terminali con posizione fissa o mobile.
87
Friday, January 22, 2010
S.Veneziano, gennaio 2010, Roma
Routing all’interno di un blocco di standard cell
• griglia di densita’ di routing, e’ utilizzata anche in fase di piazzamento, per identificare problemi di congestione
(a) (b)
6 (= n)1 2 3 40 5 6 (= n1 2 3 40 5
0
1
2
3
4
5 (= m)
88
Friday, January 22, 2010
S.Veneziano, gennaio 2010, Roma
La sintesi fisica • Procede per fasi (semplificando):
• prima fase di sintesi, senza alcuna informazione fisica. Viene generata la netlist iniziale.
• piazzamento delle celle standard, utilizzando un floorplan (dimensioni die, IO, macrocelle), guidato dai constraints di sintesi (timing-driven)
• iterazione sui due punti seguenti:
• routing approssimato e stima degli RC di ciascuna net
• ottimizzazione delle net piu’ critiche, che violano i constraints di timing/area, attraverso una ottimizzazione logica e il nuovo piazzamento delle celle interessate.
89
Friday, January 22, 2010
S.Veneziano, gennaio 2010, Roma
physical synthesis output
• netlist e piazzamento del circuito, manca il routing dettagliato.90
Friday, January 22, 2010
S.Veneziano, gennaio 2010, Roma
legame tra sintesi e routing dettagliato
• la sintesi fisica fa uso di un router per stimare i valori RC di ciascuna connessione.
• e’ possibile confrontare i valori stimati con i valori ottenuti dal router dettagliato, ed applicare delle costanti di calibrazione per rendere piu’ simili i risultati.
91
Friday, January 22, 2010
S.Veneziano, gennaio 2010, Roma
analisi RC• problema: stima sbagliata del
tool di sintesi, che effettua solo una stima del routing e non il routing dettagliato, offset e distribuzione allargata
• soluzione: introduzione delle costanti di calibrazione
0
no. nets
DeltaRCset_delay_estimation_options -max_unit_horizontal_capacitance 0.00016set_delay_estimation_options -max_unit_horizontal_resistance 0.00046set_delay_estimation_options -max_unit_vertical_capacitance 0.00013set_delay_estimation_options -max_unit_vertical_resistance 0.00050set_delay_estimation_options -min_unit_horizontal_capacitance 0.00015set_delay_estimation_options -min_unit_horizontal_resistance 0.00034set_delay_estimation_options -min_unit_vertical_capacitance 0.00012set_delay_estimation_options -min_unit_vertical_resistance 0.00040
92
Friday, January 22, 2010
S.Veneziano, gennaio 2010, Roma
Stima e routing finale
• piccole congestioni risolte nella fase di routing dettagliato
93
Friday, January 22, 2010
S.Veneziano, gennaio 2010, Roma
Post-route parasitics
• tools 3-D di estrazione
DATASHEET
FIRE & ICE QXTFULL-CHIP 3-D EXTRACTION
Fire & Ice® QXT is a fast, 3-D accurate transistor-level extractor with digital full-chip
capacity. It integrates seamlessly with GDSII and SPICE flows — and with Calibre
through a HCCI interface. Fire & Ice QXT generates parasitic data files in DSPF
and SPICE formats for sign-off timing, power, or signal integrity analysis with
VoltageStorm™ power analysis and CeltIC™ crosstalk analysis.
Figure 1: Fire & Ice QXT uses a halo to account for all near-body and multi-level interconnectcapacitive effects, including the impact of crossover fringe, corners, and capacitive shading
KEY BENEFITS• Transistor-level extraction for final
tapeout verification:
– Extracts data for power grid sign-offwith VoltageStorm
– Device-level and mixed modeextraction
– Integrated with GDSII, LEF/DEFand DEF/GDSII inputs and DSPF,SPICE output
• RC reduction
• Second-generation 3-D adaptiveanalytical extraction modeling
• Distributed, coupled RC data
• Validated in silicon by leading foundries
• Unlimited capacity
SILICON-VALIDATED ACCURACYFire & Ice QXT incorporates an enhancedversion of the production-proven 3-Dadaptive analytical extraction modelingtechnology, which enables distributedand coupled RC extraction faster thanever before. A suite of analyticalmodels is created once per process.
During extraction, parameters aregenerated based on very specific 3-Dregions, then the parameters arepassed to the analytical models forcapacitance calculation. The modelsemploy a special influence region,known as a “dynamic halo.” The halo
accounts for all near-body and multi-level interconnect capacitive effects,including the impact of crossover fringe,corners, and capacitive shading (seeFigure 1). Coupling capacitance playsa dominant role in determining theperformance of UDSM designs, so 3-D
© 2003 Cadence Design Systems, Inc. All rights reserved.Cadence, the Cadence logo, and Fire & Ice are registeredtrademarks and VoltageStorm and CeltIC are trademarksof Cadence Design Systems, Inc. All others are propertiesof their respective holders.
4977 10/03
adaptive analytical modeling extractsboth lumped and fully distributed net-to-net coupled capacitance.
In contrast to less accurate 2-D, 2-D +2-D, or “Quasi 3-D” methods, 3-Dadaptive analytical extraction modelingidentifies in three dimensions all theinteracting objects within the contextof the conductor being evaluated.Because 3-D adaptive analyticalextraction models do not rely onpattern-matching techniques, they donot suffer from boundary errors. 3-Dadaptive analytical extraction modelsscale with shrinking processes andincreasing design sizes and can beused across a broad spectrum ofdesign styles — without “tuning”for each design.
Cadence foundry partners havevalidated the accuracy of 3-D adaptiveanalytical extraction in actual silicon,so users can have confidence that theywill meet their timing budgets anddesign goals. Through the CadenceFoundry Partners program, theseIceCaps models are pre-validated forsix of the world’s leading foundries —Chartered Semiconductor, IBM, NEC,Toshiba, TSMC, and UMC. Models areavailable directly from the foundry.Or you can build your own modelsfor any process technology using thebuilt-in, fully automated IceCapsmodel generation capability inFire & Ice QXT, which makes processtransitions fast and easy (see Figure 2).
3-D ACCURATE EXTRACTIONFire & Ice QXT promotes greater accuracyin timing closure and reduces the
number of layout iterations to achievetargeted design performance. It alsoeliminates the need for excessive over-design in order to meet timing goals.Fire & Ice QXT provides a simplifiedset-up process and single-stepexecution that give designers the fastturnaround and ease of use neededfor transistor-level ASIC design flows.
PERFORMANCE ON 300 MHZ CPUFire & Ice QXT is a transistor-levelextraction product that providesovernight, full-chip, 3-D-accurate,transistor-level extraction for finaltapeout verification. It takes advantageof multiple-CPU processing, by sectioninga design into stripes, then running theresistance and capacitance engines onthe stripes before re-combining thedata to efficiently extract multi-milliontransistor chips. It also extracts the datarequired by VoltageStorm for powergrid sign-off—an essential sign-offrequirement for UDSM designs.
INTEGRATION WITHSTANDARD DESIGN FLOWS
• Fire & Ice QXT includes SPICE andGDSII readers that enable easyintegration with physical design tools
• Produces DSPF and SPICE outputs
• Calibre users can enter design datathrough the HCCI interface (see Figure 2)
• Cadence works closely with EDApartners like Synopsys, Mentor, andMagma to provide support forcommon design flows
PLATFORM SUPPORTFire & Ice runs on standard UNIXworkstations from Sun Microsystemsand Hewlett-Packard.
• OS Support: 32- and 64-bit
• LINUX: Red Hat 7.2 (32-bit)
• Solaris 7 and 8
• HP-UX 11
SYSTEM REQUIREMENTSSystem requirements will vary dependingon your circuit size. Here are somegeneral guidelines:
MINIMUM CONFIGURATION
DRAM: 512 Mb
Swap space: 2 Gb
Disk space for software: 50 Mb
Disk space for 1 million-gate design: 2 Gb
RECOMMENDED CONFIGURATION
DRAM: 1 Gb
Swap space: 4 Gb
Disk space for software: 50 Mb
Disk space for 1 million-gate design: 4 Gb
FOR MORE INFORMATIONLog on to www.cadence.com or emailus at [email protected]
DSPF
Technology file
Cell library
Calibre database
GDSII
LEF or GDSII
SPICE
Fire & Ice QXT
IceCapsTextual process data
PROCESS DATA
CELL DATA
DESIGN DATA
LibGen
XTC
H – HierarchicalC – CalibreC – ConnectivityI – Interface
Figure 2: Fire & Ice QXT dataflow. Foundry-specific processinformation is fed into IceCapsto create a technology file.LibGen reads in LEF and GDSIIdesign data to create Fire & IceQXT cell library information.Interconnect data can be inputvia a hierarchical Calibredatabase or annotated GDSII.These are the only threeinputs to Fire & Ice QXT
7-6
Chapter 7: Parasitic Back-Annotation
Annotating Detailed Parasitics
You can annotate detailed parasitics into PrimeTime and annotate
each physical segment of the routed netlist in the form of resistance
and capacitance (see Figure 7-3). Annotating detailed parasitics is
very accurate but more time-consuming than annotating lumped
parasitics. Because of the potential complexity of the RC network,
PrimeTime takes longer to calculate the pin-to-pin delays in the
netlist.
This RC network is used to compute effective capacitance
(Ceffective), slew, and delays at each subnode of the net.
PrimeTime can read detailed RC in DSPF and SPEF formats.
Figure 7-3 Detailed RC
You can use this model for netlists that have critical timing delays,
such as clock trees. This model can produce more accurate results,
especially in deep submicron designs where net delays are more
significant compared to cell delays.
The detailed RC network supports meshes, as shown in Figure 7-4.
C1
R1
C2
R2
R3
R4
R6
R5
C4
C6
C3
C5
C7
95
Friday, January 22, 2010
Top Related