Log_prog1

93
Dispositivi Dispositivi programmabili programmabili

description

programming 1 slide

Transcript of Log_prog1

  • Dispositiviprogrammabili

  • Dispositivi programmabiliSono dispositivi hardware (chip) che mettono a disposizione elementi logici piu o meno complessi che possono essere opportunamente interconnessi secondo diverse configurazioni in funzione delle specifiche di progettoDispongono diComponenti logici (porte logiche, Flip-Flop, Buffer)Linee di connessioneSistemi di inetrconnessione (Multiplexer, connessioni)Porte di I/OTipologie di Circuiti ProgrammabiliPLA, PAL, ROM.CPLDFPGA

  • Dispositivi ProgrammabiliI diversi dispositivi possono essere classificati in base a diversi aspetti:Modalita di programmazioneprogrammabili a maschera (MPGA)programmabili una volta (Fuse o Antifuse)riprogrammabili (EEPROM, SRAM)riconfigurabili (SRAM)ConnessioniGlobaliLocali e distribuite

  • Modalita di programmazioneFUSELe connessioni tra linee sono inizialmente tutte attiveIn fase di programmazione si disattivano permanentemente le connessioni inutiliANTIFUSELe connessioni tra linee sono inizialmente tutte inattiveIn fase di programmazione si attivano permanentemente le connessioni utiliEEPROM le connessioni inizialmente sono tutte inattiveIn fase di programmazione si possono attivare o disattivare elettricamente in modo non distruttivoLo stato viene mantenuto anche in assenza di alimentazione

  • Modalita di programmazioneSRAM le connessioni inizialmente sono tutte inattiveIn fase di programmazione si possono attivare o disattivare elettricamente in modo non distruttivoLo stato NON viene mantenuto in assenza di alimentazioneMaggiore velocita di programmazione rispetto la tecnologia EEPROM

    In base alla tecnologia la programmazione puo avvenire:Durante la fase non operativa del dispositivo (riprogrammabile)Durante la fase operativa del dispositivo (riconfigurabile)Si interviene separatamente su varie parti del dispositivo

  • FuseLe linee del dispositivo sono in origine tutte connesseLa programmazione consiste nel BRUCIARE (fuse) alcune connessioni in modo tale da mantenere solo quelle necessarieLa programmazione avviene mediante una tensione piu elevata di quella di normale funzionamento

  • AntifuseLe linee del dispositivo sono in origine tutte disconnesseLa programmazione consiste nel CREARE (antifuse) le connessioni necessarieLa programmazione avviene mediante una tensione piu elevata di quella di normale funzionamento

  • EEPROMLe linee del dispositivo sono in origine tutte disconnesseLa programmazione consiste nel DEPOSITARE una carica sul gate flottante del transistor in modo da mandarlo in conduzioneLa cancellazione puo avvenire elettricamente o tramite esposizione a raggi UV

  • SRAM (RAM statica)Le linee del dispositivo sono in origine tutte disconnesseLa programmazione consiste nel MEMORIZZARE un valore logico (0 o 1) in una cella di RAM Statica

  • ConnessioniConnessione GlobaleLinea che attraversa buona parte del dispositivo e condivisa da molti elementi logiciElevati ritardiPuo essere pilotata da un solo elemento logico (scarsa flessibilita)Connessione localeLinea che attraversa una parte ridotta del dispositivo ed e condivisa da pochi elementi ritardi piu contenutielevata flessibilita

  • ConnessioniLe connessioni globali sono caratteristiche deiDispositivi logici a due livelli:PAL, PLA, ROMCPLD (Complex Programmable Logic Device)Le connessioni locali sono caratteristiche degliFPGA (Field Programmable Gate Array)

  • Logiche programmabili a 2 livelliSono usate per realizzare funzioni logiche a due livelli NOTA1: qualunque funzione combinatoria puo essere espressa come somma di termini minimiNOTA2: si possono realizzare funzioni a piu livelli sfuttando la retroazioneDispongono di:Un numero di ingressi fissato (Buffer di Ingresso)Un piano di AND (per realizzare i termini minimi)Un piano di OR (per realizzare le somme)Un numero di uscite fissato (Buffer di Uscita)

  • Logiche programmabili a due livelliVi sono tre tipi principaliPLA (Programmable Logic Array)Piano AND programmabileImplementa solo i termini minimi necessariPiano OR programmabilePAL (Programmable Array Logic)Piano AND programmabilePiano OR fissatoImpone un vincolo sul numero di termini minimi che la funzione contieneROM (Read Only Memory)Piano AND fissatoImplementa tutti i possibili termini minimi (DECODER)Piano OR programmabile

  • Programmable Logic Array (PLA)Consente di realizzare qualunque funzione logica Questa e espressa in somme di implicanti

  • Programmable Logic Array (PLA)Schema logico di una PLA Esempio con 3 ingressi e due uscite (non programmata)

  • Programmable Logic Array (PLA)ESEMPIO1:Realizzazione delle funzioni f1 = ab + ac + abc f2 = ab + ac + abcProdotti p1 = ab p2 = ac p3 = ac p4 = abcSomme f1 = p1 + p3 + p4 f2 = p1 + p2 + p4Formato PLA:11-101-0100011011-011-10100101

  • Programmable Logic Array (PLA)

  • Programmable Logic Array (PLA)

  • Programmable Logic Array (PLA)

  • Programmable Logic Array (PLA)

  • Programmable Array Logic (PAL)Piano di AND programmabile e piano OR fissatoConsente di implementare somme di prodottiVi puo essere un limite sul numero massimo di prodotti che possono concorrere nella realizzazione di una funzione

  • Programmable Array Logic (PAL)Schema logico di una PALEsempio di PAL a 3 ingressi e 2 uscite (non programmata)

  • Read Only Memory (ROM)Puo essere realizzata con un piano di AND fisso e completo e con un piano di OR programmabileIn pratica implementa m funzioni a n ingressiad una configuarzione dingresso (INDIRIZZO) viene associata una configurazione duscita (PAROLA)Il piano AND agisce da DECODIFICATORE degli indirizzi

  • Read Only Memory (ROM)Piano AND (decodificatore degli indirizzi)realizza tutti i possibili termini minimiper ogni configurazione dingresso attiva una ed una sola linea duscita

  • Read Only Memory (ROM)Schema logico del piano AND

  • Read Only Memory (ROM)Schema logico di una ROMEsempio di una ROM a 3 ingressi e 4 uscite (non programmata)

  • Read Only Memory (ROM)Esempio:dalla tabella di verita della funzione a piu uscite

    Foglio1

    abcf1f2f3f4f5

    00000010

    00111110

    01000101

    01110110

    10010111

    10111111

    11010111

    11111011

    Foglio2

    Foglio3

  • Read Only Memory (ROM)Realizzazione della funzione

  • PLA e PAL avanzatePLA e PAL consentono di realizzare solo reti combinatorie a due livelliQuesto limite puo essere superatoIntroducendo una rete di reazione permette di implementare reti combinatorie a piu di due livelliIntroducendo elementi di memoria (Flip-Flop)permette di implementare macchine sequenziali (sincrone)

  • PLA e PAL avanzateEsempio di implementazione di una rete combinatoria a piu livelli grazie alla retroazione

  • PLA e PAL avanzateEsempio: realizzazione

  • PLA e PAL avanzateLaggiunta di elementi di memoria in uscita possono ulteriormente ampliare le prestazioni del dispositivo

  • CPLD Complex Programmable Logic DeviceSono la logica evoluzione di PAL e PLASono caratterizzati da:Connessioni globaliLogica ConcentrataRispetto PAL e PLASono piu complessi e hanno dimensioni maggioriConsentono di ottenere prestazioni piu elevate

  • CPLD - XC9500 Architecture5 volt in-system programmable (ISP) CPLDs5 ns pin-to-pin36 to 288 macrocells (6400 gates) Industrys best pin-locking architecture10,000 program/erase cyclesComplete IEEE 1149.1 JTAG capabilityFunctionBlock 1JTAGControllerFunctionBlock 2I/OFunctionBlock 4 3Global Tri-States2 or 4FunctionBlock 3I/OIn-SystemProgramming ControllerFastCONNECTSwitch MatrixJTAG Port3I/OI/OGlobal Set/ResetGlobal ClocksI/OBlocks1

  • XC9500 Function BlockEach function block is like a 36V18 !

  • Struttura XC9500Struttura della macrocella

  • Struttura XC9500Struttura del product term allocator

  • Struttura XC9500Possibilita di collegamento offerte dal product term allocator

  • Struttura XC9500Possibilita di collegamento offerte dal product term allocator

  • Struttura XC9500Linee globali di clock, set, reset

  • Struttura XC9500Cella di I/O

  • XC9500 Product Family9536MacrocellsUsable GatestPD (ns)RegistersMax I/O3672108144216800160024003200480057.57.57.51036721081442163472108133166PackagesVQ44PC44

    PC44PC84TQ100PQ100

    PC84TQ100PQ100PQ160

    PQ100PQ160

    288640010288192

    HQ208BG352

    PQ160HQ208BG352957295108951449521695288

  • XC9500 - Q e AQ: E possibile realizzare tramite CPLD XC9500 la seguente architettura?

  • XC9500 - Q e AQ: E possibile realizzare tramite CPLD XC9500 la seguente architettura?A: NO con una CPLD si possono realizzare solo circuiti DIGITALI

  • XC9500 - Q e AQ: E possibile realizzare tramite CPLD XC9500 la seguente funzione, ossia un predeterminato RITARDO asincrono ?RIT = 20nsInOut

  • XC9500 - Q e AQ: E possibile realizzare tramite CPLD XC9500 la seguente funzione, ossia un predeterminato RITARDO asincrono ?RIT = 20nsInOutA: NO: non vi e alcun modo o alcun elemento che possa realizzare questa funzione. Nota1: nei circuiti digitali il ritardo e una conseguenza (indesiderata) della struttura stessa del circuito e non un parametro da soddisfareNota2: Un elemento di ritardo e tuttavia realizzabile in modo SINCRONO

  • XC9500 - Q e AQ: E possibile realizzare tramite CPLD XC9500 la seguente funzione logica (gli ingressi e le uscite siano collegati direttamente ai pin di I/O del dispositivo)?

  • XC9500 - Q e AQ: E possibile realizzare tramite CPLD XC9500 la seguente funzione logica (gli ingressi e le uscite siano collegati direttamente ai pin di I/O del dispositivo)?A: NO: Non esiste alcun collegamento diretto tra i pin e la logica interna, ma bisogna passare attraverso ai buffer di I/O (il tool di sviluppo corregge automaticamente questo tipo di errore)

  • XC9500 - Q e AQ: E possibile realizzare tramite CPLD XC9500 la seguente funzione (una logica pilota altre logiche struttando un bus tri-state) Logic 1Logic 2Logic 3

  • XC9500 - Q e AQ: E possibile realizzare tramite CPLD XC9500 la seguente funzione (una logica pilota altre logiche struttando un bus tri-state) Logic 1Logic 2Logic 3A: NO ! Gli unici buffer tri-state del dispositivo sono disponibili nei blocchi di I/O e pertanto non possono essere usati per pilotare logiche interne al dispositivo stesso! (un eventuale loop dal pin di uscita dovrebbe passare attraverso un buffer di ingresso che ne annullerebbe leffetto

  • FPGA: IntroduzioneLe FPGA (Field Programmable Gate Array) sono dispositivi programmabili costituiti da una matrice di componenti logici collegabili tra loroArchitecturePAL/22V10-like Gate array-likeMore CombinationalMore Registers + RAM

    DensityLow-to-medium Medium-to-high 0.5-10K logic gates 1K to 3.2M system gates

    PerformancePredictable timing Application dependent Up to 250 MHz today Up to 200 MHz today

    InterconnectCrossbar Switch Incremental Complex Programmable Logic Device (CPLD)Field-Programmable Gate Array (FPGA)

  • FPGALe FPGA mettono a disposizione dellutenteComponenti logici (CLB - Slice) costituiti dalogica, piccole memorie, flip-flop, buffer, multiplexer.Linee di connessione sia locali (corte) che distribuite (lunghe)Matrici di inter-connessioneper collegare varie line tra loro e da queste ai blocchi logigiBlocchi di I/Oparticolari blocchi logici dedicati allI/O provvedono Buffer, protezioni, Fan-out, resistenze di pull-up e pull-down, adattatori dimpedenza, Blocchi particolarimemorie, moltiplicatori, PLL, decodificatori,

  • FPGAPregi e difetti:Estremamente versatiliElevata complessita computazionalePiu lente di CPLD e ASICCosto elevato per singolo componente (ma esistono famiglie particolarmente economiche)Costo del prototipo ridottoTime to market molto ridottoPossibilita di upgrade del circuito (anche a distanza)Ottime per la realizzazione di prototipi, (ma si usano sempre di piu anche negli elevati volumi di fabbricazione)Capacita di supportare sistemi interni (embedded systems)

  • XC4000 Architecture and Features

  • XC4000 ArchitectureProgrammableInterconnectI/O Blocks (IOBs)ConfigurableLogic Blocks (CLBs)

  • XC4000E/X Configurable Logic Blocks2 Four-input function generators (Look Up Tables)- 16x1 RAM or Logic function2 Registers- Each can be configured as Flip Flop or Latch- Independent clock polarity- Synchronous and asynchronous Set/Reset

  • Look Up TablesCapacity is limited by number of inputs, not complexityChoose to use each function generator as 4 input logic (LUT) or as high speed sync.dual port RAMCombinatorial Logic is stored in 16x1 SRAM Look Up Tables (LUTs) in a CLBExample:Look Up Table4-bit address2(2 )4= 64K !

  • XC4000X I/O Block DiagramShaded areas are not included in XC4000E family.

  • Xilinx FPGA Routing1) Fast Direct Interconnect - CLB to CLB2) General Purpose Interconnect - Uses switch matrix3) Double Lines4) Long LinesSegmented across chipGlobal clocks, lowest skew2 Tri-states per CLB for busses

  • Whats Really In that Chip?CLB(Red)Switch MatrixLong Lines(Purple)Direct Interconnect (Green)

  • Spartan-II Architecture and Features

  • XilinxYour Programmable Logic Solution Virtex-II

    CPLDs Low PowerFPGAsSRAM-basedFeature RichHigh PerformanceSpartan-IIE

    Density (System Gates)FeaturesFPGAsSRAM-basedFeature RichLow Cost10K 600K 10M

    For Academic Use Only

  • FeaturesPlentiful logic and memory resources15K to 200K system gates (up to 5,292 logic cells)Up to 57 Kb block RAM storageFlexible I/O interfacesFrom 86 to 284 I/Os 16 signal standardsAdvanced 0.25/0.22um 6-Layer Metal ProcessHigh performanceSystem frequency as high as 200 MHzAdvanced Clock Control with 4 Dedicated DLLsUnlimited Re-programmabilityFully PCI Compliant

  • Spartan-II Top-level ArchitectureConfigurable logic blocksImplement logic here!I/O blocksCommunicate with other chipsChoose from 16 signal standards Block RAMOn-chip memory for higher performance

  • Spartan-II Top-level ArchitectureClocks and delay locked loopsSynchronize to clock on and off chip Rich interconnect resources Three-state internal busesPower down modeLower quiescent power

  • CLB StructureEach slice has 2 LUT-FF pairs with associated carry logicTwo 3-state buffers (BUFT) associated with each CLB, accessible by all CLB outputs

  • Sommatore veloce Xilinx

  • CLB Slice (Simplified)1 CLB holds 2 slicesEach slice contains two sets of the following:Four-input LUTAny 4-input logic functionOr 16-bit x 1 RAMOr 16-bit shift register

  • CLB Slice (contd)Each slice contains two sets of the following:Carry & controlFast arithmetic logicMultiplier logicMultiplexer logicStorage elementLatch or flip-flopSet and resetTrue or inverted inputsSync. or async. control

  • Four-Input LUTImplements combinatorial logicAny 4-input logic functionCascaded for wide-input functions

    Truth Table

    Sheet1

    Configuration ModeM0M1M2Pre-configuration pullupsDirection of CCLK

    Master Serial000NoOut

    001Yes

    Slave Parallel010YesIn

    011No

    Boundary Scan100YesN/A

    101No

    Slave Serial110YesIn

    111No

    block_ram_ratio

    ADDRDATA#/WidthDepth50

    (11:0)(0:0)1409625

    (10:0)(1:0)2204840

    (9:0)(3:0)41024

    (8:0)(7:0)8512

    (7:0)(15:0)16256

    DeviceNo. of BlocksBlock RAM Bits

    XC2S15416,384

    XC2S30624,576

    XC2S50832,768

    XC2S1001040,960

    XC2S1501249,152

    XC2S2001457,344

    truth_table

    Inputs(ABCD)Output(Z)

    00000

    00010

    00101

    00110

    ..

    11101

    11111

    Sheet2

    One Time Programmable

    DevicePROMPackage

    PD8VO8SO20

    XC2S15XC17S15AYY--

    XC2S30XC17S30AYY--

    XC2S50XC17S50AYYY

    XC2S100XC17S100AYYY

    XC2S150XC17S150AYYY

    XC2S200XC17S200AYYY

    ISP

    DevicePROMPackage

    PC20SO20VQ44

    XC2S15XC18V256YYY

    XC2S30XC18V512YYY

    XC2S50XC18V01YYY

    XC2S100XC18V01YYY

    XC2S150XC18V01YYY

    XC2S200XC18V02Y--Y

    io_banking_rules

    Input Standards Compatibility Requirements

    Rule1All differential amplifier input signals within a bank are required to be of the same standardI/O StandardNo. of I/O Pins

    Rule2There are no placement restrictions for inputs with standards that require a single-ended inputSSTL, HSTL, GTL+1232192

    200 Mbps400 Mbps6 Gbps40 Gbps

    Output Standards Compatibility Requirements

    Rule 1Only outputs with standards which share compatible

    Rule 2There are no placement restrictions for outputs

    VCCOVrefCompatible Standards

    3.3V1.5VLVTTL, PCI, SSTL3, CTT

    1.32VLVTTL, PCI, AGP

    2.5V1.25VLVCMOS2, SSTL2

    DLL_parameters

    ParameterLow FrequencyHigh Frequency

    Input Clock

    Frequency25-90 MHz60-180 MHz

    Pulse Width3ns2.4ns

    Period Tolerance1ns1ns

    Lock time20-120us20us

    Output Jitter+/- 60ps+/- 60ps

    memory_types

    External Memory TypeSelectI/O Standard

    SRAMExternal Memory TypeSelectI/O Standard

    External Memory TypeSelectI/O StandardEDOTTL

    AsynchronousTTLSynchronousTTL, LVTTL

    SynchronousTTLFPMTTL

    NoBL/ZBTLVTTLDDRSSTL

    PBTTLPC100/133LVTTL, SSTL

    SGRAMHSTL

    QDRHSTL

    io_standards

    StandardVrefVCCOApplication

    LVTTLna3.3General Purpose

    LVCMOS2na2.5

    PCI 33MHz 3.3Vna3.3

    PCI 33MHz 5.0Vna3.3PCI

    PCI 66MHz 3.3Vna3.3

    GTL0.80naBack-Plane

    GTL+1.00na

    HSTL-I0.751.5

    HSTL-III0.901.5

    HSTL-IV0.901.5Hitachi SRAM

    SSTL3-I1.503.3

    SSTL3-II1.503.3SDRAM

    SSTL2-I1.252.5

    SSTL2-II1.252.5

    CTT1.503.3Memory

    AGP1.323.3Graphics

    cables

    CableSoftware SupportConfiguration Mode1Readback Support2

    MultiLINXHardware DebuggerSlave Serial, Slave ParallelYes

    JTAG ProgrammerJTAGYes

    Parallel CableHardware DebuggerSlave SerialNo

    JTAG ProgrammerJTAGYes

    NOTES:

    1. JTAG Mode also supports ISP PROMs

    2. Only in JTAG and Slave Parallel Modes

    jtag_commands

    CommandSupport

    BYPASSX

    SMPL/PRLDX

    EXTESTX

    INTESTX

    IDCODEX

    USERCODEX

    HIGHZX

    CLAMP--

    RUNBIST--

  • Distributed RAMCLB LUT configurable as Distributed RAMA LUT equals 16x1 RAMImplements Single and Dual-PortsCascade LUTs to increase RAM sizeSynchronous writeSynchronous/Asynchronous readAccompanying flip-flops used for synchronous readRAM16X1SODWEWCLKA0A1A2A3RAM32X1SODWEWCLKA0A1A2A3A4==orRAM16X1DSPODWEWCLKA0A1A2A3DPRA0DPODPRA1DPRA2DPRA3or

  • Shift RegisterEach LUT can be configured as shift registerSerial in, serial outDynamically addressable delay up to 16 cyclesFor programmable pipelineCascade for greater cycle delaysUse CLB flip-flops to add depthUse for programmable clock delay

  • Shift Register Register-rich FPGAAllows for addition of pipeline stages to increase throughputData paths must be balanced to keep desired functionality

  • Shift RegisterLUT as shift registerUsed to add pipeline stagesIncrease overall register count16 bit shift register per LUT64 bit shift register per CLB

  • CLB Arithmetic LogicDedicated carry logicProvides high performance for counters & arithmetic functionsDiscrete XOR component for single level sum completionTwo separate carry chains in CLB allow for 3 operand functionsCan also be used to cascade LUTs for wide-input logic functions

  • 3 Operand Adder FunctionA, B, C are two-bits wideSUM = A + B + C or PARTIAL + C, where PARTIAL = A + BImplementationFirst 2-operand sum A+B is performed in Slice 0Second 2-operand sum PARTIAL + C is performed in Slice 1Fast local feedback connection within the CLBVery small delay for on PARTIAL

  • Sommatore a 4 bitsOverflowCarry OutCarry In

  • Dedicated Expansion MultiplexersMUXF5 combines 2 LUTs to form4x1 multiplexerOr any 5-input functionMUXF6 combines 2 slices to form8x1 multiplexerOr any 6-input function

  • Memory Bandwidth and Flexibility200 MHz Memory ContinuumHighest performance FPGA memory systemkilobytes4Kx12Kx21Kx4512x8256x16Large FIFOs Packet BuffersVideo Line BuffersCache Tag MemoryDeep/WideBlock RAMSpartan-II on-chip SelectRAM+TM memory

  • Block RAM Provides 4K Bits EachDual read/write ports, each with:Independent clock, R/W, and enableIndependently configurable data width from 4Kx1 to 256x16Data Flow Spartan-IIA to B YesB to A YesA to A YesB to B Yes

  • Local RoutingInterconnect among LUTs, FFs, GRMCLB feedback path for connections to LUTs in same CLBDirect path between horizontally adjacent CLBsLocal Routing

  • General Purpose Routing24 single-length linesRoute GRM signals to adjacent GRMs in 4 directions96 buffered hex linesRoute GRM signals to another GRMs six blocks away in each of the four directions12 buffered Long linesRouting across top and bottom, left and rightDIRECTCONNECTIONINTERNAL BUSSESSingle-length linesBuffered Hex linesDirect connectionsLong lines and Global linesInternal 3-state Bus

  • Internal Three-state Buses

  • Routing SummaryVector-based routingPredictable routing delays independent of device size and routing directionCore-friendly architectureQuick Place and Route timesDesign to system at 100,000 gates per minuteEasier re-routingInternal 3-state bussingEliminates bus routing contention Improves density and performance

  • Clock distribution NetsHigh speedLow skew4 distribution nets4 dedicated input PADS4 dedicated Global buffers with inputs orfrom clock padfrom internal signal

  • System Clock ManagementDelay Lock Loops (DLLs) Lower Board CostsDe-skew clocks4 low-skew global clocksMirror clock for board distribution Multiply Divide ShiftConvert clock to different I/O standards using SelectI/ODLL1DLL2DLL3DLL4System Clocks

  • DLL CapabilitiesEasy clock duplication System clock distribution Cleans and reconditions incoming clock

    Quick and easy frequency adjustmentSingle crystal easily generates multiple clocks

    Faster state machine utilizing different clock phases Excellent for advance memory types

    De-skew incoming clockGenerate fast setup and hold time or fast clock-to-outs

  • Generic DLL OperationA DLL inserts delay on the clock net until the clock input rising edge is in phase with the clock feedback rising edgeRequires a well-designed clock distribution network: the clock edges arrive simultaneously everywhere in the part

  • Delay-locked Loop FunctionsEliminate clock distribution delaySystem synchronization (e.g., clock mirrors) Phase-shifted clocksClock multiplication and divisionClean up clocks with 50/50 duty cycle correctionClock lock for internal & external synchronizationDLL feedback connected internally or externallyCan synchronize configuration to DLL lock

  • Improved Clock-to-out Using DLLSpartan-II clock-to-out delays reduced over 50%Output standard = LVTTL Fast 16mA(OBUF_F_16)Temp=room, Vdd=2.5V, Vcco=3.3V

    Waveforms: 1: CLKIN 2: DATA OUT (no DLL) 3: DATA OUT (DLL deskewed)

    Timingw/o DLLw/ DLLr->r r->fr->r r->f3.6n 3.5n1.4n 1.4n

  • DLL MacrosTwo DLL versions availableControlled by macro choiceCLKDLL (low frequency) Input frequency: 25 MHz to 100 MHzAll 6 outputs availableCLK0, CLK90, CLK180, CLK270, CLK2X & CLKDVCLKDLLHF (high frequency)Input frequency 60 MHz to 200 MHz3 outputs availableCLK0, CLK180 & CLKDV

  • I/O Block (Simplified)Registered input, output, 3-state controlProgrammable slew rate, pull-up, pull-down, keeper and input delay

  • IOBs Organized As Independent BanksAs many as eight banks on a devicePackage dependentEach bank can be assigned any of the 16 signal standards

  • Programmable Output DriverSignificant EMI reduction benefitProgrammable driver strengthPull-up and Pull-down drivers can be individually controlled16 different setting for each2 slew rate settings

    Simultaneous Switching Output Guidelines

  • System Interfaces -- SelectI/O Supports multiple voltage and signal standards simultaneously Eliminate costly bus transceivers19 DifferentStandardsSupported!

  • SelectI/OTM Standards VCCO defines output voltageVREF defines input threshold reference voltageAvailable as user I/O when using internal reference

    io_standards

    StandardVREFVCCO

    Chip to Chip Interface

    LVTTLna3.3

    LVCMOS2na2.5

    LVCMOS18na1.8

    LVDSna2.5

    LVPECLna3.3

    Backplane Interface

    PCI 33MHz 3.3Vna3.3

    PCI 66MHz 3.3Vna3.3

    GTL0.80na

    GTL+1.00na

    AGP-2X1.323.3

    Bus LVDSna2.5

    Memory Interface

    HSTL-I0.751.5

    HSTL-III & IV0.901.5

    SSTL3-I & II1.503.3

    SSTL2-I & II1.252.5

    CTT1.503.3

  • Spartan-II As Center for Signal TranslationChip to ChipLVTTL, LVCMOSChip to MemorySSTL2-I, SSTL2-II, SSTL3-I,SSTL3-II, HSTL-I, HSTL-III,HSTL-IV, CTTChip to BackplanePCI33-5V, PCI33-3.3V, GTL, GTL+, AGPAllows support for future standards!

  • Partial ReconfigurationFrame by frame reconfiguration supported while device is runningRouting changes affect device operationRe-initializing a block RAM requires stopping all access in that columnCan dynamically load the required logic at a given timeMinimizes cost further by time-multiplexing the logic resources

  • Power-down ModeControlled by single power down pinAll inputs blocked, appear low internallyAll outputs disabledAll register states preservedPower-down status pinSynchronous wake up100 uA typical

  • Configuration ModesThere are four ways to program a Spartan-II FPGA

    Mode

    Config. Data Format

    Direction of Synchronizing Clock

    Use

    Slave Serial

    Serial

    FPGA receives CCLK

    Processor or CPLD or another FPGA ( in Master mode) controls configuration of slave FPGA

    Also for configuring multiple slave FPGAs in a daisy chain (2ND, 3RD FPGA, etc.).

    Master

    Serial

    Serial

    FPGA generates CCLK

    FPGA in Master mode configures itself from a serial PROM.

    Also, 1st FPGA (master) in daisy chain controls configuration of slave FPGA(s) in a daisy chain.

    Slave Parallel

    Byte

    FPGA receives CCLK

    Processor or CPLD controls the fast configuration of slave FPGA.

    JTAG

    Serial

    FPGA receives TCK

    Make use of existing boundary scan port

  • Spartan-II Family Overview

    Device

    XC2S15

    XC2S30

    XC2S50

    XC2S100

    XC2S150

    XC2S200

    Logic Cells

    432

    972

    1728

    2700

    3888

    5292

    Block RAM Bits

    16,384

    24,576

    32,768

    40,960

    49,152

    57,344

    Block RAM Qty.

    4

    6

    8

    10

    12

    14

    Max. User I/Os

    86

    132

    176

    196

    260

    284

    Package

    VQ100

    VQ100

    CS144

    CS144

    TQ144

    TQ144

    TQ144

    TQ144

    PQ208

    PQ208

    PQ208

    PQ208

    PQ208

    FG256

    FG256

    FG256

    FG256

    FG456

    FG456

    FG456

  • Spartan-II Architecture SummaryDelivers all the key requirements for ASIC replacement200,000 gates200 MHzFlexible I/O interfacesOn-chip distributed and block RAMClock managementLow powerComplete development system support

    Lets look at the current offering of FPGA and CPLD families from Xilinx. Xilinx has multiple product families. From the low power market leader CPLD, CoolRunner family to the advanced FPGA Families of Virtex.The Spartan-IIE FPGA and XC9500XL CPLD Families are targeted for the low cost, high volume system level designs that require up to 200K system gates. The CoolRunner CPLD family addresses the high performance, low power market segment. The Virtex/E/EM families are designed to meet the needs of high performance, high level design solutions. The focus of the winning edge is Spartan-IIE and XC9500XL families.

    Spartan FPGAs provide the low cost and high feature content required to be used in consumer electronics applications.Now we will look at the details of the architecture. Each of these sections will be examined in more detail later in the presentation.The configurable logic block (CLB) contains two slices. Each slice contains two 4-input look-up tables (LUT), carry & control logic and two registers. There are two 3-state buffers associated with each CLB, that can be accessed by all the outputs of a CLB.

    Xilinx is the only major FPGA vendor that provides dedicated resources for on-chip 3-state bussing. This feature can increase the performance and lower the CLB utilization for wide multiplex functions. The Xilinx internal bus can also be extended off chip.

    The FPGA is made up of an array of Configurable Logic Blocks (CLBs), and each CLB is made up of two slices, and each slice has two Look-Up Tables (LUTs) and 2 flip-flops.LUT is also known as function generator. It can be used to form any function of its four inputs. The software automatically cascades these LUTs to build wide input logic functions.

    When the CLB LUT is configured as memory, it can implement 16x1 synchronous RAM. One LUT can implement 16x1 Single-Port RAM. Two LUTs are used to implement 16x1 dual port RAM. The LUTs can be cascaded for desired memory depth and width.

    The write operation is synchronous. The read operation is asynchronous and can be made synchronous by using the accompanying flip flops of the CLB LUT.

    The distributed ram is compact and fast which makes it ideal for small ram based functions.

    The LUT can be configured as a shift register (serial in, serial out) with bit width programmable from 1 to 16. For example, DEPTH[3:0] = 0010(binary) means that the shift register is 3-bit wide. In the simplest case, a 16 bit shift register can be implemented in a LUT, eliminating the need for 16 flip flops, and also eliminating extra routing resources that would have been lowered the performance otherwise.

    In this example, there is a cycle imbalance, which must be fixed. Lets think of how the shift register can fix the imbalanced cycles. As seen from the slide, the logic will be off by nine clock cycles.

    The shift register functionality (Operation D) can be used in the path to obtain valid output by adding nine additional cycles. Alternatively, this design would have required additional logic and a counter to hold the data until the right time.

    You will see two carry chains present in a CLB. The discrete exclusive-OR gates (XOR) are used to compute sum of inputs in one logic level.Note that the Altera FLEX 10K and ACEX 1K families have only one carry chain in their LAB.

    Two carry chains are used to implement SUM = A + B + C function within a CLB. First carry chain is used to perform partial sum, A +B. The second carry chain is used to perform the sum of PARTIAL and C. Note that the synthesis tools may not be able to infer both the carry chains in the CLB. In that case, you will need to place the logic using the floorplanner before implementing the design.

    In the above example, two carry chains within a CLB would give higher performance as opposed to Alteras solution with just one carry chain in their Logic Array Block(LAB).

    Special logic in the CLB allows logic expansion beyond just using the lookup tables.Three types of memory are supported - small distributed RAM in the LUTs, large block RAM, or very large external RAM.The local routing provides:-connect logic within the CLB-path to connect to horizontally adjacent CLBs

    Xilinx provides automatic place and route tools efficiently use these routing The designers do not need to worry about this process.The Spartan-IIE family has vector based routing structure that provides routing delays independent of direction and device size. This feature is very useful for IP cores because it gives predictable timing for the IP; regardless of number of IPs, their placement and the device being used.

    The routing structure is abundant in the Spartan-IIE family. This in turn helps reduce the compile times. For example, a design of 100K gates targeting a Spartan-IIE can be routed in 1 minute with the release of the 3.1i software. Future version of the software should reduce these times even further.

    Xilinx is the only major FPGA vendor that provides dedicated resources for on-chip 3-state bussing. This feature can increase the performance and lower the CLB utilization for wide multiplexor functions. The Altera FPGAs do not have these resources. To emulate this functionality in their FPGAs would require extra LUTs with multiple levels of logic. The Xilinx internal bus can also be extended off chip.

    Spartan-IIE includes powerful chip and board level clock management with DLLs.DLLs are the 100% digital implementation of the old analog PLLs. Spartan-IIE contains 4 DLLs in each device. The DLLs perform the following functions:

    - Remove on-chip as well as off-chip clock delays (de-skew)

    - Clock multiplication, division and phase shift

    - Clock duplication for distribution to other chips on the system board

    - Clock output conversion to a different IO standard e.g. SSTL, using SelectI/O feature of the Spartan-IIE family.

    XAPP174 on the Xilinx web explains about DLLs with further details.

    The Altera FLEX 10KE and ACEX 1K families provide only one PLL, only on the fastest speed grade as an option at added cost.

    Clock Mirror duplicates incoming clock and performs system synchronization.

    Multiple and Divide functions allow simple frequency adjustments for distribution through out the board. By using inexpensive crystals, clock frequencies can by multiplied internally to the FPGA reducing board EMI.

    Clock Phase Shift provides coarse phase shifts of 0, 90, 180 and 270 degrees. Excellent for the fast clocking of State Machines by utilizing each of the different clock phases.

    Clock de-skew allows for faster setup, hold and clock-to-out times allowing higher overall system performance.

    DLL inserts a delay until the delayed feedback clock aligns with the input clock. At that point the DLL is locked.

    A key benefit of the DLL is the ability to remove delay from the clock path, and improve the effective clock-to-out delay.The I/O block features are automatically used according to the design entered into the development system.The Select I/O technology provides a universal I/O translation capability across many voltage levels and signaling standards. This capability is unique in the FPGA world and facilitates easy communication among chip-chip, chip-to-memory and chip-to-backplane applications.Each Select I/O pin can support any standard and each Spartan-IIE FPGA can support multiple standards simultaneously. Up to eight different standards can be supported simultaneously. SelectI/O helps to eliminate the number of translator chips on the board by incorporating industry used standards on the same chip. Traditionally, a bus transceiver chip would have been used to interface the FPGA to other chips such as high speed DDR RAM. Now that the functionality of the transceiver chip is already incorporated into the Spartan-IIE chip, translation chips are no longer needed. Thus the chip count on the board gets reduced, there by saving board space, reducing board costs and improving overall reliability. It is worthy to note that in the past we usually used only LVTTL or LVCMOS as a basic signaling standard. Spartan-IIE still supports these standards. These are the standards usually used in legacy systems to interface to older devices.Note that the Altera FLEX 10KE and ACEX 1K families do NOT support the latest I/O standards. They have only 5 standards, and only one can be used at a given time.

    The table shows the reference voltage and the output source voltage for various standards. Note that the Altera FLEX 10KE and ACEX 1K families do NOT support the latest I/O standards. They have only five standards, and only one can be used at a given time.No external translators are necessary when using the Spartan-II family.A dedicated Power Down pin helps conserve the power resources.The user can choose the configuration mode that best suits the particular application.The XC2S200 was recently added to extend the family to 200,000 system gates.