1 Delledonnebioinformatica

download 1 Delledonnebioinformatica

of 67

Transcript of 1 Delledonnebioinformatica

  • 7/30/2019 1 Delledonnebioinformatica

    1/67

    Impatto delle nuove tecnologie di sequenziamento

    sull'analisi strutturale e funzionale dei genomi

    (di interesse agrario)

    Massimo Delledonne

    Functional Genomics Center, Department of Biotechnologies,

    University of Verona - Italy

    http://www.scienze.univr.it/fol/main
  • 7/30/2019 1 Delledonnebioinformatica

    2/67

    13 years and $3 billion required for the Human Genome Project's

    reference genome

  • 7/30/2019 1 Delledonnebioinformatica

    3/67

    96 samples

    800 nt / sample

    1.600.000 nt / day

    Human genome: 3.000.000.000 bp

    Minimum coverage for an accurate

    analysis: 8X = 24.000.0000.000 nt

    24.000.000.000

    1.600.000 = 15.000 days!

    Sanger sequencing(invented in the early 1970s)

  • 7/30/2019 1 Delledonnebioinformatica

    4/67

  • 7/30/2019 1 Delledonnebioinformatica

    5/67

    2 months and 2.000.000 USD with 454 Life Sciences

    April 2008

    apoE unknown

  • 7/30/2019 1 Delledonnebioinformatica

    6/67

    454 REVOLUTION

    >00405_2045_2005

    CAGTCTCGTCGTCGTACGATCGTACGTAGCTTCACTTACGTACGCGGCG

    GGGCGCATCTGCGCGCGCGATTATATCATCATCATACTCAGCATCGTCCATC

    GS20 (2005)100 bp x 200.000 Reads = 20 Mbp

    >00343_3489_2007

    CAGTCTCGTCGTCGTACGATCGTACGTAGCTTCACTTACGTACGCGGCG

    GGGCGCATCTGCGCGCGCGATTATATCATCATCATACTCAGCATCGTCC

    ATCGATCGATCGATGATCGACGATCGTAGTCTACGTAGTACGTAGCTAG

    CTTCGATCGATCGTACGTACGTCGTACGTAGTCAGACGTCAGCTACAGT

    CATCTACGTAGCTCTACGTCGTGCATGCTAGCTATCGATCACGACTTATGCATC

    GS FLX (2007)200 bp x 400.000 Reads = 100 Mbp

    x5

    >01384_3992_2008

    CAGTCTCGTCGTCGTACGATCGTACGTAGCTTCACTTACGTACGCGGCG

    GGGCGCATCTGCGCGCGCGATTATATCATCATCATACTCAGCATCGTCCATCGATCGATCGATGATCGACGATCGTAGTCTACGTAGTACGTAGCTAG

    CTTCGATCGATCGTACGTACGTCGTACGTAGTCAGACGTCAGCTACAGT

    CATCTACGTAGCTCTACGTCGTGCATGCTAGCTATCGATCACGACTTAT

    GCATCCTCGGTATTCGGCGTACGATCGCTGACTGCTCGATTTTCGATCG

    TACGTACCGTCAGCTAGCTAAAAAAAAAGCTGCGCGGTCAGCATCGTTG

    CATGCGCTACTGCTAGTACGTAGTAGTGCTACGTTTTATATCGTCGCCAAATCTCTTCGGCTTTCG

    GS FLX Titanium (2008)400 bp x 1 M. Reads = 500 Mbp

    x5

  • 7/30/2019 1 Delledonnebioinformatica

    7/67

    GSJunior

    100.000 reads (400-500 nt)

    One run costs 1000

    GSFLX

    1.100.000 reads (400-500 nt)

    One run costs 6000

  • 7/30/2019 1 Delledonnebioinformatica

    8/67

    For individuals, the new price will be $19,500, while groups of five or more customers using the same

    ordering physician will pay $14,500 per person. In addition, individuals with serious medical conditions

    for whom whole-genome sequencing could be of clinical value will pay $9,500 to have their genome

    sequenced.

    As Illumina Cuts Price of Personal Sequencing

    Service, CEO Says Market Growth Hinges on

    AnalysisJune 08, 2010

    Price per 30 x human genomeVersion

    SOLiD 3Plus System

    SOLiD 4System

    SOLiD 4hqSystem

    Date

    Throughput

    /Genome*(30X coverage)

    Oct 2009

    60Gb

    27 500

    April 2010**

    100Gb

    5000

    Q4 2010**

    300Gb

    2500

    Accuracy 99.94% 99.94% 99.99%

    *Approximate costs for sequencing reagentsat optimal running efficiency

    **Expected release date

  • 7/30/2019 1 Delledonnebioinformatica

    9/67

  • 7/30/2019 1 Delledonnebioinformatica

    10/67

    10

    End 2007: 30 million reads, 33 nucleotides each

    End 2008: 50 million reads, 50 nucleotides each

    2010: 300 million single reads, 150 nucleotides each

    600 million paired reads, 150 nucleotides each

    Genome Analyzer II (x)

  • 7/30/2019 1 Delledonnebioinformatica

    11/6711

    HiSeq 2000

    HIGHEST OUTPUT

    Initially capable of up to 200 Gb per run

    2nd quarter 2011: 600 Gb per run!

    FASTEST DATA RATE

    ~25 Gb/day

    7-8 days for 2 x 100 bp

    HIGHEST NUMBER OF READS

    One billion single-end reads*

    Two billion paired-end reads*

    *Based on one billion clusters passing filter

  • 7/30/2019 1 Delledonnebioinformatica

    12/67

    5500xl SOLiD System

  • 7/30/2019 1 Delledonnebioinformatica

    13/67

    1600

    1400

    1200

    1000

    800

    600

    400

    200

    0

    Fall 2008 Spring 2009 Fall 2009 Spring 2010

    -

    -

    -

    -

    -

    -

    -

    -

    -

    Reads (M) Bases (GB)

  • 7/30/2019 1 Delledonnebioinformatica

    14/67

  • 7/30/2019 1 Delledonnebioinformatica

    15/67

    Copyright 2008 Pacific BiosciencesPage 15

    Just launched: Pacific Biosciences

    The company said the commercial PacBio RS instrument will generate average read

    lengths of 1,000 bases, with a small fraction of reads longer than 10 kilobases, and

    have a typical run time of 30 minutes. The commercial instrument will also run chips

    with 75,000 wells, or zero-mode waveguides, whereas the development instrument

    uses chips with 45,000 wells. Chief technology officer Steve Turner added this week

    that the expected commercial raw read accuracy will be between 85 and 90 percent.

  • 7/30/2019 1 Delledonnebioinformatica

    16/67

  • 7/30/2019 1 Delledonnebioinformatica

    17/67

    CARLSBAD, Calif. December 14, 2010 Life Technologies Corporation

    (NASDAQ: LIFE), a provider of innovative life science solutions, today

    announced that it has launched its Ion Personal Genome Machine (PGM)

    sequencer

    A run takes approximately two hours, and several runs can be performed in a

    day

    The first version of the PGM will sell for $49,500, plus a $16,500 server to

    analyze the data.

    Initially, the machine will produce about 10 megabases of data per run, or

    about 100,000 reads of 100 base pairs each, using the so-called 314 chip,

    which has about 1.5 million wells and will cost $250. Reagent kits for template

    preparation, library preparation, and sequencing will cost another $250,

    bringing the total consumables cost per run to approximately $500.

    In the first half of 2011, Ion Torrent plans to launch the 316 chip, with about 6

    million wells, which will increase the output per run to 100 megabases and

    which will cost about twice as much as the 314. Additional chip upgrades will

    follow, with details to be revealed next year.

  • 7/30/2019 1 Delledonnebioinformatica

    18/67

    Illumina's Low-Cost MiSeq Promises to

    Speed up Next-Gen Sequencing

    January 18, 2011

    Illumina last week announced a new low-cost sequencing platform called MiSeq,

    which promises to go from purified DNA to analyzed data in as few as eight hours,

    and to generate more than a gigabase of sequence in slightly more than a day of

    total experimental time.

    The instrument has a list price of under $125,000 and runs Illumina's existing

    TruSeq sequencing-by-synthesis chemistry.

    MiSeq will perform both single and paired-end sequencing with read lengthsof up to 2 x 150 base pairs, which might increase over time. It generates more

    than 3.4 million single reads, or more than 6.8 million paired-end reads per

    run. The maximum output per run is about 1.5 gigabases.

    Cluster generation and sequencing takes about 4.5 hours, and data analysisand "demultiplexing" about two hours

  • 7/30/2019 1 Delledonnebioinformatica

    19/67

    CARLSBAD, Calif. Feb. 23, 2011 Life Technologies Corporation (NASDAQ:

    LIFE) today announced that the Ion 318 semiconductor sequencing chip will be

    available for early access in September of this year, complete with RNA-Seq kits and

    analysis software, providing up to 1Gb of data output 100 times more thanoriginal Ion 314 chip introduced two months ago with the Ion Personal Genome

    Machine (PGM) sequencer.

    Ion Torrent has increased the number of accessible sensors on the Ion 318 chip to 11

    million, from 1.2 million on the Ion 314 chip and 6.1 million on the Ion 316 chip, said

    Gregg Fergus, President of Ion Torrent. The Ion 318 is ideal for applications like

    transcriptome sequencing, miRNA sequencing and ChIP-Seq. The Ion 318 requires no

    changes or upgrades to the Ion PGM sequencer with Ion Torrent The Chip is the

    Machine, so sequencing is easier and more economical than ever before.

    Internally, Ion Torrent has already achieved read lengths in excess of 300bp and

    improvements in chemistry and loading are expected to increase chip utilization by five-

    fold. By combining additional reads with longer read lengths, the company will achieve

    exceptional scalability on its Ion PGM sequencing platform and expects to reach read

    lengths of up to 400bp in 2012.

  • 7/30/2019 1 Delledonnebioinformatica

    20/67

    Genome (re)sequencing

  • 7/30/2019 1 Delledonnebioinformatica

    21/67

    Sequenziamento classico

    21

    Sequenziamento con read corte

    genoma di riferimento

    700-800 basi 35-100 basi

  • 7/30/2019 1 Delledonnebioinformatica

    22/67

    Why (Re)Sequencing

    22

  • 7/30/2019 1 Delledonnebioinformatica

    23/67

    Individuo affetto dallasindrome di Miller

    Farmacogenomica

    Genoma del cancro

  • 7/30/2019 1 Delledonnebioinformatica

    24/67

    Centro di Genomica FunzionaleJoint Project

    PowerEdgeTM R900

    4 Intel Xeon 7450 6-core

    128 Gb RAM

    tempo macchina Illumina

    Genome Analyzer II

  • 7/30/2019 1 Delledonnebioinformatica

    25/67

    Sequenziamento del genoma di Corvina

    TGTTGGAATACCTGAAGATTGCTCAGGACCTGGAGA

    CAACATTGGATCAAATGGATCTGATAGACCTTTACA

    GAAAAAATGTAGTCAGCCTTTAACTTGGCCTGATAA

    CACAGCTGGGGCTGTAGCAACCCTTTCCAACCCCTT

    TAGTCGGTTGTTGATGAGATATTTGGAGGTGGGGAT

    GTCAAAGGCAAAGGAAAAAATGTTCAATATAGTTAA

    DNA genomico

    frammentazione

    C i i l

  • 7/30/2019 1 Delledonnebioinformatica

    26/67

    Duplicazione

    Delezione

    PN40024

    Regioni delete(lunghezza media 46 kb)

    Regioni duplicate(lunghezza media 43 kb)

    500 20

    I geni duplicati appartengono alle seguenti classi funzionali:

    Metabolismo

    Trasmissione del segnale

    Risposta agli stress biotici

    Funzione sconosciuta

    Caratterizzazione strutturale:

    duplicazioni e delezioni

  • 7/30/2019 1 Delledonnebioinformatica

    27/67

    Polimorfismi fra Corvina e PN40024

    SNPs individuati SNPs in regioni

    codificanti

    Geni con SNP nella

    loro sequenzacodificante

    SNP in eterozigosi

    392.775 156.601 13.854 (46%) 288.204 (75%)

    processo biologico

    sconosciuto

    49%

    metabolismo dei

    lipidi

    2%

    metabolismo

    8%

    metabolismo

    dei carboidrati

    4%

    metabolismo

    degli

    amminoacidi

    2%

    modificazione di

    proteine

    6%

    energia1%

    altri processi

    metabolici

    15%

    riproduzione1%

    cell death

    0%

    risposta a stress

    3%traduzione

    2%

    trascrizione

    6%

    trasduzione del

    segnale

    1%

    http://www.scienze.univr.it/fol/main
  • 7/30/2019 1 Delledonnebioinformatica

    28/67

    Verona, 9 Febbraio 2007

    Centro di Genomica Funzionale

    PowerEdgeTM R900

    4 Intel Xeon 7450 6-core

    128 Gb RAM

    Illumina HiSeq 1000

    http://www.scienze.univr.it/fol/main
  • 7/30/2019 1 Delledonnebioinformatica

    29/67

  • 7/30/2019 1 Delledonnebioinformatica

    30/67

  • 7/30/2019 1 Delledonnebioinformatica

    31/67

    The Protein Coding Genome~ 30 megabases (1% of genome)

    ~ 20.000 genes

    ~ 180.000 discontinuous sequences

  • 7/30/2019 1 Delledonnebioinformatica

    32/67

    Microarray Sequence

  • 7/30/2019 1 Delledonnebioinformatica

    33/67

    Genomic DNA

    Fragment

    & Add Linkers

    Microarray Sequence

    Capture

    hybridize

    Microarray

    Wash

    Elute

    Sequencing

    454 Genome

    Sequencer

    Probes

    TargetDNA

    Background DNA

    Target DNA

    Exon1 Exon2 Exon3 Exon4 Exon5 Exon6

  • 7/30/2019 1 Delledonnebioinformatica

    34/67

    Exome sequencing

    Amplicon sequencing

  • 7/30/2019 1 Delledonnebioinformatica

    35/67

    Harismendy et al. Genome Biology2009 10:R32 doi:10.1186/gb-2009-10-3-r32

    Amplicon sequencing

  • 7/30/2019 1 Delledonnebioinformatica

    36/67

    Nimb legen Sequence Capture

  • 7/30/2019 1 Delledonnebioinformatica

    37/67

    Nimb legen Sequence Capture

    gDNA Library Preparation

    DNA fragmentation

    by Nebulization

    Hybridization on

    2.1M NimbleGen Array

    Adaptor Ligation

    Enrichment confirmation

    by qPCR on internal control genomic loci

    (Captured DNA vs. genomic)

    We obtained a 100-fold enrichment (3 Gb / 30 Mb).

    Specific probes can capture up to 30

    Mb regions (thick coloured traits). The

    remaning regions are washed away

    (thin black traits).

    i S i

  • 7/30/2019 1 Delledonnebioinformatica

    38/67

    Transcriptome Sequencing

    (RNA-Seq)

    TGTTGGAATACCTGAAGATTGC

    CAACATTGGATCAAATGGATCT

    GAAAAAATGTAGTCAGCCTTTA

    CACAGCTGGGGCTGTAGCAACC

    TAGTCGGTTGTTGATGAGATAT

    GTCAAAGGCAAAGGAAAAAATG

    Messenger

    RNAs

    cDNA

    synthesismRNA

    fragmentation

    sonication

    RNA extraction

    from tissues

    Data analysis

    Reference

    Gene predictions

  • 7/30/2019 1 Delledonnebioinformatica

    39/67

    First step:

    alignment to the genome

    Gene predictions

    Coverage

    Read alignments

    We employ different programs for mapping the reads onto the referencegenome:

    Bowtie: optimized for RNA-seq, it keeps note of multiple possible mappings for each read(multi-reads)

    Needed for accurate expression estimation Tophat: evolution of Bowtie, includes the detection of splicing sites

    Maq, BWA: leaned towards DNA-seq analyses, they report fewer hits, choosing the best ones.

    Incompatible with expression analysis, used for SNP detection.

    The output of all these programs can be transformed into the emergent

    standard SAM format.

  • 7/30/2019 1 Delledonnebioinformatica

    40/67

    DIFFERENTIAL GENE EXPRESSION

    NEW REGIONS DETECTION AND POSSIBLE

  • 7/30/2019 1 Delledonnebioinformatica

    41/67

    NEW REGIONS DETECTION AND POSSIBLE

    ALTERNATIVE SPLICING

  • 7/30/2019 1 Delledonnebioinformatica

    42/67

    GENES MISSED

  • 7/30/2019 1 Delledonnebioinformatica

    43/67

    EXONS MISSED

    INTRONS MISSED

  • 7/30/2019 1 Delledonnebioinformatica

    44/67

    Biologists want more than that, but...how to get more?

    http://www.scienze.univr.it/fol/main
  • 7/30/2019 1 Delledonnebioinformatica

    45/67

    Vitis vinifera whole-genome expression lab

    0

    500

    1000

    1500

    2000

    e

    -i

    - - r- t-

    f

    -

    l

    -i

    - -

    Microarray hybridizations

    SNP d i i h

    http://www.scienze.univr.it/fol/main
  • 7/30/2019 1 Delledonnebioinformatica

    46/67

    GeneChip SNP Array 5

    (> 500.000 SNPs)GAII x

    RNA sample of Leukemia at diagnosys stage

    SNP detection by

    hybridization

    SNP detection by analysis with:

    - MAQ cns2snp

    - ERANGE snp module

    - SAMTOOLS

    SNP detection in human

    leukemia

    Corvina transcriptome sequencing (mRNA Seq)

  • 7/30/2019 1 Delledonnebioinformatica

    47/67

    59.372.544 single reads, totaling 2,2 Gb

    of sequence (30 X coverage)

    TGTTGGAATACCTGAAGATTGCTCAG

    CAACATTGGATCAAATGGATCTGATA

    GAAAAAATGTAGTCAGCCTTTAACTT

    CACAGCTGGGGCTGTAGCAACCCTTT

    TAGTCGGTTGTTGATGAGATATTTGG

    GTCAAAGGCAAAGGAAAAAATGTTCA

    mRNA

    cDNA sysnthesis

    Corvina transcriptome sequencing (mRNA-Seq)

    mRNA fragmentazion

    frammentazione

    Plant Physiol. 152: 17871795 (2010)

    Post Fruit-Set

    Verason

    Ripening

  • 7/30/2019 1 Delledonnebioinformatica

    48/67

    Detection of new genes

    Erange

    Search for candidate

    transcribed regions

    islands of mapped reads far

    from annotated genes

    Not capable of estimating

    the presence of different

    isoforms

    Cufflinks Transcript de novo

    reconstruction, bypassingthe need for an annotation

    Sophisticated analysescapable of distinguishingeven between differentisoforms

    Paired-end data stronglysuggested for a properanalysis

    New geneReads inunnanotated

    regions

    d f d

  • 7/30/2019 1 Delledonnebioinformatica

    49/67

    255075

    100

    255075

    100

    255075

    100

    Identified 479 new genes

    New gene Reads mapping tounannotated regions

    Plant Physiol. 152: 17871795 (2010)

  • 7/30/2019 1 Delledonnebioinformatica

    50/67

    SNP detection - methods We have developed a pipeline (SRTK) for detecting

    SNPs. The suite Samtools is used to create a summary file

    for the whole genome (termed pileup)

    An in-house script is then used to remove data fromreads mapping in multiple locations

    Finally, the SNPs are filtered for: map quality (minimum 20)

    read coverage (minimum 2 reads)

    minimum percentage of reads mapping onto the mutantallele vs. the annotated base (minimum 25%)

    Id tifi d lt ti li i it

  • 7/30/2019 1 Delledonnebioinformatica

    51/67

    Identified alternative splicing sites

    Protein

    Alternative protein

    Exon 1 Exon 2 Exon 3 Exon 4Constitutive splicing

    Intron 1 Intron 2 Intron 3Exon 1 Exon 2 Exon 3 Exon 4

    Alternative splicing Exon 1 Exon 2 Exon 4

    N of

    constitutive

    splicing sites

    N of alternative

    splicing sites

    N of genes

    undergoing

    alternative splicing

    41,447 447 385

    Alt ti li i

  • 7/30/2019 1 Delledonnebioinformatica

    52/67

    Alternative splicing

    Junction database

    approach Reads are mapped to an exon

    junctions database constructed

    starting from gene annotations.

    Allows to detect exon skipping

    events only

    Default approach with E-RANGE

    De novo recognition Tophat

    Recognizes canonical splicing

    sites only

    Integrated in the

    Tophat/Cufflinks pipeline

    Supersplat

    Performs a gapped alignment

    of all reads against the

    genome

    Capable of recognizing also

    non-canonical splicing sites Not as well integrated with

    other software i.e. Cufflinks,

    Samtools

  • 7/30/2019 1 Delledonnebioinformatica

    53/67

    expression analysis

    E-RANGE Reads Bowtie, Eland or Blat output

    Reads are assigned to gene models.

    Multiple mapping reads are assigned to gene models proportionally to the density

    of uniquely mapping reads.

    Capable of detecting genes in previously not annotated regions

    Cufflinks Reads a standard SAM input

    Either looks for annotated genes or tries to reconstruct the original RNA fragments

    Uses the probability of the alignment to be correct to assign multiple mapping

    reads

    Capable of discriminating between different isoforms

    Currently in test phase, with promising results

    SRTK In-house collection of scripts

    MapReadsLocation.py performs a raw count of uniquely mapping reads mapped on

    each gene

    Used for RNA-seq statistics analysis

    http://ddlab.sci.univr.it/srtk/

    MICROARRAY ANALYSIS

  • 7/30/2019 1 Delledonnebioinformatica

    54/67

    ID stress_A stress_B stress_c control_A control_B control_C log2 fold change STRESS vsCONTROL

    AB007870 976.7419 1106.0225 958.8752 978.6526 724.0019 717.6652 0.342

    AB007877 52.4236 48.9145 61.4969 70.6655 83.2429 111.1115 -0.684

    AB007878 91.737 92.8117 46.1988 77.2719 101.3615 166.1109 -0.575

    AB007895 220.5646 190.4141 260.4116 324.2564 317.1986 273.5752 -0.454

    AB007921 179.6105 224.7979 205.405 195.7703 139.9824 176.5451 0.259

    AB007923 135.6203 93.2452 133.135 102.7447 125.3144 97.5942 0.141

    AB007928 1991.7719 1867.8622 1763.1777 1063.0456 854.2858 835.9119 1.037

    AB007937 1188.4386 1042.3558 1347.4832 1245.6073 1343.2552 1412.5417 -0.167

    AB007940 167.5594 158.4953 123.5362 262.8919 251.2853 237.4419 -0.752

    AB008790 1461.0386 1433.4358 1197.2452 696.7486 950.1929 877.7196 0.703

    AB010419 1992.4652 2830.9386 2415.7361 2450.9499 2204.4398 2272.9572 0.05

    AB010962 525.973 336.3452 572.9752 1090.8052 1704.4986 1163.3986 -1.472

    AB011088 13193.9809 16559.3745 11695.7919 13953.8194 14616.103 18189.4252 -0.179

    AB011097 18.6059 15.7499 18.4248 17.8529 18.0056 17.402 -0.017

    AB011103 1736.1662 1783.1319 1795.1724 1876.8319 2027.4452 1701.4922 -0.073

    AB011123 1692.9638 1439.6005 1639.5755 1084.6886 1042.3516 1144.1105 0.542

    AB011154 567.9596 523.3949 631.1821 1177.9344 1258.5535 1287.5093 -1.116

    AB011157 1964.3402 1926.9519 2525.3286 1579.0752 1702.1312 1445.1081 0.433

    AB011163 363.8747 491.1119 159.7169 583.7652 670.0606 981.6137 -1.25

    AB011174 3063.8319 2914.2986 3317.7625 3211.0308 3421.1982 3267.3989 -0.092

    AB011180 374.4583 416.0086 346.6452 504.8333 299.0182 417.3016 -0.074

    AB011539 539.2045 499.2931 677.8508 714.6286 993.7229 826.6439 -0.562

    MICROARRAY ANALYSIS

    RPKM allows transcript levels to be compared both

  • 7/30/2019 1 Delledonnebioinformatica

    55/67

    GENE IDRPKM

    Post Fruit Set Veraison RipeningGSVIVT00027957001 83.1 21.09 12.72

    GSVIVT00018861001 0.08 0.06 0.24

    GSVIVT00023496001 0 0.67 1.69

    GSVIVT00018860001 4.92 10.71 10.84

    GSVIVT00023540001 30.52 23.03 0.54

    GSVIVT00023541001 27.84 21.11 0.48

    GSVIVT00027954001 71.23 17.45 10.63

    GSVIVT00006661001 0 0.07 0GSVIVT00000218001 55.89 203.62 172.55

    GSVIVT00024208001 47.21 30.47 1.45

    GSVIVT00013910001 31.11 22.89 4.31

    GSVIVT00012187001 14.45 45.82 57.02

    GSVIVT00009770001 14.43 38.41 44.9

    GSVIVT00012190001 12.25 26.79 34.05

    GSVIVT00012566001 12.1 40.6 47.42GSVIVT00006640001 33.06 126.91 200.55

    GSVIVT00006638001 10.46 23.92 28.36

    GSVIVT00012567001 9.99 19.5 23.04

    GSVIVT00006659001 9.68 18.97 21.07

    GSVIVT00012193001 6.29 74.39 51.35

    GSVIVT00006633001 4.87 11.72 14.54

    GSVIVT00025776001 0.51 1.72 1.87

    GSVIVT00028957001 0.07 0.13 0.18

    RPKM allows transcript levels to be compared both

    within and between samples

    Plant Physiol. 152: 17871795 (2010)

    Validation of RNA Seq based Gene Expression:

  • 7/30/2019 1 Delledonnebioinformatica

    56/67

    GSVIVT00034646001 Chitinase

    post fruit set veraison ripening

    0 24.05 66.56RNA-Seq

    Real-time RT-PCR

    Validation of RNA-Seq based Gene Expression:

    qRT-PCR of selected genes

    Plant Physiol. 152: 17871795 (2010)

    Validation of RNA Seq based Gene Expression:

  • 7/30/2019 1 Delledonnebioinformatica

    57/67

    GSVIVT0002441001 polygalacturonase

    Real-time RT-PCR

    post fruit set veraison ripening

    0 0.19 0.11RNA-Seq

    Validation of RNA-Seq based Gene Expression:

    qRT-PCR of selected genes

    Plant Physiol. 152: 17871795 (2010)

  • 7/30/2019 1 Delledonnebioinformatica

    58/67

    OUTPUT: Exhaustive overview of gene expression dynamics

  • 7/30/2019 1 Delledonnebioinformatica

    59/67

    GENE ID

    RPKM

    p-value

    Veraison/Pos

    t fruitset

    p-value

    Veraison/Ripen

    ing

    p-value

    Ripening/Post

    fuitset

    Cluster Gene Description Functional CategoryPost

    Fruit-

    Set Veraison Ripening

    GSVIVT00000001001 0 0,34 0 1,00E+00 5,97E-01 1,00E+00 7 No Hit Found No Hit FoundGSVIVT00000003001 0 0 0 1,00E+00 1,00E+00 1,00E+00 No Hit Found No Hit Found

    GSVIVT00000004001 0,23 0,21 0,06 8,15E-01 7,77E-02 7,77E-02 4 Sulfotransferase GO:0008150

    GSVIVT00000005001 4,16 3,57 6,15 3,61E-01 2,62E-03 1,63E-03 Glycosyl transferase, family 8 GO:0005975

    GSVIVT00000007001 0,56 0,52 0,15 8,15E-01 7,77E-02 7,77E-02 4 Sulfotransferase GO:0008150

    GSVIVT00000008001 4,69 13,01 8,46 0,00E+00 0,00E+00 3,77E-10 7 Glycosyl transferase, family 8 GO:0005975

    GSVIVT00000009001 0 0 0 1,00E+00 1,00E+00 1,00E+00 Mitochondrial substrate carrier GO:0051234

    GSVIVT00000010001 0 0 0 1,00E+00 1,00E+00 1,00E+00 UspA GO:0050896

    GSVIVT00000011001 0 0 0 1,00E+00 1,00E+00 1,00E+00 Transcription factor, MADS-box GO:0050789

    GSVIVT00000012001 0,36 0,67 0,23 4,40E-01 6,33E-02 7,41E-01 7 Transcription factor, MADS-box GO:0050789

    GSVIVT00000013001 6,1 5,12 6,23 8,47E-01 1,59E-07 3,03E-09 No Hit Found No Hit Found

    GSVIVT00000015001 1,3 0,76 0,79 6,56E-02 1,00E+00 1,82E-01 Alpha-1,4-glucan-protein synthase GO:0044036

    GSVIVT00000017001 65,7 30,8 24,26 0,00E+00 1,00E-15 0,00E+00 2 Glycosyl-phosphatidyl inositol-anchored GO:0005975

    GSVIVT00000018001 0 0 0 1,00E+00 1,00E+00 1,00E+00 No Hit Found No Hit Found

    GSVIVT00000019001 0 0,03 0 1,00E+00 1,00E+00 1,00E+00 7 Glycosyl-phosphatidyl inositol-anchored GO:0005975

    GSVIVT00000020001 2,49 0,03 0 0,00E+00 1,00E+00 0,00E+00 2 Glycosyl-phosphatidyl inositol-anchored GO:0005975

    GSVIVT00000021001 0 0 0 1,00E+00 1,00E+00 1,00E+00 No Hit Found No Hit Found

    GSVIVT00000022001 0 0 0 1,00E+00 1,00E+00 1,00E+00 WD40 repeat GO:0050789

    GSVIVT00000023001 0,68 0,58 1,4 6,21E-01 5,37E-03 5,37E-03 3 Pentatricopeptide repeat GO:0008150

    GSVIVT00000024001 7,38 10,5 19,98 9,85E-02 5,33E-04 2,23E-09 1 No Hit Found No Hit Found

    GSVIVT00000025001 1,48 5,63 22,44 0,00E+00 0,00E+00 0,00E+00 1 Cytochrome P450 GO:0015979

    GSVIVT00000027001 0,09 0,1 0,16 1,00E+00 5,64E-01 3,61E-01 ABC transporter, transmembrane region GO:0051234GSVIVT00000028001 0 0 0 1,00E+00 1,00E+00 1,00E+00 Mitochodrial transcription termin. factor GO:0044238

    GSVIVT00000029001 0 0 0 1,00E+00 1,00E+00 1,00E+00 Peptidase S54, rhomboid GO:0008150

    GSVIVT00000032001 2,69 2,05 1,51 1,57E-01 9,79E-02 1,90E-02 No Hit Found No Hit Found

    GSVIVT00000033001 3,45 5,28 2,22 1,66E-08 0,00E+00 4,83E-04 7 No Hit Found No Hit Found

    GSVIVT00000034001 0,46 0,13 2,02 3,39E-01 1,92E-06 9,68E-05 8 No Hit Found No Hit Found

    GSVIVT00000035001 0,84 0,87 0 9,00E-01 2,76E-11 5,26E-09 4 NLI interacting factor GO:0008150

    GSVIVT00000036001 3,85 10,7 2,36 0,00E+00 0,00E+00 2,70E-04 7 No Hit Found No Hit Found

    GSVIVT00000037001 5,65 0 0 0,00E+00 1,00E+00 0,00E+00 6 No Hit Found No Hit Found

    GSVIVT00000038001 0 0,06 0 1,00E+00 1,00E+00 1,00E+00 7 Transcription factor, TCP GO:0050789

    OUTPUT: Exhaustive overview of gene expression dynamics

    2010

  • 7/30/2019 1 Delledonnebioinformatica

    60/67

    IdentifyCorvinasproprietary set of genes

  • 7/30/2019 1 Delledonnebioinformatica

    61/67

    fy p p y f g

    (missing in PN40024)

    Reference genome

  • 7/30/2019 1 Delledonnebioinformatica

    62/67

    Uncorking Corvinas secrets: identified

    a set of 187 proprietary genes (154 alreadypresent in the VvGI database)

    ATP-binding

    3%

    cell wall organization

    3%

    cytoskeleton organization

    3%

    defense response

    15% flower

    development

    3%

    hydrolase

    6%

    transport

    15%NA

    12%

    primary metabolic

    process

    19%

    protein aminoacid

    phosphorylation

    6%

    regulation of

    transcription3% signal transduction

    12%

  • 7/30/2019 1 Delledonnebioinformatica

    63/67

  • 7/30/2019 1 Delledonnebioinformatica

    64/67

  • 7/30/2019 1 Delledonnebioinformatica

    65/67

  • 7/30/2019 1 Delledonnebioinformatica

    66/67

    http://www.scienze.univr.it/fol/main
  • 7/30/2019 1 Delledonnebioinformatica

    67/67

    Plant Biology

    Mario Pezzotti

    Diana BellinSara Zenoni

    Medicine

    Giovanni Martinelli (UniBO)

    Marina Noris (Mario Negri)

    Aldo Scarpa

    Bioinformatics

    Luca Venturini

    Luciano Xumerle

    Stefano Barbi

    Alberto Ferrarini

    Statistics

    Giovanni Malerba

    Paola TononiAlberto Ferrarini

    Noel DagoLuca Venturini

    Luciano Xumerle Giovanni Malerba

    PowerEdgeTM R900

    Genny Bruson

    http://www.scienze.univr.it/fol/main