1 Delledonnebioinformatica
-
Upload
giorgio-predelli -
Category
Documents
-
view
213 -
download
0
Transcript of 1 Delledonnebioinformatica
-
7/30/2019 1 Delledonnebioinformatica
1/67
Impatto delle nuove tecnologie di sequenziamento
sull'analisi strutturale e funzionale dei genomi
(di interesse agrario)
Massimo Delledonne
Functional Genomics Center, Department of Biotechnologies,
University of Verona - Italy
http://www.scienze.univr.it/fol/main -
7/30/2019 1 Delledonnebioinformatica
2/67
13 years and $3 billion required for the Human Genome Project's
reference genome
-
7/30/2019 1 Delledonnebioinformatica
3/67
96 samples
800 nt / sample
1.600.000 nt / day
Human genome: 3.000.000.000 bp
Minimum coverage for an accurate
analysis: 8X = 24.000.0000.000 nt
24.000.000.000
1.600.000 = 15.000 days!
Sanger sequencing(invented in the early 1970s)
-
7/30/2019 1 Delledonnebioinformatica
4/67
-
7/30/2019 1 Delledonnebioinformatica
5/67
2 months and 2.000.000 USD with 454 Life Sciences
April 2008
apoE unknown
-
7/30/2019 1 Delledonnebioinformatica
6/67
454 REVOLUTION
>00405_2045_2005
CAGTCTCGTCGTCGTACGATCGTACGTAGCTTCACTTACGTACGCGGCG
GGGCGCATCTGCGCGCGCGATTATATCATCATCATACTCAGCATCGTCCATC
GS20 (2005)100 bp x 200.000 Reads = 20 Mbp
>00343_3489_2007
CAGTCTCGTCGTCGTACGATCGTACGTAGCTTCACTTACGTACGCGGCG
GGGCGCATCTGCGCGCGCGATTATATCATCATCATACTCAGCATCGTCC
ATCGATCGATCGATGATCGACGATCGTAGTCTACGTAGTACGTAGCTAG
CTTCGATCGATCGTACGTACGTCGTACGTAGTCAGACGTCAGCTACAGT
CATCTACGTAGCTCTACGTCGTGCATGCTAGCTATCGATCACGACTTATGCATC
GS FLX (2007)200 bp x 400.000 Reads = 100 Mbp
x5
>01384_3992_2008
CAGTCTCGTCGTCGTACGATCGTACGTAGCTTCACTTACGTACGCGGCG
GGGCGCATCTGCGCGCGCGATTATATCATCATCATACTCAGCATCGTCCATCGATCGATCGATGATCGACGATCGTAGTCTACGTAGTACGTAGCTAG
CTTCGATCGATCGTACGTACGTCGTACGTAGTCAGACGTCAGCTACAGT
CATCTACGTAGCTCTACGTCGTGCATGCTAGCTATCGATCACGACTTAT
GCATCCTCGGTATTCGGCGTACGATCGCTGACTGCTCGATTTTCGATCG
TACGTACCGTCAGCTAGCTAAAAAAAAAGCTGCGCGGTCAGCATCGTTG
CATGCGCTACTGCTAGTACGTAGTAGTGCTACGTTTTATATCGTCGCCAAATCTCTTCGGCTTTCG
GS FLX Titanium (2008)400 bp x 1 M. Reads = 500 Mbp
x5
-
7/30/2019 1 Delledonnebioinformatica
7/67
GSJunior
100.000 reads (400-500 nt)
One run costs 1000
GSFLX
1.100.000 reads (400-500 nt)
One run costs 6000
-
7/30/2019 1 Delledonnebioinformatica
8/67
For individuals, the new price will be $19,500, while groups of five or more customers using the same
ordering physician will pay $14,500 per person. In addition, individuals with serious medical conditions
for whom whole-genome sequencing could be of clinical value will pay $9,500 to have their genome
sequenced.
As Illumina Cuts Price of Personal Sequencing
Service, CEO Says Market Growth Hinges on
AnalysisJune 08, 2010
Price per 30 x human genomeVersion
SOLiD 3Plus System
SOLiD 4System
SOLiD 4hqSystem
Date
Throughput
/Genome*(30X coverage)
Oct 2009
60Gb
27 500
April 2010**
100Gb
5000
Q4 2010**
300Gb
2500
Accuracy 99.94% 99.94% 99.99%
*Approximate costs for sequencing reagentsat optimal running efficiency
**Expected release date
-
7/30/2019 1 Delledonnebioinformatica
9/67
-
7/30/2019 1 Delledonnebioinformatica
10/67
10
End 2007: 30 million reads, 33 nucleotides each
End 2008: 50 million reads, 50 nucleotides each
2010: 300 million single reads, 150 nucleotides each
600 million paired reads, 150 nucleotides each
Genome Analyzer II (x)
-
7/30/2019 1 Delledonnebioinformatica
11/6711
HiSeq 2000
HIGHEST OUTPUT
Initially capable of up to 200 Gb per run
2nd quarter 2011: 600 Gb per run!
FASTEST DATA RATE
~25 Gb/day
7-8 days for 2 x 100 bp
HIGHEST NUMBER OF READS
One billion single-end reads*
Two billion paired-end reads*
*Based on one billion clusters passing filter
-
7/30/2019 1 Delledonnebioinformatica
12/67
5500xl SOLiD System
-
7/30/2019 1 Delledonnebioinformatica
13/67
1600
1400
1200
1000
800
600
400
200
0
Fall 2008 Spring 2009 Fall 2009 Spring 2010
-
-
-
-
-
-
-
-
-
Reads (M) Bases (GB)
-
7/30/2019 1 Delledonnebioinformatica
14/67
-
7/30/2019 1 Delledonnebioinformatica
15/67
Copyright 2008 Pacific BiosciencesPage 15
Just launched: Pacific Biosciences
The company said the commercial PacBio RS instrument will generate average read
lengths of 1,000 bases, with a small fraction of reads longer than 10 kilobases, and
have a typical run time of 30 minutes. The commercial instrument will also run chips
with 75,000 wells, or zero-mode waveguides, whereas the development instrument
uses chips with 45,000 wells. Chief technology officer Steve Turner added this week
that the expected commercial raw read accuracy will be between 85 and 90 percent.
-
7/30/2019 1 Delledonnebioinformatica
16/67
-
7/30/2019 1 Delledonnebioinformatica
17/67
CARLSBAD, Calif. December 14, 2010 Life Technologies Corporation
(NASDAQ: LIFE), a provider of innovative life science solutions, today
announced that it has launched its Ion Personal Genome Machine (PGM)
sequencer
A run takes approximately two hours, and several runs can be performed in a
day
The first version of the PGM will sell for $49,500, plus a $16,500 server to
analyze the data.
Initially, the machine will produce about 10 megabases of data per run, or
about 100,000 reads of 100 base pairs each, using the so-called 314 chip,
which has about 1.5 million wells and will cost $250. Reagent kits for template
preparation, library preparation, and sequencing will cost another $250,
bringing the total consumables cost per run to approximately $500.
In the first half of 2011, Ion Torrent plans to launch the 316 chip, with about 6
million wells, which will increase the output per run to 100 megabases and
which will cost about twice as much as the 314. Additional chip upgrades will
follow, with details to be revealed next year.
-
7/30/2019 1 Delledonnebioinformatica
18/67
Illumina's Low-Cost MiSeq Promises to
Speed up Next-Gen Sequencing
January 18, 2011
Illumina last week announced a new low-cost sequencing platform called MiSeq,
which promises to go from purified DNA to analyzed data in as few as eight hours,
and to generate more than a gigabase of sequence in slightly more than a day of
total experimental time.
The instrument has a list price of under $125,000 and runs Illumina's existing
TruSeq sequencing-by-synthesis chemistry.
MiSeq will perform both single and paired-end sequencing with read lengthsof up to 2 x 150 base pairs, which might increase over time. It generates more
than 3.4 million single reads, or more than 6.8 million paired-end reads per
run. The maximum output per run is about 1.5 gigabases.
Cluster generation and sequencing takes about 4.5 hours, and data analysisand "demultiplexing" about two hours
-
7/30/2019 1 Delledonnebioinformatica
19/67
CARLSBAD, Calif. Feb. 23, 2011 Life Technologies Corporation (NASDAQ:
LIFE) today announced that the Ion 318 semiconductor sequencing chip will be
available for early access in September of this year, complete with RNA-Seq kits and
analysis software, providing up to 1Gb of data output 100 times more thanoriginal Ion 314 chip introduced two months ago with the Ion Personal Genome
Machine (PGM) sequencer.
Ion Torrent has increased the number of accessible sensors on the Ion 318 chip to 11
million, from 1.2 million on the Ion 314 chip and 6.1 million on the Ion 316 chip, said
Gregg Fergus, President of Ion Torrent. The Ion 318 is ideal for applications like
transcriptome sequencing, miRNA sequencing and ChIP-Seq. The Ion 318 requires no
changes or upgrades to the Ion PGM sequencer with Ion Torrent The Chip is the
Machine, so sequencing is easier and more economical than ever before.
Internally, Ion Torrent has already achieved read lengths in excess of 300bp and
improvements in chemistry and loading are expected to increase chip utilization by five-
fold. By combining additional reads with longer read lengths, the company will achieve
exceptional scalability on its Ion PGM sequencing platform and expects to reach read
lengths of up to 400bp in 2012.
-
7/30/2019 1 Delledonnebioinformatica
20/67
Genome (re)sequencing
-
7/30/2019 1 Delledonnebioinformatica
21/67
Sequenziamento classico
21
Sequenziamento con read corte
genoma di riferimento
700-800 basi 35-100 basi
-
7/30/2019 1 Delledonnebioinformatica
22/67
Why (Re)Sequencing
22
-
7/30/2019 1 Delledonnebioinformatica
23/67
Individuo affetto dallasindrome di Miller
Farmacogenomica
Genoma del cancro
-
7/30/2019 1 Delledonnebioinformatica
24/67
Centro di Genomica FunzionaleJoint Project
PowerEdgeTM R900
4 Intel Xeon 7450 6-core
128 Gb RAM
tempo macchina Illumina
Genome Analyzer II
-
7/30/2019 1 Delledonnebioinformatica
25/67
Sequenziamento del genoma di Corvina
TGTTGGAATACCTGAAGATTGCTCAGGACCTGGAGA
CAACATTGGATCAAATGGATCTGATAGACCTTTACA
GAAAAAATGTAGTCAGCCTTTAACTTGGCCTGATAA
CACAGCTGGGGCTGTAGCAACCCTTTCCAACCCCTT
TAGTCGGTTGTTGATGAGATATTTGGAGGTGGGGAT
GTCAAAGGCAAAGGAAAAAATGTTCAATATAGTTAA
DNA genomico
frammentazione
C i i l
-
7/30/2019 1 Delledonnebioinformatica
26/67
Duplicazione
Delezione
PN40024
Regioni delete(lunghezza media 46 kb)
Regioni duplicate(lunghezza media 43 kb)
500 20
I geni duplicati appartengono alle seguenti classi funzionali:
Metabolismo
Trasmissione del segnale
Risposta agli stress biotici
Funzione sconosciuta
Caratterizzazione strutturale:
duplicazioni e delezioni
-
7/30/2019 1 Delledonnebioinformatica
27/67
Polimorfismi fra Corvina e PN40024
SNPs individuati SNPs in regioni
codificanti
Geni con SNP nella
loro sequenzacodificante
SNP in eterozigosi
392.775 156.601 13.854 (46%) 288.204 (75%)
processo biologico
sconosciuto
49%
metabolismo dei
lipidi
2%
metabolismo
8%
metabolismo
dei carboidrati
4%
metabolismo
degli
amminoacidi
2%
modificazione di
proteine
6%
energia1%
altri processi
metabolici
15%
riproduzione1%
cell death
0%
risposta a stress
3%traduzione
2%
trascrizione
6%
trasduzione del
segnale
1%
http://www.scienze.univr.it/fol/main -
7/30/2019 1 Delledonnebioinformatica
28/67
Verona, 9 Febbraio 2007
Centro di Genomica Funzionale
PowerEdgeTM R900
4 Intel Xeon 7450 6-core
128 Gb RAM
Illumina HiSeq 1000
http://www.scienze.univr.it/fol/main -
7/30/2019 1 Delledonnebioinformatica
29/67
-
7/30/2019 1 Delledonnebioinformatica
30/67
-
7/30/2019 1 Delledonnebioinformatica
31/67
The Protein Coding Genome~ 30 megabases (1% of genome)
~ 20.000 genes
~ 180.000 discontinuous sequences
-
7/30/2019 1 Delledonnebioinformatica
32/67
Microarray Sequence
-
7/30/2019 1 Delledonnebioinformatica
33/67
Genomic DNA
Fragment
& Add Linkers
Microarray Sequence
Capture
hybridize
Microarray
Wash
Elute
Sequencing
454 Genome
Sequencer
Probes
TargetDNA
Background DNA
Target DNA
Exon1 Exon2 Exon3 Exon4 Exon5 Exon6
-
7/30/2019 1 Delledonnebioinformatica
34/67
Exome sequencing
Amplicon sequencing
-
7/30/2019 1 Delledonnebioinformatica
35/67
Harismendy et al. Genome Biology2009 10:R32 doi:10.1186/gb-2009-10-3-r32
Amplicon sequencing
-
7/30/2019 1 Delledonnebioinformatica
36/67
Nimb legen Sequence Capture
-
7/30/2019 1 Delledonnebioinformatica
37/67
Nimb legen Sequence Capture
gDNA Library Preparation
DNA fragmentation
by Nebulization
Hybridization on
2.1M NimbleGen Array
Adaptor Ligation
Enrichment confirmation
by qPCR on internal control genomic loci
(Captured DNA vs. genomic)
We obtained a 100-fold enrichment (3 Gb / 30 Mb).
Specific probes can capture up to 30
Mb regions (thick coloured traits). The
remaning regions are washed away
(thin black traits).
i S i
-
7/30/2019 1 Delledonnebioinformatica
38/67
Transcriptome Sequencing
(RNA-Seq)
TGTTGGAATACCTGAAGATTGC
CAACATTGGATCAAATGGATCT
GAAAAAATGTAGTCAGCCTTTA
CACAGCTGGGGCTGTAGCAACC
TAGTCGGTTGTTGATGAGATAT
GTCAAAGGCAAAGGAAAAAATG
Messenger
RNAs
cDNA
synthesismRNA
fragmentation
sonication
RNA extraction
from tissues
Data analysis
Reference
Gene predictions
-
7/30/2019 1 Delledonnebioinformatica
39/67
First step:
alignment to the genome
Gene predictions
Coverage
Read alignments
We employ different programs for mapping the reads onto the referencegenome:
Bowtie: optimized for RNA-seq, it keeps note of multiple possible mappings for each read(multi-reads)
Needed for accurate expression estimation Tophat: evolution of Bowtie, includes the detection of splicing sites
Maq, BWA: leaned towards DNA-seq analyses, they report fewer hits, choosing the best ones.
Incompatible with expression analysis, used for SNP detection.
The output of all these programs can be transformed into the emergent
standard SAM format.
-
7/30/2019 1 Delledonnebioinformatica
40/67
DIFFERENTIAL GENE EXPRESSION
NEW REGIONS DETECTION AND POSSIBLE
-
7/30/2019 1 Delledonnebioinformatica
41/67
NEW REGIONS DETECTION AND POSSIBLE
ALTERNATIVE SPLICING
-
7/30/2019 1 Delledonnebioinformatica
42/67
GENES MISSED
-
7/30/2019 1 Delledonnebioinformatica
43/67
EXONS MISSED
INTRONS MISSED
-
7/30/2019 1 Delledonnebioinformatica
44/67
Biologists want more than that, but...how to get more?
http://www.scienze.univr.it/fol/main -
7/30/2019 1 Delledonnebioinformatica
45/67
Vitis vinifera whole-genome expression lab
0
500
1000
1500
2000
e
-i
- - r- t-
f
-
l
-i
- -
Microarray hybridizations
SNP d i i h
http://www.scienze.univr.it/fol/main -
7/30/2019 1 Delledonnebioinformatica
46/67
GeneChip SNP Array 5
(> 500.000 SNPs)GAII x
RNA sample of Leukemia at diagnosys stage
SNP detection by
hybridization
SNP detection by analysis with:
- MAQ cns2snp
- ERANGE snp module
- SAMTOOLS
SNP detection in human
leukemia
Corvina transcriptome sequencing (mRNA Seq)
-
7/30/2019 1 Delledonnebioinformatica
47/67
59.372.544 single reads, totaling 2,2 Gb
of sequence (30 X coverage)
TGTTGGAATACCTGAAGATTGCTCAG
CAACATTGGATCAAATGGATCTGATA
GAAAAAATGTAGTCAGCCTTTAACTT
CACAGCTGGGGCTGTAGCAACCCTTT
TAGTCGGTTGTTGATGAGATATTTGG
GTCAAAGGCAAAGGAAAAAATGTTCA
mRNA
cDNA sysnthesis
Corvina transcriptome sequencing (mRNA-Seq)
mRNA fragmentazion
frammentazione
Plant Physiol. 152: 17871795 (2010)
Post Fruit-Set
Verason
Ripening
-
7/30/2019 1 Delledonnebioinformatica
48/67
Detection of new genes
Erange
Search for candidate
transcribed regions
islands of mapped reads far
from annotated genes
Not capable of estimating
the presence of different
isoforms
Cufflinks Transcript de novo
reconstruction, bypassingthe need for an annotation
Sophisticated analysescapable of distinguishingeven between differentisoforms
Paired-end data stronglysuggested for a properanalysis
New geneReads inunnanotated
regions
d f d
-
7/30/2019 1 Delledonnebioinformatica
49/67
255075
100
255075
100
255075
100
Identified 479 new genes
New gene Reads mapping tounannotated regions
Plant Physiol. 152: 17871795 (2010)
-
7/30/2019 1 Delledonnebioinformatica
50/67
SNP detection - methods We have developed a pipeline (SRTK) for detecting
SNPs. The suite Samtools is used to create a summary file
for the whole genome (termed pileup)
An in-house script is then used to remove data fromreads mapping in multiple locations
Finally, the SNPs are filtered for: map quality (minimum 20)
read coverage (minimum 2 reads)
minimum percentage of reads mapping onto the mutantallele vs. the annotated base (minimum 25%)
Id tifi d lt ti li i it
-
7/30/2019 1 Delledonnebioinformatica
51/67
Identified alternative splicing sites
Protein
Alternative protein
Exon 1 Exon 2 Exon 3 Exon 4Constitutive splicing
Intron 1 Intron 2 Intron 3Exon 1 Exon 2 Exon 3 Exon 4
Alternative splicing Exon 1 Exon 2 Exon 4
N of
constitutive
splicing sites
N of alternative
splicing sites
N of genes
undergoing
alternative splicing
41,447 447 385
Alt ti li i
-
7/30/2019 1 Delledonnebioinformatica
52/67
Alternative splicing
Junction database
approach Reads are mapped to an exon
junctions database constructed
starting from gene annotations.
Allows to detect exon skipping
events only
Default approach with E-RANGE
De novo recognition Tophat
Recognizes canonical splicing
sites only
Integrated in the
Tophat/Cufflinks pipeline
Supersplat
Performs a gapped alignment
of all reads against the
genome
Capable of recognizing also
non-canonical splicing sites Not as well integrated with
other software i.e. Cufflinks,
Samtools
-
7/30/2019 1 Delledonnebioinformatica
53/67
expression analysis
E-RANGE Reads Bowtie, Eland or Blat output
Reads are assigned to gene models.
Multiple mapping reads are assigned to gene models proportionally to the density
of uniquely mapping reads.
Capable of detecting genes in previously not annotated regions
Cufflinks Reads a standard SAM input
Either looks for annotated genes or tries to reconstruct the original RNA fragments
Uses the probability of the alignment to be correct to assign multiple mapping
reads
Capable of discriminating between different isoforms
Currently in test phase, with promising results
SRTK In-house collection of scripts
MapReadsLocation.py performs a raw count of uniquely mapping reads mapped on
each gene
Used for RNA-seq statistics analysis
http://ddlab.sci.univr.it/srtk/
MICROARRAY ANALYSIS
-
7/30/2019 1 Delledonnebioinformatica
54/67
ID stress_A stress_B stress_c control_A control_B control_C log2 fold change STRESS vsCONTROL
AB007870 976.7419 1106.0225 958.8752 978.6526 724.0019 717.6652 0.342
AB007877 52.4236 48.9145 61.4969 70.6655 83.2429 111.1115 -0.684
AB007878 91.737 92.8117 46.1988 77.2719 101.3615 166.1109 -0.575
AB007895 220.5646 190.4141 260.4116 324.2564 317.1986 273.5752 -0.454
AB007921 179.6105 224.7979 205.405 195.7703 139.9824 176.5451 0.259
AB007923 135.6203 93.2452 133.135 102.7447 125.3144 97.5942 0.141
AB007928 1991.7719 1867.8622 1763.1777 1063.0456 854.2858 835.9119 1.037
AB007937 1188.4386 1042.3558 1347.4832 1245.6073 1343.2552 1412.5417 -0.167
AB007940 167.5594 158.4953 123.5362 262.8919 251.2853 237.4419 -0.752
AB008790 1461.0386 1433.4358 1197.2452 696.7486 950.1929 877.7196 0.703
AB010419 1992.4652 2830.9386 2415.7361 2450.9499 2204.4398 2272.9572 0.05
AB010962 525.973 336.3452 572.9752 1090.8052 1704.4986 1163.3986 -1.472
AB011088 13193.9809 16559.3745 11695.7919 13953.8194 14616.103 18189.4252 -0.179
AB011097 18.6059 15.7499 18.4248 17.8529 18.0056 17.402 -0.017
AB011103 1736.1662 1783.1319 1795.1724 1876.8319 2027.4452 1701.4922 -0.073
AB011123 1692.9638 1439.6005 1639.5755 1084.6886 1042.3516 1144.1105 0.542
AB011154 567.9596 523.3949 631.1821 1177.9344 1258.5535 1287.5093 -1.116
AB011157 1964.3402 1926.9519 2525.3286 1579.0752 1702.1312 1445.1081 0.433
AB011163 363.8747 491.1119 159.7169 583.7652 670.0606 981.6137 -1.25
AB011174 3063.8319 2914.2986 3317.7625 3211.0308 3421.1982 3267.3989 -0.092
AB011180 374.4583 416.0086 346.6452 504.8333 299.0182 417.3016 -0.074
AB011539 539.2045 499.2931 677.8508 714.6286 993.7229 826.6439 -0.562
MICROARRAY ANALYSIS
RPKM allows transcript levels to be compared both
-
7/30/2019 1 Delledonnebioinformatica
55/67
GENE IDRPKM
Post Fruit Set Veraison RipeningGSVIVT00027957001 83.1 21.09 12.72
GSVIVT00018861001 0.08 0.06 0.24
GSVIVT00023496001 0 0.67 1.69
GSVIVT00018860001 4.92 10.71 10.84
GSVIVT00023540001 30.52 23.03 0.54
GSVIVT00023541001 27.84 21.11 0.48
GSVIVT00027954001 71.23 17.45 10.63
GSVIVT00006661001 0 0.07 0GSVIVT00000218001 55.89 203.62 172.55
GSVIVT00024208001 47.21 30.47 1.45
GSVIVT00013910001 31.11 22.89 4.31
GSVIVT00012187001 14.45 45.82 57.02
GSVIVT00009770001 14.43 38.41 44.9
GSVIVT00012190001 12.25 26.79 34.05
GSVIVT00012566001 12.1 40.6 47.42GSVIVT00006640001 33.06 126.91 200.55
GSVIVT00006638001 10.46 23.92 28.36
GSVIVT00012567001 9.99 19.5 23.04
GSVIVT00006659001 9.68 18.97 21.07
GSVIVT00012193001 6.29 74.39 51.35
GSVIVT00006633001 4.87 11.72 14.54
GSVIVT00025776001 0.51 1.72 1.87
GSVIVT00028957001 0.07 0.13 0.18
RPKM allows transcript levels to be compared both
within and between samples
Plant Physiol. 152: 17871795 (2010)
Validation of RNA Seq based Gene Expression:
-
7/30/2019 1 Delledonnebioinformatica
56/67
GSVIVT00034646001 Chitinase
post fruit set veraison ripening
0 24.05 66.56RNA-Seq
Real-time RT-PCR
Validation of RNA-Seq based Gene Expression:
qRT-PCR of selected genes
Plant Physiol. 152: 17871795 (2010)
Validation of RNA Seq based Gene Expression:
-
7/30/2019 1 Delledonnebioinformatica
57/67
GSVIVT0002441001 polygalacturonase
Real-time RT-PCR
post fruit set veraison ripening
0 0.19 0.11RNA-Seq
Validation of RNA-Seq based Gene Expression:
qRT-PCR of selected genes
Plant Physiol. 152: 17871795 (2010)
-
7/30/2019 1 Delledonnebioinformatica
58/67
OUTPUT: Exhaustive overview of gene expression dynamics
-
7/30/2019 1 Delledonnebioinformatica
59/67
GENE ID
RPKM
p-value
Veraison/Pos
t fruitset
p-value
Veraison/Ripen
ing
p-value
Ripening/Post
fuitset
Cluster Gene Description Functional CategoryPost
Fruit-
Set Veraison Ripening
GSVIVT00000001001 0 0,34 0 1,00E+00 5,97E-01 1,00E+00 7 No Hit Found No Hit FoundGSVIVT00000003001 0 0 0 1,00E+00 1,00E+00 1,00E+00 No Hit Found No Hit Found
GSVIVT00000004001 0,23 0,21 0,06 8,15E-01 7,77E-02 7,77E-02 4 Sulfotransferase GO:0008150
GSVIVT00000005001 4,16 3,57 6,15 3,61E-01 2,62E-03 1,63E-03 Glycosyl transferase, family 8 GO:0005975
GSVIVT00000007001 0,56 0,52 0,15 8,15E-01 7,77E-02 7,77E-02 4 Sulfotransferase GO:0008150
GSVIVT00000008001 4,69 13,01 8,46 0,00E+00 0,00E+00 3,77E-10 7 Glycosyl transferase, family 8 GO:0005975
GSVIVT00000009001 0 0 0 1,00E+00 1,00E+00 1,00E+00 Mitochondrial substrate carrier GO:0051234
GSVIVT00000010001 0 0 0 1,00E+00 1,00E+00 1,00E+00 UspA GO:0050896
GSVIVT00000011001 0 0 0 1,00E+00 1,00E+00 1,00E+00 Transcription factor, MADS-box GO:0050789
GSVIVT00000012001 0,36 0,67 0,23 4,40E-01 6,33E-02 7,41E-01 7 Transcription factor, MADS-box GO:0050789
GSVIVT00000013001 6,1 5,12 6,23 8,47E-01 1,59E-07 3,03E-09 No Hit Found No Hit Found
GSVIVT00000015001 1,3 0,76 0,79 6,56E-02 1,00E+00 1,82E-01 Alpha-1,4-glucan-protein synthase GO:0044036
GSVIVT00000017001 65,7 30,8 24,26 0,00E+00 1,00E-15 0,00E+00 2 Glycosyl-phosphatidyl inositol-anchored GO:0005975
GSVIVT00000018001 0 0 0 1,00E+00 1,00E+00 1,00E+00 No Hit Found No Hit Found
GSVIVT00000019001 0 0,03 0 1,00E+00 1,00E+00 1,00E+00 7 Glycosyl-phosphatidyl inositol-anchored GO:0005975
GSVIVT00000020001 2,49 0,03 0 0,00E+00 1,00E+00 0,00E+00 2 Glycosyl-phosphatidyl inositol-anchored GO:0005975
GSVIVT00000021001 0 0 0 1,00E+00 1,00E+00 1,00E+00 No Hit Found No Hit Found
GSVIVT00000022001 0 0 0 1,00E+00 1,00E+00 1,00E+00 WD40 repeat GO:0050789
GSVIVT00000023001 0,68 0,58 1,4 6,21E-01 5,37E-03 5,37E-03 3 Pentatricopeptide repeat GO:0008150
GSVIVT00000024001 7,38 10,5 19,98 9,85E-02 5,33E-04 2,23E-09 1 No Hit Found No Hit Found
GSVIVT00000025001 1,48 5,63 22,44 0,00E+00 0,00E+00 0,00E+00 1 Cytochrome P450 GO:0015979
GSVIVT00000027001 0,09 0,1 0,16 1,00E+00 5,64E-01 3,61E-01 ABC transporter, transmembrane region GO:0051234GSVIVT00000028001 0 0 0 1,00E+00 1,00E+00 1,00E+00 Mitochodrial transcription termin. factor GO:0044238
GSVIVT00000029001 0 0 0 1,00E+00 1,00E+00 1,00E+00 Peptidase S54, rhomboid GO:0008150
GSVIVT00000032001 2,69 2,05 1,51 1,57E-01 9,79E-02 1,90E-02 No Hit Found No Hit Found
GSVIVT00000033001 3,45 5,28 2,22 1,66E-08 0,00E+00 4,83E-04 7 No Hit Found No Hit Found
GSVIVT00000034001 0,46 0,13 2,02 3,39E-01 1,92E-06 9,68E-05 8 No Hit Found No Hit Found
GSVIVT00000035001 0,84 0,87 0 9,00E-01 2,76E-11 5,26E-09 4 NLI interacting factor GO:0008150
GSVIVT00000036001 3,85 10,7 2,36 0,00E+00 0,00E+00 2,70E-04 7 No Hit Found No Hit Found
GSVIVT00000037001 5,65 0 0 0,00E+00 1,00E+00 0,00E+00 6 No Hit Found No Hit Found
GSVIVT00000038001 0 0,06 0 1,00E+00 1,00E+00 1,00E+00 7 Transcription factor, TCP GO:0050789
OUTPUT: Exhaustive overview of gene expression dynamics
2010
-
7/30/2019 1 Delledonnebioinformatica
60/67
IdentifyCorvinasproprietary set of genes
-
7/30/2019 1 Delledonnebioinformatica
61/67
fy p p y f g
(missing in PN40024)
Reference genome
-
7/30/2019 1 Delledonnebioinformatica
62/67
Uncorking Corvinas secrets: identified
a set of 187 proprietary genes (154 alreadypresent in the VvGI database)
ATP-binding
3%
cell wall organization
3%
cytoskeleton organization
3%
defense response
15% flower
development
3%
hydrolase
6%
transport
15%NA
12%
primary metabolic
process
19%
protein aminoacid
phosphorylation
6%
regulation of
transcription3% signal transduction
12%
-
7/30/2019 1 Delledonnebioinformatica
63/67
-
7/30/2019 1 Delledonnebioinformatica
64/67
-
7/30/2019 1 Delledonnebioinformatica
65/67
-
7/30/2019 1 Delledonnebioinformatica
66/67
http://www.scienze.univr.it/fol/main -
7/30/2019 1 Delledonnebioinformatica
67/67
Plant Biology
Mario Pezzotti
Diana BellinSara Zenoni
Medicine
Giovanni Martinelli (UniBO)
Marina Noris (Mario Negri)
Aldo Scarpa
Bioinformatics
Luca Venturini
Luciano Xumerle
Stefano Barbi
Alberto Ferrarini
Statistics
Giovanni Malerba
Paola TononiAlberto Ferrarini
Noel DagoLuca Venturini
Luciano Xumerle Giovanni Malerba
PowerEdgeTM R900
Genny Bruson
http://www.scienze.univr.it/fol/main