1 Delledonnebioinformatica

7/30/2019 1 Delledonnebioinformatica

1/67

Impatto delle nuove tecnologie di sequenziamento

sull'analisi strutturale e funzionale dei genomi

(di interesse agrario)

Massimo Delledonne

Functional Genomics Center, Department of Biotechnologies,

University of Verona - Italy
http://www.scienze.univr.it/fol/main


2/67

13 years and $3 billion required for the Human Genome Project's

reference genome


3/67

96 samples

800 nt / sample

1.600.000 nt / day

Human genome: 3.000.000.000 bp

Minimum coverage for an accurate

analysis: 8X = 24.000.0000.000 nt

24.000.000.000

1.600.000 = 15.000 days!

Sanger sequencing(invented in the early 1970s)


4/67


5/67

2 months and 2.000.000 USD with 454 Life Sciences

April 2008

apoE unknown


6/67

454 REVOLUTION

>00405_2045_2005

CAGTCTCGTCGTCGTACGATCGTACGTAGCTTCACTTACGTACGCGGCG

GGGCGCATCTGCGCGCGCGATTATATCATCATCATACTCAGCATCGTCCATC

GS20 (2005)100 bp x 200.000 Reads = 20 Mbp

>00343_3489_2007


GGGCGCATCTGCGCGCGCGATTATATCATCATCATACTCAGCATCGTCC

ATCGATCGATCGATGATCGACGATCGTAGTCTACGTAGTACGTAGCTAG

CTTCGATCGATCGTACGTACGTCGTACGTAGTCAGACGTCAGCTACAGT

CATCTACGTAGCTCTACGTCGTGCATGCTAGCTATCGATCACGACTTATGCATC

GS FLX (2007)200 bp x 400.000 Reads = 100 Mbp

x5

>01384_3992_2008


GGGCGCATCTGCGCGCGCGATTATATCATCATCATACTCAGCATCGTCCATCGATCGATCGATGATCGACGATCGTAGTCTACGTAGTACGTAGCTAG

CTTCGATCGATCGTACGTACGTCGTACGTAGTCAGACGTCAGCTACAGT

CATCTACGTAGCTCTACGTCGTGCATGCTAGCTATCGATCACGACTTAT

GCATCCTCGGTATTCGGCGTACGATCGCTGACTGCTCGATTTTCGATCG

TACGTACCGTCAGCTAGCTAAAAAAAAAGCTGCGCGGTCAGCATCGTTG

CATGCGCTACTGCTAGTACGTAGTAGTGCTACGTTTTATATCGTCGCCAAATCTCTTCGGCTTTCG

GS FLX Titanium (2008)400 bp x 1 M. Reads = 500 Mbp

x5


7/67

GSJunior

100.000 reads (400-500 nt)

One run costs 1000

GSFLX

1.100.000 reads (400-500 nt)

One run costs 6000


8/67

For individuals, the new price will be $19,500, while groups of five or more customers using the same

ordering physician will pay $14,500 per person. In addition, individuals with serious medical conditions

for whom whole-genome sequencing could be of clinical value will pay $9,500 to have their genome

sequenced.

As Illumina Cuts Price of Personal Sequencing

Service, CEO Says Market Growth Hinges on

AnalysisJune 08, 2010

Price per 30 x human genomeVersion

SOLiD 3Plus System

SOLiD 4System

SOLiD 4hqSystem

Date

Throughput

/Genome*(30X coverage)

Oct 2009

60Gb

27 500

April 2010**

100Gb

5000

Q4 2010**

300Gb

2500

Accuracy 99.94% 99.94% 99.99%

*Approximate costs for sequencing reagentsat optimal running efficiency

**Expected release date


9/67


10/67

10

End 2007: 30 million reads, 33 nucleotides each

End 2008: 50 million reads, 50 nucleotides each

2010: 300 million single reads, 150 nucleotides each

600 million paired reads, 150 nucleotides each

Genome Analyzer II (x)


11/6711

HiSeq 2000

HIGHEST OUTPUT

Initially capable of up to 200 Gb per run

2nd quarter 2011: 600 Gb per run!

FASTEST DATA RATE

~25 Gb/day

7-8 days for 2 x 100 bp

HIGHEST NUMBER OF READS

One billion single-end reads*

Two billion paired-end reads*

*Based on one billion clusters passing filter


12/67

5500xl SOLiD System


13/67

1600

1400

1200

1000

800

600

400

200

0

Fall 2008 Spring 2009 Fall 2009 Spring 2010

-

-

-

-

-

-

-

-

-

Reads (M) Bases (GB)


14/67


15/67

Copyright 2008 Pacific BiosciencesPage 15

Just launched: Pacific Biosciences

The company said the commercial PacBio RS instrument will generate average read

lengths of 1,000 bases, with a small fraction of reads longer than 10 kilobases, and

have a typical run time of 30 minutes. The commercial instrument will also run chips

with 75,000 wells, or zero-mode waveguides, whereas the development instrument

uses chips with 45,000 wells. Chief technology officer Steve Turner added this week

that the expected commercial raw read accuracy will be between 85 and 90 percent.


16/67


17/67

CARLSBAD, Calif. December 14, 2010 Life Technologies Corporation

(NASDAQ: LIFE), a provider of innovative life science solutions, today

announced that it has launched its Ion Personal Genome Machine (PGM)

sequencer

A run takes approximately two hours, and several runs can be performed in a

day

The first version of the PGM will sell for $49,500, plus a $16,500 server to

analyze the data.

Initially, the machine will produce about 10 megabases of data per run, or

about 100,000 reads of 100 base pairs each, using the so-called 314 chip,

which has about 1.5 million wells and will cost $250. Reagent kits for template

preparation, library preparation, and sequencing will cost another $250,

bringing the total consumables cost per run to approximately $500.

In the first half of 2011, Ion Torrent plans to launch the 316 chip, with about 6

million wells, which will increase the output per run to 100 megabases and

which will cost about twice as much as the 314. Additional chip upgrades will

follow, with details to be revealed next year.


18/67

Illumina's Low-Cost MiSeq Promises to

Speed up Next-Gen Sequencing

January 18, 2011

Illumina last week announced a new low-cost sequencing platform called MiSeq,

which promises to go from purified DNA to analyzed data in as few as eight hours,

and to generate more than a gigabase of sequence in slightly more than a day of

total experimental time.

The instrument has a list price of under $125,000 and runs Illumina's existing

TruSeq sequencing-by-synthesis chemistry.

MiSeq will perform both single and paired-end sequencing with read lengthsof up to 2 x 150 base pairs, which might increase over time. It generates more

than 3.4 million single reads, or more than 6.8 million paired-end reads per

run. The maximum output per run is about 1.5 gigabases.

Cluster generation and sequencing takes about 4.5 hours, and data analysisand "demultiplexing" about two hours


19/67

CARLSBAD, Calif. Feb. 23, 2011 Life Technologies Corporation (NASDAQ:

LIFE) today announced that the Ion 318 semiconductor sequencing chip will be

available for early access in September of this year, complete with RNA-Seq kits and

analysis software, providing up to 1Gb of data output 100 times more thanoriginal Ion 314 chip introduced two months ago with the Ion Personal Genome

Machine (PGM) sequencer.

Ion Torrent has increased the number of accessible sensors on the Ion 318 chip to 11

million, from 1.2 million on the Ion 314 chip and 6.1 million on the Ion 316 chip, said

Gregg Fergus, President of Ion Torrent. The Ion 318 is ideal for applications like

transcriptome sequencing, miRNA sequencing and ChIP-Seq. The Ion 318 requires no

changes or upgrades to the Ion PGM sequencer with Ion Torrent The Chip is the

Machine, so sequencing is easier and more economical than ever before.

Internally, Ion Torrent has already achieved read lengths in excess of 300bp and

improvements in chemistry and loading are expected to increase chip utilization by five-

fold. By combining additional reads with longer read lengths, the company will achieve

exceptional scalability on its Ion PGM sequencing platform and expects to reach read

lengths of up to 400bp in 2012.


20/67

Genome (re)sequencing


21/67

Sequenziamento classico

21

Sequenziamento con read corte

genoma di riferimento

700-800 basi 35-100 basi


22/67

Why (Re)Sequencing

22


23/67

Individuo affetto dallasindrome di Miller

Farmacogenomica

Genoma del cancro


24/67

Centro di Genomica FunzionaleJoint Project

PowerEdgeTM R900

4 Intel Xeon 7450 6-core

128 Gb RAM

tempo macchina Illumina

Genome Analyzer II


25/67

Sequenziamento del genoma di Corvina

TGTTGGAATACCTGAAGATTGCTCAGGACCTGGAGA

CAACATTGGATCAAATGGATCTGATAGACCTTTACA

GAAAAAATGTAGTCAGCCTTTAACTTGGCCTGATAA

CACAGCTGGGGCTGTAGCAACCCTTTCCAACCCCTT

TAGTCGGTTGTTGATGAGATATTTGGAGGTGGGGAT

GTCAAAGGCAAAGGAAAAAATGTTCAATATAGTTAA

DNA genomico

frammentazione

C i i l


26/67

Duplicazione

Delezione

PN40024

Regioni delete(lunghezza media 46 kb)

Regioni duplicate(lunghezza media 43 kb)

500 20

I geni duplicati appartengono alle seguenti classi funzionali:

Metabolismo

Trasmissione del segnale

Risposta agli stress biotici

Funzione sconosciuta

Caratterizzazione strutturale:

duplicazioni e delezioni


27/67

Polimorfismi fra Corvina e PN40024

SNPs individuati SNPs in regioni

codificanti

Geni con SNP nella

loro sequenzacodificante

SNP in eterozigosi

392.775 156.601 13.854 (46%) 288.204 (75%)

processo biologico

sconosciuto

49%

metabolismo dei

lipidi

2%

metabolismo

8%

metabolismo

dei carboidrati

4%

metabolismo

degli

amminoacidi

2%

modificazione di

proteine

6%

energia1%

altri processi

metabolici

15%

riproduzione1%

cell death

0%

risposta a stress

3%traduzione

2%

trascrizione

6%

trasduzione del

segnale

1%


28/67

Verona, 9 Febbraio 2007

Centro di Genomica Funzionale

PowerEdgeTM R900

4 Intel Xeon 7450 6-core

128 Gb RAM

Illumina HiSeq 1000


29/67


30/67


31/67

The Protein Coding Genome~ 30 megabases (1% of genome)

~ 20.000 genes

~ 180.000 discontinuous sequences


32/67

Microarray Sequence


33/67

Genomic DNA

Fragment

& Add Linkers

Microarray Sequence

Capture

hybridize

Microarray

Wash

Elute

Sequencing

454 Genome

Sequencer

Probes

TargetDNA

Background DNA

Target DNA

Exon1 Exon2 Exon3 Exon4 Exon5 Exon6


34/67

Exome sequencing

Amplicon sequencing


35/67

Harismendy et al. Genome Biology2009 10:R32 doi:10.1186/gb-2009-10-3-r32

Amplicon sequencing


36/67

Nimb legen Sequence Capture


37/67

Nimb legen Sequence Capture

gDNA Library Preparation

DNA fragmentation

by Nebulization

Hybridization on

2.1M NimbleGen Array

Adaptor Ligation

Enrichment confirmation

by qPCR on internal control genomic loci

(Captured DNA vs. genomic)

We obtained a 100-fold enrichment (3 Gb / 30 Mb).

Specific probes can capture up to 30

Mb regions (thick coloured traits). The

remaning regions are washed away

(thin black traits).

i S i


38/67

Transcriptome Sequencing

(RNA-Seq)

TGTTGGAATACCTGAAGATTGC

CAACATTGGATCAAATGGATCT

GAAAAAATGTAGTCAGCCTTTA

CACAGCTGGGGCTGTAGCAACC

TAGTCGGTTGTTGATGAGATAT

GTCAAAGGCAAAGGAAAAAATG

Messenger

RNAs

cDNA

synthesismRNA

fragmentation

sonication

RNA extraction

from tissues

Data analysis

Reference

Gene predictions


39/67

First step:

alignment to the genome

Gene predictions

Coverage

Read alignments

We employ different programs for mapping the reads onto the referencegenome:

Bowtie: optimized for RNA-seq, it keeps note of multiple possible mappings for each read(multi-reads)

Needed for accurate expression estimation Tophat: evolution of Bowtie, includes the detection of splicing sites

Maq, BWA: leaned towards DNA-seq analyses, they report fewer hits, choosing the best ones.

Incompatible with expression analysis, used for SNP detection.

The output of all these programs can be transformed into the emergent

standard SAM format.


40/67

DIFFERENTIAL GENE EXPRESSION

NEW REGIONS DETECTION AND POSSIBLE


41/67

NEW REGIONS DETECTION AND POSSIBLE

ALTERNATIVE SPLICING


42/67

GENES MISSED


43/67

EXONS MISSED

INTRONS MISSED


44/67

Biologists want more than that, but...how to get more?


45/67

Vitis vinifera whole-genome expression lab

0

500

1000

1500

2000

e

-i

- - r- t-

f

-

l

-i

- -

Microarray hybridizations

SNP d i i h


46/67

GeneChip SNP Array 5

(> 500.000 SNPs)GAII x

RNA sample of Leukemia at diagnosys stage

SNP detection by

hybridization

SNP detection by analysis with:

- MAQ cns2snp

- ERANGE snp module

- SAMTOOLS

SNP detection in human

leukemia

Corvina transcriptome sequencing (mRNA Seq)


47/67

59.372.544 single reads, totaling 2,2 Gb

of sequence (30 X coverage)

TGTTGGAATACCTGAAGATTGCTCAG

CAACATTGGATCAAATGGATCTGATA

GAAAAAATGTAGTCAGCCTTTAACTT

CACAGCTGGGGCTGTAGCAACCCTTT

TAGTCGGTTGTTGATGAGATATTTGG

GTCAAAGGCAAAGGAAAAAATGTTCA

mRNA

cDNA sysnthesis

Corvina transcriptome sequencing (mRNA-Seq)

mRNA fragmentazion

frammentazione

Plant Physiol. 152: 17871795 (2010)

Post Fruit-Set

Verason

Ripening


48/67

Detection of new genes

Erange

Search for candidate

transcribed regions

islands of mapped reads far

from annotated genes

Not capable of estimating

the presence of different

isoforms

Cufflinks Transcript de novo

reconstruction, bypassingthe need for an annotation

Sophisticated analysescapable of distinguishingeven between differentisoforms

Paired-end data stronglysuggested for a properanalysis

New geneReads inunnanotated

regions

d f d


49/67

255075

100

255075

100

255075

100

Identified 479 new genes

New gene Reads mapping tounannotated regions

Plant Physiol. 152: 17871795 (2010)


50/67

SNP detection - methods We have developed a pipeline (SRTK) for detecting

SNPs. The suite Samtools is used to create a summary file

for the whole genome (termed pileup)

An in-house script is then used to remove data fromreads mapping in multiple locations

Finally, the SNPs are filtered for: map quality (minimum 20)

read coverage (minimum 2 reads)

minimum percentage of reads mapping onto the mutantallele vs. the annotated base (minimum 25%)

Id tifi d lt ti li i it


51/67

Identified alternative splicing sites

Protein

Alternative protein

Exon 1 Exon 2 Exon 3 Exon 4Constitutive splicing

Intron 1 Intron 2 Intron 3Exon 1 Exon 2 Exon 3 Exon 4

Alternative splicing Exon 1 Exon 2 Exon 4

N of

constitutive

splicing sites

N of alternative

splicing sites

N of genes

undergoing

alternative splicing

41,447 447 385

Alt ti li i


52/67

Alternative splicing

Junction database

approach Reads are mapped to an exon

junctions database constructed

starting from gene annotations.

Allows to detect exon skipping

events only

Default approach with E-RANGE

De novo recognition Tophat

Recognizes canonical splicing

sites only

Integrated in the

Tophat/Cufflinks pipeline

Supersplat

Performs a gapped alignment

of all reads against the

genome

Capable of recognizing also

non-canonical splicing sites Not as well integrated with

other software i.e. Cufflinks,

Samtools


53/67

expression analysis

E-RANGE Reads Bowtie, Eland or Blat output

Reads are assigned to gene models.

Multiple mapping reads are assigned to gene models proportionally to the density

of uniquely mapping reads.

Capable of detecting genes in previously not annotated regions

Cufflinks Reads a standard SAM input

Either looks for annotated genes or tries to reconstruct the original RNA fragments

Uses the probability of the alignment to be correct to assign multiple mapping

reads

Capable of discriminating between different isoforms

Currently in test phase, with promising results

SRTK In-house collection of scripts

MapReadsLocation.py performs a raw count of uniquely mapping reads mapped on

each gene

Used for RNA-seq statistics analysis

http://ddlab.sci.univr.it/srtk/

MICROARRAY ANALYSIS


54/67

ID stress_A stress_B stress_c control_A control_B control_C log2 fold change STRESS vsCONTROL

AB007870 976.7419 1106.0225 958.8752 978.6526 724.0019 717.6652 0.342

AB007877 52.4236 48.9145 61.4969 70.6655 83.2429 111.1115 -0.684

AB007878 91.737 92.8117 46.1988 77.2719 101.3615 166.1109 -0.575

AB007895 220.5646 190.4141 260.4116 324.2564 317.1986 273.5752 -0.454

AB007921 179.6105 224.7979 205.405 195.7703 139.9824 176.5451 0.259

AB007923 135.6203 93.2452 133.135 102.7447 125.3144 97.5942 0.141

AB007928 1991.7719 1867.8622 1763.1777 1063.0456 854.2858 835.9119 1.037

AB007937 1188.4386 1042.3558 1347.4832 1245.6073 1343.2552 1412.5417 -0.167

AB007940 167.5594 158.4953 123.5362 262.8919 251.2853 237.4419 -0.752

AB008790 1461.0386 1433.4358 1197.2452 696.7486 950.1929 877.7196 0.703

AB010419 1992.4652 2830.9386 2415.7361 2450.9499 2204.4398 2272.9572 0.05

AB010962 525.973 336.3452 572.9752 1090.8052 1704.4986 1163.3986 -1.472

AB011088 13193.9809 16559.3745 11695.7919 13953.8194 14616.103 18189.4252 -0.179

AB011097 18.6059 15.7499 18.4248 17.8529 18.0056 17.402 -0.017

AB011103 1736.1662 1783.1319 1795.1724 1876.8319 2027.4452 1701.4922 -0.073

AB011123 1692.9638 1439.6005 1639.5755 1084.6886 1042.3516 1144.1105 0.542

AB011154 567.9596 523.3949 631.1821 1177.9344 1258.5535 1287.5093 -1.116

AB011157 1964.3402 1926.9519 2525.3286 1579.0752 1702.1312 1445.1081 0.433

AB011163 363.8747 491.1119 159.7169 583.7652 670.0606 981.6137 -1.25

AB011174 3063.8319 2914.2986 3317.7625 3211.0308 3421.1982 3267.3989 -0.092

AB011180 374.4583 416.0086 346.6452 504.8333 299.0182 417.3016 -0.074

AB011539 539.2045 499.2931 677.8508 714.6286 993.7229 826.6439 -0.562

MICROARRAY ANALYSIS

RPKM allows transcript levels to be compared both


55/67

GENE IDRPKM

Post Fruit Set Veraison RipeningGSVIVT00027957001 83.1 21.09 12.72

GSVIVT00018861001 0.08 0.06 0.24

GSVIVT00023496001 0 0.67 1.69

GSVIVT00018860001 4.92 10.71 10.84

GSVIVT00023540001 30.52 23.03 0.54

GSVIVT00023541001 27.84 21.11 0.48

GSVIVT00027954001 71.23 17.45 10.63

GSVIVT00006661001 0 0.07 0GSVIVT00000218001 55.89 203.62 172.55

GSVIVT00024208001 47.21 30.47 1.45

GSVIVT00013910001 31.11 22.89 4.31

GSVIVT00012187001 14.45 45.82 57.02

GSVIVT00009770001 14.43 38.41 44.9

GSVIVT00012190001 12.25 26.79 34.05

GSVIVT00012566001 12.1 40.6 47.42GSVIVT00006640001 33.06 126.91 200.55

GSVIVT00006638001 10.46 23.92 28.36

GSVIVT00012567001 9.99 19.5 23.04

GSVIVT00006659001 9.68 18.97 21.07

GSVIVT00012193001 6.29 74.39 51.35

GSVIVT00006633001 4.87 11.72 14.54

GSVIVT00025776001 0.51 1.72 1.87

GSVIVT00028957001 0.07 0.13 0.18

RPKM allows transcript levels to be compared both

within and between samples

Plant Physiol. 152: 17871795 (2010)

Validation of RNA Seq based Gene Expression:


56/67

GSVIVT00034646001 Chitinase

post fruit set veraison ripening

0 24.05 66.56RNA-Seq

Real-time RT-PCR

Validation of RNA-Seq based Gene Expression:

qRT-PCR of selected genes

Plant Physiol. 152: 17871795 (2010)

Validation of RNA Seq based Gene Expression:


57/67

GSVIVT0002441001 polygalacturonase

Real-time RT-PCR

post fruit set veraison ripening

0 0.19 0.11RNA-Seq

Validation of RNA-Seq based Gene Expression:

qRT-PCR of selected genes

Plant Physiol. 152: 17871795 (2010)


58/67

OUTPUT: Exhaustive overview of gene expression dynamics


59/67

GENE ID

RPKM

p-value

Veraison/Pos

t fruitset

p-value

Veraison/Ripen

ing

p-value

Ripening/Post

fuitset

Cluster Gene Description Functional CategoryPost

Fruit-

Set Veraison Ripening

GSVIVT00000001001 0 0,34 0 1,00E+00 5,97E-01 1,00E+00 7 No Hit Found No Hit FoundGSVIVT00000003001 0 0 0 1,00E+00 1,00E+00 1,00E+00 No Hit Found No Hit Found

GSVIVT00000004001 0,23 0,21 0,06 8,15E-01 7,77E-02 7,77E-02 4 Sulfotransferase GO:0008150

GSVIVT00000005001 4,16 3,57 6,15 3,61E-01 2,62E-03 1,63E-03 Glycosyl transferase, family 8 GO:0005975

GSVIVT00000007001 0,56 0,52 0,15 8,15E-01 7,77E-02 7,77E-02 4 Sulfotransferase GO:0008150

GSVIVT00000008001 4,69 13,01 8,46 0,00E+00 0,00E+00 3,77E-10 7 Glycosyl transferase, family 8 GO:0005975

GSVIVT00000009001 0 0 0 1,00E+00 1,00E+00 1,00E+00 Mitochondrial substrate carrier GO:0051234

GSVIVT00000010001 0 0 0 1,00E+00 1,00E+00 1,00E+00 UspA GO:0050896

GSVIVT00000011001 0 0 0 1,00E+00 1,00E+00 1,00E+00 Transcription factor, MADS-box GO:0050789

GSVIVT00000012001 0,36 0,67 0,23 4,40E-01 6,33E-02 7,41E-01 7 Transcription factor, MADS-box GO:0050789

GSVIVT00000013001 6,1 5,12 6,23 8,47E-01 1,59E-07 3,03E-09 No Hit Found No Hit Found

GSVIVT00000015001 1,3 0,76 0,79 6,56E-02 1,00E+00 1,82E-01 Alpha-1,4-glucan-protein synthase GO:0044036

GSVIVT00000017001 65,7 30,8 24,26 0,00E+00 1,00E-15 0,00E+00 2 Glycosyl-phosphatidyl inositol-anchored GO:0005975

GSVIVT00000018001 0 0 0 1,00E+00 1,00E+00 1,00E+00 No Hit Found No Hit Found

GSVIVT00000019001 0 0,03 0 1,00E+00 1,00E+00 1,00E+00 7 Glycosyl-phosphatidyl inositol-anchored GO:0005975

GSVIVT00000020001 2,49 0,03 0 0,00E+00 1,00E+00 0,00E+00 2 Glycosyl-phosphatidyl inositol-anchored GO:0005975

GSVIVT00000021001 0 0 0 1,00E+00 1,00E+00 1,00E+00 No Hit Found No Hit Found

GSVIVT00000022001 0 0 0 1,00E+00 1,00E+00 1,00E+00 WD40 repeat GO:0050789

GSVIVT00000023001 0,68 0,58 1,4 6,21E-01 5,37E-03 5,37E-03 3 Pentatricopeptide repeat GO:0008150

GSVIVT00000024001 7,38 10,5 19,98 9,85E-02 5,33E-04 2,23E-09 1 No Hit Found No Hit Found

GSVIVT00000025001 1,48 5,63 22,44 0,00E+00 0,00E+00 0,00E+00 1 Cytochrome P450 GO:0015979

GSVIVT00000027001 0,09 0,1 0,16 1,00E+00 5,64E-01 3,61E-01 ABC transporter, transmembrane region GO:0051234GSVIVT00000028001 0 0 0 1,00E+00 1,00E+00 1,00E+00 Mitochodrial transcription termin. factor GO:0044238

GSVIVT00000029001 0 0 0 1,00E+00 1,00E+00 1,00E+00 Peptidase S54, rhomboid GO:0008150

GSVIVT00000032001 2,69 2,05 1,51 1,57E-01 9,79E-02 1,90E-02 No Hit Found No Hit Found

GSVIVT00000033001 3,45 5,28 2,22 1,66E-08 0,00E+00 4,83E-04 7 No Hit Found No Hit Found

GSVIVT00000034001 0,46 0,13 2,02 3,39E-01 1,92E-06 9,68E-05 8 No Hit Found No Hit Found

GSVIVT00000035001 0,84 0,87 0 9,00E-01 2,76E-11 5,26E-09 4 NLI interacting factor GO:0008150

GSVIVT00000036001 3,85 10,7 2,36 0,00E+00 0,00E+00 2,70E-04 7 No Hit Found No Hit Found

GSVIVT00000037001 5,65 0 0 0,00E+00 1,00E+00 0,00E+00 6 No Hit Found No Hit Found

GSVIVT00000038001 0 0,06 0 1,00E+00 1,00E+00 1,00E+00 7 Transcription factor, TCP GO:0050789

OUTPUT: Exhaustive overview of gene expression dynamics

2010


60/67

IdentifyCorvinasproprietary set of genes


61/67

fy p p y f g

(missing in PN40024)

Reference genome


62/67

Uncorking Corvinas secrets: identified

a set of 187 proprietary genes (154 alreadypresent in the VvGI database)

ATP-binding

3%

cell wall organization

3%

cytoskeleton organization

3%

defense response

15% flower

development

3%

hydrolase

6%

transport

15%NA

12%

primary metabolic

process

19%

protein aminoacid

phosphorylation

6%

regulation of

transcription3% signal transduction

12%


63/67


64/67


65/67


66/67


67/67

Plant Biology

Mario Pezzotti

Diana BellinSara Zenoni

Medicine

Giovanni Martinelli (UniBO)

Marina Noris (Mario Negri)

Aldo Scarpa

Bioinformatics

Luca Venturini

Luciano Xumerle

Stefano Barbi

Alberto Ferrarini

Statistics

Giovanni Malerba

Paola TononiAlberto Ferrarini

Noel DagoLuca Venturini

Luciano Xumerle Giovanni Malerba

PowerEdgeTM R900

Genny Bruson

1 Delledonnebioinformatica

Documents

Transcript of 1 Delledonnebioinformatica