Bioinformatica BioPerl Dr. Giuseppe Pigola – [email protected].

BioinformaticaBioPerl

Dr. Giuseppe Pigola – [email protected]

Link Utili http://www.bioperl.org

Utilizzare il tool Perl Package Manager: http://www.bioperl.org/wiki/Installing_Bioperl_on_Windows

Altri package: http://biojava.org http://biopython.org http://www.biophp.org

2 Bioinformatica

BioPerl BioPerl è una collezione di moduli Perl che

favoriscono lo sviluppo di script relativi ad applicazioni bioinformatiche;

Dato che Perl è un ottimo linguaggio per la manipolazione di testo risulta molto efficace nelle applicazioni bioinformatiche;

BioPerl è orientato agli oggetti;

3 Bioinformatica

Namespace di BioPerl Bio:: Seq: Oggetto sequenza (DNA,RNA, Proteina); Bio::SeqIO: Recupero e conservazione delle sequenze (in

tanti formati); Bio::SeqFeature: Caratteristiche (Gene, Esone,Promotore,

etc); Bio::Annotation: Usato per memorizzare link a DB, letteratura

e commenti; Bio::AlignIO; Bio::SimpleAlign; Bio::DB; Bio::SearchIO; ………. ….

4 Bioinformatica

Manipolare Sequenze Crea un oggetto sequenza con determinati attributi:

5 Bioinformatica

Use Bio::Seq;

$seq = Bio::Seq->new(’-seq’=>’actgtggcgtcaact’,’-desc’=>’Sample Bio::Seq object’,’-display_id’ => ’something’,’-accession_number’ => ’accnum’,’-moltype’ => ’dna’);$seq->display_id(); # Common Name$seq->seq();$seq->length();$seq->subseq(5,10); #Restituisce una stringa$seq->accession_number();$seq->moltype();$seq->primary_id(); # Indipendente dagli ID nei vari DB$seq->trunc(5,10) # Sottostringa (nuovo oggetto)$seq->revcom # Sequenza complementare (nuovo oggetto)$seq->translate # Traduzione of the sequence (nuovo oggetto)$seq->translate(p1,p2,p3) # p1=simbolo codone di stop, p2=aa X, p3= frame;

Semplici Statistiche Statistiche sulla sequenza:

6 Bioinformatica

Use Bio::Seq;use Bio:: Tools::SeqStats;

$seq = Bio::Seq->new(’-seq’=>’actgtggcgtcaact’,’-desc’=>’Sample Bio::Seq object’,’-display_id’ => ’something’,’-accession_number’ => ’accnum’,’-moltype’ => ’dna’);

$seq_stats = Bio::Tools::SeqStats->new($seq);$weight = $seq_stats->get_mol_wt(); #inf e sup (array)$monomer_ref = $seq_stats->count_monomers(); # frequenze (hash)$codon_ref = $seq_stats->count_codons(); # for nucleic acid sequence (array)

BLAST in Locale Ricercare sequenze simili sul DB “ecoeli.nt”:

7 Bioinformatica

Use Bio::Seq;Bio::Tools::StandAloneBlast;

@params = (’program’ => ’blastn’,’database’ => ’ecoli.nt’);$factory = Bio::Tools::StandAloneBlast->new(@params);

$input = Bio::Seq->new(’-id’=>"test query“,’-seq’=>"ACTAAGTGGGGG");$blast_report = $factory->blastall($input);

Smith-Waterman o Blast2Seq Deve essere installato (bioperl-ext):

8 Bioinformatica

Use Bio::Seq;use Bio::Tools::pSW;Bio::Tools::StandAloneBlast;

$seq1 = Bio::Seq->new(’-seq’=>’actgtggcgtcaact’,’-desc’=>’Sample Bio::Seq object’,’-display_id’ => ’something’,’-accession_number’ => ’accnum’,’-moltype’ => ’dna’ );

$seq2 = Bio::Seq->new(’-seq’=>’actgtggcgtcaact’,’-desc’=>’Sample Bio::Seq object’,’-display_id’ => ’something’,’-accession_number’ => ’accnum’,’-moltype’ => ’dna’ );

$factory1 = new Bio::Tools::pSW( ’-matrix’ => ’blosum62.bla’,’-gap’ => 12,’-ext’ => 2, );$factory1->align_and_show($seq1, $seq2, STDOUT); #Allinea e mostra$aln = $factory1->pairwise_alignment($seq1, $seq2); # Allinea e restituisce un oggetto;

$factory2 = Bio::Tools::StandAloneBlast->new(’outfile’ => ’bl2seq.out’);$bl2seq_report = $factory2->bl2seq($seq1, $seq2);

# Usiamo AlignIO.pm per creare un oggetto SimpleAlign dal report di blast2seq$str = Bio::AlignIO->new(’-file ’=>’ bl2seq.out’,’-format’ => ’bl2seq’);

ClustalW – TCoffee Deve essere installato (bioperl-ext):

9 Bioinformatica

Use Bio::Seq;use Bio::Tools::Run::Alignment::Clustalw;

@params = (’ktuple’ => 2, ’matrix’ => ’BLOSUM’);$factory = Bio::Tools::Run::Alignment::Clustalw->new(@params);$ktuple = 3;$factory->ktuple($ktuple); # Cambia il parametro prima dell’esezuzione$seq_array_ref = \@seq_array; # @seq_array è un array di sequenze

$aln = $factory->align($seq_array_ref);

GenScan Deve essere installato (bioperl-ext):

10 Bioinformatica

use Bio::Seq;use Bio::Tools::Genscan;

$genscan = Bio::Tools::Genscan->new(-file => ’result.genscan’);

# $gene è una istanza di Bio::Tools::Prediction::Gene# $gene->exons() ritorna un array di oggetti Bio::Tools::Prediction::Exonwhile($gene = $genscan->next_prediction()){ @exon_arr = $gene->exons(); }$genscan->close();

Esempio: Formattare una sequenza Legge da File una sequenza in formato FASTA e la

riscrive in un altro file in formato EMBL:

Formati: Fasta, EMBL, GenBank, Swissprot, PIR, GCG, SCF, phd/phred, Ace, oppure raw (plain sequence);

11 Bioinformatica

use Bio::SeqIO;

$in = Bio::SeqIO->new('-file' => "inputfilename", '-format' => 'Fasta'); $out = Bio::SeqIO->new('-file' => ">outputfilename", '-format' => 'EMBL'); while ( my $seq = $in->next_seq() ) { $out->write_seq($seq); }

Esempio: Formattare un allineamento Legge da File un allineamento in formato FASTA e

lo riscrive su un altro file in formato PFAM:

12 Bioinformatica

use Bio::SeqIO;

$in = Bio::AlignIO->new(’-file’ => "inputfilename" ,’-format’ => ’fasta’);$out = Bio::AlignIO->new(’-file’ => ">outputfilename“,’-format’ => ’pfam’);while ( my $aln = $in->next_aln() ) { $out->write_aln($aln); }

Esempio: Accedere ad un DB (1) Ricerca la sequenza ROA1_HUMAN sul DB di genbank,

stampa Accession number, descrizione e sequenza (in formto FASTA):

Formati: Fasta, EMBL, GenBank, Swissprot, PIR, GCG, SCF, phd/phred, Ace, oppure raw (plain sequence);

13 Bioinformatica

#!/usr/bin/perluse strict;use Bio::DB::GenBank;use Bio::Seq;use Bio::SeqIO;

my $database = new Bio::DB::GenBank; my $seq = $database->get_Seq_by_id('ROA1_HUMAN');print "Seq: ", $seq->accession_number(), " -- ", $seq->desc(), "\n\n";my $out = Bio::SeqIO->newFh ( -fh => \*STDOUT, -format => 'fasta');

print $out $seq;

Esempio: Accedere ad un DB (2) Ricerca la sequenza ROA1_HUMAN sul DB di

genbank, stampa Accession number, descrizione e sequenza (in formto FASTA):

14 Bioinformatica

#!/usr/bin/perluse Bio::Perl;$seq_object = get_sequence("genbank","ROA1_HUMAN");write_sequence(">roa1.fasta.txt",'fasta',$seq_object);

Esempio: Accedere ad un DB (3) Ricerca la sequenza AB077698 sul DB di genPept, e

la stampa sul STDOUT:

15 Bioinformatica

#!/usr/bin/perl -wuse strict;use Bio::DB::GenPept;use Bio::DB::GenBank;use Bio::SeqIO;my $db = new Bio::DB::GenPept();my $out = new Bio::SeqIO(-format => 'fasta');my $acc = 'AB077698';my $seq = $db->get_Seq_by_acc($acc);if( $seq ) { $out->write_seq($seq);} else { print STDERR "cannot find seq for acc $acc\n";}$out->close();

Esempio: Accedere ad un DB (4) Ricerca sul DB Taxonomy di NCBI (deve essere

installato XML::Twig):

16 Bioinformatica

#!/usr/bin/perl -wuse Bio::DB::Taxonomy;my $db = new Bio::DB::Taxonomy(-source => 'entrez');$node1 = $db->get_Taxonomy_Node(-taxonid => '9606');$node2 = $db->get_Taxonomy_Node(-name => 'Homo sapiens');

$pnode = $node->get_Parent_Node();$parentid = $node->parent_id;my @class = $node->classification;$node->name; $node->scientific_name;

Bioinformatica BioPerl Dr. Giuseppe Pigola – [email protected].

Documents

Transcript of Bioinformatica BioPerl Dr. Giuseppe Pigola – [email protected].