Lordito algoritmico: alcuni problemi algoritmici che hanno favorito il progresso scientifico....
-
Upload
damiano-marra -
Category
Documents
-
view
215 -
download
1
Transcript of Lordito algoritmico: alcuni problemi algoritmici che hanno favorito il progresso scientifico....
L’ordito algoritmico:alcuni problemi algoritmici che hanno favorito il
progresso scientifico.
Alberto PolicritiDipartimento di Matematica e Informatica,
Universita’ di Udine.
www.dimi.uniud.it/~policrit
Di cosa parleremo
• Classi di problemi (i problemi specifici richiedono un trattamento tecnico)
• Problemi “significativi” (che legano l’algoritmica ad altre discipline)
• Complessita’ (perche’, alla fine, e’ il vero problema dell’algoritmica)
Quali problemi
• Il problema della decisione (Entscheidungsproblem)
• Problemi algoritmici in biologia computazionale
• Una riflessione sulla nozione di complessita’
Passato Presente Futuro
Le fonti principali
• M. Davis “The Universal Computer: the road from Leibniz to
Turing”• S. Feferman “On the light of Logic”• E. Green “Strategies for the systematic sequencing of complex
genomes”• D. Knuth “Papers on the foundation of Computer Science”
Il problema della decisione
Trovare un algoritmo per decidere le formule
se una formula della logica al prim’ordine e’
soddisfacibile.
In a sense it [il problema della decisione] is the most general probem of mathematics.
J. Herbrand
La logica del prim’ordine
x‚y‚uxyx‚u vy ‚vv‚uSe x e y sono donne e x e’ felice con u, allora esiste v tale che y e’ felice con ve u e v sono amici
Se x e y sono punti e x e’ sulla retta u, allora esiste v tale che y e’ sulla retta v e u e v sono parallele
Esempio:
Esempio:
Un algoritmo per risolvere il problema della decisione ci potrebbe dire se l’ipotesi di Riemann (ottavo problema di Hilbert) e’ vera o falsa!
David Hilbert
D. Hilbert nel 1937
Born: 23 Jan 1862 in Königsberg, Prussia (now Kaliningrad, Russia)Died: 14 Feb 1943 in Göttingen, Germany
The Entscheidungsproblem is solved when we know a procedure that allows for any given logical expression to decide by finitely many operations its validity or satisfiability. [...] The Entscheidungsproblem must be considered the main problem of mathematical logic.
Principles of Mathematical LogicD. Hilbert and W. Ackermann 1928
Hilbert sapeva porre problemi!
The mathematicians present at an international conference in Paris in August 1900 inevitably wondered what the new century would bring to their subject. [...] he presented, as a challenge to the mathematicians of the twentieth century, 23 problems that seemed utterly inaccessible by the methods available at the time. The Universal
ComputerM. Davis
In his work, Hilbert demonstrated an unusual combination of direct intuition and concern for absolute rigor. With exceptional technical power at his command, he would tackle outstanding problems, usually with a great originality of approach.The title of Hilbert’s lecture in Paris was simply, “Mathematical problems”.
Deciding the undecidable: Wrestling with Hilbert’s problemsS. Feferman
The great importance of definite problems for the progress of mathematical science in general ... is undeniable. ... [for] as long as a branch of knowledge supplies a surplus of such problems, it maintains its vitality. ... every mathematician certainly shares ..the conviction that every mathematical problem is necessarily capable of strict resolution ... we hear within ourselves the constant cry: There is the problem, seek the solution. You can find it through pure thought...
D. Hilbert
1. L’ipotesi del continuo2. La consistenza dell’aritmetica10. L’esistenza di un algoritmo per risolvere le equazioni
diofantee
The solution of three of Hilbert’s problems were to involve mathematical logic and the foundation of mathematics in an essential way; they are the ones numbered 1,2, and 10 in his list
Non parleremo di 1. e 2. e’ il legame con il problema della decisione
Deciding the undecidable: Wrestling with Hilbert’s problemsS. Feferman
Esempio:
E’ possibile scrivere una equazione diofantea che ammette soluzioni intere se e solo se l’ipotesi di Riemann e’ falsa.
Equazioni diofantee: P(x1, ... , xk) = 0 con P polinomio a coefficienti interi
Contrary to Hilbert’s expectations, Problem 10 was eventually solved in the negative. This was accomplished in 1970 by a young russian mathematician, Yuri Matiyasevich, who built on earlier work in 1950’s and 1960’s by the American logicians Martin Davis, Hilary Putnam, and Julia Robinson. [...]
Il decimo problema di Hilbert
Gia’ nel 1920 si sospettava che problemi come il precedente fossero indecidibili. Ma come dimostrare che non esiste un
algoritmo??
Deciding the undecidable: Wrestling with Hilbert’s problemsS. Feferman
La soluzione del secondo problema: il simposio di
Könisberg del 1930During the days immediately preceding Hilbert’s address, a symposium on the foundations of mathematics took place in Königsberg. [...] At the round table discussion that concluded the event, a shy young man named Kurt Gödel [...] made a quiet announcement that, to those who grasped its import, signalled a new era in foundational studies. Von Neumann got the point at once, and concluded that the jig was up, that Hilbert’s program could not succeed.
The Universal Computer
M. Davis
Il programma di Hilbert
1. La consistenzaconsistenza dell’aritmetica (secondo problema di Hilbert)
2. La completezzacompletezza della logica e dell’aritmetica (Gödel 1928)
3. Il problema della decisione (Entscheidungsproblem)
Kurt Gödel
Born: 28 April 1906 in Brünn, Austria-Hungary (now Brno, Czech Republic)Died: 14 Jan 1978 in Princeton, New Jersey, USA
The crucial step in Gödel’s proof was his demonstration that the propertyof a natural number of being the code of a proposition provable in PM isitself expressible in PM.[...]- U says that some particular proposition is not provable in PM.- That particular proposition is none other than U itself.- Therefore, U says: “U is not provable in PM.”
Gödel aveva scritto il primo compilatore e ... decretato la fine del programma di Hilbert!
The Universal Computer
M. Davis
Cosa rimane del programma di Hilbert?
Hilbert had also sought explicit calculational procedures bymeans of which it would always be possible to determine, given some premises and a proposed conclusion, written in the notation of what has come to be called “first-order logic”, whether Frege’s rules would enable that conclusion to be derived from those premises. The task of finding such procedures came to be known as Hilbert’s Entscheidungsproblem (literally: decision problem),
The Universal Computer
M. Davis
C’erano risultati parziali e i granndi giovani matematici erano tutti attivi: F. P. Ramsey, W. Ackermann, P. Bernays , M. Shönfinkel e lo
stesso Gödel
Apparently intrigued by these developments, Newman gave a lecture course in the spring term of 1935 on the foundations of mathematics featuring Gödel’s incompleteness theorem as its climax. Attending this course, Turing learned about Hilbert’s Entscheidungsproblem. Quite apart from the incredulity of such as Hardy, after Gödel’s work it was hard to believe that there could be an algorithm such as Hilbert had wanted. Alan Turing began to think about how it could be possible to prove that no such algorithm exists.
The Universal Computer
M. Davis
Now, if someone comes along with a proposed algorithm to settle a given decision problem in a positive way, one can check to see that it does the required work (or at least try to do so), without inquiring into the general nature of what constitutes an algorithm. But if it is to be shown that the problem is undecidable, one has to have a precise explanation of what algorithms can compute in general.
Deciding the undecidable: Wrestling with Hilbert’s problemsS. Feferman
Alan Turinghttp://www.turing.org.uk/turing/
His high pitched voice already stood out above the general murmur of well-behaved junior executives grooming themselves for promotion within the Bell corporation. Then he was suddenly heard to say: "No, I'm not interested in developing a powerful brain. All I'm after is just a mediocre brain, something like the President of the American Telephone and Telegraph Company."
Quoted in A Hodges, Alan Turing the Enigma of Intelligence, (London 1983) 251.
[...] on the basis of Turing’s analysis of the notion of computation, it is possible to conclude that anything computable by any algorithmic process can be computed by a Turing machine. So if we can prove that some particular task can not be accomplished by a Turing machine, we can conclude that no algorithmic process can accomplish that task. That is how Turing proved that there is no algorithm for the Entscheidungsproblem. In addition, Turing showed how to produce one individual Turing machine that, all by itself, can do anything that could be done by any Turing machine whatever – a mathematical model of an all-purpose computer.
The Universal Computer
M. Davis
Il metodo diagonale nel lavoro di Turing
Now, if we think of the halting set of a Turing machine as constituting a “package” and of the code number of that machine as labeling that package, then we have exactly the typical setup for applying the diagonal method: labeled packages in which the labels are exactly the kind of thing in the packages – in this case, natural numbers.
The Universal Computer
M. Davis
La macchina universale di Turing
The universal machine also provides a model of a “stored program” computer [...] in which the machine makes no fundamental distinction between “program” and “data.” Finally, the universal machine shows how “hardware” [...] thought of as a description of the functioning of a mechanism, canbe replaced by equivalent “software” [...] “stored” on the tape of a universal machine.
On computable numbers with an application to the `Entscheidungsproblem’A. Turing Proc. of the London Mathematical Society 1937
The Universal Computer
M. Davis
Turing’s universal computer was a marvelous conceptual device that all by-itself could execute any algorithmic task. But could one actually build such a thing? And aside from what such a machine could accomplish “in principle,” could it be designed and constructed so as to be able to solve real world problems in an acceptable time frame, and using reasonable available resources?
By the end of 1945, Turing had produced his remarkable ACE (Automatic Computing Engine) Report. One detailed comparison of the ACE Report with von Neumann's EDVAC Report, notes that whereas the latter ``is a draft and is unfinished … more important … is incomplete …'' the ACE Report ``is a complete description of a computer, right down to the logical circuit diagrams'' and even including ``a cost estimate of £11,200.''
The Universal Computer
M. Davis
ACE: la risposta (inglese) di Turing ad Edvac
[It] is … very contrary to the line of development here, and much more in the American tradition of solving one's difficulties by means of much equipment rather than by thought.… Furthermore certain operations which we regard as more fundamental than addition and multiplication have been omitted.---------------------------------------------Alan Turing
Problemi algoritmici in biologia computazionale
Astronomy began when the Babylonians mapped the heavens. Our descendants will certainly not say that biology began with today’s genome projects, but they may well recognize that a great acceleration in the accumulation of biological knowledge began in our era. To make sense of this knowledge is a challenge, and will require increased understanding of the biology of cells and organisms. But part of the challenge is simply to organise, classify and parse the immense richness of sequence data.
Biological sequence analysisR. Durbin, S. Eddy, A. Krogh and G. Mitchinson
Un po’ di storia
• 1953: F. Crick e J. Watson scoprono la struttura a doppia elica del DNA
• anni ’70: si sviluppano le tecniche per il sequenziamento di spezzoni di DNA (F. Sanger)
• anni ’80: viene lanciato il progetto genoma e partono le prime sperimentazioni pilota (insieme alle prime compagnie per lo sfruttamento commerciale di queste ricerche)
• anni ’90: vengono sequenziati i primi organismi (qualche M di paia di basi)
• 1990: viene pubblicato BLAST
• 1998: C. Venter annuncia la costituzione della compagnia privata Celera e sfida il consorzio pubblico per il sequenziaemnto del genoma umano: Celera otterra’ il risultato in 3 anni (e 300 M di $)
http://www.accessexcellence.org/AB/
Human Genome Working Draft Sequencepublished February 15 & 16, 2001
Science and Nature
Clone-by-clone shotgun sequencing
Dietro la sfida:Two main shotgun-sequencing strategies.
Whole-genome shotgun sequencing
Programmi e algoritmi in bioinformatica
[...] Yet other programs provide user-friendly viewers for inspection and editing of the resulting sequence assemblies. A particularly popular suite of programs for these various steps is Phred, Phrap and Consed,which are designed for base calling, sequence assembly and the viewing of sequence assemblies, respectively. [...]
(21 occorrenze della parola “programs” 2 della parola “algorithms”)
Strategies for the systematic sequencing of complex genomesEric D. Green
Programmi e algoritmi nella sfida
Finally, perhaps the most essential element of any whole-genome shotgun-sequencing strategy is the availability of a robust assembly program that can accommodate the inevitably large collection of sequence reads. [...] include algorithms that account for the anticipated spatial relationship of read pairs emanating from individual subclones, which help to avoid misassemblies due to repetitive sequences.
Strategies for the systematic sequencing of complex genomesEric D. Green
Com’e’ finita la sfida?
Among the most useful computer-based tools in modern biology are those that involve sequence alignments of proteins, since these alignements oftem provide insights into gene and protein function. There are several types of alignments: global alignments of pairs of proteins, multiple alignments of members of protein families, and alignments made diring data base searches to detect homologies.
S. Henikoff and J.G.Henikoff PNAS 1992
L’allineamento di sequenze
GTTGAT_TAGCTTATCCCAAAGCAAGGCACTGAAAATG_CTAGATGT_GATGTAGCTTAACCCAA_GCAAGGCACTAAAAATGCCTAGAT
Input:
GTTGATTAGCTTATCCCAAAGCAAGGCACTGAAAATGCTAGATGTGATGTAGCTTAACCCAAGCAAGGCACTAAAAATGCCTAGAT
Output:
Cos’e’ un allineamento?
Algoritmi
• Needelman-Wunsh 1970• Smith –Waterman 1981• Landau-Vishkin 1986• Wu-Manber 1992• Myers 1994• Chang-Lawler 1994• ...
G T T G A T T A G C T T A
G 0 1 2 3 4 5 6 7 8 9 10 11 12
T 1 0 1 2 3
G 2 1 1 1 2
A 3 2 2 2 1
T 4 3 2
G 5 4 3
T 6 5 4
A 7 6 5
GTTGATTAGCTTATCCCAAAGCAAGGCACTGAAAATGCTAGATGTGATGTAGCTTAACCCAAGCAAGGCACTAAAAATGCCTAGAT
GTTGAT_TAGCTTATCCCAAAGCAAGGCACTGAAAATG_CTAGATGT_GATGTAGCTTAACCCAA_GCAAGGCACTAAAAATGCCTAGAT
Altri problemi algoritmici correlati
• exact-matching (un problema piu’ “vecchio” e forse meno “applicativo”, gli algoritmi per la cui soluzione si sono rivelati fondamentali)
• strutture dati (non conviene rappresentare in memoria sequenze come stringhe ma come sistemi di indici per tutti i possibili suffissi della sequenza)
• protein folding (un bel problema NP-completo che ci hanno regalato i biologi)
• ...
Riflessioni conclusive
• Il problema della decisione poteva essere difficile ma era enunciato in modo chiaro e preciso. Matematicamente “pulito”.
• I problemi algoritmici in biologia computazionale non sono sempre altrettanto “puliti” (forse, piu’ sono interessanti e piu’ sono “sporchi”).
• In cosa consiste veramente la complessita’ di un problema algoritmico?
Complessita’: le risorse che abbiamo sono finite
Advances in our ability to compute are bringing us substantially closer to ultimate limitations
D. Knuth
Mathematics and Computer Science: Coping with Finiteness
Che risorse (computazionali) abbiamo?
40 miliardi di anni luce
10-13 cm
Universo
protone
10125
(maggiore o uguale al) numero di protoni nell’universo
Se assumiamo una unita’ di tempo pari al tempo necessario alla luce a viaggiare per 10-13 cm e assumiamo che l’universo sia nato 10 milioni di anni fa, il numero di
unita’ di tempo trascorse e’ minore o uguale a
1042
Che “speranze” abbiamo
• snail 0.0006 miles/h
• man 4 miles/h
• US auto 55 miles/h
• Jet 600 miles/h
• Supersonic jet 1200 miles/h
• man (pencil) 0.2/sec
• man (abacus) 1/sec
• calculator 4/sec
• computer 200.000/sec
• fast computer 2M/sec
start
finish
Grid problem: calcolare il numero di cammini da start a finish
Il problema e’ difficile
• non ci sono metodi noti per calcolare il numero di cammini (in a reasonable amount of time)
• possiamo comunque generare dei cammini random e usare un teorema di statistica che ci dice che la stima migliore e’ data dalla media dei reciproci delle probabilita’ osservate
• otteniamo una stima enorme: (1.6 ± 0.3) 1024
il problema di stabilire una (qualunque) proprieta’ dei cammini sulla griglia e’ algoritmicamente trattabile?
non possiamo contare nemmeno su una procedura esaustiva per enumerare i cammini!
Forse abbiamo bisogno di una teoria della complessita’ algoritmica che ci permetta di
classificare questo come un problema difficile
Un problema semplice (da enunciare) e “pulito”, ma ...
Conclusioni
I problemi algoritmici costituiscono l’ossatura dell’informatica e le loro soluzioni richiedono uno sforzo (matematico) genuino e particolare
I problemi algoritmici si sono rivelati essere “dietro la scena” in momenti cruciali dell’avanzamento scientifico
La complessita’ ed una teoria adeguata per il suo studio e’ probabilmente la piu’ interessante delle attuali sfide algoritmiche
My favorite way to describe computer science is to say that it is the study of algorithms.
D.Knuth