INFORMATICA UMANISTICA D: LESSICOGRAFIA & COMPUTER Dizionari elettronici WordNet.

Post on 01-May-2015

222 views 2 download

Transcript of INFORMATICA UMANISTICA D: LESSICOGRAFIA & COMPUTER Dizionari elettronici WordNet.

INFORMATICA UMANISTICA D: LESSICOGRAFIA & COMPUTER

Dizionari elettronici

WordNet

Dizionari elettronici

Strumenti informatici usati non piu’ solo per realizzare dizionari cartacei, ma per sviluppare nuovi tipi di dizionari che consentono nuove forme di ricerca

DIZIONARI PER L’INGLESE IN FORMA ELETTRONICA

Oxford English Dictionary, seconda edizione

Oxford Talking Dictionary Concise Oxford Dictionary Learner dictionaries:

Longman Dictionary of Contemporary English (LDOCE)

Collins COBUILD English Dictionary

CONCISE OXFORD DICTIONARY

RICERCA: Headword search (con *) Hypertext search Full text search (also of phrases / groups)

FILTRI: etymology, phrasal verbs, suffixes

COLLINS: COBUILD

Disponibile da: http://

www.biblio.unitn.it/BancheDati/BancheDati.asp

DIZIONARI ELETTRONICI PER L’ITALIANO

Il VELI Zanichelli: CD-ROM Multilingue,

Scaffale Elettronico Devoto-Oli Garzanti: IPA `parla’

DEVOTO-OLI

ESEMPIO: DEVOTO-OLI

Ricerca normale Forme di citazione (incrementale)

Hyperlinks Definizione / declinazione Sinonimi / contrari Ricerca avanzata No: pronuncia; citazioni? Limitato: storico

DEVOTO-OLI: SINONIMI E CONTRARI

ESEMPIO:ZINGARELLI INTERATTIVO

MRDS

Distinzione importante: Dizionari consultabili elettronicamente Dizionari MACHINE READABLE Dizionari MACHINE TRACTABLE

Particolarmente utili: dizionari creati per EFL: LDOCE COBUILD

Progetto piu’ ambizioso: ODE in XML

ESEMPIO: ODE su CD-ROM (in XML)

Esempio di database lessicografico in XML (= estremamente machine tractable)

ODE IN XML: OVERVIEW

ODE IN XML: FORMATO DELLE ENTRIES

<se>  <cn>815750</cn> - <hg> <hw>stock</hw> </hg> <s1>  <ps>noun</ps> - <s2 num="1">-   <df>the goods or merchandise kept on the premises of a shop or warehouse and available for sale or distribution:</df>   <ex>the store has a very low turnover of stock</ex>   |   </S2> <S2 num=“2”> …… </S2> </S1> <s1> <ps>adjective</ps> …..

ODE IN XML: INFORMAZIONI NLP

- <nlp>  <sup>merchandise</sup>   <ss>Commerce</ss> - <morph id="01">- <mu sy="NN">  <inf>stock</inf>   <ph>stQk</ph>   </mu>+ <mu sy="NNS">  <ph>stQks</ph>   </mu>  </morph>  </nlp>

ELDIT

(Elektronisches Lern(er)wörterbuch Deutsch-Italienisch – Dizionario elettronico per apprendenti italiano-tedesco )

Un esempio di dizionario Per apprendimento Nato in forma elettronica

Lezione su ELDIT: il 14/5

WordNet

SEMANTICA & LESSICO: UN RIASSUNTO

“ate”

WORD-FORMS LEXEMES SENSES

EAT-LEX-1eat0600

eat0700

“eat”

“eats”

“eaten”

L’ORGANIZZAZIONE DEL LESSICO

“stock”

WORD-FORMS LEXEMES SENSES

STOCK-LEX-1

STOCK-LEX-2

STOCK-LEX-3

stock0100

stock0200

stock0600

stock0700

stock0900

stock1000

SINONIMIA

“cheap”

WORD-FORMS LEXEMES SENSES

CHEAP-LEX-1

CHEAP-LEX-2

INEXP-LEX-3

cheap0100

….

……

cheapXXXX

inexp0900

inexpYYYY

“inexpensive”

WORDNET

A lexical database created at Princeton Freely available for research from the Princeton site http://www.cogsci.princeton.edu/~wn/

Information about a variety of SEMANTICAL RELATIONS Three sub-databases (supported by psychological

research as early as (Fillenbaum and Jones, 1965)) NOUNs VERBS ADJECTIVES and ADVERBS

Each database organized around SYNSETS

SYNSETS

Senses (or `lexicalized concepts’) are represented in WordNet by the set of words that can be used in AT LEAST ONE CONTEXT to express that sense / lexicalized concept: the SYNSET

E.g.,

{chump, fish, fool, gull, mark, patsy, fall guy, sucker, shlemiel, soft touch, mug}(gloss: person who is gullible and easy to take advantage of)

IL DATABASE DEI NOMI

About 90,000 forms, 116,000 senses Relations:

hypernym breakfast -> meal

hyponym meal -> lunch

has-member faculty -> professor

member-of copilot -> crew

has-Part table -> leg

part-of course -> meal

antonym leader -> follower

IPERNIMIA2 senses of robin                                                       

Sense 1robin, redbreast, robin redbreast, Old World robin, Erithacus rubecola -- (small Old World songbird with a reddish breast)       => thrush -- (songbirds characteristically having brownish upper plumage with a spotted breast)           => oscine, oscine bird -- (passerine bird having specialized vocal apparatus)               => passerine, passeriform bird -- (perching birds mostly small and living near the ground with feet having 4 toes arranged to allow for gripping the perch; most are songbirds; hatchlings are helpless)                   => bird -- (warm-blooded egg-laying vertebrates characterized by feathers and forelimbs modified as wings)                       => vertebrate, craniate -- (animals having a bony or cartilaginous skeleton with a segmented spinal column and a large brain enclosed in a skull or cranium)                           => chordate -- (any animal of the phylum Chordata having a notochord or spinal column)                               => animal, animate being, beast, brute, creature, fauna -- (a living organism characterized by voluntary movement)                                   => organism, being -- (a living thing that has (or can develop) the ability to act or function independently)                                       => living thing, animate thing -- (a living (or once living) entity)                                           => object, physical object --                                                => entity, physical thing --

MERONIMIAwn beak –holon

Holonyms of noun beak

1 of 3 senses of beak

Sense 2

beak, bill, neb, nib

PART OF: bird

VERBI

About 10,000 forms, 20,000 senses Relations between verb meanings:

Hypernym fly-> travel

Troponym Walk -> stroll

Entails Snore -> sleep

Antonym Increase -> decrease

RELAZIONI TRA SIGNIFICATI VERBALI

V1 ENTAILS V2 when Someone V1 (logically) entails Someone V2- e.g., snore entails sleep

TROPONYMY when To do V1 is To do V2 in some manner- e.g., limp is a troponym of walk

AGGETTIVI & AVVERBI

About 20,000 adjective forms, 30,000 senses

4,000 adverbs, 5600 senses Relations:

Antonym (adjective)

Heavy <-> light

Antonym (adverb) Quickly <-> slowly

COME USARLO

Online: http://cogsci.princeton.edu/cgi-bin/webwn

Scaricatevelo, poi da command line: Get synonyms:

wn –synsn bank Get hypernyms:

wn –hypen robin (also for adjectives and verbs): get antonyms

wn –antsa right

I LIMITI DI WORDNET

Coverage words not in WordNet

Crocidolite, spinoff (spin-off) Missing information: MERONYMY

Context-dependent senses: slump, crash, bust all synonyms in the WSJ corpus

The structure of WordNet Some information is encoded in complex ways

(room, wall, floor) But: MOVING TARGET!!

MERONIMIA IN WORDNET: UN ESPERIMENTO

100 bridging descriptions in a mereological relation

Ran a script trying to find a direct link in WordNet (1.7) between one of the senses of the BD and one of the senses of any of the previous NPs

Results: in only 6 cases there is in WordNet a direct lexical relation between a BD and one of the CFs

John looked at the HOUSE. The WALL was crumbling.

ARTIFACT

HOUSING BUILDING

HOUSE HOME ROOM

WALL FLOOR

IS-A IS-A

IS-AIS-A PART-OF

PART-OF PART-OF

SOLUZIONE: ACQUISIZIONE LESSICALE

Parziale (aggiungi informazioni a WordNet, specialmente per domini specialistici)

Totale (crei un nuovo lessico a partire da zero)

LETTURE

Jackson, cap. 6.7 Marello, cap. 5.5 C. Fellbaum. WordNet: An electronic

lexical database. MIT Press, 1998 cap. 1