Documenti non strutturati sul Web e Semantica palmonari@disco.unimib.it Dott. Matteo Palmonari.

Documenti non strutturati sul Web e Semantica

palmonari@disco.unimib.it

Dott. Matteo Palmonari

–2–Rielaborato da Atzeni et al., Basi di Dati, Mc-Graw Hill

Semantica nelle basi di dati relazionali

• La semantica di una base di dati è definita in accordo con la struttura relazionale (algebra relazionale) ed è determinata, relativamente ai suoi elementi costitutivi (valori, tuple, relazioni), sostanzialmente da:• Livello dello schema:

• schema logico• definisce la macro-organizzazione della rappresentazione

di un dominio• vincoli di integrità

• definiscono vincoli relazionali di dettaglio tra specifici oggetti e fatti rappresentati

• Livello delle istanze:• insieme delle istanze

• costituisce l‘insieme di oggetti e fatti effettivamente rappresentati come veri nella bas di dati

Semantica, Schemi e Istanze• Lo schema di una base di dati definisce le regole

generali cui deve aderire ciascun insieme di istanze (per essere considerato valido); è in questi termini che lo schema costituisce una parte fondamentale della semantica di una base di dati

• Tali regole (ad esempio i vincoli di integrità)• supportano l‘interrogazione delle basi di dati

(verifica della sussistenza o non sussistenza di alcuni fatti nella base di dati)

• permettono di controllare la validità dello schema• non permettono di dedurre nuove conoscenze

Query nel modello relazionale• Il linguaggio di

interrogazione piùdiffuso per le basi di dati è SQL (Structured Query Language)

• Ragionamento piuttosto debole

SQL Query Example

•SQL (base/select)•Principio:

soddisfazione/correttezza•Meccanismo/semantica: algebra

relazionale

Esempi di altri data model

The Object-Oriented Data Model

Objects/idAttributes

MethodsClasses

Class Hierachies

Alla base di JAVA/C++ etc

Object-Oriented Schema (Example)

Cosa succede nel Web?

•Ci sono ancora schemi e istanze?

•Che tipo di interrogazioni si possono fare?

•Che tipo di ragionamenti si possono fare?

Web page (Web 1.0)

Rielaborato da Atzeni et al., Basi di Dati, Mc-Graw Hill

Information Retrieval

The information retrieval system has to deal with the following tasks…

Micro-Introduction to Information Retrieval &

Search Engines

Slides and material from

Karl Aberer

EPFL-IC, Laboratoire de systèmes d'informations répartis

Information Retrieval – Document Model

• Generating structured representations of information items: this process is called feature extraction and can include simple tasks, such as extracting words from a text as well as complex methods, e.g. for image or video analysis.

Information Retrieval – Query Model

• Generating structured representations of information needs: often this task is solved by providing users with a query language and leave the formulation of structured queries to them. This is the case for example for simple keyword based query languages, as used in Web search engines. Some information retrieval systems also support the user in the query formulation, e.g. through visual interfaces.

Information Retrieval – Matching Model

• Matching of information needs with information items: this is the algorithmic task of computing similarity of information items and information need and constitutes the heart of the information retrieval model. Similarity of the structured representations is used to model relevance of information for users. As a result a selection of relevant information items or a ranked result can be presented to the user.

Information Retrieval - Efficiency

Since information retrieval systems deal usually with large information collections and/or large user communities, the efficiency of an information retrieval system is crucial. This imposes fundamental constraints on the retrieval model. Retrieval models that would capture relevance very well, but are computationally prohibitively expensive are not suitable for an information retrieval system.

Text Retrieval (search engines)

The currently most popular information retrieval systems are Web search engines. To a large degree, they are text retrieval system, since they exploit only the textual content of Web documents for retrieval. However, more recently Web search engines also start to exploit link information and even image information (e.g. Google’s page Rank). The three tasks of a Web search engine for retrieval are:

1. extracting the textual features, which are the words or terms that occur in the documents. We assume that the web search engine has already collected the documents from the Web using a Web crawler.

2. support the formulation of textual queries. This is usually done by allowing the entry of keywords through Web forms.

3. computing the similarity of documents with the query and producing from that a ranked result. Here Web search engines use standard text retrieval methods, such as Boolean retrieval and vector space retrieval.

The Retrieval Model

• Determines– the structure of the document representation– the structure of the query representation– the similarity matching function•

• Relevance – determined by the similarity matching function – should reflect right topic, user needs, authority, recency– no objective measure•

• Quality of a retrieval model depends on how well it matches user needs !

• Comparison to database querying– correct evaluation of a class of query language expressions– can be used to implement a retrieval model

The Retrieval Model

• The heart of an information retrieval system is its retrieval model. The model is used to capture the meaning of documents and queries, and determine from that the relevance of documents with respect to queries. Although there exist a number of intuitive notions of what determines relevance one must keep clearly in mind that it is not an objective measure. The quality of a retrieval system can principally only be determined through the degree of satisfaction of its users. This is fundamentally different to database querying, where there exists a formally verifiable criterion for the task to be performed: whether a result set retrieved from a database matches the conditions specified in a query.

The Vector Space Model

Example

The document model

• the structure of the document representation– Term-document matrix

Example Vector-Space Retrieval

–Emanuele Della Valle - http://applied-semantic-web.org

Introduction

What does Google “understand”?

Understanding that• [page1] links [page2] page2 is interesting

Google is able to rank results!• “The heart of our software is PageRank™, a system for

ranking web pages […] (that) relies on the uniquely democratic nature of the web by using its vast link structure as an indicator of an individual page's value.”

http://www.google.com/technology/

Pagine Web: query model

• Principio: “rilevanza” (documenti rilevanti)

• Meccanismo/semantica:– Contenuto: indicizzazione (feature extraction):

keywords + testo– Provenienza: pageRank– …

• Meccanismi sintattici e basati su criteri di trust

DirectoriesFile system

Directories

Blogs by topics

Directories: query model

• Principio: “appartenenza alla directory” (correttezza)

• Meccanismo/semantica:– Contenuto: documenti appartenenti alle

categorie / relazione di contenimento tra categorie

• Meccanismi di organizzazione gerarchica delle informazioni

Metadata / Tag / Folksonomies

• Metadati: dati che descrivono altri dati/documenti– E.g. creatore, autore, ultima modifica etc.– E.g. contenuto, caratteristiche, etc

• Sistemi di metadati Attributo-Valore– E.g. creatore: Matteo Palmonari– Spesso sistematizzati in standard o standard di fatto

• E.g. Dublincore (metadati generici)– http://dublincore.org/

• E.g. MPEG-7 (audio/video) – http://mpeg.chiariglione.org/standards/mpeg-7/mpeg-7.htm

• Sistemi di tag– E.g. Viaggi, Malesia, Mare

Metadata / Attribute-Value

File system

Metadata / Attribute-ValueFile system / Images

Metadata / Attribute-ValueWeb / Images

Metadata & SearchWeb / Images

Metadata & SearchWeb / Images / Search: ‘where=malaysia’

Folksonomies

• Tagging systems – Tags (sort of concepts) associated with pieces of information

• E.g. blog posts, videos, pictures

• Tagging systems – Tags (sort of concepts) associated with pieces of information

• E.g. blog posts, videos, pictures

Folksonomies & Search

Documenti non strutturati sul Web e Semantica palmonari@disco.unimib.it Dott. Matteo Palmonari.

Documents

Transcript of Documenti non strutturati sul Web e Semantica palmonari@disco.unimib.it Dott. Matteo Palmonari.

“ORA PARLO IO - INFANZIA” - isa1sp.it IOINFANZIA.pdf · “ORA PARLO IO - INFANZIA” Progetto di ... tra gli altri, ... Strutturati e non strutturati utilizzati nella prassi

Metodologia sviluppo KBS Fabio Sartori sartori@disco.unimib.it 12 ottobre 2005.

Metodi strutturati di problem solving creativio

Linguaggi di Programmazione - elementi Corso di Laurea in Informatica (AA 2005/2006) Gabriella Pasi e Carla Simone pasi@ disco.unimib.it simone@disco.unimib.it.

Rappresentazione di conoscenza Procedurale sartori@disco.unimib.it.

Dipartimento di Informatica, Sistemistica e Comunicazione Università degli Studi di Milano – Bicocca Claudio Ferretti - Alberto Leporati{ferretti,leporati}@disco.unimib.it.

Simposio FATTORI DI MODERAZIONE DEL PREGIUDIZIO SOCIALE Augusto Palmonari e Silvia ... Palmonari... · 2017-01-15 · Simposio FATTORI DI MODERAZIONE DEL PREGIUDIZIO SOCIALE Augusto

Tipi di dato strutturati Vettori - · Tipi di dato strutturati TTiippii ddii ddaattoo ssttrruuttttuurraattii I dati strutturati (o strutture di dati) sono ottenuti mediante ... Inizializzazione

1 Palmonari-Cavazza-Rubini, LA COGNIZIONE SOCIALE 1. Che ... · Palmonari-Cavazza-Rubini, Psicologia sociale 1 LA COGNIZIONE SOCIALE Un esempio fondamentale di questo approccio olistico

INSEGNARE A STUDIARE. METODI DI STUDIO STRUTTURATI.

Architettura dellelaboratore Claudia Raibulet raibulet@disco.unimib.it.

1 Corso di Informatica - Basi di Dati Introduzione alle basi di dati Gabriella Pasi pasi@disco.unimib.it.

Dati semi-strutturati e non strutturati - Scienza e Ingegneriamontesi/CBD/01IntroModelli.pdf · XML nasce per scambiare dati tra applicazioni e per ... Il modello dei dati e' quindi

Un modello di qualità per i siti web Roberto Polillo polillo@disco.unimib.it.

Tipi di dato strutturati: Arraydidawiki.di.unipi.it/lib/exe/fetch.php/fisica/inf/lezione6.pdf · Tipi di dato strutturati: Array ... Consentono di rappresentare tabelle, matrici,

1 Linguaggi di Programmazione - elementi Corso di Laurea in Informatica (AA 2005/2006) Gabriella Pasi e Carla Simone gabriella.pasi@itc.cnr.itsimone@disco.unimib.it.

11 - Programmazione: Tipi di dato strutturati pt. 2

Operational Transformation su documenti strutturati Transformation su... · SCUOLA DI SCIENZE Corso di Laurea in Informatica Operational Transformation su documenti strutturati ...

Introduzione allICSE Fabio Sartori sartori@disco.unimib.it 11 ottobre 2006.

Strumenti di monitoraggio di dati web non strutturati...Andrea Pruccoli – Strumenti di monitoraggio di dati web non strutturati 2 raccomandando un determinato servizio piuttosto