PROGETTAZIONE E PRODUZIONE MULTIMEDIALE … · Telegraph 1837 Remote narrow communication ......

82
PROGETTAZIONE E PRODUZIONE MULTIMEDIALE Prof. Alberto Del Bimbo Dip. Sistemi e Informatica Univ. degli Studi di Firenze

Transcript of PROGETTAZIONE E PRODUZIONE MULTIMEDIALE … · Telegraph 1837 Remote narrow communication ......

PROGETTAZIONE E PRODUZIONE MULTIMEDIALE

Prof. Alberto Del BimboDip. Sistemi e InformaticaUniv. degli Studi di Firenze

Programma ANNO ACCADEMICO 2007-2008

Part I Media e formati Del Bimbo

Part II Standard per immagini, video, audio: JPEG-JPEG2000, MPEG 1-2-4, MP3 Del Bimbo - Bertini - D’Amico

Part III Linguaggi di elaborazione per i media: Processing, Action Script Del Bimbo, Nunziati, D’Amico

Part IV Interaction design: Web interface designDel Bimbo

Part V Linguaggi di interscambio e presentazione: XML, XHTML, CSS Del Bimbo - Martorana

Part VI Laboratorio/home work

Coursework

Specifications for projects to be negotiated with lecturers. Presentations can be stand-alone, networked or web-based, according to

requirements. Projects must be agreed with the lecturers, before implementation

commences.

Orario Lezioni

Lunedì 9.00 – 11.00 Aula 103 Martedì 10.00 – 13.00 Aula 015

Ricevimento

Prof. A. Del BimboMercoledì 10.00 – 11.00Dipartimento Sistemi e InformaticaVia S. Marta 3 – FirenzeTel. 055-4796262E-mail [email protected]

AssistentiGianpaolo D’Amico [email protected] Bertini [email protected] Martorana [email protected] Torpei [email protected]

PART IINTRODUCTORY ISSUES

Progress in communication of human experiences

Inventions Application Impact

Languages Communicate symbolic experiencesWritten Languages III Mil bc Record symbolic experiences (time)Paper II Mil b.c Make symbolic experiences portable (space)Print 1452 Mass distribution (time and space)Telegraph 1837 Remote narrow communication (space)Telephone 1849 Remote analog communication (space)Radio 1895 Analog broadcasting of sound (space)Television 1924 Combining two senses – media (space)Recording media Photos, audio, video (time)Digital processing Machine enhancement and processingInternet Interactive multimedia communication (t&s)

Progress in communication on a temporal scale

(D. De Kerchove Univ. Toronto)

Information technology evolution

LPS

Relational )

PART I Media e formati

What is multimedia?

Generic definion of Multimedia: multi and media

Multi = many Media = an intervening substance through which something is transmitted or

carried on. A means of mass communication such as a newspaper, magazine, or television (American Heritage Electronic Dictionary 1991)

Anyway, for computer processing medium is a means of distribution and presentation of information

Media classification wrt computer processing

Cosidering the full chain of information input and output, storage, transmission, management and processing media can be classified according to: Presentation, Storage, Transmission, Perception, Representation

Presentation medium

Presentation media refer to the tools and devices for the input and output of information. The paper, the screen ……. are used by the computer to deliver information; keyboard, mouse, camera, microphone, dataglove,….. are the input media.

Storage medium

Storage media refer to a data carrier that enables storage of information. CD, DVD…. are examples of storage media

Transmission medium

Transmission medium characterizes different information carriers that enable continuous data transmission. Information is transmitted over networks that use wire and cable (coaxial, fiber) as

well as free air space transmission (for wireless traffic).

Perception medium

Perception media help the humans to sense their environment; perception mostly occurs through seeing or hearing the information. For the perception through seeing the visual media such as text, image and video

are used. For the perception of information through hearing, auditory media such as music,

noise and speech are relevant.

Representation medium

Representation media are characterized by internal computer representations of information.Various formats are used to represent media information in a computer: A text character is coded in ASCII or EBCDIC Graphics are coded according to VRML standard or to the GKS graphics standard… An audio stream can be represented using PCM…. An image can be coded in JPEG, JPEG 2000, TIFF… format A combined audio/video sequence can be coded in different TV standards (PAL,

SECAM, NTSC) and stored in the computer using the MPEG format

What is Multimedia?

The term Multimedia is ubiquitous: it appears in numerous contexts, each with its own nuances: One point on which all those involved in multimedia agree is the essential role

played by multimedia data, which can be argued to be the unifying thread, especially for digital multimedia.

For the purpose of the PPM course we therefore will focus on Perception media (text, image and video) and their Representations and Operations available

In order to define multimedia data we need first to consider what we mean by media and media data.

McLuhan describes media as “extensions to man”, which encompasses two more specific views : We relate the term media to to the way in which information is conveyed and

distributed - hence we have print and broadcast media. We also use the term media when describing the materials and forms of artistic expression

We use the term digital media as opposed to natural media, where natural media rely on physical elements, such as paper, paint, instruments, the stage, while digital media rely on the computer

If we describe the objects produced in a particular medium as artifacts, then we can define media data as machine-readable representations of artifacts : prints, paintings, musical performances, recordings, films and video clips are all

artifacts digital images, digital video and digital audio are media data corresponding to

these artifacts

we can define multimedia artifacts as the composition of artifacts from various differing media. Two broad categories of composition are identified :

spatial composition - such as an image being positioned relative to a body of text describing it.

temporal composition - such as combining an audio commentary with a slide-show of images

We can then define multimedia data in terms of multimedia artifacts: multimedia data is the machine-readable representation of multimedia artifacts.

What is Hypermedia?

Hypermedia is a way of organising multimedia information by linking media elements.

Hypermedia has grown out of a fusion between hypertext and multimedia.

Hypertext was developed to provide a different structure for basic text in computer systems : text is essentially sequential in nature, even though its structure is hierarchical

(chapters, sections, subsections, paragraphs) hypertext was developed to permit more random access between components

of text documents, or between documents, to allow a greater degree of flexibility and cross-referencing than a purely linear or sequential model would allow

Chap.1 Chap.2 Chap.3 Chap.4 Chap.5 Chap.6

A sequential text

Chap.6

Chap.5

Chap.4

Chap.3

Chap.2

Chap.1

A linked, self-referencing text

The structure of a hypermedia organisations is called a hypermedia web,

which consists of a number of multimedia elements or nodes with links between them.

Links represent semantic relationships, thus when a link exists between two nodes they must be related in some fashion : a digital image linked to a textual description of it a slide-show linked to an audio commentary

Most widely used hypermedia tools are hypermedia browsers, which let users view nodes and traverse links between them, and markup languages, such as HTML, which allow users to create hypermedia webs as structured documents.

A Simple Hypermedia Web

Text Node

Image Node

Audio Node

Media classification wrt time

Media data can be classified wrt time. We can broadly divide media types into two groups: Non-temporal

static non time-based discrete includes text, images and graphics

Temporal dynamic time-based continuous includes audio, video, music and animation

This classification has nothing to do with the internal representation but rather relates to the impression of the viewer or the listener

Non temporal (time-independent / discrete): Images, text and graphics are time-independent. Information in these media

consist of a sequence of individual elements/media or of a continuum without a time component (eg text, color blobs, texture patches, shapes, graphics….).

Processing of discrete media should happen as fast as possible but this processing is not time-critical because the validity and correctness of the data does not depend on any time condition.

Non-Temporal Media Types

Text

Media TypeRepresentation

Operations

<Text>

HypertextStructured TextMarked-up Text

ASCIIISO Character Sets

Pattern-matching & searchingFormattingEditing

String Operations

EncryptionLanguage-specific operations

Character Operations

SortingCompression

Non-Temporal Media Types : Text representations

ASCII 7-bit code 128 values in ASCII character set use of 8th bit in text editors/word processors creates incompatibility

ISO character sets extended ASCII to support non-English text ISO Latin provides support for accented characters à, ö, ø, etc. ISO sets include Chinese, Japanese, Korean & Arabic

UNICODE 16 bit format 32768 different symbols

Structured Text

structure of text represented in data structure, usually tree-based ODA, structure embedded in byte-stream with content

Hypertext non-linear graph structure : nodes and links

Marked-up text LaTEX SGML

HTML XML, ……..

Non-Temporal Media Types : Text Operations

Character operations basic data type with assigned value permits direct character comparison (a<b)

String operations comparison concatenation substring extraction and manipulation

Editing cut/copy/paste strings versus blocks, dependent on document structure

Formatting

interactive or non-interactive (WYSIWYG v. LaTEX) formatted output

bitmap page description language (Postscript, PDF)

font management typeface point size (1 point = 1/72 of an inch) TrueType fonts : geometric description + kerning

Pattern-matching and Searching search and replace for large bodies of text, or text databases, use of inverted indices, hashing

techniques and clustering.

Compression

ASCII uses 7 bits per character, most word-processors actually use the 8th bit to use up a byte per character

Information theory estimates 1-2 bits per character to be sufficient for natural language text. This redundancy can be removed by encoding : Huffman : varies the numbers of bits used to represent characters, shortest codes for

highest frequency characters Lempel-Ziv : identifies repeating strings and replaces them by pointers to a tableBoth techniques compress English text at a ratio of between 2:1 and 3:1

Compression techniques have their roots in text handling, but have become of importance especially for transmission.

Encryption text encryption is widely used in electronic mail and networked information

systems

Language-specific operations spell-checking parsing and grammar checking

Media TypeRepresentation

Operations

<Image>

InterlacingChannel DepthNumber of Channels

Colour ModelAlpha Channels

Point operationsEditing

CompressionPixel Aspect Ratio

Geometric transformationsConversion

Indexing

FilteringCompositing

Images

© 2005 Dr. Christian Jones

Multimedia Design Media types topic 5 - Slide

34

Colour model Colour production on output device Theory of human colour perception

CIE colour space international standard used to calibrate other colour models developed in 1931, as CIE XYZ, based on tristimulus theory of colour specification

Non-Temporal Media Types : Image representations

RGBnumeric triple specifying red, green and blue intensitiesconvenient for video display drivers since numbers can be easily mapped to voltages for RGB guns in colour CRTs

HSBHue - dominant colour of sample, angular value varying from red to green to blue at 120° intervalsSaturation - the intensity of the colourBrightness - the amount of gray in the colour

© 2005 Dr. Christian Jones

Multimedia Design Media types topic 5 - Slide

35

CMYK displays emit light, so produce colours by adding red, green and blue intensities paper reflects light, so to produce a colour on paper one uses inks that subtract all colours

other than the one desired printers use inks corresponding to the subtractive primaries, cyan, magenta and yellow

(complements of RGB) additionally, since inks are not pure, a black ink is used to give better blacks and grays

YUVcolour model used in the television industry also YIQ, YCbCr, and YPbPrY represents luminance, effectively the black-and-white portion of a video signalUV are colour difference signals, form the colour portion of a video signal, and are called chrominance or chromaYUV makes efficient use of bandwidth as the human eye has greater sensitivity to changes in luminance than chrominance, so bandwidth can be better utilised by allocating more to luminance and less to chrominance

© 2005 Dr. Christian Jones

Multimedia Design Media types topic 5 - Slide

36

Alpha Channels images may have one or more alpha channels defining regions of full or partial

transparency can be used to store selections and to create masks and blends

Number of channels the number of pieces of information associated with each pixel usually the dimensionality of the colour model plus the number of alpha channels

Channel depth number of bits-per-pixel used to encode the channel values commonly 1,2,4 or 8 bits, less commonly 5,6,12 or 16bits in a multiple channel image, different channels can have different depths

Interlacing storage layout of a multiple channel image could separate channel values (all R

values, followed by all G, followed by all B) or could use interlacing (all RGB for pixel 1, all RGB for pixel 2.........)

© 2005 Dr. Christian Jones

Multimedia Design Media types topic 5 - Slide

37

Indexing pixel colours can be represented by an index in a colour map or a colour

lookup table

Pixel aspect ratio ratio of pixel width to height square pixels are simple to process, but some displays and scanners work with

rectangular pixels if the pixel aspect ratios of an image and a display differ the image will appear

stretched or squeezed

Compression a page-sized 24-bit colour image produced by a scanner at 300dpi takes up about

20 Mbytes many image formats compress pixel data, using run-length coding, LZW,

predictive coding and transform coding many image formats : JPEG, GIF, TIFF, BMP most widely used

Non-Temporal Media Types: Image Operations

Image Operations can operate directly on pixel data or on higher-level features such as edges, surfaces and volumes. Operations on higher-level features fall into the domain of image analysis and understanding

Editing changing individual pixels for image touch-up, forms the basis of airbrushing and

texturing cutting, copying and pasting are supported for groups of pixels, from simple shape

manipulation through to more complex foreground and background masking and blending

Point operations consists of applying a function to every pixel in an image. Only uses the pixels

current value, neighbouring pixels cannot be used: Thresholding: a pixel is set to 1 or 0 depending on whether it is above or below a

threshold value - creates binary images which are often used as masks when compositing

Colour Correction: modifying the image to increase or reduce contrast, brightness, gamma effects, or to strengthen or weaken particular colours

© 2005 Dr. Christian Jones

Multimedia Design Media types topic 5 - Slide

39

Filtering used to blur, sharpen or distort images, producing a variety of special effects operate on every pixel in an image, but use values of neighbouring pixels as well

Compositing the combining of two or more images to produce a new image generally done by specifying mathematical relationships between the images

Geometric Transformations basic transformations involve displacing, rotating, mirroring or scaling an image more advanced transformations involve skewing and warping images

Conversions support conversions between image formats. A number of tools exist. other forms of conversion include compression and decompression, changing

colour models, and changing image depth and resolution

Media TypeRepresentation

Operations

<Graphic>

Drawing ModelsEmpirical Models

Physically-based Models

Geometric ModelsSolid Models

ShadingStructural EditingPrimitive Editing

ViewingRendering

External formats for Models

MappingLighting

The central notion of graphics, as opposed to image data, is in the rendering of graphical data to produce an image. A graphics type or model is therefore the combination of a data type plus a rendering operation

Graphics

Geometric Models consist of 2D and/or 3D geometric primitives 2D primitives include lines, rectangles, ellipses plus more general polygons

and curves 3D primitives include the above plus surfaces of various forms. Curves and

curved surfaces described by parameterised polynomials

Primitives can be used to build structural hierarchies, allowing each structure thus created to be broken down into lower-level structures and primitives

Several standard device-independent graphics libraries are based on geometric modelling GKS (Graphic Kernel System(ISO)) PHIGS (Programmers Hierarchical Interactive Graphic System (ISO)) - see also

PHIGS+ and PEX OpenGL - portable version of Silicon Graphics library

Non-Temporal Media Types : Graphics representations

Solid Models Constructive Solid Geometry (CSG) : solid objects are combined using the set

operators union, intersection and difference Surfaces of revolution : a solid is formed by rotating a 2D curve about an axis in 3D space Extrusion : a 2D outline is extended in 3D space along an arbitrary path

Using the above techniques will produce models much faster than building them up from geometric primitives, but rendering them will be expensive

Physically-based Models realistic images can be produced by modelling the forces, stresses and strains on

objects when one deformable object hits another, the resulting shape change can be

numerically determined from their physical properties

© 2005 Dr. Christian Jones

Multimedia Design Media types topic 6 - Slide

43

Empirical Models

complex natural phenomena (clouds, waves, fire, etc.) are difficult to describe realistically using geometric or solid modelling. Physically based models are possible, but they may be computationally expensive or intractable

the alternative is to develop models based on observation rather than physical laws, such models do not embody the underlying physical processes that cause these phenomena but they do produce realistic images: fractals, probabilistic graph grammars (used for branching plant structures) and particle systems(used for fires and explosions) are examples of empirical models

Drawing Models describing an object in terms of drawing or painting actions the description can be seen as a sequence of commands to an imaginary drawing

device – Postscript..

External formats for Models need for export/import formats between graphics packages CGM & CAD are OK. Postscript and RIB are render-only

Shape primitive editing specifying and modifying the parameters associated with the model primitives e.g. specify the type of a primitive and the vertex coordinates and surface normals

Shape structural editing creating and modifying collections of primitives establish spatial relationships between members of collections

Shading operations provide means to describe the interaction of light with the object in terms how the object reflects light and if it transmits light. Most methods describe the surface of the object using meshes of small, polygonal surface patches flat shading - each patch is given a constant colour Gouraud shading - colour information is interpolated across a patch Phong shading - surface normal information is interpolated across a patch Ray tracing & Radiosity - physical models of light behaviour are used to calculate

colour information for each patch, giving highly realistic results

Non-Temporal Media Types : Graphics operations

© 2005 Dr. Christian Jones

Multimedia Design Media types topic 6 - Slide

45

Texture mapping: an image, the texture map, is applied to a surface. This requires a mapping from 3D surface coordinates to 2D image coordinates, so

given a point on the surface the image is sampled and the resulting value used to colour the surface at that point

Viewing to produce an image of a 3D model we require a transformation which projects 3D

world coordinates onto 2D image coordinates view specification consists of selecting the projection transformation, usually from

parallel or perspective projections although camera attributes can be specified in some renderers, and the view volume

Rendering converts a model, including shading, lighting and viewing information, into an image output resolution: the width and height of the output image in pixels, and the pixel

depth

Temporal media Temporal (time-dependent / continuous):

In full motion video, sound, continuous signals from different sensors… the values change over time. Information is expressed not only in its individual value but also by the time of occurrence. The semantics depends on the level of relative change of the discrete values or of the continuum. They consist of a time dependent sequence of individual information units called Logical Data Unit (shots, clips within the sequence, moving objects within the clips).

Processing these media is time critical because the validity and correctness of the data depends on a time

Media Type

Representation

Operations

<Digital Video>

Frame rateData rateSample size and quantisation

Sampling rate

RetrievalStorage

MixingConversion

Compression

SynchronisationEditing

Digital video

Digital video is a sequence of frames, the frames being digital images, possibly compressed in some manner. Each frame in a digital video sequence has its own timing associated with it.

© 2005 Dr. Christian Jones

Multimedia Design Media types topic 8 - Slide

48

Temporal Media Types : Digital video representations

Sampling rate the value of the sampling rate determines the storage requirement and data

transfer rate the lower limit for the frequency at which to sample in order to faithfully reproduce

the signal, the Nyquist rate, is twice the highest frequency within the signal

Sample size and quantisation sample size is the number of bits used to represent sample values quantisation refers to the mapping from the continuous range of the analog signal

to discrete sample values choice of sample size is based on:

signal to noise ratio of sampled signal sensitivity of medium used to display frames sensitivity of the human eye

© 2005 Dr. Christian Jones

Multimedia Design Media types topic 8 - Slide

49

Digital video commonly uses linear quantisation, where quantisation levels

are evenly distributed over the analog range (as opposed to logarithmic quantisation) we can divide digital video representations into two broad categories :

high data rate formats - primarily used in professional video production and post-production- little or no compression, picture quality and ease of processing are prime considerations

Examples are :- Digital Component Video (CCIR 601)- Digital Composite Video- Common Intermediate Format (CIF) and Quarter-CIF (QCIF) : approved by CCITT for video conferencing

© 2005 Dr. Christian Jones

Multimedia Design Media types topic 8 - Slide

50

Data rate

high data rate formats can be reduced to lower data rates by a combination of : compression reducing horizontal and vertical resolution reducing the frame rate

for example: start with broadcast quality digital video at 10Mbytes/s divide the horizontal and vertical resolutions by 2, giving VHS quality resolution divide the frame rate by 2 compress at a rate of 10:1 data rate becomes 1Mbit/s, suitable for use on LANs and on optical storage devices (i.e.

CD-ROM)

© 2005 Dr. Christian Jones

Multimedia Design Media types topic 8 - Slide

51

Frame rate

25 or 30 fps equates to analog frame rate, or full-motion video at 10-15 fps motion is less accurately depicted and the image flickers, but the

data rate is much reduced

Compression in digital video we can compare compression methods by three factors:

Lossy v. lossless Real-time compression - trade-off between symmetric models and asymmetric models

with real-time decompression Interframe (relative) v. Intraframe (absolute) compression (i.e. MPEG-1 v. Motion JPEG)

© 2005 Dr. Christian Jones

Multimedia Design Media types topic 8 - Slide

52

Video formats: MPEG-1 - 1Mbit/s MPEG-2 - broadcast quality video at rates between 2-15Mbit/s MPEG-4 - low data rate video MPEG-7 - metadata standard for video representation Motion JPEG px64 (CCITT H.261) - intended for video applications using ISDN (Integrated

Services Digital Network). Known as px64 since it produces rates that are multiples of ISDNs 64Kbits/s B channel rate. Uses similar techniques to MPEG but, since compressions and decompression must be real-time, quality tends to be poorer.

H.263 - based on H.261, but offers 2.5 times greater compression, uses MPEG-1 and MPEG-2 techniques.

© 2005 Dr. Christian Jones

Multimedia Design Media types topic 8 - Slide

53

Non-Temporal Media Types : Digital Video Operations

Storage to record or playback digital video in real-time, the storage system must be

capable of sustaining data transfer at the video data rate Problem is size of storage, even using MPEG-1 13 minutes of video will fill a

100Mbyte disk.

Retrieval uses frame addressing, but there are some problems : interframe compression techniques, i.e. MPEG, only code key frames

independently, other frames are derived from these key frames. So random access requires to first find the nearest key frame and then use this to decode the desired frame, again using the index but enhancing it with key frame locations

Multimedia systems

Not every combination of media justifies the use of the term multimedia.

A proper use is when both continuous and discrete media are utilized. According to this, the defining characteristics of multimedia systems is the incorporation of continuous media such as voice, video and animation. Independent media should be integrated to accomplish certain functions using timing, spatial and semantic syncronization

Multimedia systems are also those systems that incorporate only discrete or continuous media but employ an integrated multiplicity of Presentation, Transmission, Representation, or Perception media

A text processing program with incorporated images is therefore not a multimedia application

Research and development efforts in multimedia fall into two groups: stand-alone multimedia wks and associated software systems such as

music composition, computer aided learning and interactive video. Stand alone multimedia systems are such that multimedia information is created, processed, presented and stored in the workstation.

multimedia computing with distributed systems: multimedia information systems, collaboration and conferencing systems, on-demand multimedia services, distance learning…..

STANDALONE MULTIMEDIA SYSTEMS

Natural interfaces are easy to use, and allow people to interact with them the way they do with other people. In particular, interfaces that make it possible to interact with computerized equipment without need for special external equipment.

These interfaces are not based on menus, mice, and keyboards but use instead gesture, speech, affect, context, and movement.

Their applications are not word processors and spreadsheets, but smart homes and personal assistants: “instead of making computer-interfaces for people, it is of more fundamental value to make people-interfaces for computers”.

Research examples at MICC: Natural interaction

The most important factor in making these applications possible in recent years

has been the novel viability of real-time computer vision and speech understanding.

Systems coupled with natural interfaces will enable tasks historically outside the normal range of human-computer interaction by connecting computers to phenomena (such as someone walking into a room) that have traditionally been outside the scope of traditional user-interfaces. With natural interfaces the user experiences a form of context awareness, exploiting dialog modalities and behaviors that are commonly used in ordinary activities in his/her real daylife.

The POLIFEMO project

Objectives: Support for hard disabilities Interaction based on eye gaze capture 3D navigation control

Keypoints: eye shape and pupil identification tracking of eye motion by elastic

matching clicking by persistence

eye drawing demo

click on the icon to play the videoclick on the icon to play the video

The GOLEM project

Objectives: Virtual presence Real time body motion tracking and

replication

Keypoints: Input from two webcams Color blob indentification (head,

hands feet) Stereo matching and computing Inverse kinematics for internal DOFs

computation H-anim body model and MPEG-4

compliance Virtual body animation

demo dilbert

demo balls

Demo golem click on the icon to play the videoclick on the icon to play the video

The POINT AT project

Objectives: Natural interaction for cultural heritage

applications

Keypoints: large projection screen input from two webcams hand detection geometric computation of the pointed

position

CLICK ON DEMO POINT AT

Image analysis through background subtraction, thresholding, blob tracking

Current research at MICC: Natural interaction in “intelligent rooms”

Non-Active userviewing

Active usergesture-based interaction

Tangible object

Active context

Neighbor context

External context

Non-active userviewing

Non-active user exiting

Active usertangible object interaction

Institutional cooperation with Provincia di Firenze Museo Palazzo Medici Riccardi

Main ongoing projects:Regione Toscana POR: Mnemosyne Provincia di Firenze: Interactive BookshopProvincia di Lucca: Monte Alfonso LibraryShawbak castle Jordany: interactive intro

Smart-Object (Micrel Lab University of Bologna)

o wireless data transmission (via bluetooth) through electronic miniaturized sensorso triple-axis accelerometer to recognize the upper cube face and user actions o infra-red LED matrix to track the cube position through the computer

Cube as metaphor of a digital data collector

• Face detector based on the Viola-Jones algorithm • Tracking algorithm based on color histogram and particle filtering• PTZ camera steering algorithm (controlled by the tracker)• Face recognition, gaze and expression estimation are applicable• Running at 9-12 fps while tracking 4-5 targets with fixed and PTZ camera on a Intel Xeon 4-core PC at 640x480.No background subtraction.

Non-Active userviewing

Active usergesture-based interaction

Tangible object

Active context

Neighbor context

External context

Non-active userviewing

Non-active user exiting

Active usertangible object interaction

Smart room environment

Interacting with physical and graphical objects

“Frontiers of Interaction” Roma, Italy 2009

Automatic annotation of images, video and audio and 3D objects is motivated by the huge amount of digital data daily produced by industry and individuals and the need to store them into repositories for later retrieval and reuse with descriptions that capture their intrinsic content in someway at the lowest possible cost

Visual and audio data must be complemented with descriptors that provide a synthetic representation of their content.

Descriptors must be at both the syntactic and the semantic level, where the former captures visual and auditory features (like color, texture, spatial relations – for visual data – peak values, frequency – for audio data) and the latter captures objects, meanings, highlights, events, moods….

Research examples at MICC: Automatic semantic annotation

Anchorperson shot detection

Shot life-time

Quantity of motion

Automatic Annotation of NEWS Video

Automatic Detection of images in Sport Advertising video

Trademark detection

Example of human behavior detection

M ethod KTH Weizmann

Our method 92.57 95.41

Laptev et al. - HoG ['08]

81.6 -

Laptev et al. - HoF [‘08]

89.7 -

Dollár et al. [‘05]

81.2 -

Wong e Cipolla [‘07]

86.6 -

Scovanner et al. [‘07]

- 82.6

Niebles et al. [‘08]

83.3 90

Liu et al. [‘08] - 90.4

Kläser et al. [’08]

91.4 84.3

Willems et al. [‘08]

84.2 -

Example of event detection

Automatic semantic annotation of video through event/behavior detection

• Soccer highlight detection: • Kickoff• Shot on goal • Counterattack• Turn over • Placed Kick (in attack) • Forward pass

• Corner kick• Penalty kick• Free kick (close to penalty)

Highlight video annotation

Automatic semantic annotation of Sports Video

Semantic annotation is a prerequisite to effective retrieval by content of

multimedia data. You can use words to retrieve video data: “find all video shots where President Clinton or President are present”…

The emerging of multimedia, as well as the availability of large image and video archives and the development of information highways have attracted research efforts in providing tools for effective retrieval of visual data based on their content.

The relevance of visual information retrieval applies to several application fields like art galleries and museum archives, medical and geographic databases…..

Research examples at MICC: Content-based retrieval

Retrieval by global Color similarity Retrieval by Color Region similarity Retrieval by Shape similarity Retrieval by Visual similarity based on appearing features of spatial entities:

shape, color, texture, spatial relationships

3D faces

Main ongoing projects:EC VIDIVIDEOEC IM3I

3D face matching

Video event / behavior detection

video

Bag-of-words

SVM classifierrunning

walkingjogginghandwavinghandclappingboxing

Visual Dictionary

Bag-of-featuresInterest

points

3D surfaces descriptors

W011W-111 W111

W001W-101 W101

W0-11W-1-11 W1-11

+1

j 0

-1-1 0 +1 i

W010W-110 W110

W100

W1-10

W01-1W-11-1 W11-1

W10-1

W1-1-1

+1

0 k

-1

w(A,B)

bababaabkA B

abjabiijk

ijk dzdzdydydxdxzzCyyCxxCK

BAw )()()(1),( −⋅−⋅−= ∫ ∫

Partitionment of 3D faces into iso-geodesic stripes

Derivation, for each stripe, of an integral measure that indicates how much of another stripe is in one of the 27 possible directions in the 3D space

3D face matching

• 3D to 3D face recognition by modeling the 3D shape and the relative arrangement of iso-geodesic stripes identified on the model surface.• Construction and matching of a graph based face model.

3D face stripe-based matching

t1 t

2 t3 t

4 t5 t

6 t7

r1 r

2 r3 r

4 r5 r

6 r7

Shrec 2008 contest

From 2D face images to 3D face matching

Scale Invariant target template matchingReliable target tracking from PTZ video

2D face sequences

3D faces

2D faces

PTZ camera

Industry joint laboratory with THALES Corp

Identity recognition through 2D and 3D face matching

Current view

Nearest Image

Current viewwarped onto the nearest image

DISTRIBUTED MULTIMEDIA SYSTEMS

Communication capable multimedia systems are such that multimedia information cannot only be created, processed, presented and stored, but also distributed above the single computer boundary.

Web-based multimedia systems support multimedia applications over the Internet. Distributed multimedia systems require continuous data transfer over relatively long periods of time, media syncronization, very large storage and special indexing and retrieval.

WEB BASED MULTIMEDIA SYSTEMS

Web sites and web applications

Course outline

Part I Media e formati Del Bimbo

Part II Standard per immagini, video, audio: JPEG-JPEG2000, MPEG 1-2-4, MP3 Del Bimbo - Bertini - D’Amico

Part III Linguaggi di elaborazione per i media: (Processing), Action Script Del Bimbo, Nunziati, D’Amico

Part IV Interaction design: Web interface designDel Bimbo

Part V Linguaggi di interscambio e presentazione: XML, XHTML, CSS Del Bimbo –Martorana

Part VI Laboratorio