PROGETTAZIONE E PRODUZIONE MULTIMEDIALE … · Telegraph 1837 Remote narrow communication ......
-
Upload
duongkhanh -
Category
Documents
-
view
219 -
download
0
Transcript of PROGETTAZIONE E PRODUZIONE MULTIMEDIALE … · Telegraph 1837 Remote narrow communication ......
PROGETTAZIONE E PRODUZIONE MULTIMEDIALE
Prof. Alberto Del BimboDip. Sistemi e InformaticaUniv. degli Studi di Firenze
Programma ANNO ACCADEMICO 2007-2008
Part I Media e formati Del Bimbo
Part II Standard per immagini, video, audio: JPEG-JPEG2000, MPEG 1-2-4, MP3 Del Bimbo - Bertini - D’Amico
Part III Linguaggi di elaborazione per i media: Processing, Action Script Del Bimbo, Nunziati, D’Amico
Part IV Interaction design: Web interface designDel Bimbo
Part V Linguaggi di interscambio e presentazione: XML, XHTML, CSS Del Bimbo - Martorana
Part VI Laboratorio/home work
Coursework
Specifications for projects to be negotiated with lecturers. Presentations can be stand-alone, networked or web-based, according to
requirements. Projects must be agreed with the lecturers, before implementation
commences.
Ricevimento
Prof. A. Del BimboMercoledì 10.00 – 11.00Dipartimento Sistemi e InformaticaVia S. Marta 3 – FirenzeTel. 055-4796262E-mail [email protected]
AssistentiGianpaolo D’Amico [email protected] Bertini [email protected] Martorana [email protected] Torpei [email protected]
Progress in communication of human experiences
Inventions Application Impact
Languages Communicate symbolic experiencesWritten Languages III Mil bc Record symbolic experiences (time)Paper II Mil b.c Make symbolic experiences portable (space)Print 1452 Mass distribution (time and space)Telegraph 1837 Remote narrow communication (space)Telephone 1849 Remote analog communication (space)Radio 1895 Analog broadcasting of sound (space)Television 1924 Combining two senses – media (space)Recording media Photos, audio, video (time)Digital processing Machine enhancement and processingInternet Interactive multimedia communication (t&s)
What is multimedia?
Generic definion of Multimedia: multi and media
Multi = many Media = an intervening substance through which something is transmitted or
carried on. A means of mass communication such as a newspaper, magazine, or television (American Heritage Electronic Dictionary 1991)
Anyway, for computer processing medium is a means of distribution and presentation of information
Media classification wrt computer processing
Cosidering the full chain of information input and output, storage, transmission, management and processing media can be classified according to: Presentation, Storage, Transmission, Perception, Representation
Presentation medium
Presentation media refer to the tools and devices for the input and output of information. The paper, the screen ……. are used by the computer to deliver information; keyboard, mouse, camera, microphone, dataglove,….. are the input media.
Storage medium
Storage media refer to a data carrier that enables storage of information. CD, DVD…. are examples of storage media
Transmission medium
Transmission medium characterizes different information carriers that enable continuous data transmission. Information is transmitted over networks that use wire and cable (coaxial, fiber) as
well as free air space transmission (for wireless traffic).
Perception medium
Perception media help the humans to sense their environment; perception mostly occurs through seeing or hearing the information. For the perception through seeing the visual media such as text, image and video
are used. For the perception of information through hearing, auditory media such as music,
noise and speech are relevant.
Representation medium
Representation media are characterized by internal computer representations of information.Various formats are used to represent media information in a computer: A text character is coded in ASCII or EBCDIC Graphics are coded according to VRML standard or to the GKS graphics standard… An audio stream can be represented using PCM…. An image can be coded in JPEG, JPEG 2000, TIFF… format A combined audio/video sequence can be coded in different TV standards (PAL,
SECAM, NTSC) and stored in the computer using the MPEG format
What is Multimedia?
The term Multimedia is ubiquitous: it appears in numerous contexts, each with its own nuances: One point on which all those involved in multimedia agree is the essential role
played by multimedia data, which can be argued to be the unifying thread, especially for digital multimedia.
For the purpose of the PPM course we therefore will focus on Perception media (text, image and video) and their Representations and Operations available
In order to define multimedia data we need first to consider what we mean by media and media data.
McLuhan describes media as “extensions to man”, which encompasses two more specific views : We relate the term media to to the way in which information is conveyed and
distributed - hence we have print and broadcast media. We also use the term media when describing the materials and forms of artistic expression
We use the term digital media as opposed to natural media, where natural media rely on physical elements, such as paper, paint, instruments, the stage, while digital media rely on the computer
If we describe the objects produced in a particular medium as artifacts, then we can define media data as machine-readable representations of artifacts : prints, paintings, musical performances, recordings, films and video clips are all
artifacts digital images, digital video and digital audio are media data corresponding to
these artifacts
we can define multimedia artifacts as the composition of artifacts from various differing media. Two broad categories of composition are identified :
spatial composition - such as an image being positioned relative to a body of text describing it.
temporal composition - such as combining an audio commentary with a slide-show of images
We can then define multimedia data in terms of multimedia artifacts: multimedia data is the machine-readable representation of multimedia artifacts.
What is Hypermedia?
Hypermedia is a way of organising multimedia information by linking media elements.
Hypermedia has grown out of a fusion between hypertext and multimedia.
Hypertext was developed to provide a different structure for basic text in computer systems : text is essentially sequential in nature, even though its structure is hierarchical
(chapters, sections, subsections, paragraphs) hypertext was developed to permit more random access between components
of text documents, or between documents, to allow a greater degree of flexibility and cross-referencing than a purely linear or sequential model would allow
Chap.1 Chap.2 Chap.3 Chap.4 Chap.5 Chap.6
A sequential text
Chap.6
Chap.5
Chap.4
Chap.3
Chap.2
Chap.1
A linked, self-referencing text
The structure of a hypermedia organisations is called a hypermedia web,
which consists of a number of multimedia elements or nodes with links between them.
Links represent semantic relationships, thus when a link exists between two nodes they must be related in some fashion : a digital image linked to a textual description of it a slide-show linked to an audio commentary
Most widely used hypermedia tools are hypermedia browsers, which let users view nodes and traverse links between them, and markup languages, such as HTML, which allow users to create hypermedia webs as structured documents.
Media classification wrt time
Media data can be classified wrt time. We can broadly divide media types into two groups: Non-temporal
static non time-based discrete includes text, images and graphics
Temporal dynamic time-based continuous includes audio, video, music and animation
This classification has nothing to do with the internal representation but rather relates to the impression of the viewer or the listener
Non temporal (time-independent / discrete): Images, text and graphics are time-independent. Information in these media
consist of a sequence of individual elements/media or of a continuum without a time component (eg text, color blobs, texture patches, shapes, graphics….).
Processing of discrete media should happen as fast as possible but this processing is not time-critical because the validity and correctness of the data does not depend on any time condition.
Non-Temporal Media Types
Text
Media TypeRepresentation
Operations
<Text>
HypertextStructured TextMarked-up Text
ASCIIISO Character Sets
Pattern-matching & searchingFormattingEditing
String Operations
EncryptionLanguage-specific operations
Character Operations
SortingCompression
Non-Temporal Media Types : Text representations
ASCII 7-bit code 128 values in ASCII character set use of 8th bit in text editors/word processors creates incompatibility
ISO character sets extended ASCII to support non-English text ISO Latin provides support for accented characters à, ö, ø, etc. ISO sets include Chinese, Japanese, Korean & Arabic
UNICODE 16 bit format 32768 different symbols
Structured Text
structure of text represented in data structure, usually tree-based ODA, structure embedded in byte-stream with content
Hypertext non-linear graph structure : nodes and links
Marked-up text LaTEX SGML
HTML XML, ……..
Non-Temporal Media Types : Text Operations
Character operations basic data type with assigned value permits direct character comparison (a<b)
String operations comparison concatenation substring extraction and manipulation
Editing cut/copy/paste strings versus blocks, dependent on document structure
Formatting
interactive or non-interactive (WYSIWYG v. LaTEX) formatted output
bitmap page description language (Postscript, PDF)
font management typeface point size (1 point = 1/72 of an inch) TrueType fonts : geometric description + kerning
Pattern-matching and Searching search and replace for large bodies of text, or text databases, use of inverted indices, hashing
techniques and clustering.
Compression
ASCII uses 7 bits per character, most word-processors actually use the 8th bit to use up a byte per character
Information theory estimates 1-2 bits per character to be sufficient for natural language text. This redundancy can be removed by encoding : Huffman : varies the numbers of bits used to represent characters, shortest codes for
highest frequency characters Lempel-Ziv : identifies repeating strings and replaces them by pointers to a tableBoth techniques compress English text at a ratio of between 2:1 and 3:1
Compression techniques have their roots in text handling, but have become of importance especially for transmission.
Encryption text encryption is widely used in electronic mail and networked information
systems
Language-specific operations spell-checking parsing and grammar checking
Media TypeRepresentation
Operations
<Image>
InterlacingChannel DepthNumber of Channels
Colour ModelAlpha Channels
Point operationsEditing
CompressionPixel Aspect Ratio
Geometric transformationsConversion
Indexing
FilteringCompositing
Images
© 2005 Dr. Christian Jones
Multimedia Design Media types topic 5 - Slide
34
Colour model Colour production on output device Theory of human colour perception
CIE colour space international standard used to calibrate other colour models developed in 1931, as CIE XYZ, based on tristimulus theory of colour specification
Non-Temporal Media Types : Image representations
RGBnumeric triple specifying red, green and blue intensitiesconvenient for video display drivers since numbers can be easily mapped to voltages for RGB guns in colour CRTs
HSBHue - dominant colour of sample, angular value varying from red to green to blue at 120° intervalsSaturation - the intensity of the colourBrightness - the amount of gray in the colour
© 2005 Dr. Christian Jones
Multimedia Design Media types topic 5 - Slide
35
CMYK displays emit light, so produce colours by adding red, green and blue intensities paper reflects light, so to produce a colour on paper one uses inks that subtract all colours
other than the one desired printers use inks corresponding to the subtractive primaries, cyan, magenta and yellow
(complements of RGB) additionally, since inks are not pure, a black ink is used to give better blacks and grays
YUVcolour model used in the television industry also YIQ, YCbCr, and YPbPrY represents luminance, effectively the black-and-white portion of a video signalUV are colour difference signals, form the colour portion of a video signal, and are called chrominance or chromaYUV makes efficient use of bandwidth as the human eye has greater sensitivity to changes in luminance than chrominance, so bandwidth can be better utilised by allocating more to luminance and less to chrominance
© 2005 Dr. Christian Jones
Multimedia Design Media types topic 5 - Slide
36
Alpha Channels images may have one or more alpha channels defining regions of full or partial
transparency can be used to store selections and to create masks and blends
Number of channels the number of pieces of information associated with each pixel usually the dimensionality of the colour model plus the number of alpha channels
Channel depth number of bits-per-pixel used to encode the channel values commonly 1,2,4 or 8 bits, less commonly 5,6,12 or 16bits in a multiple channel image, different channels can have different depths
Interlacing storage layout of a multiple channel image could separate channel values (all R
values, followed by all G, followed by all B) or could use interlacing (all RGB for pixel 1, all RGB for pixel 2.........)
© 2005 Dr. Christian Jones
Multimedia Design Media types topic 5 - Slide
37
Indexing pixel colours can be represented by an index in a colour map or a colour
lookup table
Pixel aspect ratio ratio of pixel width to height square pixels are simple to process, but some displays and scanners work with
rectangular pixels if the pixel aspect ratios of an image and a display differ the image will appear
stretched or squeezed
Compression a page-sized 24-bit colour image produced by a scanner at 300dpi takes up about
20 Mbytes many image formats compress pixel data, using run-length coding, LZW,
predictive coding and transform coding many image formats : JPEG, GIF, TIFF, BMP most widely used
Non-Temporal Media Types: Image Operations
Image Operations can operate directly on pixel data or on higher-level features such as edges, surfaces and volumes. Operations on higher-level features fall into the domain of image analysis and understanding
Editing changing individual pixels for image touch-up, forms the basis of airbrushing and
texturing cutting, copying and pasting are supported for groups of pixels, from simple shape
manipulation through to more complex foreground and background masking and blending
Point operations consists of applying a function to every pixel in an image. Only uses the pixels
current value, neighbouring pixels cannot be used: Thresholding: a pixel is set to 1 or 0 depending on whether it is above or below a
threshold value - creates binary images which are often used as masks when compositing
Colour Correction: modifying the image to increase or reduce contrast, brightness, gamma effects, or to strengthen or weaken particular colours
© 2005 Dr. Christian Jones
Multimedia Design Media types topic 5 - Slide
39
Filtering used to blur, sharpen or distort images, producing a variety of special effects operate on every pixel in an image, but use values of neighbouring pixels as well
Compositing the combining of two or more images to produce a new image generally done by specifying mathematical relationships between the images
Geometric Transformations basic transformations involve displacing, rotating, mirroring or scaling an image more advanced transformations involve skewing and warping images
Conversions support conversions between image formats. A number of tools exist. other forms of conversion include compression and decompression, changing
colour models, and changing image depth and resolution
Media TypeRepresentation
Operations
<Graphic>
Drawing ModelsEmpirical Models
Physically-based Models
Geometric ModelsSolid Models
ShadingStructural EditingPrimitive Editing
ViewingRendering
External formats for Models
MappingLighting
The central notion of graphics, as opposed to image data, is in the rendering of graphical data to produce an image. A graphics type or model is therefore the combination of a data type plus a rendering operation
Graphics
Geometric Models consist of 2D and/or 3D geometric primitives 2D primitives include lines, rectangles, ellipses plus more general polygons
and curves 3D primitives include the above plus surfaces of various forms. Curves and
curved surfaces described by parameterised polynomials
Primitives can be used to build structural hierarchies, allowing each structure thus created to be broken down into lower-level structures and primitives
Several standard device-independent graphics libraries are based on geometric modelling GKS (Graphic Kernel System(ISO)) PHIGS (Programmers Hierarchical Interactive Graphic System (ISO)) - see also
PHIGS+ and PEX OpenGL - portable version of Silicon Graphics library
Non-Temporal Media Types : Graphics representations
Solid Models Constructive Solid Geometry (CSG) : solid objects are combined using the set
operators union, intersection and difference Surfaces of revolution : a solid is formed by rotating a 2D curve about an axis in 3D space Extrusion : a 2D outline is extended in 3D space along an arbitrary path
Using the above techniques will produce models much faster than building them up from geometric primitives, but rendering them will be expensive
Physically-based Models realistic images can be produced by modelling the forces, stresses and strains on
objects when one deformable object hits another, the resulting shape change can be
numerically determined from their physical properties
© 2005 Dr. Christian Jones
Multimedia Design Media types topic 6 - Slide
43
Empirical Models
complex natural phenomena (clouds, waves, fire, etc.) are difficult to describe realistically using geometric or solid modelling. Physically based models are possible, but they may be computationally expensive or intractable
the alternative is to develop models based on observation rather than physical laws, such models do not embody the underlying physical processes that cause these phenomena but they do produce realistic images: fractals, probabilistic graph grammars (used for branching plant structures) and particle systems(used for fires and explosions) are examples of empirical models
Drawing Models describing an object in terms of drawing or painting actions the description can be seen as a sequence of commands to an imaginary drawing
device – Postscript..
External formats for Models need for export/import formats between graphics packages CGM & CAD are OK. Postscript and RIB are render-only
Shape primitive editing specifying and modifying the parameters associated with the model primitives e.g. specify the type of a primitive and the vertex coordinates and surface normals
Shape structural editing creating and modifying collections of primitives establish spatial relationships between members of collections
Shading operations provide means to describe the interaction of light with the object in terms how the object reflects light and if it transmits light. Most methods describe the surface of the object using meshes of small, polygonal surface patches flat shading - each patch is given a constant colour Gouraud shading - colour information is interpolated across a patch Phong shading - surface normal information is interpolated across a patch Ray tracing & Radiosity - physical models of light behaviour are used to calculate
colour information for each patch, giving highly realistic results
Non-Temporal Media Types : Graphics operations
© 2005 Dr. Christian Jones
Multimedia Design Media types topic 6 - Slide
45
Texture mapping: an image, the texture map, is applied to a surface. This requires a mapping from 3D surface coordinates to 2D image coordinates, so
given a point on the surface the image is sampled and the resulting value used to colour the surface at that point
Viewing to produce an image of a 3D model we require a transformation which projects 3D
world coordinates onto 2D image coordinates view specification consists of selecting the projection transformation, usually from
parallel or perspective projections although camera attributes can be specified in some renderers, and the view volume
Rendering converts a model, including shading, lighting and viewing information, into an image output resolution: the width and height of the output image in pixels, and the pixel
depth
Temporal media Temporal (time-dependent / continuous):
In full motion video, sound, continuous signals from different sensors… the values change over time. Information is expressed not only in its individual value but also by the time of occurrence. The semantics depends on the level of relative change of the discrete values or of the continuum. They consist of a time dependent sequence of individual information units called Logical Data Unit (shots, clips within the sequence, moving objects within the clips).
Processing these media is time critical because the validity and correctness of the data depends on a time
Media Type
Representation
Operations
<Digital Video>
Frame rateData rateSample size and quantisation
Sampling rate
RetrievalStorage
MixingConversion
Compression
SynchronisationEditing
Digital video
Digital video is a sequence of frames, the frames being digital images, possibly compressed in some manner. Each frame in a digital video sequence has its own timing associated with it.
© 2005 Dr. Christian Jones
Multimedia Design Media types topic 8 - Slide
48
Temporal Media Types : Digital video representations
Sampling rate the value of the sampling rate determines the storage requirement and data
transfer rate the lower limit for the frequency at which to sample in order to faithfully reproduce
the signal, the Nyquist rate, is twice the highest frequency within the signal
Sample size and quantisation sample size is the number of bits used to represent sample values quantisation refers to the mapping from the continuous range of the analog signal
to discrete sample values choice of sample size is based on:
signal to noise ratio of sampled signal sensitivity of medium used to display frames sensitivity of the human eye
© 2005 Dr. Christian Jones
Multimedia Design Media types topic 8 - Slide
49
Digital video commonly uses linear quantisation, where quantisation levels
are evenly distributed over the analog range (as opposed to logarithmic quantisation) we can divide digital video representations into two broad categories :
high data rate formats - primarily used in professional video production and post-production- little or no compression, picture quality and ease of processing are prime considerations
Examples are :- Digital Component Video (CCIR 601)- Digital Composite Video- Common Intermediate Format (CIF) and Quarter-CIF (QCIF) : approved by CCITT for video conferencing
© 2005 Dr. Christian Jones
Multimedia Design Media types topic 8 - Slide
50
Data rate
high data rate formats can be reduced to lower data rates by a combination of : compression reducing horizontal and vertical resolution reducing the frame rate
for example: start with broadcast quality digital video at 10Mbytes/s divide the horizontal and vertical resolutions by 2, giving VHS quality resolution divide the frame rate by 2 compress at a rate of 10:1 data rate becomes 1Mbit/s, suitable for use on LANs and on optical storage devices (i.e.
CD-ROM)
© 2005 Dr. Christian Jones
Multimedia Design Media types topic 8 - Slide
51
Frame rate
25 or 30 fps equates to analog frame rate, or full-motion video at 10-15 fps motion is less accurately depicted and the image flickers, but the
data rate is much reduced
Compression in digital video we can compare compression methods by three factors:
Lossy v. lossless Real-time compression - trade-off between symmetric models and asymmetric models
with real-time decompression Interframe (relative) v. Intraframe (absolute) compression (i.e. MPEG-1 v. Motion JPEG)
© 2005 Dr. Christian Jones
Multimedia Design Media types topic 8 - Slide
52
Video formats: MPEG-1 - 1Mbit/s MPEG-2 - broadcast quality video at rates between 2-15Mbit/s MPEG-4 - low data rate video MPEG-7 - metadata standard for video representation Motion JPEG px64 (CCITT H.261) - intended for video applications using ISDN (Integrated
Services Digital Network). Known as px64 since it produces rates that are multiples of ISDNs 64Kbits/s B channel rate. Uses similar techniques to MPEG but, since compressions and decompression must be real-time, quality tends to be poorer.
H.263 - based on H.261, but offers 2.5 times greater compression, uses MPEG-1 and MPEG-2 techniques.
© 2005 Dr. Christian Jones
Multimedia Design Media types topic 8 - Slide
53
Non-Temporal Media Types : Digital Video Operations
Storage to record or playback digital video in real-time, the storage system must be
capable of sustaining data transfer at the video data rate Problem is size of storage, even using MPEG-1 13 minutes of video will fill a
100Mbyte disk.
Retrieval uses frame addressing, but there are some problems : interframe compression techniques, i.e. MPEG, only code key frames
independently, other frames are derived from these key frames. So random access requires to first find the nearest key frame and then use this to decode the desired frame, again using the index but enhancing it with key frame locations
Multimedia systems
Not every combination of media justifies the use of the term multimedia.
A proper use is when both continuous and discrete media are utilized. According to this, the defining characteristics of multimedia systems is the incorporation of continuous media such as voice, video and animation. Independent media should be integrated to accomplish certain functions using timing, spatial and semantic syncronization
Multimedia systems are also those systems that incorporate only discrete or continuous media but employ an integrated multiplicity of Presentation, Transmission, Representation, or Perception media
A text processing program with incorporated images is therefore not a multimedia application
Research and development efforts in multimedia fall into two groups: stand-alone multimedia wks and associated software systems such as
music composition, computer aided learning and interactive video. Stand alone multimedia systems are such that multimedia information is created, processed, presented and stored in the workstation.
multimedia computing with distributed systems: multimedia information systems, collaboration and conferencing systems, on-demand multimedia services, distance learning…..
Natural interfaces are easy to use, and allow people to interact with them the way they do with other people. In particular, interfaces that make it possible to interact with computerized equipment without need for special external equipment.
These interfaces are not based on menus, mice, and keyboards but use instead gesture, speech, affect, context, and movement.
Their applications are not word processors and spreadsheets, but smart homes and personal assistants: “instead of making computer-interfaces for people, it is of more fundamental value to make people-interfaces for computers”.
Research examples at MICC: Natural interaction
The most important factor in making these applications possible in recent years
has been the novel viability of real-time computer vision and speech understanding.
Systems coupled with natural interfaces will enable tasks historically outside the normal range of human-computer interaction by connecting computers to phenomena (such as someone walking into a room) that have traditionally been outside the scope of traditional user-interfaces. With natural interfaces the user experiences a form of context awareness, exploiting dialog modalities and behaviors that are commonly used in ordinary activities in his/her real daylife.
The POLIFEMO project
Objectives: Support for hard disabilities Interaction based on eye gaze capture 3D navigation control
Keypoints: eye shape and pupil identification tracking of eye motion by elastic
matching clicking by persistence
The GOLEM project
Objectives: Virtual presence Real time body motion tracking and
replication
Keypoints: Input from two webcams Color blob indentification (head,
hands feet) Stereo matching and computing Inverse kinematics for internal DOFs
computation H-anim body model and MPEG-4
compliance Virtual body animation
demo dilbert
demo balls
Demo golem click on the icon to play the videoclick on the icon to play the video
The POINT AT project
Objectives: Natural interaction for cultural heritage
applications
Keypoints: large projection screen input from two webcams hand detection geometric computation of the pointed
position
CLICK ON DEMO POINT AT
Current research at MICC: Natural interaction in “intelligent rooms”
Non-Active userviewing
Active usergesture-based interaction
Tangible object
Active context
Neighbor context
External context
Non-active userviewing
Non-active user exiting
Active usertangible object interaction
Institutional cooperation with Provincia di Firenze Museo Palazzo Medici Riccardi
Main ongoing projects:Regione Toscana POR: Mnemosyne Provincia di Firenze: Interactive BookshopProvincia di Lucca: Monte Alfonso LibraryShawbak castle Jordany: interactive intro
Smart-Object (Micrel Lab University of Bologna)
o wireless data transmission (via bluetooth) through electronic miniaturized sensorso triple-axis accelerometer to recognize the upper cube face and user actions o infra-red LED matrix to track the cube position through the computer
Cube as metaphor of a digital data collector
• Face detector based on the Viola-Jones algorithm • Tracking algorithm based on color histogram and particle filtering• PTZ camera steering algorithm (controlled by the tracker)• Face recognition, gaze and expression estimation are applicable• Running at 9-12 fps while tracking 4-5 targets with fixed and PTZ camera on a Intel Xeon 4-core PC at 640x480.No background subtraction.
Non-Active userviewing
Active usergesture-based interaction
Tangible object
Active context
Neighbor context
External context
Non-active userviewing
Non-active user exiting
Active usertangible object interaction
Smart room environment
Interacting with physical and graphical objects
“Frontiers of Interaction” Roma, Italy 2009
Automatic annotation of images, video and audio and 3D objects is motivated by the huge amount of digital data daily produced by industry and individuals and the need to store them into repositories for later retrieval and reuse with descriptions that capture their intrinsic content in someway at the lowest possible cost
Visual and audio data must be complemented with descriptors that provide a synthetic representation of their content.
Descriptors must be at both the syntactic and the semantic level, where the former captures visual and auditory features (like color, texture, spatial relations – for visual data – peak values, frequency – for audio data) and the latter captures objects, meanings, highlights, events, moods….
Research examples at MICC: Automatic semantic annotation
Example of human behavior detection
M ethod KTH Weizmann
Our method 92.57 95.41
Laptev et al. - HoG ['08]
81.6 -
Laptev et al. - HoF [‘08]
89.7 -
Dollár et al. [‘05]
81.2 -
Wong e Cipolla [‘07]
86.6 -
Scovanner et al. [‘07]
- 82.6
Niebles et al. [‘08]
83.3 90
Liu et al. [‘08] - 90.4
Kläser et al. [’08]
91.4 84.3
Willems et al. [‘08]
84.2 -
Example of event detection
Automatic semantic annotation of video through event/behavior detection
• Soccer highlight detection: • Kickoff• Shot on goal • Counterattack• Turn over • Placed Kick (in attack) • Forward pass
• Corner kick• Penalty kick• Free kick (close to penalty)
Highlight video annotation
Automatic semantic annotation of Sports Video
Semantic annotation is a prerequisite to effective retrieval by content of
multimedia data. You can use words to retrieve video data: “find all video shots where President Clinton or President are present”…
The emerging of multimedia, as well as the availability of large image and video archives and the development of information highways have attracted research efforts in providing tools for effective retrieval of visual data based on their content.
The relevance of visual information retrieval applies to several application fields like art galleries and museum archives, medical and geographic databases…..
Research examples at MICC: Content-based retrieval
Retrieval by global Color similarity Retrieval by Color Region similarity Retrieval by Shape similarity Retrieval by Visual similarity based on appearing features of spatial entities:
shape, color, texture, spatial relationships
3D faces
Main ongoing projects:EC VIDIVIDEOEC IM3I
3D face matching
Video event / behavior detection
video
Bag-of-words
SVM classifierrunning
walkingjogginghandwavinghandclappingboxing
Visual Dictionary
…
Bag-of-featuresInterest
points
3D surfaces descriptors
W011W-111 W111
W001W-101 W101
W0-11W-1-11 W1-11
+1
j 0
-1-1 0 +1 i
W010W-110 W110
W100
W1-10
W01-1W-11-1 W11-1
W10-1
W1-1-1
+1
0 k
-1
w(A,B)
bababaabkA B
abjabiijk
ijk dzdzdydydxdxzzCyyCxxCK
BAw )()()(1),( −⋅−⋅−= ∫ ∫
Partitionment of 3D faces into iso-geodesic stripes
Derivation, for each stripe, of an integral measure that indicates how much of another stripe is in one of the 27 possible directions in the 3D space
3D face matching
• 3D to 3D face recognition by modeling the 3D shape and the relative arrangement of iso-geodesic stripes identified on the model surface.• Construction and matching of a graph based face model.
From 2D face images to 3D face matching
Scale Invariant target template matchingReliable target tracking from PTZ video
2D face sequences
3D faces
2D faces
PTZ camera
Industry joint laboratory with THALES Corp
Identity recognition through 2D and 3D face matching
DISTRIBUTED MULTIMEDIA SYSTEMS
Communication capable multimedia systems are such that multimedia information cannot only be created, processed, presented and stored, but also distributed above the single computer boundary.
Web-based multimedia systems support multimedia applications over the Internet. Distributed multimedia systems require continuous data transfer over relatively long periods of time, media syncronization, very large storage and special indexing and retrieval.
Course outline
Part I Media e formati Del Bimbo
Part II Standard per immagini, video, audio: JPEG-JPEG2000, MPEG 1-2-4, MP3 Del Bimbo - Bertini - D’Amico
Part III Linguaggi di elaborazione per i media: (Processing), Action Script Del Bimbo, Nunziati, D’Amico
Part IV Interaction design: Web interface designDel Bimbo
Part V Linguaggi di interscambio e presentazione: XML, XHTML, CSS Del Bimbo –Martorana
Part VI Laboratorio