Post on 19-Jan-2015
description
Carlo PatriniInformation Architect carlo.patrini@it.ibm.com+393357248561
© 2013 IBM Corporation
Overview dellaproposta IBM
22.marzo.2013
2© 2013 IBM Corporation
Abbiamo bisogno di acquisire maggiore conoscenza
© 2013 IBM Corporation
Le esigenze di acquisire maggior conoscenza (insights) sono sempre più necessarie ed urgenti
32012 IBM Corporation
Come potremmo sfruttare al meglio i dati storici per
capire in anticipo le azionidei nostri compratori ?
Quali prodotti sivendono meglio oggi
in Italia?
il 91% dei clienti insoddisfatti sirivolgerà ad altri fornitori
Come migliorare la ns customer retention ?
Qual è stata l’ efficaciadella campagna C123 ?
Cosa dicono le persone del nostronuovo prodotto ?
lIntegrare il Business con la Tecnologia
lUtilizzare dati storici e di sintesi – strutturati e non
lTrarre il massimo profitto dall'analisi delle informazioni estratte da tutte le fonti disponibili
Vorrei scoprire nuovisegmenti cliente….
Cosa dice la gente al nostro servizioCall center ?
Rispondere a domande.. sempre nuove, sempre urgentie sempre… strategiche
42012 IBM Corporation
Reporting
Predictive Analytics Analysis
CubiMaster Data Management
ETLData
Integration Data Quality Data Delivery
Data Warehouse
…sempre sollecitata dal mercato che chiede..
• Volumi più elevati• Più elevata qualità dei dati• Maggior controllo sul processo
•e soprattutto maggior SEMPLICITA AUTONOMIA e PERFORMANCE
•…….
Il Data Warehouse e la Business Analytics sono un’ottima risposta
Fontidati dasistemi
gestionali
5
Mumblemumble….
DWH più snelli, veloci e reattivi …l’appliance DWH è la soluzione
5© 2013 IBM Corporation
Il DWH è fondamentale però a volte è lento e troppo ingessato e non evolvecon i tempi del business .. la soluzione èIBM Netezza
62012 IBM Corporation
E il business è interessato ad acquisire info chevanno oltre la transazione
Il DWH generalmente traccia la transazionefinale, quella conclusiva.
Per “leggere” meglio il processo di acquisto serve conoscere anche il resto
Inizio processo acquisto
Fail
Fail
FailFail
FailFail
Fail
Fail
Fail
FailFailFail
Fail
FailFail
Alberodecisionale
del processo
di acquisto
Yes!
Fine processo acquisto
7
Big Data: il nuovo oceano dei dati
7 © 2013 IBM Corporation
Volume
di Tweets al giorno
12+ terabytes
Varietà
Di tipi dati diversi100’s
Veridicità
Utenti di business ritiene diavere informazioni affidabili
Solo 1 su 3
Sensori, RFID, altri device che generano dati in
streaming
30 miliardi
Velocità
I dati sotto la superficie ancora inesplorati
82012 IBM Corporation
La conoscenza è contenuta anche in fonti non convenzionali …perchè ignorarle? q Il Business necessita di gestire ed usare in modo
massivo una quantità sempre crescente diinformazioni non convenzionali e generalmentecreate all’esterno delle organizzazioni aziendali
q La maggior parte di queste informazioni non convenzionali, sono semistrutturate o completamente destrutturate
q Le organizzazioni soffrono se non possonoacquisire la conoscenza contenuta nelleinformazioni di business Ø I sistemi tradizionali analizzano solo dati strutturatiØ Il mancante 80% è costituito da informazioni non
strutturate o semi strutturate (Gartner).
200k twitter al minuto290 milioni twitter anno
12Tb twitter/giorno
25Tb Facebook /giorno
Big Data
92012 IBM Corporation
6,000,000 users on Twitterpushing out 300,000
tweets per day
500,000,000 users on Twitterpushing out 400,000,000
tweets per day
83x
1333x
Quando si parla di “data explosion”
102012 IBM Corporation
Approccio Tradizionale e Approccio Big Data
112012 IBM Corporation
BIG DATAStato dell’arte
12
IBM and the Saïd Business School (on Global Scale) and SDA Bocconi University (on local Scale) partnered to benchmark global big data activities
12 www.ibm.com/2012bigdatastudy
>1100 Business Managers >200 CIOs
Which is the State-of-The-Art?
13IBM e Saïd Business School (Università di Oxford – ricerca globale) e Università SDA Bocconi
(Italia) hanno collaborato per un benchmark sulle iniziative Big Data
Big Data: lo stato dell’arte
Big data is dependent upon a scalable and extensible information foundation2
The emerging pattern of big data adoption is focused upon delivering measureable business value5
Customer analytics are driving big data initiatives1
Big data requires strong analytics capabilities4
Initial big data efforts are focused on gaining insights from existing and new sources of internal data3
14
Key Findings: Big Data Activities
Have Not Begun Big Data Activities
>1000 Business Managers
>200 CIOs
Pilot & Implementation ofBig Data Activities
Planning Big Data Activities
24% 47% 28%
18%25% 57%
15
BI / Reporting
BI / Reporting
Exploration / Visualization
FunctionalApp
IndustryApp
Predictive Analytics
Content Analytics
Analytic Applications
IBM Big Data Platform
Systems Management
Application Development
Visualization & Discovery
Accelerators
Information Integration & Governance
HadoopSystem
Stream Computing
Data Warehouse
1 – Analyse large structured and
unstructured data sets
– Analyse large structured and
unstructured data set in streaming
4 – Search (and federate
data) in a big data context– Optimized Very Large
Data Warehousing
INFOSPHEREDATA EXPLORER
(VIVISIMO)
INFOSPHERE STREAMSINFOSPHERE
BIGINSIGHTS
PURE DATA for Analytics(NETEZZA)
1
2
3
4
IBM Big Data Platform & Ecosystem
5
IBM CONTENT ANALYTICSOut-of-the-Box Text analytics
Open environment with Enterprise Search6
IBM SOCIAL MEDIA ANALYTICSOut-of-the-Box Social Analytics
Environment
162012 IBM Corporation
Cognos
SpreadsheetsApplications
CubiMaster Data Management
ETLData Integration
Data Quality Data Delivery
Data Warehouse
Il Data Warehouse e la Business Analytics…. ben siintegrano con la BIG DATA platform
External Source SystemsStructured,
Semi Structured/ Unstructured DataSensors
IBM InfoSphere BigInsights1
IBM InfoSphere Streams3
2
Netezza
IBM Vivisimo4
56
172012 IBM Corporation
L’ecosistema Big Data : la chiave è l’interoperabilità
StreamingData
TraditionalWarehouse
Analytics onData at Rest
DataWarehouse
Analytics on Structured Data
Analytics onData In-Motion
InfoSphereBigInsightsInfoSphereBigInsights
Traditional / Relational
Data Sources
Traditional / Relational
Data Sources
Non-Traditional / Non-Relational Data Sources
Non-Traditional / Non-Relational Data Sources
Non-Traditional/Non-RelationalData Sources
Non-Traditional/Non-RelationalData Sources
Traditional/Relational Data Sources
Traditional/Relational Data Sources
Internet-ScaleData Sets
InfoSphereStreams
InfoSphereStreams
18
0011010100100100100110100101010011100101001111001000100100010010001000100101
La piattaforma IBM Big Data: La nuova frontiera di Analisi
18
01011001100011101001001001001110001001010010010110010010100110010010100100101010001001001100100101001001010100010010110001001010010010110010010100110010010100100101010001001001100100101001001010100010010
Ana
lisiR
eal T
ime 01100100101001001010100010010
011001001010010010101000100101100010010100100101100100101001100100101001001010100010010011001001010010010101000100100110010010100100101010001001001100100101001001010100010010011001001010010010101000100101100010010100100101100100101001100100101001001010100010010011001001010010010101000100100110010010100100101010001001011000100101001001011001001010
ModelloAnaliticoAdattivo
Arricchire
Data Ingest
19
Analisi Tradizionale estesa ai Big Data
Pre-Processing Hub Query-able Archive Exploratory Analysis
InformationServer
Data Warehouse
Streams
BigInsight
Data Warehouse
BigInsight
Combinare datistrutturati con non strutturati
Data Warehouse
1 2 3
19
Find and viewData Explorer
Data Explorer
BigInsight
Streams
19 © 2013 IBM Corporation
InformationServer
Information Server
20
Applicazioni Big Data
q Analisi cosa si dice sui Social Media di un argomento q Analisi messaggi Call Center q Analisi dei LOG. q Identificazione delle frodi. q Ricercare dati attraverso un
motore federatoq Analisi di dati provenienti
da sensoriq ........
Si ricorre ad una soluzione Big Data, ad esempio, quando:
- risulta necessario analizzare TUTTI i dati potenzialmente disponibili e quando l’elaborazione di un loro campione non sarebbe significativa e in grado di fornire risultati efficaci.
- si vuole ESPLORARE, anche in modo interattivo, i dati disponibili nei casiin cui le misure e gli indicatori di business non siano predeterminati.
- occorre analizzare un FLUSSO CONTINUO ed ampio di dati per prendere decisioni in tempo reale
Il fenomeno Big Data non è legato ad un particolare settore di industria fa leva sulla crescita del volume dei dati e su ulteriori dimensioni come la Velocità e la Varietà dei dati disponibili.
212012 IBM Corporation
Vestas optimizes capital investments
based on 2.5 Petabytes of information.
§Model the weather to optimize placement of
turbines, maximizing power generation and longevity.
§ Reduce time required to identify placement of turbine
from weeks to hours.
§ Incorporate 2.5 PB of structured and semi-
structured information flows. Data volume expected to
grow to 6 PB.
Biginsights per elaborare inmodo molto veloce Petabytesdi dati
21
222012 IBM Corporation
Cisco turns to IBM big data for intelligent
infrastructure management.
§Optimize building energy consumption with centralized
monitoring and control of building monitoring system.
§ Automates preventive and corrective maintenance of
building systems.
§ Uses Streams, InfoSphere BigInsights and Cognos
§ Log Analytics
§ Energy Bill Forecasting
§ Energy consumption optimization
§ Detection of anomalous usage
§ Presence-aware energy mgt.
§ Policy enforcement22
Infosphere Streams e Biginsightsper la gestione degli ambienti
22
232012 IBM Corporation
IBM Data Babyyoutube.com
Big Data enabled doctors from University of Ontario to apply neonatal infant monitoring to predict infection in ICU 24 hours in advance
Infosphere Streams nel campo medico
23
242012 IBM Corporation24
Dublin City Centre Increases Bus Transportation
Performance
• Public transportation awareness solution improves on-time performance and provides real-time bus arrival info to
riders
• Continuously analyzes bus location data to infer traffic conditions and predict
arrivals
• Collects, processes, and visualizes location data of all bus vehicles
• Automatically generates transportation routes and stop locations
Results:• Monitoring 600 buses across 150 routes • Analyzing 50 bus locations per second• Anticipated to Increase bus ridership
Capabilities Utilized:Stream Computing
Infosphere Streams per la ottimizzazione del traffico
24
25
“What is great about thissolution is that it helpsus to focus our actionson the most important
topics of online discussions and
immediately plan the correct and most
suitable reaction.” –Online Communication
Department, BBVA
- Enables BBVA to consistently respond to and gain insightinto customer needs and feedback.
- Gives BBVA the ability to measure the success of its outputsand approaches to engaging stakeholders and customers.
- Shows whether positive or negative sentiments haveincreased or not, looks for the source and reason ofcomments and helps make decisions and plans.
BehavioralData
CUSTOMER Analytics – GRUPO BBVAseamlessly monitors and improves its online reputation
.
2626
Social Analytics to collect Customer longitudinal pointof views from Web 2.0 and correlate themwith internal data
BehavioralData
Better understand its marketing campaigns and consumer preferences,
Looking for ways to analyze and differentiate consumer experiences
Helped the client to assess the company’s corporate brands, with respect to one of its main pay-TV competitors
“Big Data is a greatopportunity for TV
innovation in the nextyears. TV viewing istransforming into a multiplatform and
participative experience: the better we know and understand our viewers, the better we can serve them." – Valerio Motti,
Head of Marketing Innovation, Mediaset
S.p.A.
TrandationalData
CUSTOMER Analytics – MEDIASET.
272012 IBM Corporation
VIVISIMO – referenze
282012 IBM Corporation
CASEHistory
292012 IBM Corporation
LINK
PDF File
SUCCESS STORIES : tra le varie fonti…. eccone due
Ricorda : Recuperare link che contiene questo doc
302012 IBM Corporation
LINK UTILI
312012 IBM Corporation
Big Data HUB & Success Storieshttp://www.ibmbigdatahub.com/
Big Data Universityhttp://bigdatauniversity.com/
http://www.ibmbigdatahub.com/blog/research-director-reflects-new-big-data-book
FREE ebook – Harness the power of BigData
BIG Data : alcuni utili link
https://www.ibm.com/developerworks/mydeveloperworks/wikis/home?lang=en_US#/wiki/BigInsights
BigInsights tec enablement wiki
32
Mi fermo qui….
grazie per la pazienza
32 © 2013 IBM Corporation
332012 IBM Corporation
HADOOP&
BIGINSIGHTS
342012 IBM Corporation
CPU istruzioni al secondo – miglioramenti significativi 1990 44 Mips at 40 Mhz2000 3.562 Mips at 1.2 Ghz2010 147.600 Mips at 3.3 Ghz
RAM Memory - miglioramenti significativi – 1990 640 K– 2000 64 Mb – 2010 8-32 GB
Disk capacity - miglioramenti significativi – 1990 20 MB– 2000 10 GB – 2010 1 TB
Disk latency (velocità di leggere e scrivere su disco ) - miglioramenti poco significativi
Negli ultimi 7-10 anni non ci sono state enormi miglioriecorrentemente la velocita è di circa 70 – 80 MB / sec
Biginsights basato su Hadoop ….. perchè ?
352012 IBM Corporation
Quanto tempo ci vuole per scandire 1 TB ?
q1 TB (at 80 MB / sec) – 1 disk 3.4 hours– 10 disks 20 min– 100 disks 2 min– 1000 disks 12 sec
q Per ovviare alla Disc Latency la risposta è la ..elaborazione parallela
q Hadoop : un nuovo modo per memorizzare ed elaborare i dati
ØScritto in JavaØProgettato per lavorare su hardware non specializzatoØGira in ambiente LinuxØScalabile, Flessibile,Robusto
362012 IBM Corporation
What is Hadoop?
§ Apache Hadoop = free, open source framework for data-intensive applications – Inspired by Google technologies (MapReduce, GFS)– Yahoo has been the largest contributor to the project (Doug Cutting),– Well-suited to batch-oriented, read-intensive applications – Originally built to address scalability problems of Nutch, an open source
Web search technology
§ Enables applications to work with thousands of nodes and petabytes of data in a highly parallel, cost effective manner– CPU + disks of commodity box = Hadoop “node”– Boxes can be combined into clusters– New nodes can be added as needed without changing
• Data formats• How data is loaded• How jobs are written
372012 IBM Corporation
Two Key Aspects of Hadoop
§MapReduce framework – MapReduce is a software framework introduced by
Google to support distributed computing on large data sets of clusters of computers.
– How Hadoop understands and assigns work to the nodes (machines)
§ Hadoop Distributed File System = HDFS– Where Hadoop stores data– A file system that spans all the nodes in a Hadoop cluster– It links together the file systems on many local nodes to
make them into one big file system
382012 IBM Corporation
MapReduce Application
1. Map Phase(spezza il job in piccole parti)
2. Shuffle(riordina I risultati parziali per
le elaborazione finale)3. Reduce Phase
(rielabora il tutto per ottenereun singolo risultato)
Return a single result setResult Set
Shuffle
public static class TokenizerMapperextends Mapper<Object,Text,Text,IntWritable> {
private final static IntWritableone = new IntWritable(1);
private Text word = new Text();
public void map(Object key, Text val, ContextStringTokenizer itr =
new StringTokenizer(val.toString());while (itr.hasMoreTokens()) {word.set(itr.nextToken());context.write(word, one);
} }}
public static class IntSumReducerextends Reducer<Text,IntWritable,Text,IntWrita
private IntWritable result = new IntWritable();
public void reduce(Text key,Iterable<IntWritable> val, Context context){
int sum = 0;for (IntWritable v : val) {
sum += v.get();
. . .
public static class TokenizerMapperextends Mapper<Object,Text,Text,IntWritable> {
private final static IntWritableone = new IntWritable(1);
private Text word = new Text();
public void map(Object key, Text val, ContextStringTokenizer itr =
new StringTokenizer(val.toString());while (itr.hasMoreTokens()) {word.set(itr.nextToken());context.write(word, one);
} }}
public static class IntSumReducerextends Reducer<Text,IntWritable,Text,IntWrita
private IntWritable result = new IntWritable();
public void reduce(Text key,Iterable<IntWritable> val, Context context){
int sum = 0;for (IntWritable v : val) {
sum += v.get();
. . .
Distribute maptasks to cluster
Hadoop Data Nodes
§ I dati sono memorizzati su un sistema distribuito di server
§Le funzioni elaborative vengono inviate dove ci sono I dati
§Ogni server elabora I dati di propria competenza e condivide i risultati
§ Il sistema può scalare raggiungendo migliaia di nodi e PB di dati
Hadoop ed il paradigma Map Reduce
392012 IBM Corporation
BI / Report
ing
BI / Reporting
Exploration / Visualization
FunctionalApp
IndustryApp
Predictive Analytics
Content Analytics
Analytic Applications
IBM Big Data Platform
Systems Management
Application Development
Visualization & Discovery
Accelerators
Information Integration & Governance
Stream Computing
Data Warehouse
BigInsights estende le capabilities di Hadoop open source con l’aggiunta di nuove funzionalità ….
InfoSphere BigInsights
Administration & Security
Workload Optimization
Connectors
IBM tested & supported open source components
Open source based
components
Enterprise capabilities
Advanced Engines
Indexing
HadoopSystem
Development Tools
402012 IBM Corporation
Con BigInsights le aziende possono indirizzare l’ elaborazione di enormi quantità di dati mai prima sfruttate e ricavare nuova conoscenza in modo efficiente, ottimizzato e
scalabile.Tale infrastruttura sfrutta il MapReduce framework di Hadoop per affrontare
l’elaborazione parallela di grandi insiemi di dati distribuiti su numerosi nodi.
Infosphere BigInsights : due edizioni
40
41
Enterprise EditionGPFS-SNC Native Support*
Spreadsheet-style data explorationJob and Workflow Management
Productivity and Efficiency ImprovementsIntegration with InfoSphere Warehouse
Integration with NetezzaIntegration with DB2
Large Scale IndexingText Analytics
Machine Learning*Tiered Terabyte Pricing
* = coming soon
Basic EditionFree Download, Easy Installation
24x7 Web Support, 10TB LimitPaid Support Option
Infosphere BigInsights : due edizioni
422012 IBM Corporation
Biginsights on Cloud
432012 IBM Corporation
IBM BigInsights on CloudHadoop for everyone
442012 IBM Corporation
Infosphere Streams
452012 IBM Corporation
InfoSphere Streams dispone di un’infrastruttura software agile e scalabile per l’analisi in tempo reale di enormi flussi di dati in movimento, di qualsiasi natura e provenienti da innumerevoli sorgenti.
Tale tipo di elaborazione aumenta la precisione e la velocità del processo decisionale in diversi campi come quelli sanitario, astronomico,manifatturiero, finanziario e molti altri ancora.
Infosphere Streams
462012 IBM Corporation
Categories of Problems Solved by Streams
§ Applications that require on-the-fly processing, filtering and analysis of streaming data– Sensors: environmental, industrial, surveillance video, GPS, …– “Data exhaust”: network/system/web server/app server log files– High-rate transaction data: financial transactions, call detail records
§ Criteria: two or more of the following– Messages are processed in isolation or in limited data windows– Sources include non-traditional data (spatial, imagery, text, …)– Sources vary in connection methods, data rates, and processing
requirements, presenting integration challenges– Data rates/volumes require the resources of multiple processing nodes– Analysis and response are needed with sub-millisecond latency– Data rates and volumes are too great for store-and-mine approaches
472012 IBM Corporation
à continuous ingestion
à continuous analysis
achieve scaleby partitioning applications into componentsby distributing across stream-connected hardware nodes
infrastructure provides services for scheduling analytics across h/w nodesestablishing streaming connectivity…
TransformFilter
ClassifyCorrelate
Annotate
Elaborazione real time time con infosphere streams
482012 IBM Corporation
Infosphere DataExplorer
(ex VIVISIMO)
492012 IBM Corporation
Aiuta le organizzazioni a scoprire,
organizzare, analizzare e navigare
grandi quantità di dati eterogenei e
dinamici, sia strutturati che
destrutturati, indipendentemente da
dove siano gestiti o storicizzati, per
incrementare l’efficienza ed il valore
nei processi di business.
Vivisimo e la sua missione
502012 IBM Corporation
Vivisimo nell’azienda
FileSystems
RelationalData
ContentManagement
CRM
SupplyChain
ERP
RSS Feeds
ExternalSources
Cloud
CustomSources
Velocity P
latform
§ Garantire l'accesso a numerose applicazioni e archivi dati
§ Scoprire e navigare all’interno ditutta l’azienda
§ Fondere informazioni strutturate e non strutturate per guidare
l’azienda verso:– Migliori decisioni
– Operazioni più efficienti– Migliore comprensione dei
clienti– Innovazione
§ Strumenti Social per la collaborazione ed il riutilizzo
Application/Users
Commenting
Rating
SharedFolders
Tagging
Social Tools
512012 IBM Corporation
51
Vivisimo ricerca federata
522012 IBM Corporation
CM, RM, DM RDBMS Feeds Web 2.0 Email Web CRM, ERP File Systems
Web ResultsFeedsSubscriptions
ThesauriClustering
Ontology SupportSemantic Processing
Entity ExtractionRelevancy
Text Analytics
Federated SourcesApplication SDK
Authentication/AuthorizationQuery transformation
PersonalizationDisplay
User Profiles
Search EngineFaceting
BITagging
TaxonomyCollaboration
Meta-Data
ConnectorFramework
Vivisimo architettura
532012 IBM Corporation
CUSTOMERAnalyticsesempi..
54
Deeper Customer Analytics Examples and Best Practice and leverage Big Data: Ready for Business
Single viewBusiness Data,
Social Data, Interactive data
Enterprise Systems
Delight customers with targeted….social and transactional
propositions
Real time interaction across channels
Interact!
Connect with Clients & prospects, with Brands
...analyse strong and weaksignals in discussion
You
Interaction Data
Behavioral Data
TransactionData
CUSTOMER Analytics - alcuni esempi ..
55
Digital & MultichannelMarketing / individual
digital analytics, real timemonitoring, I/O ERP data, dynamic segments, mkt.
automation
Intuitive
Social collection
Single viewBusiness Data,
Social Data, Interactive data
Enterprise Systems
Digital marketing optimization: lifetime individualtracking, microsegmentation, channel attribution, proposition automation
CUSTOMER Analytics – MOBY Lines .
56
Single viewBusiness Data,
Social Data, Interactive data
Garanty Real time interaction across channelsCUSTOMER Analytics – GARANTY bank – un filmato..