Introduzione alle metodologie e agli strumenti per il...

48
1 2009/05/04 Introduzione alle metodologie e agli strumenti per il Grid Corso integrativo all'insegnamento di Ingegneria del Software 2 Contatti: [email protected] (353-2789, BioLab, Villa Bonino -1) Modalità d'esame: N punti su 30 con N=0..4 Disponibilità slide on-line: Si, più avanti

Transcript of Introduzione alle metodologie e agli strumenti per il...

12009/05/04

Introduzione alle metodologie e agli strumenti per il Grid

Corso integrativo all'insegnamento di

Ingegneria del Software 2

Contatti: [email protected] (353-2789, BioLab, Villa Bonino -1)

Modalità d'esame: N punti su 30 con N=0..4

Disponibilità slide on-line: Si, più avanti

22009/05/04

Disclosure agreement

• Grid != High Performance Computing

• Grid == A large ICT infrastructure supporting large research infrastructures (LHC)

• Grid ~= Peer2Peer computing (SETI@home)

• “I have too many servers” → Utility computing

• Utility Computing + marketing ~= Cloud computing

• Grid + business models + virtualization ~= Cloud Comp.

• Summary:• Research + Physics → Grid

• Business + Marketing + Grid + (1 - Physics)→ Cloud

• In this course: Grid = rnd(Cloud, Grid, … , … , ...);

Unit 0A brief introduction to Grid

42009/05/04

Outline

• Introduction• Service Utilities• The Grid computing model

• EGEE and other EU Projects• Cloud computing

52009/05/04

Introduction

• Architectures evolution• Context size: computing, storage, network• Use cases

62009/05/04

IT Architecture evolution

• Personal Computing• Client/Server Architecture• Multi-Tier Architecture

• Server Farm• Cluster• Multi-Cluster

• High Performance Computing • Grid Computing• Peer To Peer Computing

# CPU

1-2

2-4

3-10

5-50

5-100

100-1000

thousands

>40000

millions

72009/05/04

IT dimensions

• Computing: flops/specints/mips …• Storage: bytes• Network: bit/sec

1. Computing

2. S

tora

ge

3. Netw

ork

Parte I - Introduzione

82009/05/04

IT Actors: “Plug & Use” models

• Grid Computing introduce a new use model for computing• Storage models already active (hosting/housing, storage leasing)• Network models are relevant (Internet Providers, TLC Carriers, Last-Mile

Providers)

1. Computing

2. S

tora

ge

3. Netw

ork

Internet Service Providers

Housing/Hosting Providers

Utility / Cloud Computing Providers

92009/05/04

1. Computing

• State of art: 32nm, Billions transistors, 3-4Ghz/64 bit, 4 - 8 core, 100 watt or less

• User: software• Cost: ~200€/cpu• Limits

– Existing software– Existing

programmers

• Alternative exists– Manycore/GPU

– Multicore hybrid– (Cell PS3/IBM)

Moore Law

102009/05/04

2. Storage

• State of art: 4 Terabyte (HDD), ~100 Gbyte (solid), 200Gbyte (dvdbr)

• Device: computer, tv / audio, photo / cam / embedded• Cost: 0,2-0,4€/Gbyte (hdd)

112009/05/04

3. Network

• State of art: 1gbit (eth), 12mbits(adsl), 10gbits (fiber)• Device: computer, phone Voip, video, ...• Cost: 20€/month (adsl)

• Warning: first limits are evident (ipv4!!)

• Bandwidth shaping• technologies• Network neutrality?

Gilder law

122009/05/04

… exponential growh

• Computing, every 18 months (Gordon Moore, 1964)

• Storage, every 12 months

• Available network bandwidth every 6 month (George Gilder, 1992)

132009/05/04

Growth comparison

Network is driving the market

142009/05/04

Some “killer applications” use the Peer to Peer model

• Seti@Home: 5 mln-users, 10^21 Flops• Napster: 40 mln.users, 10 petabytes di files• Skype: 200 mln-users, 7,5 mld minutes/comm

Computing

Sto

rag

e

Network

Mp3 share

Skype

Seti@Home

Pet

abyt

es

mln-PetaFlops

bln-minutes

152009/05/04

Grid Computing – The “utilities” model

• The Model• The Model in the real world

• The Model applied to the Grid

162009/05/04

Utilities model

• Network (infrastructure)mostly public and ruled control

• Resource (account and access) the object to be transferred

• Device (consume) modalities to consume resources

• Plant (production) how resources are made available

172009/05/04

Plan

t

Plan

t

Plan

t

Plan

t

Plan

t

Plan

tInfrastructure

Schema

Prodotti Prodotti Prodotti

Prodotti

Resource

Resource Resource

Resource

Resource ResourceInfrastructure

Resources

Production Plants

Infrastructure

DeviceD

evice

DeviceD

evice

Device

Device

Devices

users

providers

182009/05/04

Examples

• Electricity• Gasoline• Railways• Logistics• ...• ...• Information technology!

192009/05/04

The Power Grid

• Devices (consume)• Infrastructure (distribution)• Resources (transmitted, measured in watt)• Plant (production)

InfrastructureConsume Production

202009/05/04

Grid Computing

• Devices: software• Infrastructure = EU Grid (EGEE), middleware (gLite)• Resources (unit) = #cpu, cpu time, bytes• Plants = server farm (CERN, INFN, CNR, Universities, ...)

InfrastructureConsume Production

212009/05/04

Originated by...

• Universities unable to buy supercomputers• Share commodity and already available

resources• Everyone access to the sum of resources• End of 80's at University of Wisconsin Condor

(www.cs.wisc.edu/condor) is developed• End of 90's Ian Foster create Globus Toolkit

(www.globus.org) and Grid Computing is born

Ian Fosterianfoster.typepad.com

222009/05/04

Architecture comparison

3 reference models:

• Grid Computing (GC)standard platform with shared resources

• High Performance Computing (HPC)centralized and specialized supercomputers

• Peer To Peer Computing (P2P)vertical applications based on Desktop sharing but with very limited reliability

232009/05/04

Currently

• Several definitions for the same thing:– Grid Computing– High Throughput Computing (HTC)– MetaComputing

– Distributed Supercomputing

• Be aware of marketing people...– Sun Grid Engine: it is a scheduler...– Oracle 10G(rid): it is a distributed scalable RDBMS...

242009/05/04

Virtual Organizations (VO)

• Grid is an hybrid modelwith a bit of P2P

• Hardware is shared insideVirtual Organizations

• VO's share resources between them• VO go beyond physical boundaries of nations and real

organizations

Shared resourceVO 1 VO 2

252009/05/04

Open Source role

• gLite IS Open Source, as like as Globus and Condor• Linux is the reference operating system• Applications are mostly Open Source• The cooperation model is mutuated from Open

Source• Virtual Organization model is well supported by a

cooperative Open Source code development model• Grid is plenty of technical services for setup,

maintenance, training... there is room for a traditional Open Source business model

262009/05/04

EU investment in Grid: LHC

• Large Hadron Collider (and black hole generator) was the use case driving the EU investment in Grid

• This hard requirements generated all the currently available tools

• Pro: size of problems can be as large as currently available resources allows

• Cons: requirements from High Energy Physics

Large Hadron Collider

272009/05/04

gLite middleware

• Services and components for a distributed Grid Computing system

Data Services

File & ReplicaCatalog

MetadataCatalog

StorageElement

DataManagement

Security Services

Auditing

Authorization Authentication

Accou

ntin

g

Site P

roxy

Job Management Services

JobProvenance

PackageManager

WorkloadManagement

ComputingElement

Info. & Monitoring Services

Information & Monitoring

JobMonitoring

Access

CLI API

282009/05/04

Grid Computing – EGEE

Currently:240 sites45 countries41,000 CPUs5 PetaBytes> 10,000 users> 150 VOs> 100,000 jobs/day

Discipline d’uso:ArcheologyAstronomyAstrophysicsCivil ProtectionComp. ChemistryEarth SciencesFinanceFusionGeophysicsHigh Energy PhysicsLife SciencesMultimediaMaterial Sciences…......

292009/05/04

Production infrastructureRecent level: ~15k CPUs in continuous use

Usage doubled this last year

302009/05/04

Increasing workloads

32%

Still expect factor 5 increase for LHC experiments over next year

30EGEE'07; 2nd October 2007

312009/05/04

Use of the infrastructure

EGEE: ~250 sites, >45000 CPU

24% of the resources are contributed by groups external to the project

~>20k simultaneous jobs

31EGEE'07; 2nd October 2007

322009/05/04

Scientific Disciplines

• Disciplines: 10• Sub-disciplines: 36

11862Total

144Others

3923Life Sciences

119High-Energy Physics

32Fusion

1616Earth Science

276Computational Chemistry

82Astronomy & Astrophysics

2/20076/2006

Condensed Matter PhysicsComp. Fluid DynamicsComputer Science/ToolsCivil ProtectionFinance

332009/05/04

Usage by Scientific Discipline

• Wide (natural) differences in total CPU utilization.• Evidence of broad adoption of grid technology.

342009/05/04

Resources by Discipline

• Utilization depends on having available resources.

• See good coverage ofscientific disciplines for computing and storage resources.– Sites often have more than one

CE or SE defined.

– Number not size of resources!

334366Total306282Infra.327288Unknown149143Others

6542ES2119Fusion8357AA4125CC

123113LS299292HEP

# SEs

# CEs

352009/05/04

Active Virtual Organizations (VO)

• Number of “active” VOs growing steadily– Turnover: Diff. VOs in last 6 / 12 / 24 months = 83 / 92 / 102

– Total VOs: 104 registered, 258 visible

362009/05/04

Summary of Use

• Large, growing overall utilization

• Long-term, habitual use of infrastructure.

• Broad adoption many diverse communities

372009/05/04

Operations progress

Progress/success:• Production service, Oct ’06 to Sep ’07:

– Number of sites: ~190 => ~240 (x1.25 increase)– average number of jobs/month for preceding 12

months: 0.97 million => 2.46 million x2.5 increase)– peak number of jobs in preceding 12 months: 1.45

million (June 06) => 3.11 million (May 07) (x2.14 increase)

– number of CPUs: ~32,000 => ~46,000 (x1.44 increase)

37EGEE'07; 2nd October 2007

382009/05/04

Grid Computing – EU Projects

392009/05/04

Registered Collaborating Projects

Applicationsimproved services for academia,

industry and the public

Support Actionskey complementary functions

Infrastructuresgeographical or thematic coverage

25 projects have registered as of Sept 2007: web page

402009/05/04

EGEE working with collaborating infrastructure projects

412009/05/04

Collaborating e-Infrastructures

Potential for linking ~80 countries by 2008

422009/05/04

Where are we?

Gartner Group

Grid on the Computing in HighEnergy Physics conferences timeline

2000

2001

2003

2004

20062007

Slide courtesy of Les Robertson, LCG Project Leader

432009/05/04

Towards a European Grid Infrastructure

• A permanent infrastructure• Sustainable• Large international coordination• Public and private use

442008/07/15

Current IT trend

• Grid was intended as the answer to IT needs in terms of distributed infrastructures (processing and data management)… but the term has been overexposed– Software as a Service: The Closest Thing to Grid's Killer

Application? (http://www.gridtoday.com)

– Europe 7th Research Framework Program (2009-2010): from Grid to “Internet of Services and Things”

Grid, High Performance Computing, Clusters, High Availability, Virtualization, AJAX, Web applications, On demand applications, utility computing, IT oursorucing

Cloud computing

Software as a Service

Marketingoperation!

452008/07/15

Use cases

• IBM– Blue Cloud datacenter

• Amazon– Elastic Compute Cloud, Simple Storage Service, Simple DB

• Oracle– Oracle DB and applications aaS, on-demand services

• Google– AppEngine, Docs, BitTable (DB)

• HP– Adaptive Infrastructure aaS

• Much more:– Salesforce and about other 30 “vendors”

462008/07/15

Different problems require different skills

Problem SolutionScalable hardware infrastructure, manage computing peaks

Grid middlwares, virtualization, cloud & utility computing services. Integration of existing commercial and free tools, help in development of datacenter providing these services to support SaaS applications

Provide powerful development tools and offer user interface to end users (SaaS offering)

Abstract layer for developers, hiding different middleware and IT resources complexity. Provide a portal, modern Web 2.0 technologies, integration with libraries and toolkits (e.g. Google). Async Web applications (plenty of AJAX), Dara & metadata management, semantics and ontologies (mash-up)

Computing intensive tasks (e.g. financial real-time data analysis, statistics)

Parallelization and porting to modern architectures, clustering, computing farms. Competences on hardware (parallel architectures, many / multi core solutions) and software (high performance computing, development and porting)

472009/05/04

Real world examples

Problem Case studyScalable hardware infrastructure, manage computing peaks

Goods and people transport planning and routing (on-demand transport, large supermarket chains)

Provide powerful development tools and offer user interface to end users (SaaS offering)

End user Web services for non-tech users (health and non-health biotechnology, pharma, agro & food), data integration tools (control panels for large plant/infrastructure monitoring and management)

Computing intensive tasks (e.g. financial real-time data analysis, statistics)

High performance image processing, financial risk calculation, engineering (CAD/CAM, e.g. naval, aereo, automotive, F1), hydrometeorology (normal and extreme events forecasting)

482009/05/04

Thank you

Ivan Porro

[email protected]