Data Management for the RedisDG Scienti c Work ow Engine

30
Data Management for the RedisDG Scientic Workow Engine Per3S (Performance and Scalability of Storage Systems) { DDN Leila Abidi, Souha Bejaoui, Christophe Cerin, Jonathan Lejeune, Yanik Ngoko, Walid Saad Laboratoire d’Informatique de Paris Nord Jan 30, 2017 1 Leila Abidi, Souha Bejaoui, Christophe Cerin, Jonathan Lejeune, Yanik Ngoko, Walid Saad Data Management for the RedisDG Scientic Workow Engine

Transcript of Data Management for the RedisDG Scienti c Work ow Engine

Page 1: Data Management for the RedisDG Scienti c Work ow Engine

Data Management for the RedisDGScientific Workflow Engine

Per3S (Performance and Scalability of Storage Systems) –DDN

Leila Abidi, Souha Bejaoui, Christophe Cerin, Jonathan Lejeune,Yanik Ngoko, Walid Saad

Laboratoire d’Informatique de Paris Nord

Jan 30, 2017

1 Leila Abidi, Souha Bejaoui, Christophe Cerin, Jonathan Lejeune, Yanik Ngoko, Walid SaadData Management for the RedisDG Scientific Workflow Engine

Page 2: Data Management for the RedisDG Scienti c Work ow Engine

2 / 30

Table of Contents

1 Elements of (local) context

2 Problem definition

3 Contributions

4 Conclusion - Future work

2 Leila Abidi, Souha Bejaoui, Christophe Cerin, Jonathan Lejeune, Yanik Ngoko, Walid SaadData Management for the RedisDG Scientific Workflow Engine

Page 3: Data Management for the RedisDG Scienti c Work ow Engine

3 / 30

Digital platform: http://cirrus.uspc.fr

CIRRUS TODAY (SaaS)

MAGI

S-CAPAD

CUMULUS1) VM install2) Softfare install inside VM3) Use Software4) Network of VMs (soon)

1) Request Software install2) Reserve nodes3) Send job4) Get results

USER SYS ADMIN

3 Leila Abidi, Souha Bejaoui, Christophe Cerin, Jonathan Lejeune, Yanik Ngoko, Walid SaadData Management for the RedisDG Scientific Workflow Engine

Page 4: Data Management for the RedisDG Scienti c Work ow Engine

4 / 30

Disruptive industrial projects (french technology)

SlapOS cloud (http://www.slapos.org): open sourceStarted in 2010. . . VM was an option (not mandatory); Basedon Linux LXC ;A conceptual view based on 3 ingredients:

One ERP (to manage the catalog of applications and thecustomer relations);A model for the deployment;Nodes;

Figure: SlapOS architecture4 Leila Abidi, Souha Bejaoui, Christophe Cerin, Jonathan Lejeune, Yanik Ngoko, Walid SaadData Management for the RedisDG Scientific Workflow Engine

Page 5: Data Management for the RedisDG Scienti c Work ow Engine

5 / 30

Disruptive industrial projects (french technology)

http://www.qarnot-computing.com → The Q.rad is a smartand connected digital heater, fusion of an electrical heater anda high-performance computing server;

Q.rad are located at home. . . and produce heat bycomputation;

Figure: Qarnot Computing platform

5 Leila Abidi, Souha Bejaoui, Christophe Cerin, Jonathan Lejeune, Yanik Ngoko, Walid SaadData Management for the RedisDG Scientific Workflow Engine

Page 6: Data Management for the RedisDG Scienti c Work ow Engine

6 / 30

Innovative industrial projects (french technology)

Fully automated, the Q.ware platform is in charge of :

Optimal computing node selection (Q.rads, data center, IaaS,cloudlets)Client payload/container boot sequenceInput data distribution and output results collectionProcessors frequencies adjustmentJob management (progress control, logging, automaticrecovery)

Industrial talk for EuroPar 2016 conference: Heating as acloud-service, a position paper, Yanik Ngoko (QarnotComputing) ⇒ scheduling scientific issues.

6 Leila Abidi, Souha Bejaoui, Christophe Cerin, Jonathan Lejeune, Yanik Ngoko, Walid SaadData Management for the RedisDG Scientific Workflow Engine

Page 7: Data Management for the RedisDG Scienti c Work ow Engine

7 / 30

1 Elements of (local) context

2 Problem definition

3 Contributions

4 Conclusion - Future work

7 Leila Abidi, Souha Bejaoui, Christophe Cerin, Jonathan Lejeune, Yanik Ngoko, Walid SaadData Management for the RedisDG Scientific Workflow Engine

Page 8: Data Management for the RedisDG Scienti c Work ow Engine

8 / 30

Scientific computing before/after the cloud

8 Leila Abidi, Souha Bejaoui, Christophe Cerin, Jonathan Lejeune, Yanik Ngoko, Walid SaadData Management for the RedisDG Scientific Workflow Engine

Page 9: Data Management for the RedisDG Scienti c Work ow Engine

9 / 30

Scientific computing before/after the cloud

9 Leila Abidi, Souha Bejaoui, Christophe Cerin, Jonathan Lejeune, Yanik Ngoko, Walid SaadData Management for the RedisDG Scientific Workflow Engine

Page 10: Data Management for the RedisDG Scienti c Work ow Engine

10 / 30

Scheduling DAGs

MONTAGE: experiments on a large instance ⇒ 9423 input files(including the intermediary files) and the workflow generates 2889files (including the intermediary files).

10 Leila Abidi, Souha Bejaoui, Christophe Cerin, Jonathan Lejeune, Yanik Ngoko, Walid SaadData Management for the RedisDG Scientific Workflow Engine

Page 11: Data Management for the RedisDG Scienti c Work ow Engine

11 / 30

State of the art

Scheduling:

Theoretical foundation: to think about what we have to do.Sometimes too abstract. Often related to the performancemetric only in a context of ”everything is known” in advance;Heuristics: to solve concrete problems. . . those with influencingparameters identified by some experiments;Our heuristics are ”natural”. Goes to simplicity. The novelty:ways we manage them and interaction between components;

Tools/Middleware:

BOINC, Condor, XtremWeb, BonjourGrid;OpenAlea for plant phenomics(https://succes2016.sciencesconf.org/data/Infraphenogrid pradal.pdf):DIET (http://graal.ens-lyon.fr/diet/?page id=551)SWIFT (http://swift-lang.org/)D. Talia survey (https://www.hindawi.com/journals/isrn/2013/404525/)

11 Leila Abidi, Souha Bejaoui, Christophe Cerin, Jonathan Lejeune, Yanik Ngoko, Walid SaadData Management for the RedisDG Scientific Workflow Engine

Page 12: Data Management for the RedisDG Scienti c Work ow Engine

12 / 30

State of the art

Scheduling:

Theoretical foundation: to think about what we have to do.Sometimes too abstract. Often related to the performancemetric only in a context of ”everything is known” in advance;Heuristics: to solve concrete problems. . . those with influencingparameters identified by some experiments;Our heuristics are ”natural”. Goes to simplicity. The novelty:ways we manage them and interaction between components;

Tools/Middleware:

BOINC, Condor, XtremWeb, BonjourGrid;OpenAlea for plant phenomics(https://succes2016.sciencesconf.org/data/Infraphenogrid pradal.pdf):DIET (http://graal.ens-lyon.fr/diet/?page id=551)SWIFT (http://swift-lang.org/)D. Talia survey (https://www.hindawi.com/journals/isrn/2013/404525/)

12 Leila Abidi, Souha Bejaoui, Christophe Cerin, Jonathan Lejeune, Yanik Ngoko, Walid SaadData Management for the RedisDG Scientific Workflow Engine

Page 13: Data Management for the RedisDG Scienti c Work ow Engine

13 / 30

Our general thesis

Building systems for heterogeneous and highly dynamicenvironments we need to be compliant with:

1. a publish-subscribe layer for the orchestration of thecomponents of the system;

2. a set of opportunistic strategies for allocating work/tasks thatare also based on the publish-subscribe layer;

3. a small number of software dependencies for the system andthe ability to deploy the system and its applications ondemand. This point is of particular interest in this paper andwe promote the ”easy to use”, and systems that can bedeployed without a system administrator.

13 Leila Abidi, Souha Bejaoui, Christophe Cerin, Jonathan Lejeune, Yanik Ngoko, Walid SaadData Management for the RedisDG Scientific Workflow Engine

Page 14: Data Management for the RedisDG Scienti c Work ow Engine

14 / 30

1 Elements of (local) context

2 Problem definition

3 Contributions

4 Conclusion - Future work

14 Leila Abidi, Souha Bejaoui, Christophe Cerin, Jonathan Lejeune, Yanik Ngoko, Walid SaadData Management for the RedisDG Scientific Workflow Engine

Page 15: Data Management for the RedisDG Scienti c Work ow Engine

15 / 30

Contributions (a step forward)

WaaS (Workflow as a Service):

Computing infrastructure easy to deploy (on demand) in acloud;Easy to use for everyone (expert level not required);

Based on the RedisDG framework:

The initial framework has been formally proved;To play with opportunistic scheduling;To play with the Publication/Subscription paradigm for theinteractions between the component of the RedisDG system;Specifying a distributed system according to thePublication/Subscription paradigm is challenging (and notfrequent. . . even with Google Pub-Sub :-)

15 Leila Abidi, Souha Bejaoui, Christophe Cerin, Jonathan Lejeune, Yanik Ngoko, Walid SaadData Management for the RedisDG Scientific Workflow Engine

Page 16: Data Management for the RedisDG Scienti c Work ow Engine

16 / 30

Opportunistic scheduling

Many modern computing platforms (clouds, desktop grids,and volunteer-computing projects) exhibit extreme levels ofdynamic heterogeneity (availability and relative efficiencies);

When an event occurs ⇒ decide on the ’best think’ to dobased on the knowledge available at this time;

Event Service

Notify()Subscribe()

Unsubscribe()

Storageand managementof subscriptions

Publish

Publish

Publisher

Publisher

Publisher

Publisher

Subscribe

Un-subscribe

Notify

SubscriberNotify()

SubscriberNotify()

SubscriberNotify()

SubscriberNotify()

16 Leila Abidi, Souha Bejaoui, Christophe Cerin, Jonathan Lejeune, Yanik Ngoko, Walid SaadData Management for the RedisDG Scientific Workflow Engine

Page 17: Data Management for the RedisDG Scienti c Work ow Engine

17 / 30

Opportunistic scheduling

Many modern computing platforms (clouds, desktop grids,and volunteer-computing projects) exhibit extreme levels ofdynamic heterogeneity (availability and relative efficiencies);

When an event occurs ⇒ decide on the ’best think’ to dobased on the knowledge available at this time;

Event Service

Notify()Subscribe()

Unsubscribe()

Storageand managementof subscriptions

Publish

Publish

Publisher

Publisher

Publisher

Publisher

Subscribe

Un-subscribe

Notify

SubscriberNotify()

SubscriberNotify()

SubscriberNotify()

SubscriberNotify()

17 Leila Abidi, Souha Bejaoui, Christophe Cerin, Jonathan Lejeune, Yanik Ngoko, Walid SaadData Management for the RedisDG Scientific Workflow Engine

Page 18: Data Management for the RedisDG Scienti c Work ow Engine

18 / 30

Platform-oblivious vs Platform aware OpportunisticScheduling

Theoretical point of view (Arnold L. Rosenberg, EuroPar2016) ⇒ Opportunistic dag-Execution via Platform-ObliviousScheduling:

One always benefits computationally with dag-structuredworkflows by enhancing the likelihood of having as manyeligible chores as possible.Such scheduling enhances the likelihood of having workavailable as (advantageous) resources become available, hencebeing able to exploit resources opportunistically.

In the concrete life: network bandwidth, hierarchicalarchitecture or heterogeneous computing nodes (CPU, GPU,FPGA). . .

18 Leila Abidi, Souha Bejaoui, Christophe Cerin, Jonathan Lejeune, Yanik Ngoko, Walid SaadData Management for the RedisDG Scientific Workflow Engine

Page 19: Data Management for the RedisDG Scienti c Work ow Engine

19 / 30

Platform-oblivious vs Platform aware OpportunisticScheduling

Theoretical point of view (Arnold L. Rosenberg, EuroPar2016) ⇒ Opportunistic dag-Execution via Platform-ObliviousScheduling:

One always benefits computationally with dag-structuredworkflows by enhancing the likelihood of having as manyeligible chores as possible.Such scheduling enhances the likelihood of having workavailable as (advantageous) resources become available, hencebeing able to exploit resources opportunistically.

In the concrete life: network bandwidth, hierarchicalarchitecture or heterogeneous computing nodes (CPU, GPU,FPGA). . .

19 Leila Abidi, Souha Bejaoui, Christophe Cerin, Jonathan Lejeune, Yanik Ngoko, Walid SaadData Management for the RedisDG Scientific Workflow Engine

Page 20: Data Management for the RedisDG Scienti c Work ow Engine

20 / 30

Interest and implications of our proposal

From the ”interest” point of view: volunteering architecturesare very interesting for HTC problems:

However, HTC often generate a lot of data whereasvoluntering architectures are based on low bandwidth networks⇒ minimizing transfer costs is very relevant.

From the ”implications” point of view:

Do we need to always wait for the response of all workers?Do we need to transfer data to a central node at the end ofthe calculation?What about replication of jobs?Time to deploy the infrastructure? As a service?

20 Leila Abidi, Souha Bejaoui, Christophe Cerin, Jonathan Lejeune, Yanik Ngoko, Walid SaadData Management for the RedisDG Scientific Workflow Engine

Page 21: Data Management for the RedisDG Scienti c Work ow Engine

21 / 30

Interest and implications of our proposal

From the ”interest” point of view: volunteering architecturesare very interesting for HTC problems:

However, HTC often generate a lot of data whereasvoluntering architectures are based on low bandwidth networks⇒ minimizing transfer costs is very relevant.

From the ”implications” point of view:

Do we need to always wait for the response of all workers?Do we need to transfer data to a central node at the end ofthe calculation?What about replication of jobs?Time to deploy the infrastructure? As a service?

21 Leila Abidi, Souha Bejaoui, Christophe Cerin, Jonathan Lejeune, Yanik Ngoko, Walid SaadData Management for the RedisDG Scientific Workflow Engine

Page 22: Data Management for the RedisDG Scienti c Work ow Engine

22 / 30

The problem we solve

Problem definition: given a DAG and a of queue for requests,find an allocation such as a performance criteria (time, energy,load. . . ) is minimized/maximized;

In this paper: data aware approaches for opportunisticscheduling in order to minimize the execution time;

Implementation: RedisDG framework

22 Leila Abidi, Souha Bejaoui, Christophe Cerin, Jonathan Lejeune, Yanik Ngoko, Walid SaadData Management for the RedisDG Scientific Workflow Engine

Page 23: Data Management for the RedisDG Scienti c Work ow Engine

23 / 30

The problem we solve

Problem definition: given a DAG and a of queue for requests,find an allocation such as a performance criteria (time, energy,load. . . ) is minimized/maximized;

In this paper: data aware approaches for opportunisticscheduling in order to minimize the execution time;

Implementation: RedisDG framework

23 Leila Abidi, Souha Bejaoui, Christophe Cerin, Jonathan Lejeune, Yanik Ngoko, Walid SaadData Management for the RedisDG Scientific Workflow Engine

Page 24: Data Management for the RedisDG Scienti c Work ow Engine

24 / 30

Centralized:Decentralized management

Centralization versus Decentralization

24 Leila Abidi, Souha Bejaoui, Christophe Cerin, Jonathan Lejeune, Yanik Ngoko, Walid SaadData Management for the RedisDG Scientific Workflow Engine

Page 25: Data Management for the RedisDG Scienti c Work ow Engine

25 / 30

Heuristics for minimizing the transfer of data

Input Number (IN)

=⇒ score(Wj ,Ti ) =∑

p∈Pred(i,j) card(ITi∩ Op)

Input Size (IS)

=⇒ score(Wj ,Ti ) =∑

p∈Pred(i,j)

∑f∈(ITi

∩Op) size(f )

25 Leila Abidi, Souha Bejaoui, Christophe Cerin, Jonathan Lejeune, Yanik Ngoko, Walid SaadData Management for the RedisDG Scientific Workflow Engine

Page 26: Data Management for the RedisDG Scienti c Work ow Engine

26 / 30

Heuristics for controlling a fairness principle

Based on observations and to reduce the effect of monopolizing thetasks by fewer workers, accentuated by both previous heuristics:

Fair Root Input Number (FRIN): for tasks on the first level; basedon IN;

Fair Root Input Size (FRIS): for tasks on the first level; based on IS;

Fair: generalization of the fair distribution to all the levels of thetasks graph:

Balance the number of tasks performed by each workerindependently of the data ⇒ emphasize that the reduction ofthe execution time in the FRIN and FRIS heuristics is not onlydue to equity but to the Combination of equity and data

26 Leila Abidi, Souha Bejaoui, Christophe Cerin, Jonathan Lejeune, Yanik Ngoko, Walid SaadData Management for the RedisDG Scientific Workflow Engine

Page 27: Data Management for the RedisDG Scienti c Work ow Engine

27 / 30

Experimental results (sample)

Homogeneous physical machines; (1+4/1 – 1+4/3 for theDocker case)

MONTAGE workflow (NASA);

Testbed: https://www.grid5000.fr/ – Large-scale and versatiletestbed for experiment-driven research in all areas of computerscience, with a focus on parallel and distributed computingincluding Cloud, HPC and Big Data.

C-FIFO D-FIFO IN IS FRIN FRIS FD

Data (s) 675 474 374 355 383 404 549

Exe (min) 8:39 7:40 6:21 6:44 6:01 6:28 7:30

27 Leila Abidi, Souha Bejaoui, Christophe Cerin, Jonathan Lejeune, Yanik Ngoko, Walid SaadData Management for the RedisDG Scientific Workflow Engine

Page 28: Data Management for the RedisDG Scienti c Work ow Engine

28 / 30

1 Elements of (local) context

2 Problem definition

3 Contributions

4 Conclusion - Future work

28 Leila Abidi, Souha Bejaoui, Christophe Cerin, Jonathan Lejeune, Yanik Ngoko, Walid SaadData Management for the RedisDG Scientific Workflow Engine

Page 29: Data Management for the RedisDG Scienti c Work ow Engine

29 / 30

Conclusion - Future work

Data aware scheduling and opportunistic scheduling schema;Implementation in the RedisDG framework; Experiments;

Work done (since the submission):

Introduction of Docker containers + Introduction ofheterogeneity of hardware;Introduction of decisions policies based on the load or theenergy consumption or the ’fairness’;

Current work:

Multicritera approaches (including the availability of nodes) ⇒HPC in the cloud;Predictor(s): load, network bandwidth. . . according to MLtechniques;Modelization of the coupling of different execution models(one node of the DAG may be unfold dynamically) -Provenance/Data life cycle (ActiveData).

29 Leila Abidi, Souha Bejaoui, Christophe Cerin, Jonathan Lejeune, Yanik Ngoko, Walid SaadData Management for the RedisDG Scientific Workflow Engine

Page 30: Data Management for the RedisDG Scienti c Work ow Engine

30 / 30

Data Management for the RedisDGScientific Workflow Engine

Per3S (Performance and Scalability of Storage Systems) –DDN

Leila Abidi, Souha Bejaoui, Christophe Cerin, Jonathan Lejeune,Yanik Ngoko, Walid Saad

Laboratoire d’Informatique de Paris Nord

Jan 30, 2017

30 Leila Abidi, Souha Bejaoui, Christophe Cerin, Jonathan Lejeune, Yanik Ngoko, Walid SaadData Management for the RedisDG Scientific Workflow Engine