ANSYS HPC for CFD Applications Release 17 - · PDF filelayering, Oil slosh modeled with VOF,...

37
ANSYS HPC for CFD Applications Release 17.0

Transcript of ANSYS HPC for CFD Applications Release 17 - · PDF filelayering, Oil slosh modeled with VOF,...

ANSYS HPCfor CFD Applications Release 17.0

AgendaHigh-Performance Computing – Motivazioni

Le soluzioni ANSYS HPC

Miglioramenti delle performance HPC per ANSYS CFD R17.0

A parità di complessità del modello, ridurre i tempi di design impatto sul time to market

A parità di tempo, possibilità di studiare modelli più complessi maggiore dettaglio di conoscenza sui propri prodotti

A parità di tempo e complessità del modello, possibilità di studiare più varianti

studi parametrici con analisi delle correlazioni input/output

HPC – Motivazioni

3

Necessità di studiare modelli più accurati e/o più complessi (high fidelity) passaggio da studio di componente a studio di sistemageometrie sempre più complicate e dettagliategriglie di calcolo più fittemaggiore dettaglio di conoscenza sui propri prodottimaggiore possibilità di sviluppo dei prodotti

HPC – Motivazioni

4

Necessità di applicare modelli numerici più avanzatitransitoriturbolenzacombustionemultifaseecc.

HPC – Motivazioni

5

Necessità di provare diverse configurazionianalisi di sensitivitàottimizzazionerobust design

HPC – Motivazioni

6

HPC – Motivazioni

I risultati dell’indagine indicano un notevole ritorno dell’investimento sull’HPC: • $356 medi in ricavo per dollaro investito in HPC • $38 medi in profitti (o risparmi sui costi) per dollaro investito in HPC

Financial ROI Results

Source: IDC report “Creating Economic Models Showing the Relationship Between Investments in HPC and the Resulting Financial ROI and Innovation”; October 2013, IDC #243296, Volume: 1.

7

LE SOLUZIONI ANSYS HPC

Qualsiasi sia la richiesta di simulazione, ANSYS HPC fornisce lacapacità di calcolo parallelo richieste per accelerare il tempo disoluzione e risolvere problemi con elevata accuratezza (high fidelity).

I solutori ANSYS in ambito meccanico, dinamica esplicita,fluidodinamico ed elettromagnetico, tra cui:

ANSYS MechanicalANSYS AutodynANSYS FluentANSYS CFXANSYS IcepakANSYS Polyflow

utilizzano tutti le stesse licenze ANSYS HPC per essere eseguiti inparallelo.

Interdisciplinarietà: unica soluzione, multi-fisica

9

ANSYS HPC Solutions at Every Scale

Efficiency onmulti-core workstations

HPC cluster appliances

Scalability onsupercomputers

pp

ooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooonnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnorrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaatk t t

Cou

rtes

y of

FC

A Ita

lyyyyy

Cou

Cou

rte

rtes

ysy yyofof

FC

AFC

AItIta

lyalyy

HPC (per processo)HPC Pack

Per un singolo utente che vuole affrontare una simulazione sulla propria workstation, un singolo ANSYS HPC Pack permette l’accelerazione del calcolo fino a 8 volte.Per utenti che hanno accesso a grandi risorse HPC, gli ANSYS HPC Packs possono essere combinati per abilitare il calcolo parallelo su centinaia, o addirittura migliaia, di cores.

HPC WorkgroupOffre la possibilità di avere grandi volumi di calcolo parallelo per migliorare la produttività degli utenti.Abilita un numero massimo totale di cores di calcolo (da 16 a 32768 sullo stesso server) al quale un team ha accesso.

HPC Parametric PackMoltiplica la disponibilità di licenze per le single applicazioni, abilitando l’esecuzione simultanea di più design points e consumando solo un set di licenze applicativo per volta (solo via ANSYS Workbench).

Le soluzioni ANSYS HPC2048

328

128512

Cores abilitati

HPC Packs per simulazione1 2 3 4 5

327688192

6 7

11

Le licenze ANSYS HPC Parametric Pack scalano lapossibilità da parte dell’utente ad eseguirecontemporaneamente più analisi parametricheall’interno di ANSYS Workbench.

Una licenza ANSYS HPC Parametric Pack consente divalutare fino a 4 design simultaneamente, senzaalcuna richiesta aggiuntiva di licenze applicativo (difatto sono moltiplicate le licenze “base”).

ANSYS HPC Parametric Pack

12

Tempo

Riduzione tempo di calcolo

Esecuzione sequenziale

(esempio: 4 design points)

Esecuzione in simultanea

dp1 dp2 dp3 dp4 ddddddddd 12

Number of Simultaneous Design Points Enabled 64

2

8

Number of HPC Parametric Pack Licenses 1

4

16

32

3 4 5

MIGLIORAMENTI DELLE PERFORMANCE HPC PERANSYS CFD R17.0

13

Improved Parallel Performance & Scaling – CFX 17.0Case Details:

• Airfoil • External Aerodynamic Flow • 100 M hex elements • Single Domain • Turbulent Flow

R17 vs. R16: Solution time reduced by up to 39%

@ 4096 coresScaling to 25K nodes/core

ANSYS Application Example

R17 vs. R15: >5X faster solution

@ 2048 cores

ApplicationGeneral flow

Improved Parallel Performance & Scaling – CFX 17.0Case Details:

• Automotive IC Engine Application • 146 M nodes (380M elements:

tet/prism/pyramid) • Single Domain • Turbulent Flow 32%

faster!

R17 vs. R16: 32% faster @ 4096 cores

ANSYS Application Example

Application

ApplicationMesh motion

Improved Parallel Performance & Scaling – CFX 17.0Case Details: • Full Turbine • Steady (FR) • 13 M nodes (hex)

• 256 cores 50K nodes/core • 4 Domains

• Casing, guide vanes, runner, draft tube • Turbulent Flow

R17 vs. R16: Absolute 5-10% faster

Minimal scaling change

ANSYS Application Example ApplicationTurbomachinery

Improved Parallel Performance & Scaling – CFX 17.0Case Details: • Full Turbine • Unsteady (TRS) • 13 M nodes (hex)

• 256 cores 50K nodes/core • 4 Domains

• Casing, guide vanes, runner, draft tube • Turbulent Flow

R17 vs. R16: Absolute 10-30% faster

Speed-up @ 16 compute nodes5.8X 7X

ANSYS Application Example ApplicationTurbomachinery

Improved Parallel Performance & Scaling – CFX 17.0

Background: • Particular parallel performance issue on large

partition counts

Optimized source point performance • Improved efficiency with large numbers of

source points

"GaTurbineBlade" by Tomeasy - Own work by uploader; produced with Adobe illustrator. Licensed under CC BY-SA 3.0 via Commons -https://commons.wikimedia.org/wiki/File:GaTurbineBlade.svg#/media/File:GaTurbineBlade.svg

Test case showing reduction in total CPU time when using large numbers of source points (reduction of additional

computational cost of source points by as much as 70%)

ANSYS Features & Capabilities ApplicationTurbomachinery

Improved Parallel Performance & Scaling – CFX 17.0ANSYS Features & Capabilities

Background: • Problems modeling collimated radiation such as

headlights and solar irradiation use the Monte Carlo solver. This solver needs to take full advantage of HPC potential

Enhanced Monte Carlo Radiation model • Optimized the model so that the total number of rays

(histories) remains consistent, independent of the number of core partitions

Headlights, solar irradiation • 2-pectral bands (multiband) participating

media; 5 radiation domains (2 fluid, 3 solid); 3.5 million elements of which 2.2 million radiation elements

• Specified serial histories – 10 million

ANSYS Application Example

Complex headlamp case with 10 million ray histories. Comparison when solving only radiation and energy

ApplicationRadiation

Improved Parallel Performance & Scaling – CFX 17.0ANSYS Features & Capabilities

Background: • Time to read and write files to HPC for large and

complex cases with many regions/face sets could significantly lengthen overall solution time

Optimized HPC I/O speedup • Optimization of CFX solver to HPC interface

resulted in a substantial speed-up • I/O time now nearly negligible even at 64

cores

Reduction in wall clock seconds for I/O on an example test case with many regions

I/O

Miglioramenti delle performance HPC per ANSYS Fluent

21

Improved Parallel Performance & Scaling – Fluent 17.0

ANSYS Features & CapabilitiesBackground: • Fluent’s priority has been to deliver the best results,

not the fastest convergence

Conservative Coarsening Method default for Pressure-based Coupled Solver: • Especially helpful for native polyhedral meshes and/or

highly stretched cells

Algebraic multigrid solver now automatically reorders the linear system • Ensures proper ordering in multiple cell zones

(was limited to within a single cell zone)

No reordering Not converged >200 iterations

RCM reordering Converged in 94 iterations

No reordering

Robustness

Improved Parallel Performance & Scaling – Fluent 17.0ANSYS Features & Capabilities

Faster METIS partitioning: • Updated library and optimized algorithms deliver

significant partitioning speed-up for many larger cases, particularly those with adapted meshes

• 64-bit indexing in METIS and for partition storage to enable larger models

• Future proofed: Tested up to 2 billion cells!

Combustor: • 40% faster to partition for 8192 cores • Less than 3 minutes Truck: • 99% faster to partition for 512 cores • Just 18 seconds (versus 36 minutes!!)

ANSYS Application Examples

256 512 1024

2048

4096

16.0.0 923,1 2175,17.0.0 18,2 15,8 18,5 27,4 51,7

0,0

500,0

1000,0

1500,0

2000,0

2500,0

Auto

Par

titio

n tim

e -S

econ

ds

Truck 134M Cells

> 1 hour

4096 819216.0.0 141 29517.0.0 111 174

050

100150200250300350

Part

ition

Tim

e -S

econ

ds

Combustor 830M Cells CRAY XE6

tor 888888333333000000MMMMMM CCCCCCeeellllllllllllsss CCCCCCRRRRRRAAAAAAYYYYYY

Partitioning

Improved Parallel Performance & Scaling – Fluent 17.0ANSYS Features & Capabilities

Background: • DPM and combustion models pose challenges to

parallel performance as users attempt to load-balance flow and physics calculations

New Option: Model-Weighted Partitioning • Automatically weights multiple physics models

across the full set of processors within a specified load imbalance tolerance

• Users can select the factors and relative weightings

Oxy-Fuel Burner: • Turbulence, combustion, radiation, detailed

kinetic mechanism (25 species, 113 reactions) • 60% faster for 128 cores (Just 82 seconds)

ANSYS Application Example

32 64 128 256 512 1024Default 647,26314,59203,16112,15 65,05 37,1Load Balance 198,08150,59 82,03 61,76 34,29 22,33

0

100

200

300

400

500

600

700

Tim

e in

Sec

onds

Oxy-fuel Burner, 1.9M hex cells

Partitioning

Improved Parallel Performance & Scaling – Fluent 17.0ANSYS Features & Capabilities

Background: • Partitions need to communicate with each other. Lack

of optimization can slow performance, especially for moving/dynamic mesh cases where the neighborhood needs to be updated frequently

Neighborhood Creation Optimization: • Optimized communication algorithms and improved

interface identification for better performance and completeness

• Better identification of interfaces improves robustness

Exhaust System: • Speed-up from 1X to 30X depending on case and

number of cores 128 256 512 1024 2048 4096 819216.0.0 7,828 4,75 6,219 7,882 17,07 52,63 156,417.0.0 3,844 2,539 1,866 1,838 2,346 2,793 5,749

0

20

40

60

80

100

120

140

160

180

Tim

e in

sec

onds

Exhaust 33M Neighborhood Creation

ANSYS Application Example

Partitioning

Improved Parallel Performance & Scaling – Fluent 17.0Case Details:

• External flow over a passenger sedan • Number of cells: 4 Million • Cell Type: Mixed • Models used: Standard K- turbulence

Results obtained on Intel Xeon E5-2697v3 nodes with TrueScale InfiniBand fabric

General solver scalability improvements

ANSYS Application Example

d

ApplicationGeneral flow

Improved Parallel Performance & Scaling – Fluent 17.0

Optimized Neighborhood Creation

Case Details:

• Vehicle exhaust model • Number of cells: 33 Million • Cell Type: Mixed • Models used: SST K-omega turbulence

Results obtained on Intel Xeon E5-2697v3 nodes with TrueScale InfiniBand fabric

ANSYS Application Example

d l

ApplicationGeneral flow

Improved Parallel Performance & Scaling – Fluent 17.0

ANSYS Application Example

48 96 19216.0.0 18 14 10,8317.0.0 13,85 9,26 5,86

0

5

10

15

20

Tota

l Run

Tim

e (h

rs)

Engine Crankcase Lubrication Model

Total Run Time per One Cycle

Representative Illustration

Big speed-ups for moving dynamic mesh due to:• Neighborhood optimization• Sliding interface optimization• Parallel solver optimization

Engine Crankcase Lubrication Model: • 85% faster run time (<6 hours) • Faster than recent competitive benchmark • Crankshaft Rotation in a sliding mesh zone,

Piston motion through dynamic mesh layering, Oil slosh modeled with VOF, 5M cell Poly Mesh

kcaaaasssssseeeeee LLLLLLLLLLLuuuuubbbbbbbbbbbbrrrrrriiiiiiiiiiiiccccccaaaaaatttttttttttiiiiiiiiiiiioooooonnnnnn

ApplicationMesh motion

Improved Parallel Performance & Scaling – Fluent 17.0Case Details:

• 4-stroke spray guided Gasoline Direct Injection

• Number of cells: 2 Million • Cell Type: Mixed • Models used: Standard K- turbulence • Moving mesh, Spray, Combustion

Results obtained on Intel Xeon E5-2697v3 nodes with TrueScale InfiniBand fabric

Big speed-ups for moving dynamic mesh due to:• Neighborhood optimization• Sliding interface optimization• Parallel solver optimization• Combustion code refactoring

ANSYS Application Example ApplicationMesh motion,combustion

Improved Parallel Performance & Scaling – Fluent 17.0Case Details:

• Circulating Fluidized Bed • Number of cells: 2 Million • Cell Type: Mixed • Models used: Laminar

ANSYS Application Example

General solver scalability improvements

Gas inlet

Solid inlet

Results obtained on Intel Xeon E5-2697v3 nodes with TrueScale InfiniBand fabric

d B d

ApplicationMultiphase

Improved Parallel Performance & Scaling – Fluent 17.0Case Details:

• Wave loading on Oil Rig • Number of cells: 7 Million • Cell Type: Mixed • Models used: SST K-omega turbulence

Results obtained on Intel Xeon E5-2697v3 nodes with TrueScale InfiniBand fabric

General solver scalability improvements

ANSYS Application Example

il Ri

ApplicationMultiphase

Improved Parallel Performance & Scaling – Fluent 17.0Case Details:

• Flow through a Combustor • Number of cells: 12 Million • Cell Type: Polyhedra • Models used: Realizable K- turbulence • Species transport

Results obtained on Intel Xeon E5-2697v3 nodes with TrueScale InfiniBand fabric

ANSYS Application Example

b t

ApplicationCombustion

Improved Parallel Performance & Scaling – Fluent 17.0Case Details:

• External flow over aircraft landing gear • Number of cells: 15 Million • Cell Type: Mixed • Models used: LES

Results obtained on Intel Xeon E5-2697v3 nodes with TrueScale InfiniBand fabric

General solver scalability improvements

ANSYS Application Example

i ft l di

ApplicationAeroacoustics

Improved Parallel Performance & Scaling – Fluent 17.0Case Details:

• Single-stage Transonic axial-flow Fan

Stator Row • Number of cells: 3 Million • Cell Type: Hexahedral • Models used: SST K-omega turbulence • Unsteady (sliding interfaces)

Ref: NASA-103800

Results obtained on Intel Xeon E5-2697v3 nodes with TrueScale InfiniBand fabric

General solver scalability improvements

ANSYS Application Example

i i l fl F

ApplicationTurbomachinery

Improved Parallel Performance & Scaling – Fluent 17.0Case Details:

• Cavity flow in a centrifugal pump • Number of cells: 2 Million • Model used: Realizable K- turbulence

Results obtained on Intel Xeon E5-2697v3 nodes with TrueScale InfiniBand fabric

General solver scalability improvements

ANSYS Application Example ApplicationTurbomachinery,

multiphase

Optimized for the Latest HPC Architectures – Fluent 17.0

Case Details: • 1.2 million cell pipe benchmark

Hardware Configuration: • One node of XL250Gen9s with E5-2690v3, 128GBs 2133MHz memory and 2 NVIDIA K80s

ANSYS Application Example GPU

Optimized for the Latest HPC Architectures – Fluent 17.0

Case Details: • 9.6 million cell pipe benchmark

Hardware Configuration: • Cluster of XL250Gen9s with E5-2690v3, 128GBs 2133MHz memory and 2 NVIDIA K80s/node

ANSYS Application Example GPU