ANSYS HPC for CFD Applications Release 17 - · PDF filelayering, Oil slosh modeled with VOF,...
-
Upload
trinhkhanh -
Category
Documents
-
view
217 -
download
3
Transcript of ANSYS HPC for CFD Applications Release 17 - · PDF filelayering, Oil slosh modeled with VOF,...
AgendaHigh-Performance Computing – Motivazioni
Le soluzioni ANSYS HPC
Miglioramenti delle performance HPC per ANSYS CFD R17.0
A parità di complessità del modello, ridurre i tempi di design impatto sul time to market
A parità di tempo, possibilità di studiare modelli più complessi maggiore dettaglio di conoscenza sui propri prodotti
A parità di tempo e complessità del modello, possibilità di studiare più varianti
studi parametrici con analisi delle correlazioni input/output
HPC – Motivazioni
3
Necessità di studiare modelli più accurati e/o più complessi (high fidelity) passaggio da studio di componente a studio di sistemageometrie sempre più complicate e dettagliategriglie di calcolo più fittemaggiore dettaglio di conoscenza sui propri prodottimaggiore possibilità di sviluppo dei prodotti
HPC – Motivazioni
4
Necessità di applicare modelli numerici più avanzatitransitoriturbolenzacombustionemultifaseecc.
HPC – Motivazioni
5
Necessità di provare diverse configurazionianalisi di sensitivitàottimizzazionerobust design
HPC – Motivazioni
6
HPC – Motivazioni
I risultati dell’indagine indicano un notevole ritorno dell’investimento sull’HPC: • $356 medi in ricavo per dollaro investito in HPC • $38 medi in profitti (o risparmi sui costi) per dollaro investito in HPC
Financial ROI Results
Source: IDC report “Creating Economic Models Showing the Relationship Between Investments in HPC and the Resulting Financial ROI and Innovation”; October 2013, IDC #243296, Volume: 1.
7
Qualsiasi sia la richiesta di simulazione, ANSYS HPC fornisce lacapacità di calcolo parallelo richieste per accelerare il tempo disoluzione e risolvere problemi con elevata accuratezza (high fidelity).
I solutori ANSYS in ambito meccanico, dinamica esplicita,fluidodinamico ed elettromagnetico, tra cui:
ANSYS MechanicalANSYS AutodynANSYS FluentANSYS CFXANSYS IcepakANSYS Polyflow
utilizzano tutti le stesse licenze ANSYS HPC per essere eseguiti inparallelo.
Interdisciplinarietà: unica soluzione, multi-fisica
9
ANSYS HPC Solutions at Every Scale
Efficiency onmulti-core workstations
HPC cluster appliances
Scalability onsupercomputers
pp
ooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooonnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnorrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaatk t t
Cou
rtes
y of
FC
A Ita
lyyyyy
Cou
Cou
rte
rtes
ysy yyofof
FC
AFC
AItIta
lyalyy
HPC (per processo)HPC Pack
Per un singolo utente che vuole affrontare una simulazione sulla propria workstation, un singolo ANSYS HPC Pack permette l’accelerazione del calcolo fino a 8 volte.Per utenti che hanno accesso a grandi risorse HPC, gli ANSYS HPC Packs possono essere combinati per abilitare il calcolo parallelo su centinaia, o addirittura migliaia, di cores.
HPC WorkgroupOffre la possibilità di avere grandi volumi di calcolo parallelo per migliorare la produttività degli utenti.Abilita un numero massimo totale di cores di calcolo (da 16 a 32768 sullo stesso server) al quale un team ha accesso.
HPC Parametric PackMoltiplica la disponibilità di licenze per le single applicazioni, abilitando l’esecuzione simultanea di più design points e consumando solo un set di licenze applicativo per volta (solo via ANSYS Workbench).
Le soluzioni ANSYS HPC2048
328
128512
Cores abilitati
HPC Packs per simulazione1 2 3 4 5
327688192
6 7
11
Le licenze ANSYS HPC Parametric Pack scalano lapossibilità da parte dell’utente ad eseguirecontemporaneamente più analisi parametricheall’interno di ANSYS Workbench.
Una licenza ANSYS HPC Parametric Pack consente divalutare fino a 4 design simultaneamente, senzaalcuna richiesta aggiuntiva di licenze applicativo (difatto sono moltiplicate le licenze “base”).
ANSYS HPC Parametric Pack
12
Tempo
Riduzione tempo di calcolo
Esecuzione sequenziale
(esempio: 4 design points)
Esecuzione in simultanea
dp1 dp2 dp3 dp4 ddddddddd 12
Number of Simultaneous Design Points Enabled 64
2
8
Number of HPC Parametric Pack Licenses 1
4
16
32
3 4 5
Improved Parallel Performance & Scaling – CFX 17.0Case Details:
• Airfoil • External Aerodynamic Flow • 100 M hex elements • Single Domain • Turbulent Flow
R17 vs. R16: Solution time reduced by up to 39%
@ 4096 coresScaling to 25K nodes/core
ANSYS Application Example
R17 vs. R15: >5X faster solution
@ 2048 cores
ApplicationGeneral flow
Improved Parallel Performance & Scaling – CFX 17.0Case Details:
• Automotive IC Engine Application • 146 M nodes (380M elements:
tet/prism/pyramid) • Single Domain • Turbulent Flow 32%
faster!
R17 vs. R16: 32% faster @ 4096 cores
ANSYS Application Example
Application
ApplicationMesh motion
Improved Parallel Performance & Scaling – CFX 17.0Case Details: • Full Turbine • Steady (FR) • 13 M nodes (hex)
• 256 cores 50K nodes/core • 4 Domains
• Casing, guide vanes, runner, draft tube • Turbulent Flow
R17 vs. R16: Absolute 5-10% faster
Minimal scaling change
ANSYS Application Example ApplicationTurbomachinery
Improved Parallel Performance & Scaling – CFX 17.0Case Details: • Full Turbine • Unsteady (TRS) • 13 M nodes (hex)
• 256 cores 50K nodes/core • 4 Domains
• Casing, guide vanes, runner, draft tube • Turbulent Flow
R17 vs. R16: Absolute 10-30% faster
Speed-up @ 16 compute nodes5.8X 7X
ANSYS Application Example ApplicationTurbomachinery
Improved Parallel Performance & Scaling – CFX 17.0
Background: • Particular parallel performance issue on large
partition counts
Optimized source point performance • Improved efficiency with large numbers of
source points
"GaTurbineBlade" by Tomeasy - Own work by uploader; produced with Adobe illustrator. Licensed under CC BY-SA 3.0 via Commons -https://commons.wikimedia.org/wiki/File:GaTurbineBlade.svg#/media/File:GaTurbineBlade.svg
Test case showing reduction in total CPU time when using large numbers of source points (reduction of additional
computational cost of source points by as much as 70%)
ANSYS Features & Capabilities ApplicationTurbomachinery
Improved Parallel Performance & Scaling – CFX 17.0ANSYS Features & Capabilities
Background: • Problems modeling collimated radiation such as
headlights and solar irradiation use the Monte Carlo solver. This solver needs to take full advantage of HPC potential
Enhanced Monte Carlo Radiation model • Optimized the model so that the total number of rays
(histories) remains consistent, independent of the number of core partitions
Headlights, solar irradiation • 2-pectral bands (multiband) participating
media; 5 radiation domains (2 fluid, 3 solid); 3.5 million elements of which 2.2 million radiation elements
• Specified serial histories – 10 million
ANSYS Application Example
Complex headlamp case with 10 million ray histories. Comparison when solving only radiation and energy
ApplicationRadiation
Improved Parallel Performance & Scaling – CFX 17.0ANSYS Features & Capabilities
Background: • Time to read and write files to HPC for large and
complex cases with many regions/face sets could significantly lengthen overall solution time
Optimized HPC I/O speedup • Optimization of CFX solver to HPC interface
resulted in a substantial speed-up • I/O time now nearly negligible even at 64
cores
Reduction in wall clock seconds for I/O on an example test case with many regions
I/O
Improved Parallel Performance & Scaling – Fluent 17.0
ANSYS Features & CapabilitiesBackground: • Fluent’s priority has been to deliver the best results,
not the fastest convergence
Conservative Coarsening Method default for Pressure-based Coupled Solver: • Especially helpful for native polyhedral meshes and/or
highly stretched cells
Algebraic multigrid solver now automatically reorders the linear system • Ensures proper ordering in multiple cell zones
(was limited to within a single cell zone)
No reordering Not converged >200 iterations
RCM reordering Converged in 94 iterations
No reordering
Robustness
Improved Parallel Performance & Scaling – Fluent 17.0ANSYS Features & Capabilities
Faster METIS partitioning: • Updated library and optimized algorithms deliver
significant partitioning speed-up for many larger cases, particularly those with adapted meshes
• 64-bit indexing in METIS and for partition storage to enable larger models
• Future proofed: Tested up to 2 billion cells!
Combustor: • 40% faster to partition for 8192 cores • Less than 3 minutes Truck: • 99% faster to partition for 512 cores • Just 18 seconds (versus 36 minutes!!)
ANSYS Application Examples
256 512 1024
2048
4096
16.0.0 923,1 2175,17.0.0 18,2 15,8 18,5 27,4 51,7
0,0
500,0
1000,0
1500,0
2000,0
2500,0
Auto
Par
titio
n tim
e -S
econ
ds
Truck 134M Cells
> 1 hour
4096 819216.0.0 141 29517.0.0 111 174
050
100150200250300350
Part
ition
Tim
e -S
econ
ds
Combustor 830M Cells CRAY XE6
tor 888888333333000000MMMMMM CCCCCCeeellllllllllllsss CCCCCCRRRRRRAAAAAAYYYYYY
Partitioning
Improved Parallel Performance & Scaling – Fluent 17.0ANSYS Features & Capabilities
Background: • DPM and combustion models pose challenges to
parallel performance as users attempt to load-balance flow and physics calculations
New Option: Model-Weighted Partitioning • Automatically weights multiple physics models
across the full set of processors within a specified load imbalance tolerance
• Users can select the factors and relative weightings
Oxy-Fuel Burner: • Turbulence, combustion, radiation, detailed
kinetic mechanism (25 species, 113 reactions) • 60% faster for 128 cores (Just 82 seconds)
ANSYS Application Example
32 64 128 256 512 1024Default 647,26314,59203,16112,15 65,05 37,1Load Balance 198,08150,59 82,03 61,76 34,29 22,33
0
100
200
300
400
500
600
700
Tim
e in
Sec
onds
Oxy-fuel Burner, 1.9M hex cells
Partitioning
Improved Parallel Performance & Scaling – Fluent 17.0ANSYS Features & Capabilities
Background: • Partitions need to communicate with each other. Lack
of optimization can slow performance, especially for moving/dynamic mesh cases where the neighborhood needs to be updated frequently
Neighborhood Creation Optimization: • Optimized communication algorithms and improved
interface identification for better performance and completeness
• Better identification of interfaces improves robustness
Exhaust System: • Speed-up from 1X to 30X depending on case and
number of cores 128 256 512 1024 2048 4096 819216.0.0 7,828 4,75 6,219 7,882 17,07 52,63 156,417.0.0 3,844 2,539 1,866 1,838 2,346 2,793 5,749
0
20
40
60
80
100
120
140
160
180
Tim
e in
sec
onds
Exhaust 33M Neighborhood Creation
ANSYS Application Example
Partitioning
Improved Parallel Performance & Scaling – Fluent 17.0Case Details:
• External flow over a passenger sedan • Number of cells: 4 Million • Cell Type: Mixed • Models used: Standard K- turbulence
Results obtained on Intel Xeon E5-2697v3 nodes with TrueScale InfiniBand fabric
General solver scalability improvements
ANSYS Application Example
d
ApplicationGeneral flow
Improved Parallel Performance & Scaling – Fluent 17.0
Optimized Neighborhood Creation
Case Details:
• Vehicle exhaust model • Number of cells: 33 Million • Cell Type: Mixed • Models used: SST K-omega turbulence
Results obtained on Intel Xeon E5-2697v3 nodes with TrueScale InfiniBand fabric
ANSYS Application Example
d l
ApplicationGeneral flow
Improved Parallel Performance & Scaling – Fluent 17.0
ANSYS Application Example
48 96 19216.0.0 18 14 10,8317.0.0 13,85 9,26 5,86
0
5
10
15
20
Tota
l Run
Tim
e (h
rs)
Engine Crankcase Lubrication Model
Total Run Time per One Cycle
Representative Illustration
Big speed-ups for moving dynamic mesh due to:• Neighborhood optimization• Sliding interface optimization• Parallel solver optimization
Engine Crankcase Lubrication Model: • 85% faster run time (<6 hours) • Faster than recent competitive benchmark • Crankshaft Rotation in a sliding mesh zone,
Piston motion through dynamic mesh layering, Oil slosh modeled with VOF, 5M cell Poly Mesh
kcaaaasssssseeeeee LLLLLLLLLLLuuuuubbbbbbbbbbbbrrrrrriiiiiiiiiiiiccccccaaaaaatttttttttttiiiiiiiiiiiioooooonnnnnn
ApplicationMesh motion
Improved Parallel Performance & Scaling – Fluent 17.0Case Details:
• 4-stroke spray guided Gasoline Direct Injection
• Number of cells: 2 Million • Cell Type: Mixed • Models used: Standard K- turbulence • Moving mesh, Spray, Combustion
Results obtained on Intel Xeon E5-2697v3 nodes with TrueScale InfiniBand fabric
Big speed-ups for moving dynamic mesh due to:• Neighborhood optimization• Sliding interface optimization• Parallel solver optimization• Combustion code refactoring
ANSYS Application Example ApplicationMesh motion,combustion
Improved Parallel Performance & Scaling – Fluent 17.0Case Details:
• Circulating Fluidized Bed • Number of cells: 2 Million • Cell Type: Mixed • Models used: Laminar
ANSYS Application Example
General solver scalability improvements
Gas inlet
Solid inlet
Results obtained on Intel Xeon E5-2697v3 nodes with TrueScale InfiniBand fabric
d B d
ApplicationMultiphase
Improved Parallel Performance & Scaling – Fluent 17.0Case Details:
• Wave loading on Oil Rig • Number of cells: 7 Million • Cell Type: Mixed • Models used: SST K-omega turbulence
Results obtained on Intel Xeon E5-2697v3 nodes with TrueScale InfiniBand fabric
General solver scalability improvements
ANSYS Application Example
il Ri
ApplicationMultiphase
Improved Parallel Performance & Scaling – Fluent 17.0Case Details:
• Flow through a Combustor • Number of cells: 12 Million • Cell Type: Polyhedra • Models used: Realizable K- turbulence • Species transport
Results obtained on Intel Xeon E5-2697v3 nodes with TrueScale InfiniBand fabric
ANSYS Application Example
b t
ApplicationCombustion
Improved Parallel Performance & Scaling – Fluent 17.0Case Details:
• External flow over aircraft landing gear • Number of cells: 15 Million • Cell Type: Mixed • Models used: LES
Results obtained on Intel Xeon E5-2697v3 nodes with TrueScale InfiniBand fabric
General solver scalability improvements
ANSYS Application Example
i ft l di
ApplicationAeroacoustics
Improved Parallel Performance & Scaling – Fluent 17.0Case Details:
• Single-stage Transonic axial-flow Fan
Stator Row • Number of cells: 3 Million • Cell Type: Hexahedral • Models used: SST K-omega turbulence • Unsteady (sliding interfaces)
Ref: NASA-103800
Results obtained on Intel Xeon E5-2697v3 nodes with TrueScale InfiniBand fabric
General solver scalability improvements
ANSYS Application Example
i i l fl F
ApplicationTurbomachinery
Improved Parallel Performance & Scaling – Fluent 17.0Case Details:
• Cavity flow in a centrifugal pump • Number of cells: 2 Million • Model used: Realizable K- turbulence
Results obtained on Intel Xeon E5-2697v3 nodes with TrueScale InfiniBand fabric
General solver scalability improvements
ANSYS Application Example ApplicationTurbomachinery,
multiphase
Optimized for the Latest HPC Architectures – Fluent 17.0
Case Details: • 1.2 million cell pipe benchmark
Hardware Configuration: • One node of XL250Gen9s with E5-2690v3, 128GBs 2133MHz memory and 2 NVIDIA K80s
ANSYS Application Example GPU