Post on 30-May-2018
8/14/2019 Rend. Sem. Mat. Univ. Poi. Torino Fascicolo
1/22
Rend. Sem. Mat. Univ. Poi. TorinoFascicolo Speciale 1991
Numerical Methods
Z. Johan - T.J.R. Hughes - F. Shakib
A MATRIX-FREE IMPLICIT ITERATIVESOLVER FOR COMPRESSIBLE FLOW PROBLEMS
Abstrac t . A procedure for soving noninear time-marching problems is presented. Thenonsymmetric systems of equations arising from Newton-type linearizations are sol-ved using an iterativestrategy based on the Generalized Minimal RESidual (GMRES)algorithm. Matrix-free techniques eading to reduction in Storage are presented. In-corporation of a linesearch algorithm in the Newton-GMRES scheme is discussed. Anautomatic time-incrcment control strategy is developed to increase the stability of thetime-marching process. High-speed flow computations demonstrate the effectiveness
of these agorithms.
1. Introduction
Over the years, several solvers have been introduced for obtaining steadysolutions to the time-dependent Euler and Navier-Stokes equations. Until re-cently, explicit time-marching solvers have dominated in the CFD community
because of their simplicity and their low Storage requirements. Unfortunately,the time incrernent limitation associated with these solvers leads to very costlycomputations. The role of implicit time-marching solvers has become increa-singly important and has gained popularity by the introduction of efficientiterative techniques. Our objective is to develop a robust solver within theframework of a space-time finite element methodology requiring little Storage.
An outline of the paper follows: In Section 2, we present a genericimplicit time-marching solution algorithm. Some matrix-free solution techniques eading to a substantial reduction in Storage are introduced in Section 3.
8/14/2019 Rend. Sem. Mat. Univ. Poi. Torino Fascicolo
2/22
142
Linesearch and automatic time-increment control algorithms which increasethe robustness of the time-marching scheme are described in Sections 4 and5, respectively. Numerical experiments in Section 6 illustrate the proposed
algorithms. Finally, conclusions are drawn in Section 7.
2. Implicit Ti me- Marc hi ng Solution Algori thm
At each discrete timet n , space-time finite element discretization of thecompressible Navier-Stokes equations leads to the following nonlinear problem[8,9]:
Given the solution vector v^n-i)at time t n-u and a time increment At, find the solution vector v ai time t n , which satisfies the nonlinear system of equations
G(v; v(n_1)f A*) = 0 (1)
G is a system of nonlinear functionals of v and of parameters V(ni) and At. This system is solved for v by performing a succession of linearizations,leading to Newton's algorithm. This is done via a truncated Taylor seriesexpansion of G:
G(v
8/14/2019 Rend. Sem. Mat. Univ. Poi. Torino Fascicolo
3/22
143
In general, the consstent Jacobian in (6) is replaced by a Jacobian-like matrix J (see Section 3.1 for the choice of J). Therefore, the linear systems of equations that need to be solved read
J ^ = -R< 0 (7)
Hence, the time-marching procedure is performed by looping over thediscrete timest n. At each of these times, the time incrementAt is givenby a time-increment control strategy presented in Section 5, followed by thesolution of the nonlinear system of equations (1) as described above. Thecomplete algorithm is summarized in Box 1. In this algorithm,Nmax is themaximum number of time steps; max is the maximum number of Newton
iterations; and V(0) is the initial guess of the solution. A detailed studyof time-marching schemes for space-time finite element methods is given inShakib [8].
Box 1 - Implicit Time-marching Solution Algorithm.
Given N max , t'max and V(0), proceed as follows:
(Loop over time)For n = l,2,...,JVmax
(Time-increment control)Choose the time incrementi(Newton-type algorithm)~(0)
For i = 0,l,...,tmax - 1Solve jWpM = -R{i) for p ( 0Compute the linesearch coeflcient A^ + 0 ^ v(0 +ApW
v() - v('
)
3. Matrix-free Solution Techniques
The system of equations presented (7) is solved using the GeneralizedMinimal RESidual (GMItES) algorithm; see Saad and Schultz [7] and Shakib
et al. [9] for a detailed description and some applications. One advantage of
8/14/2019 Rend. Sem. Mat. Univ. Poi. Torino Fascicolo
4/22
144
this iterative algorithm is t'hat the matrixJ need not be explicitly formed andstored, but instead a procedure for computing the matrix-vector productJucan be used. The coupling of Newton's algorithm and the GMRES algorithmwill be referred to as the Newton-GMRES algorithm in the remainder of thispaper.
3.1 - Choice of the Left-hand-side Matrix
Use of the consistent Jacobian J as the left-hand-side matrix yieldsin general the fastest convergence to the steady-state solution, provided theinitial guess is within the radius of convergence of Newton's algorithm. Ex-perience, however, has shown that for most fluid dynamics problems the useof this matrix leads to very poor stability in the initial phase of the time-marching convergence. One possible way to circumvent this instability is tochoose J to be the frozen-coefficient Jacobian for the first few iterationsof the time-marching problem. The frozen-coefficient Jacobian is derived byassuming a frozen field, that is the advective and diffusive matrices in theNavier-Stokes equations are assumed Constant in the process of differentia-ting the residuai. We have noticed that the frozen-coefficient Jacobian resultsin a more stable time-marching algorithm, even for an initial guess which isfar from the steady-state solution. Once the solution is sufficiently dose tothe steady-state solution, one can switch to the consistent Jacobian for rapidconvergence.
3.2 - Matrix-vector Products
The classical way of performing the matrix-vector products in an iterative procedure, sudi as in the GMRES algorithm, is to initially compute theentries of the left-hand-side matrixJ and store them in core memory. Then,the matrix-vector products can be performed explicitly. The use of this tech-nique in the GMRES algorithm is often referred to as the"linear" GMRESalgorithm. Unfortunately, the nonlinearities in the Navier-Stokes equationsmake the explicit computation of the consistent Jacobian very difficult and/orCPU-intensive. Only the components of the frozen-coefficient Jacobian canbe evaluated at a reasonable cost. The Storage requirement for large problemsis a drawback of this technique. Nevertheless, the linear GMRES algorithmused with the frozen-coefficient Jacobian has been used successfully in the
past and will serve as a reference for the numerical experiments.
8/14/2019 Rend. Sem. Mat. Univ. Poi. Torino Fascicolo
5/22
145
Since the matrixJ is a Jacobian-like matrix, it is possible to defne aresidual-like vector% such that
J(v)u = hm ** (8)
for agiven vector 5.il can be either the residuai fi, in which caseJ is theconsistent Jacobian, or a modifed residuai, in which case the left-hand-sidematrix is the frozen-coefTcient Jacobian. In the latter case,H is evaluated byusing a frozen field. As for the frozen-coefrcient Jacobian, the advective anddiffusive matrices are always computed at v, even for the vectorH(v-\- eu).
A formula for evaluating the matrix-vector productJ(v)u is derivedfrom (8) by choosing a small and dropping the limit, viz.,
J(v) u * '- ^ - (9)
The choice of e is criticai since sufficient accuracy is needed in evaluatingthe matrix-vector products using (9). A criterion for determininge0pt isto minimize the error introduced when computing the forward difference ap-proximation (8). Gillet al. [5] have derived eQpt in the case of a scalarunivariate function and their work can be easily extended to the case of amult ivar iate vector function. This technique does not require any knowledgeof the components of the left-hand-side matrix and it saves a substantialamount of Storage. Its implementation into GMRES will be referred to as the"matrix-free" GMRES algorithm.
3.3 - Scaling
A scaling transformation is required to nondimensionalize and at thesanie time improve the conditioning of the linear systemJp= -R. Shakibet al. [9] have studied different scaling transformations for the problems under consideration and have shown that a nodal block-diagonal scaling is veryeffcient. Unfortunately, computing the nodal blocks of J using a finitedifference stencil of the form (9) is too expensive. It is, however, possibleto get an analytical expression for the nodal blocks of the frozen-coefficientJacobian. It is also possible to evaluate them at the same time as the residuai R in order to minimize the cost. Although these nodal blocks are only anapproximation to the nodal blocks of the consistent Jacobian, experience hasshown that they perforili almost as well.
8/14/2019 Rend. Sem. Mat. Univ. Poi. Torino Fascicolo
6/22
146
Defme W to be thenodal block-diagonal matrix of thefrozen-coefficientJacobian. SinceW is symmetric positive-definitefor the problems consideredbere, it accommodates a Cholesky factorization,denoted as
r -
w= uT u (io)The scaled system of equations can be written as
Jp=-It (11)
with J = U~T J U"1 , p=p , R = U~T R (12)
Moreover, defme the scaled residual-like vector 7J(v) asft( v) = U~ r K{U~ lv) = U~ T n(v) (13)
wherev=Uv (14)
The finite difTerence approximation (9) reads
j K(y + ep)-K(v) _U- T K(U-\v + ep))-U-T J(U-\v))e e
To take full advantage of the scaling in finite precision arithmetic,J p shouldbe evaluated by the second equality in (15) as it is presented, i.e., with nosimplification. In particular the terniU (v + ep) should not be replacedby v+ep. Numerica! experiments have shown a substantial degradation inconvergence whenJ p was not implemented as in (15).
4. Linesearch Algorithm
In order to increase the robustness of the Newton-GMRES algorithm, acontrol over the step length is required. Several techniques for unconstrainedoptimization have been studied in recent years; see Dennis and Schnabel [2]and Brown and Saad [1] for overviews of these techniques. Foilowing theideas suggested in [2], we consider a linesearch backtracking algorithm. Theidea is to determine the step length by performing the minimization of a
uni variate function representi ng a measure of the residuai. We give here an
8/14/2019 Rend. Sem. Mat. Univ. Poi. Torino Fascicolo
7/22
147
algorithmic description of the method and we refer to the above references forthe theoretical details.
Defne the functions
/(v)=Jfi(vff"1 jR(v) and
/(A) = / (v + Ap) (17)
where v is the current solution;p is the step given by the Newton-GMRES algorithm; and A is the linesearch parameter. The matrix [0 J =block-diag(A0 ,...,0 ) can be viewed as a metric tensor for computing thenomi of R as in (16). For the problems under consideration, (16) is simi-lar to energy norms used in structural analysis. It is important to note thatcomputation of the Euclidian norm of R leads to dimensionai inconsistencyfor general defnitions of R.
We bave trivially
/(0) =f(v) (18) /( l ) = / (v + p) (19)
Moreover, the first derivative of / atA = 0 is
/'(0) = V/(vfp = R(vf \~0l \ J(v)p (20)
Recali that J(v) is the Jacobian of R(v). The matrix-vector product J(v)pis computed via the centrai finite difference approximation
5 f v ). ^ R(~y+S p)-RCy-sp) (21)
with e = (M)1 / 3 /\\T>\\- The approximation (21) yields higher accuracy for theevaluation of the matrix-vector product than the forward difference stendi of (9). The choice of e is also greatly simplified. Ilowever, this approximationis tvvice as expensive as the forward difference scheme.
The objective of the linesearch method is to find A G]0,1] such thatthe Goldstein-Armijo condtion
/( A)
8/14/2019 Rend. Sem. Mat. Univ. Poi. Torino Fascicolo
8/22
148
showed that p is always a descent direction when it is the solution of theNewton-GMRES iteration with the consistent Jacobian. However, use of thefrozen-coeflcient Jacobian does not always yield a descent direction. If p isfound not to be a descent direction, a wise strategy is to repeat the time stepwith a smaller time incrementAt.
The inequality (22) can be replaced by the approximate condition
/( l)-*
The minimization of the cubie model using the latest data /(0) , /(A), /(Ap)and /' (0) is repeated unti l (22) is satisfied; here Ap is the parameter obtained
from the previous polynomial ft. The linesearch parameter may be too largeto achieve a reasonable reduction of the step length. It may also be too small,resulting in a slow Newton convergence. Therefore, relative upper and lowerbounds i]u and i]t on A with respect to the previous parameter Ap areimposed. The complete algorithm is presented in Box 2. Rernark. The study presented in [2] and our own experience have shown thatthe values a = 10~4, tjt = 0.1 and i]u = 0.5 are appropriate for the problemstested.
8/14/2019 Rend. Sem. Mat. Univ. Poi. Torino Fascicolo
9/22
149
Box 2 - Backtrackng Linesearch Algorithm.
Given a, rjt, ;u and EM, proceed as follows:
A iju Ap , A
8/14/2019 Rend. Sem. Mat. Univ. Poi. Torino Fascicolo
10/22
150
5. Automatic Time-increment Control Algorithm
The linesearch technique presented in the previous section increases the
stabil ity of the Newton-GMRES algorithm. However, it does not providethe necessary robustness during the time-marching process. A control overthe time increment overcomes this diffculty. Time-increment selection stra-tegies bave been applied with success to transient heat conduction analysisby Winget and Hughes [10]. Ericksson and Johnson [3, 4] have developedtime-increment control algorithms based oh a theoretical approach for linearand nonlinear parabolic problems. The complexity of the Navier-Stokes equa-tions currently prevents us from developing a complete theory. We presentbere a heuristic automatic time-increment control algorithm based on simple
considerations.Let the correction factor at stepn be defined as
y(t) = - ^ - (25) Atmax
where At is the current time increment and A
8/14/2019 Rend. Sem. Mat. Univ. Poi. Torino Fascicolo
11/22
151
where 6 is the tolerance vector. We use a stable and ConstantAt forthe first N t'irne steps. At the N th time step, 6 is chosen to be thechange in the solution,v(tjsr-\) V(
8/14/2019 Rend. Sem. Mat. Univ. Poi. Torino Fascicolo
12/22
152
6. Numerical Results
In this section, we present three numerical experiments to illustrate the
proposed procedures.A locai time-increment strategy was used for ali problems. Only one passwas performed in the Newton loop, i.e., max = 1 in Box 1. The tolerance forthe GMRES algorithm (etoi as defned in [9]) was set to 0.1. Ali computationswere done on a CONVEX-Cl in doubl precision (64 bits per floating pointword), using 3-point Gaussian quadrature for linear triangular elements.
6.1 - NACA-0012 Airfoil
A NACA-0012 airfoil is placed in a Mach 0.85 two-dimensional inviscidflow at 1 degree of angle of attack. The coniputational domain and the boun-dary conditions are depicted in Figure 1, wherep is the density; ui and W2
u n = 0
Meo = 0.85a= 1
P = Poo
it[ = t^oo cosau2 = Uoo s'ma
Fig. 1 - NACA-0012 airfoil. Problem description.
8/14/2019 Rend. Sem. Mat. Univ. Poi. Torino Fascicolo
13/22
153
are the components of the velocity vector u (not to be confused with theu defined in the previous sections); p is the pressure, andn is the vectornormal to the boundary. The finite element mesh consists of 1,393 nodes and
2,602 linear triangles (see Figure 2). The pressure contours near the airfoilare shown in Figure 3. One can note two low pressure regions followed bynormal shocks, as is classically seen in transonic flow calculations.
This problem is converged to steady-state starting from a uniform flow.A Constant CFL number of 15 is used. The maximum dimension of the Krylovspace for the GMIIES algorithm is set to 20. Both linear and matrix-freeGMRES algorithms were used to solve this test case. For the matrix-freeGMRES algorithm, the frozen-coeflcient Jacobian was used for the first 100steps. Then, it was replaced by the consistent Jacobian.
The convergence for both methods as a function of time step is shown inFigure 4. The labels "MF-GMRES" and "Lin-GMRES" refer to the matrix-free and linear GMRES algorithms. The convergence acceleration using thematrix-free GMRES algorithm with the consistent Jacobian is particularlystriking. A thorough study of the convergence has revealed that the non-convergence of the linear GMRES algorithm is due to the nonlinearities of the outflow boundary condition. An important remark is that matrix-freeGMRES is 2 to 3 times more expensive that linear GMRES to march the
sanie number of time steps. This is observed to be a general rule for two-dimensional computations. The explanation can be found by analyzing theCPU-time cost of each technique as a function of the number of GMRESiterations. For the linear GMRES algori thm, there is an initial overheadassociated with the formation of the frozen-coefficient Jacobian matrix. Thecost of computing the matrix-vector products is however low (~ 0.3 timesthe cost of evaluating the right-hand-side vector). On the other hand, theinitial overhead of the matrix-free GMRES algorithm is lower but the cost of a matrix-vector product is equivalent to one evaluation of the right-hand-side
vector. The break-even point for these two techniques is about three GMRESiterations. For both techniques the number of GMRES iterations per timestep for a typical high speed flow calculation is 10 15.
Another improvement generated through the use of the matrix-free GMRES algorithm is the reduction in Storage as can be seen in Table 1. TheStorage requirements for explicit and column height direct solvers are givenfor referenCe, as well as the Storage requested by ali the modules of the program. The ratio of memory usage between linear and matrix-free GMRESalgorithms is about 3 for this two-dimensional computat ion. We have calcu-lated this ratio to be of the order of 10 for three-dimensional problems withlinear tetrahedra.
8/14/2019 Rend. Sem. Mat. Univ. Poi. Torino Fascicolo
14/22
154
Fig. 2 - NACA-0012 airfoil. Finite element mesh (1,393 nodes; 2,602 elements).
Fig. 3 - NACA-0012 airfoil. Pressure contours.
8/14/2019 Rend. Sem. Mat. Univ. Poi. Torino Fascicolo
15/22
155
*n
8/14/2019 Rend. Sem. Mat. Univ. Poi. Torino Fascicolo
16/22
156
in Figure 7. Note that the CFL number was reduced to 0.5 at the first timestep because of possible instabilities at higher CFL numbers detected by thelinesearch algorithm (see marka on Figure 7). After the first 10 time steps
were completed, the tolerance6 was computed and the auto-A* algorithmwas activated by the program driver (markb). Unfortunately, 6 was toolarge and the CFL had to be reduced twice before an acceptable tolerance wasfound (markse and d). Then, the CFL number progressively increased to5 while the bow shock was forming in the flow feld.
P = Poo
Mi = ^o o
2 = 0
Meo = 3.0
Fig. 5 - Half cylinder. Fig. 6 - Finite elements meshProblem description. (520 nodes ; 930 elements).
Convergence diffkulties that resulted from the use of the frozen-coeffi-cient Jacobian can be seen in Figure 8. In turn, they generate oscillations onthe CFL number in Figure 7 (mark e). These difhxulties would not exist if the consistent Jacobian was used in place of the frozen-coefflcient Jacobian.
8/14/2019 Rend. Sem. Mat. Univ. Poi. Torino Fascicolo
17/22
157
fi
Pn
6
5 -
4 -
9 -
1 -
50 100
Time Step
Fig. 7 - Ilalf cylinder. CFL number.
150 200
^
^
&
50 100
Time Step
Fig. 8 - Ilalf cylinder. Convergeiice.
150 200
8/14/2019 Rend. Sem. Mat. Univ. Poi. Torino Fascicolo
18/22
158
6.3 - Doubl Ellipse
This two-dimensional hypersonic problem consists of a shuttle-like geo-
metry defined by two ellipses placed in a Mach 25 inviscid flow at 30 degreesof angle of attack. The description of this problem is depicted in Figure 9. Afirst solution was computed on a 624-node mesh (not shown) and was projec-ted onto a 2,350-node mesh (see Figure 10). The temperature contours are
P = Poom Uoo cos30u2 Uoo sin 30
Meo =25.0 Reco/vi = oo
Fig. 9 - Doubl ellipse. Problem description.
presented in Figure 11. Note that the strong bow shock and the canopy shock
could have been better captured with an adaptive mesh procedure. The con-vergence presented in Figure 12 was obtained by using a CFL number of 2for the linear GMRES algorithm. Numerical tests have shown that this is themaximum CFL number that could be used with this algorithm. On the otherhand, the consistent Jacobian was used for the matrix-free GMRES algorithmand a CFL number of 2 was set for the first 50 steps, followed by a CFL numberof 10. This example demonstrates that a better stability, and therefore a fa-ster convergence of the time-marching scheme, can be achieved for high speedflows when the consistent Jacobian is used. This is with the condition that theinitial guess is sufficiently dose to the steady-state solution. A factor of two
8/14/2019 Rend. Sem. Mat. Univ. Poi. Torino Fascicolo
19/22
159
Fig. 10 - Doubl ellipse. Mesh (2,350nodes; 4,448 elements).
Fig. 11 - Doubl ellipse. Temperaturecontours.
^
. I-Hcocu
-aCUNle
a
o
.1 -
.01 -
.001 -
1E-4
LA ' ' ' ' ' ' ' '
\ \ _
1 *\ \ \ / - Lin-GMRES
\ *N \ \ \ \
V * MF-GMRES
i , i i i . i
100 200 300Time Step
Fig. 12 - Doubl ellipse. Convergence.
400
8/14/2019 Rend. Sem. Mat. Univ. Poi. Torino Fascicolo
20/22
160
speed-up in CPU-time is achieved with the matrix-free GMRES algorithm ata normalized residuai of IO-3.
7. Conclusions
We have presented a solution algori thm for implicit time-marching sche-mes. Its robustness and elTciency bave been enhanced through the use of a linesearch procedure and an automatic time-increment control algori thm.Matrix-free techniques have eliminated the Storage of the Jacobian matrix.Tvvo-dimensional computations of inviscid and viscous compressible flows havedemoustrated the potential of these strategies. However, three-dimensional industr iai problems will provide a better basis for evaluating efficiency. Futurework will focus on producing a higher level of automation in the choice of theJacobian matrix and the different parameters involved in the strategy.
Acknowledgments
The authors would like to express their appreciation to Michel Mallet,Walter Murray and Youcef Saad for helpful comments. This research wassupported by the NASA Langley Research Center under Grant NASA-NAG-1-361 and Dassault Aviation, St. Cloud, France.
REFERENCES
[1] P.N. Brown and Y. Saad, "Hybrid Krylov Methods for Nonlinear Systems of Equations",SIAM Journal of Scientifc and Statistica! Computing,11 (1990)450-481.
[2] J.E. Dennis and R.B. Schnabel,Numerical Methods for Unconstrained Optimi- zation and Nonlinear Equations,Prentice-Hall, Englewood Cliffs, NJ, 1983.
[3] K. Eriksson and C. Johnson, "Adaptive Finite Element Methods for ParabolicProblems. I: A Linear Model Problem", Preprint no. 1988: 31/ISSN 0347-2809,Department of Mathematics, Chalmers University of Technology, The University
of Goteborg, 1988.
8/14/2019 Rend. Sem. Mat. Univ. Poi. Torino Fascicolo
21/22
161
[4] K. Eriksson and C. Johnson, "Error Estimates and Automatic Time-Step Control for Nonlinear Parabolic Problenis, I",SIAM Journal of Numerical Analysis,24 (1987)12-23.
[5] P.E. Gill, W. Murray and M.If. VVriglit,Practical Optimization, Academic Press,London,1981.
[6] T.J.R. Hughes,The Finite Eewent Method: Linear Static and Dynamic FiniteEenient Analysis,Prentice-Hall, Englewoods ClifFs, NJ, 1987.
[7] Y. Saad and M.H. Schultz, "GMRES: A Generalized Minimal Residuai Algo-rithm for Solving Nonsymmetric Linear Systems",SIAM Journal of Scentifcand Statistical Computing,7 (1986) 856-869.
[8] F. Shakib, "Finite Eenient Analysis of the Compressible Euler and Navier-Stokes Equations",Ph.D. Thesis,Stanford University, 1989.
[9] F. Shakib, T.J .R. Hughes and Z. Johan, "A Multi-Element Group Precondi-tioned GMRES Algoritlnn for Nonsymmetric Systems Arising in Finite ElementAnalysis",Computer Methods in Applied Mechanics and Engineering,75 (1989)415-456.
[10] J.M. Winget and T.J.R. Hughes, "Solution Algorithms for Nonlinear TransientHeat Conduction Analysis Employing Eleinent-by-Element Iterative Strategies",Computer Methods in Applied Mechanics and Engineering,52 (1985) 711-815.
Zdenk JOHAN - Thomas J.R. HUGHESDivision of Applied Mechanics, Stanford University, Stanford, CA 94305, U.S.A.
Farzin SHAKIBCENTRIC Engineering Systems, Inc., 3801 East Bayshore Road, Palo Alto,CA 94303, .S.A.
8/14/2019 Rend. Sem. Mat. Univ. Poi. Torino Fascicolo
22/22