previous day
next day
all days

View: session overviewtalk overview

08:00-10:05 Session 11A: MG for Partially Structured Meshes. Room: Bighorn B.
Asynchronous Stencil Computations in dynamically adaptive FAS for AFAC

ABSTRACT. Classic multigrid methods require all stencils, i.e.~matrix entries, to be computed prior to the solve. Yet, they have to be able to exploit massively parallel computers without a ramp up (assembly) phase of limited arithmetic intensity and concurrency where nothing is actually solved, while their solve implementation has to be aware that the gap between compute power and memory bandwidth is widening. It is thus important to read each unknown as rarely as possible. Additive multigrid variants are well-known to have advantageous scaling properties but tend to be less stable and efficient than their multiplicative counterparts. Asynchronous Fast Adaptive Composite (AFAC) candidates to stabilise additive multigrid as each correction term is damped by an additional multilevel coarse grid solve. We propose a pipelined, single-touch realisation of AFAC where each degree of freedom is loaded only once per multilevel smoothing step. It is memory-access optimal. Furthermore, we combine it with Full Approximation Storage (FAS). The latter allows for a straightforward realisation of arbitrary dynamic mesh refinement (AMR). Finally, we propose to kick off the solve with approximate stencils and then to compute more accurate stencils in parallel to the actual solve. The latter computation can be deployed to background threads. Our paper demonstrates the robustness of both our AFAC variant and the background integration of stencil entries, while the parallelisation potential of the latter is validated.

A non-invasive matrix-based multigrid scheme for partially structured grids
SPEAKER: Matthias Mayr

ABSTRACT. Specialized multigrid algorithms tailored to structured grids allow one to exploit memory access patterns and increase efficiency, especially for large-scale computations on next generation platforms. However, many engineering problems require unstructured meshes to represent complex geometries. Schemes based on Hierarchical Hybrid Grids (HHG) [1] can represent more complex geometries while still providing locally structured meshes to leverage efficiency. Specifically, an HHG mesh can be viewed as a union of logically structured sub-region meshes. Unfortunately, however, utilizing such schemes can be quite burdensome for an application scientist as the solver often wants sub-region information such as stiffness matrices or sub-assembled regional matrices. To avoid this difficulty (and so reuse existing matrix evaluation and assembly capabilities of applications codes), we base our algorithm on fully assembled matrices. Therefore, we first need to decompose the matrix into submatrices that correspond to the partial structure of the mesh and then setup and run multigrid hierarchies based on this decomposed representation. To compute level transfer operators, we target schemes like geometric interpolation, brick aggregation or black box multigrid [2].

In this presentation, we outline strategies to decompose assembled matrices into partial matrices based on the underlying partially structured grid and discuss the numerical trade-offs in working with an artificial decomposition as opposed to sub-assembled matrices. Additionally, we examine various aspects of basic mathematical operations like matrix-vector multiplication or relaxation-based smoothers for such decomposed matrices and conclude with numerical examples to demonstrate the capabilities of the proposed algorithm.

[1] Bergen, B.K. and Hülsemann, F.: Hierarchical hybrid grids: data structures and core algorithms for multigrid, Numer. Linear Algebra Appl., 11:279-291 (2004) [2] Dendy, D.E. and Moulton, J.D.: Black Box Multigrid with coarsening by a factor of three, Numer. Linear Algebra Appl., 17:577-598 (2010)

The Method of Local Corrections for Computing Volume Potentials on Locally-Structured Grids

ABSTRACT. The Method of Local Corrections (MLC) is a method for computing volume potentials, i.e. solutions to Poisson's equation in 3D with infinite-domain boundary conditions, on nested refined grids. The potential is represented as a collection of local solutions to a rectangular-grid discretization of Poisson's equation, each corresponding to a subset of the charge defined on a small rectangular patch of N^3 mesh points and grid spacing h. The global coupling is represented using a coarse grid solution with grid spacing H > h for a charge distribution obtained by applying the coarse-grid operator to the fine-grid local fields, so that the total solution is computed using a non-iterative version of FAS multigrid. Both the local fine-grid fields and the coarse-grid solution can be expressed as discrete convolutions, and computed using FFTs.

It has been shown that the solution error for this method is is of the form O(h^q) + ||f||_max O(H/(b*h*N))^Q, where f is the charge distribution. Here q and Q are the order of convergence of the truncation error of the discrete Laplacian operator, respectively, on general data and on harmonic functions -- for the Mehrstellen discretizations used here, we can have Q > q, depending on the preconditioning applied to the discrete charge distribution. The local potentials are each computed on a patch of size (b N)^3, b > 1. Typically, we take H=4*h and N to be fixed, in which case there are two discretization parameters: h, which controls the local contribution to the error; and b, which controls the error in representation of the smooth global coupling. Also, with this scaling, and by computing the coarse-grid solution using a recursive call to MLC, the method generalizes in a straightforward fashion to any number of levels of nested, patch-based locally-refined grids.

In this talk, we present a systematic study of the design tradeoffs of MLC for current and future HPC systems. The fundamental property of such systems that drives our design approach is that the cost of data movement / data storage is remaining more or less fixed, while the cost of performing floating point operations is dropping. Thus any algorithm that uses more flops in order to reduce data motion has the potential for obtaining a reduced time to solution. MLC has data movement comparable to a single geometric multigrid v-cycle (with a factor of 4 refinement between levels) indicating that it has potentially a much lower communications cost (and therefore higher performance) than traditional geometric multigrid. Since this is being done in the context of a method that has a more complicated dependence of the accuracy on discretization parameters, we map out that dependence empirically, and explain why we expect the observed dependence on the parameter choices to be robust. We then present some of the algorithmic design and software engineering choices for obtaining the best possible performance, including: the impact of an initial choice of Q and N on accuracy and communications costs; approximating part of the local convolutions by the fields induced by low-moment Legendre expansions of the charges on each patch; and the organization of the small multidimensional FFTs that are the compute kernels for the discrete convolution step to reduce their working set size and have more of the computation be performed out of higher levels of the memory hierarchy. We show weak and strong scaling behavior of the resulting implementations on both uniform-grid and locally-refined-grid test problems, and compare the measured performance with that predicted from a roofline model and with published benchmark calculations.

Robust and Scalable Iterative Solvers for Immersed Finite Element Methods

ABSTRACT. Immersed finite element methods are useful tools to preclude expensive meshing operations for problems posed on complex and/or scanned domains. In immersed methods, the problem domain (on which the problem is posed) is immersed in a larger, embedding domain of simple shape, that is simple or trivial to mesh. The problem is then solved on the restriction of this embedding mesh to its intersection with the problem domain. This requires advanced integration techniques to integrate cut elements [1] and a special treatment of boundary conditions that are, since the restriction of the mesh is not interpolary on the boundary of the problem domain, generally weakly imposed by means of Nitsche's method [2]. A common pitfall of immersed finite element methods is ill-conditioning of the linear system [3]. This impedes the convergence of iterative solvers, and often compels researchers to resort to direct solvers [4]. This hinders the efficient and inexpensive computation of solutions of large sparse systems, as the computational cost of direct solvers does not scale well with the size of the linear system, making them unsuitable for the increasingly large problems being solved by immersed methods, e.g. [5]. In [6] we have analyzed the cause of ill-conditioning of immersed finite element methods and developed a preconditioner that brings the condition number down to that of standard, mesh-fitting, finite elements, such that the linear system can be solved iteratively. Like standard finite elements, the conditioning of this preconditioned system still depends on the grid size however. In this contribution we shortly describe the cause of the conditioning problems and the developed preconditioner, and extend this approach to investigate possibilities to apply the preconditioner as a smoother in a multigrid cycle to develop a scalable solver that is robust to both the grid size and how elements are cut.

[1] S. Hubrich, P. Di Stolfo, L. Kudela, S. Kollmannsberger, E. Rank, A. Schröder and A. Düster. Numerical integration of discontinuous functions: moment fitting and smart octree. Comp. Mech. 2017

[2] A. Embar, J. Dolbow and I. Harari. Imposing Dirichlet boundary conditions with Nitsche's method and spline-based finite elements. Int. Jour. Num. Meth. Engng. 2010.

[3] E. Burman, S. Claus, P. Hansbo, M. Larson and A. Massing. CutFEM: Discretizing geometry and partial differential equations. Int. Jour. Num. Meth. Engng. 2014.

[4] M. Ruess, D. Schillinger, A.I. Özcan and E. Rank. Weak coupling for isogeometric analysis of non-matching and trimmed multi-patch geometries. Comp. Meth. App. Mech. Engng. 2014.

[5] F. Xu, D. Schillinger, D. Kamensky, V. Varduhn, C. Wang and M.-C. Hsu. The tetrahedral finite cell method for fluids: Immersogeometric analysis of turbulent flow around complex geometries. Computers & Fluids. 2016.

[6] F. de Prenter, C.V. Verhoosel, G.J. van Zwieten and E.H. van Brummelen. Condition number analysis and preconditioning of the finite cell method. Comp. Meth. App. Mech. Engng. 2017.

Multigrid algorithms in MueLu for structured grid problems

ABSTRACT. MueLu is the multigrid package part of the Trilinos library that aims at providing high performance and scalable numerical algorithms to model physical phenomenon primarily using systems of partial equations (PDEs). MueLu provides a general multigrid framework but has traditionally been focused on algebraic multigrid algorithms. This talk presents newly implemented aggregation and grid transfer algorithms that take advantage of the mesh structure to compute the grid hierarchy efficiently.

First the aggregation algorithm will be detailed with an emphasis on the assumption that it makes on the fine mesh and on the properties it guarantees for the successively coarser meshes. Algorithmic performance of this algorithm will also be discussed. Second two grid transfer algorithms: geometric interpolation and black box, will be discussed and their performance compared on example systems of partial differential equations. Finally a comparison of these algorithm with their unstructured counterpart: unstructured aggregation and algebraic multigrid, will be presented.

08:00-10:05 Session 11B: Data Science. Room: Bighorn C/1.
The tensor t-function: a definition for functions of third-order tensors
SPEAKER: Kathryn Lund

ABSTRACT. A definition for functions of multidimensional arrays is presented. The definition is valid for third-order tensors in the tensor t-product formalism and is therefore referred to as the ``tensor t-function." By making use of its connection to block circulant matrices, the tensor t-function is shown to have similar properties as matrix functions in a number of fundamental scenarios. To demonstrate the definition's potential in applications, the notion of network communicability is generalized and computed for a small-scale example via block Krylov methods for matrix functions.

Stable tensor neural networks
SPEAKER: unknown

ABSTRACT. We propose a tensor neural network framework that offers an exciting new direction in machine learning. Using our design, we can store data in high-dimensional structures and extract multidimensional correlations otherwise latent in traditional algorithms. Our network architecture is based on the t-product (Kilmer and Martin, 2011), an algebraic formulation to multiply tensors via circulant convolution which inherits mimetic matrix properties. In this study, we demonstrate that our tensor neural network architecture is a natural high-dimensional extension of traditional neural networks. Then, we expand on recent work (Haber and Ruthotto, 2017) to analyze the stability of our tensor neural network. Haber and Ruthotto interpret deep neural networks as discretizations of nonlinear differential equations and introduce stable neural networks which promote superior generalization. Motivated by their framework, we examine the stability of tensor differential equations from which we introduce a stable tensor neural network architecture. We illustrate the advantages of stability and demonstrate the potential of tensor neural networks with numerical experiments on the MNIST dataset.

Parallel-in-Time for Non-PDE-Based Evolution Problems: Neural Network Training and Powersystems

ABSTRACT. The need for parallel-in-time is being driven by changes in computer architectures, where future speedups will be available through greater concurrency, but not faster clock speeds, which are stagnant. This leads to a bottleneck for sequential evolution processes, such as traditional time marching, because they lack parallelism over the serial evolution process. In this talk, we examine the optimal-scaling multigrid reduction in time (MGRIT) method, which is able to parallelize over many such evolution processes. In the traditional PDE-based setting, MGRIT applies multigrid reduction to the time dimension by solving the (non)linear system that arises when solving for multiple time steps simultaneously. The result is a parallel-in-time method. However, MGRIT is designed to be a nonintrusive method, capable of wrapping existing codes that simulate fairly general evolution processes, not just PDE-based problems. In this talk, we present the MGRIT framework and then discuss some recent results and developments from applying MGRIT to two such non-PDE-based problems, powergrid simulations and neural network training runs. We will describe the flexibility of the XBraid library (an open source implementation of MGRIT) to handle such test problems, and then demonstrate the feasibility of parallelism over powergrid evolution steps and over neural network training runs.

This work performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52--07NA27344

Approximating trace($L^{\dagger})$ using the Johnson-Lindenstrauss lemma and spectral methods.

ABSTRACT. Computing the trace of the inverse of $L^{\dagger}$ is a problem that occurs in many areas such as Lattice Quantum Chromodynamics, Machine Learning, Uncertainty Quantification, Mathematical Chemistry and Network Analysis. In several of these areas, the matrix $L$ in question is a graph Laplacian, and the trace of the pseudo-inverse of this graph is needed. An example of this would be to obtain the Kirchhoff index $n$ trace($L^{\dagger}$).

While exactly solving for trace($L^{\dagger}$) is infeasible when the matrix is large, there are several statistical methods for estimating trace($L^{\dagger}$). A common one is Hutchinson's method. In this algorithm, several vectors are prepared with their elements being drawn from the Rademacher distribution. The trace estimate is then $z'L^{\dagger}z$, where the matrix-vector multiplication is computed using a linear solver. This method can obtain a relatively low accuracy cheaply, but to improve beyond this takes a significant number of additional vectors. This can be expensive since each matrix-vector operation requires the invocation of a linear solver.

Statistical methods such as Hutchison's are the best that can be done if the matrix lacks structure, meaning that no elements of the inverse are much larger or smaller than the average element. However, if the matrix has such a structure and it can be effectively discovered then probing vectors can be created which are more effective than statistical approaches. This is done by effectively dropping or ignoring the smallest elements of the inverse and coloring the rest. By grouping the remaining nodes of the same colors together a block zero structure is induced along the diagonal and the probing vectors can then be used to recover the trace, albeit with the error introduced by treating certain elements of the inverse as zero when they are not. Probing methods can be quite effective and can still provide statistical bounds on the error of the resultant trace. If the relative size of the elements can be modeled with a linear function, then the variance of a trace estimate found by probing can decrease as 1/n(where n is the number of vectors needed) instead of the 1/sqrt(n) of the statistical methods, a substantial improvement.

A major difficulty with probing based methods is that it can be difficult to discover the structure of $L^{\dagger}$. One approach is to take powers of L and color this as an approximation for the important elements of $L^{\dagger}$. If the Neumann series converges to $L^{\dagger}$ with high accuracy in a limited number of steps then this can work well. However, for many graphs of interest such as those of real-world social networks this method works very poorly and probing does not surpass statistical methods. Alternatively, one can use spectral methods. By finding the smallest eigenvector of $L^{\dagger}$ one can find an approximate red-black coloring that can be used to create probing vectors. By then recursively finding the smallest eigenvector of the red-black sub-blocks, the probing vectors can be improved. Unfortunately finding the eigenvectors of these sub-blocks of $L^{\dagger}$ is very expensive since each matrix-vector multiplication with $L^{\dagger}$ requires a linear solve.

We propose a new method that leverages both statistical and structural approaches. By using the Johnson-Lindenstrauus lemma we obtain an approximation to $L^{\dagger}$, by projecting the elements of $L^{\dagger}$, into a lower dimensional space. This is the same approach as was proposed by Spielman and Srivastava for effectively calculating the effective resistances given a fast solver, and can be done using a relatively low number of linear solves at a low accuracy. While we could easily use this approximation to the elements of $L^{\dagger}$ to compute the trace, the error we would obtain is the same as with Hutchinson's method. However, this gives us a way to cheaply obtain an approximation to the eigenvectors of the sub-blocks of $L^{\dagger}$, since we no longer require a linear solve for each matrix-vector operation. While the eigenvectors will not be computed very accurately, this will not greatly impact the quality of the probing vectors, because with structural methods we care about the relative size of the elements, and not their absolute value. Unlike other structural approaches, this method is relatively easy to analyze because the error introduced by the Johnson-Lindenstrauus lemma is known, and the Cheeger bounds provide grantees about the quality of the given coloring. It also provides a good way to analyze the quality of our probing vectors, because the spectral gap provides an indication of the quality of the partitions our algorithm is producing.

We have tested this method against several social media graphs arising in the real world, as well as several synthetic graphs which are thought to model real networks well and have obtained better accuracy than both pure statistical and structural methods using the same number of linear solves.

Iterative Methods in Sports Ranking
SPEAKER: Erik G. Boman

ABSTRACT. Ranking sports teams from partial and incomplete information (as not every pair of teams play each other) has long been a challenge among sports enthusiasts but has also attracted interest among mathematicians. There are interesting questions both on the modeling and the algorithms side. We discuss two algorithms based on linear algebra: (a) The popular PageRank algorithm, and (b) spectral ordering based on a Laplacian matrix. We show how the latter solves the related seriation problem exactly, given noiseless data. Finally, we present some examples based on popular sports events such as the soccer (football) World Cup and the 2018 Winter Olympic Games.

08:00-10:05 Session 11C: Fluid Mechanics. Room: Bighorn C/2.
Low-Order Finite Element Preconditioner for High-Order Spectral Element Pressure Solver for the Navier-Stokes Equations

ABSTRACT. High-order spectral element methods (SEM), while very accurate for computational fluid dynamics (CFD) simulations, can be prohibitively expensive for meshes with difficult geometries (i.e. high-aspect ratio elements, sharp corners, etc.). Disparate acoustic and convective timescales drive one to choose an incompressible model that avoids the need to compute fast acoustic waves. The price is that one must solve a linear symmetric-positive-definite (SPD) system for pressure at every timestep --- a system whose condition number does not improve as dt is reduced (because the sound speed is infinite in the incompressible model). The pressure solve is invariably the bottleneck for every incompressible Navier-Stokes code. Having a scalable solution strategy, particularly for highly efficient high-order methods, is of premium importance. Preconditioned iterative solvers are usually the de facto alternatives in solving for the linear system, but good preconditioners for complicated geometries remain an open research question.

Preconditioners like the hybrid-Schwarz multigrid preconditioner proposed by Fischer et al. solve local Poisson problems for each element on overlapping domains using fast-diagonalization methods[1]. While very fast implementations for this preconditioner exist, iteration counts remain very large for geometries with high-aspect ratio elements. Other alternatives include point-wise smothers proposed by Ronquist et al. and Maday et al. in which convergence rates are proven to be bounded independent of the polynomial order of the basis functions for 1D geometries but degrade for higher spatial dimensions[2, 3]. The situation is somewhat improved by line-relaxation methods as considered in a series of papers by Heinrichs and later by Shen, but such methods remain constrained to spectral methods alone[4, 5]. The preconditioning of spectral methods in conjunction with low-order approximation schemes was first proposed by Orzag[6]. In his paper, Orzag develops an iterative method based on a low-order finite-difference approximation to the spectral operator on the Chebyshev nodes that bounds the eigenvalues of the matrix from below and above as N approaches infinity. More recently, a work by Canuto and Quarteroni looked at using low-order finite elements collocated on the high-order SEM mesh to construct the preconditioning operators and demonstrated preconditioning properties under different meshing strategies[7]. Their work, however, only looked at spectral methods with a homogeneous boundary conditions on a single unit cube.

The goal of this research is to propose a novel low-order FEM preconditioner for the SEM that can be applied cheaply using an algebraic multigrid (AMG) solver. Our preconditioning strategy relies on choosing low-order elements that relax the constraints imposed by the physical domain used in the simulation. These elements are allowed to overlap with each other or, in some cases, create void spaces in the geometry. Elements are meshed so that only immediate neighbors are connected using triangles in 2D or tetrahedrals in 3D. We do this so that the low-order matrices remain as sparse as possible. Furthermore, we propose several preconditioning operators based on different combinations of the low-order stiffness and mass matrices. Finally, we look at how to setup the AMG solver so that the execution time is minimized and the preconditioner runs faster than highly-optimized options like the hybrid-Schwarz multigrid preconditioner.

Our results show that the proposed meshing strategy reduces the number of iterations by more than 5 times in some cases compared to the current state of the art schemes without increasing the computational cost of the preconditioners. It also overcomes the shortcomings of other well known preconditioners such as the hybrid-Schwarz multigrid preconditioner for complex geometries. In terms of running time, we see a speed up of 3 to 4 times for low-degree basis functions in the SEM when using the setup that produces the fastest AMG solver. We present these results for different geometries used in real CFD simulations with mixed boundary conditions.

[1] Fischer, P.F., Lottes, J.W.: Hybrid Schwarz-Multigrid Methods for the Spectral Element Method: Extensions to Navier-Stokes, Lecture Notes in Computational Science and Engineering, vol. 40, pp. 35–49. Springer Netherlands (2005) [2] Ronquist, E.M., Patera, A.T.: Spectral element multigrid, I:. Formulation and numerical results. Journal of Scientific Computing 2, 389–406 (1987) [3] Maday, Y., Muñoz, R.: Spectral element multigrid: Numerical analysis. Journal of Scientific Computing 3, 323–354 (1988) [4] Heinrichs, W.: Line relaxation for spectral multigrid methods. Journal of Computational Physics 77(1), 166–182 (1988) [5] Shen, J., Wang, F., Xu, J.: A finite element multigrid preconditioner for chebyshevcollocation methods. Applied Numerical Mathematics 33(1), 471–477 (2000) [6] Orszag, S.A.: Spectral methods for problems in complex geometries, Journal of Computational Physics 37(1), 70–92 (1980) [7] Canuto, C., Gervasio, P., Quarteroni, A.: Finite-element preconditioning of g-ni spectral methods. SIAM Journal on Scientific Computing 31(6), 4422–4451 (2010)

Block Preconditioners for Two-Phase Incompressible Navier-Stokes Flow
SPEAKER: unknown

ABSTRACT. Block preconditioners paired with iterative Krylov methods have proven to yield some of the most effective solvers for incompressible fluid flow problems. We present novel preconditioners designed for two-phase Navier-Stokes flow which extend existing approaches for a single fluid based on approximate commutators. Without extension these methods fail to provide good performance for the more challenging variable coefficient Navier-Stokes systems arising in two-phase flow. Numerical results demonstrate that a preconditioner motivated by analysing continuous operators provides convergence in a number of iterations that is essentially independent of the grid size.

Efficient Linear Solvers for Two-Phase Navier-Stokes Equations in Free-Surface Models
SPEAKER: unknown

ABSTRACT. Free-surface models are a powerful tool for simulating dynamic hydraulic processes and are an important application of the two-phase Navier-Stokes equations. In this setting, the two-phase Navier-Stokes equations are often modified with advective stabilization, numerical diffusion, and weakly enforced boundary conditions. Moreover, most simulations of interest have three-dimensional domains that must be solved in parallel on high performance computers. In this paper, we examine the application of several preconditioning strategies applied to a linearized two-phase Navier-Stokes system generated in a free-surface setting. In particular, we present and compare results for a new two-phase pressure convection diffusion operator with other established preconditioning strategies. We also consider the effects of boundary conditions, stabilization, and parallel computing on performance and suggest guidelines for dynamically selecting a preconditioner during simulations.

Preconditioners for non-isothermal flow through porous media
SPEAKER: Thomas Roy

ABSTRACT. In oil and gas reservoir simulation, standard preconditioners involve solving a restricted pressure system with AMG. Initially designed for isothermal models, this approach is often used in the thermal case. However, it does not incorporate heat diffusion or the effects of temperature changes on fluid flow through viscosity and density. We seek to develop preconditioners which consider this cross-coupling between pressure and temperature. In order to study the effects of both pressure and temperature on fluid and heat flow, we first consider a model of non-isothermal single phase flow through porous media. By focusing on single phase flow, we are able to isolate the properties of the pressure-temperature subsystem. We present a numerical comparison of different preconditioning approaches.

Fast algorithms for the dense matrices arising from the Method of Regularized Stokeslets

ABSTRACT. The swimming motion of microorganisms such as sperm and cilia can be modeled by several methods, all of which entail solving equations of fluid-structure interaction. Among them, the Method of Regularized Stokeslets (MRS) and the Rotne-Prager-Yamakawa tensor have the advantage of not requiring a 3D Eulerian grid and using the fundamental solutions to the underlying equations instead. However, the computations required by both methods entail the use of dense matrices, and they tend to be large and very costly to work with for practical models in which the number of micro-swimmers is large.

The ‘data-sparse’ structure of these matrices enables the development of fast algorithms. To compute the matrix-vector products efficiently, we extend the Kernel-Independent Fast Multipole Method (KIFMM) to the kernels associated with the MRS. To solve linear systems with the same matrices efficiently, we consider both a data-sparser preconditioner and a block-diagonal preconditioner; to expedite the application of the preconditioners, we employ a number of techniques such as Krylov subspace recycling. The proposed algorithms are applied to study the dynamics of a large group of sperm as well as the flow field induced by a carpet of cilia.

10:25-12:30 Session 12A: MG for Non-symmetric Problems. Room: Bighorn B.
AMG smoothers for Maxwell's equations
SPEAKER: unknown

ABSTRACT. Algebraic Multigrid (AMG) is used to speed up linear system solves in a wide variety of applications.

This paper is concentrated on expanding AMG's applicability to important new classes of problems through algorithms that automatically construct advanced smoothing techniques when needed. This means in particular, we apply AMG to solve Maxwell's equations.

These AMG smoothing techniques have their roots in smoothing techniques used for geometric multigrid methods. Arnold, Falk and Winther developed an overlapping schwarz smoother for geometric multigrid and Hiptmair developed the distributive relaxation for geometric multigrid. We use this knowledge to construct smoothing techniques for AMG.

We develop adapted overlapping Schwarz smoothers and distributive relaxation for AMG. Therefore, we rely on Nédélec's H(curl,$\Omega$)-conforming finite elements to discretize the problem.

We present first results regarding the smoothing quality of the developed smoother for AMG.

A Residual Smoothing Strategy for Accelerating Newton Method Continuation

ABSTRACT. Newton methods have become popular strategies for solving large-systems of non-linear equations. In the final stages of convergence, when the iterative solution state is close to the exact nonlinear solution, Newton methods converge in a small number of nonlinear steps, and each step can often be solved effectively using preconditioned Krylov methods which are generally robust. However, for most cases, a continuation strategy must be employed to iteratively approach the nonlinear state where fast convergence is obtained. A typical strategy consists of employing a pseudo-transient approach, where a pseudo-time term is added to the diagonal of the Jacobian matrix with a variable pseudo-time step. In the initial phases of convergence, when the pseudo-time step is small, the method approximates a pseudo-time explicit scheme, and in the final stages, when the pseudo-step becomes large, the exact Newton method is recovered.

Although pseudo-transient continuation for Newton methods is widely used for CFD type problems, experience has shown that this approach can be slow to converge and often evolves through nonlinear states that produce convergence stagnation or even unrealizable solutions. A notable mode of failure or inefficiency occurs when isolated residuals in the nonlinear system remain large and retard the entire global solution. This has prompted research into methods such as nonlinear preconditioning or other approaches to break up the problem into smaller more local nonlinear problems, based on the observation that local nonlinear solution methods can often overcome the difficulties encountered by global continuation Newton methods.

Alternatively, the ability of localized residuals to retard the entire global nonlinear problem may be attributed to a non-smooth residual distribution, since for a perfectly smooth residual distribution, it should be possible to advance the global problem uniformly. In this work, we propose a residual smoothing technique that is applied to the right-hand side (i.e. residual vector) of a traditional pseudo-transient continuation Newton-Krylov method. Residual smoothing can be achieved through a variety of approaches, such as using well-known local nonlinear smoothers including block Jacobi and line smoothers. In the limit of small pseudo-time steps, the nonlinear iterations of the smoothed Newton continuation scheme correspond to a nonlinear implicit solver, while in the limit of large pseudo-time steps, the exact Newton scheme is recovered, along with its quadratic convergence properties. Example solutions are given for aerodynamic Reynolds-averaged Navier-Stokes (RANS) CFD problems on unstructured meshes, demonstrating a more robust and efficient solution strategy compared to the traditional pseudo-transient continuation Newton Krylov approach.

Multigrid Methods for flows in porous media

ABSTRACT. We present a high order mass conserving discontinuous Galerkin method for flows in porous media. The governing system of partial differential equations are nonlinear, have discontinuous material parameters, and give rise to diffusion-dominated or advection-dominated behavior. A robust deeply nested multilevel method is proposed, which combines p-multigrid, h-multigrid, and algebraic multigrid. Several simulations and numerical results are shown to assess the multigrid solver.

A Geometric Multigrid Method for Locking-Free, Divergence-Conforming Isogeometric Discretizations of Linear Elasticity

ABSTRACT. Isogeometric analysis has recently arisen as an attractive numerical simulation methodology for structural mechanics. In isogeometric analysis, the geometry and solution fields are represented using the same underlying basis, namely Non-Uniform Rational B-splines (NURBS), eliminating the need for mesh generation. Moreover, isogeometric analysis typically provides solutions with enhanced accuracy and stability as compared with classical finite element analysis.

In this talk, we present a novel geometric multigrid methodology for isogeometric discretizations of linear elasticity. Our methodology relies on the use of divergence-conforming isogeometric discretizations which are free of spurious volumetric locking behavior. Our methodology also relies on the use of Schwarz-style smoothers in conjunction with specially chosen overlapping subdomains. We provide theoretical results demonstrating the robustness of our methodology in the nearly-incompressible limit, and we present numerical results confirming the robustness of our methodology even when applied to heterogeneous structures consisting of interlacing hard, compressible material (such as steel) and soft, nearly-incompressible material (such as rubber). We finish by discussing extension of our multigrid methodology to geometrically nonlinear hyperelasticity.

aSP-AMG: a novel adaptive algebraic multigrid method for ill-conditioned problems.

ABSTRACT. The numerical simulation of modern engineering problems can easily incorporate millions or even billions of degrees of freedom. In several applications, the solution of sparse linear systems is required and algebraic multigrid (AMG) methods are often standard choices as iterative solvers or preconditioners. These comprise a family of techniques built on a hierarchy of levels associated with linear problems of decreasing size where the fundamental idea is to reduce different error components across their hierarchy, i.e., high-frequency components are reduced in the fine level system while the low-frequency counterpart is handled by the coarse levels. In this way, optimality and efficiency are achieved by combining the two complementary processes: relaxation and coarse-grid correction.

Since the creation of classical AMG in the early 1980s, many kinds of research have been conducted in the direction of improving its components as well as in the invention of new variants such as smoothed aggregation, element-based AMG and, more recently, adaptive and bootstrap AMG. In most cases, the common goal of these studies was to improve the efficiency of AMG and to expand their applicability to challenging problems such as those characterized by complex geometries, distorted grids, strong discontinuities in the physical properties and anisotropy. It was then proved that one of the key factors defining a fast AMG method consists of capturing correctly the near-null space of the system matrix and use it in the construction of suitable prolongation operators.

In this work, we propose a novel AMG package designed for SPD problems that we name aSP-AMG where the acronym aSP stands for adaptive Smoothing and Prolongation. The reason for the choice of the word adaptive is manifold. The first motivation is that we follow the perspective of adaptive and bootstrap AMG that is to assume no information about the near-null space of the system matrix. In fact, we construct a space of smooth vectors of limited size, referred as test space, by applying the Lanczos method to an initial random vector, however without making use of the self-improvement concept in the current implementation. Another novel contribution is the introduction of the adaptive pattern factorized sparse approximate inverse (aFSAI) as a smoother. This improves the smoothing capabilities of the resulting method as aFSAI is more effective than Jacobi and usually much sparser than Gauss-Seidel for the same accuracy. Moreover, aFSAI has been shown to be, both theoretically and experimentally, strongly scalable, and this fact fosters the implementation of the package in massively parallel computers, even if it is not our main focus at the moment. The coarsening phase is carried out as in classical AMG by dividing variables into fine and coarse, but the strength of connection is computed by means of the affinity between components of the test space, which we believe is a concept that better adapts to general application problems. Finally, the main contribution that we want to stress is the presentation of three new techniques for building the prolongation operator. The first one is called ABF, recalling the Adaptive Block FSAI preconditioning method, and it approximates the columns of the ideal prolongation operator by running a few iterations of the aFSAI algorithm. The second strategy, LS-ABF, uses the nonzero pattern computed by ABF and updates the prolongation weights with the aid of a least squares minimization process involving the test vectors. Lastly, we present a strategy called DPLS which is based solely on a least-squares process for building the sparsity pattern of the prolongation as well as its coefficients. Here, the prolongation is built independently row by row in such a way to include the test space in its range by using as few as possible nonzero coefficients in the sense of the minimum squares.

We assess the performance of the proposed AMG through the solution of a set of model problems as well as real-world engineering test cases. Initially, we study the impact of the configuration parameters of aSP-AMG on its performance by solving a highly heterogeneous model problem based on the linear elasticity equations. We then derive a set of default parameters providing good convergence results. After that, the weak scalability of the method is evaluated for a FEM discretization of two model problems based on the Poisson and linear elasticity equations with homogeneous materials. Lastly, a set of challenging engineering problems arising from different applications is solved by using the optimal configuration parameters for each test case, which are slightly different than the default set found before. Moreover, comparisons are made with the aFSAI and BoomerAMG preconditioners, showing that our new method proves to be superior to the first strategy and comparable to the second one, if not better as in the elasticity problems.

10:25-12:30 Session 12B: Sparse Linear Solvers. Room: Bighorn C/1.
Convergence analysis of Anderson-type acceleration of Richardson's iteration

ABSTRACT. We consider Anderson extrapolation to accelerate the (stationary) Richardson iterative method for sparse linear systems. Using an Anderson mixing at periodic intervals, we assess how this benefits convergence to a prescribed accuracy. The method, named Alternating Anderson-Richardson, has appealing properties for high performance computing, such as the potential to reduce communication and storage in comparison to more conventional linear solvers. We establish sufficient conditions for convergence and we evaluate the performance of this technique in combination with various preconditioners through numerical examples. Furthermore, we propose an augmented version of this technique.

A block Gauss-Seidel/Jacobi preconditioner for heterogeneous many-core architecture

ABSTRACT. The Jacobi and Gauss-Seidel algorithms are widely used in the field of parallel computing as iterative solvers for linear system. However, it is challenging to make good use of the fine-level parallelism on heterogeneous many-core architectures. The Sunway TaihuLight supercomputer, ranked the first in the latest Top500 list, is based on heterogeneous many-core architecture. In this paper, we focus on a class of unstructured mesh problems and present a domain decomposition method with a block Gauss-Seidel/Jacobi algorithm as the subdomain solver implemented on the Sunway TaihuLight supercomputer. To take the advantage of the small but fast local data memory on each computer processing element of the SW26010 processor, we designed implementation-based optimization strategies such as multiple lines packing copy and computation-communication overlapping. Moreover, a copying reduced version of block Gauss-Seidel/Jacobi algorithm is proposed to alleviate the memory bandwidth bottleneck. Numerical results show that the proposed block Gauss-Seidel/Jacobi algorithm delivers a 4.16x speedup comparing to the sequential version. For the parallel efficiency, the block Gauss-Seidel/Jacobi preconditioner achieves a 61% efficiency as the number of processors increases from 1,040 to 33,280.

Iterative Method suggestion techniques for sparse linear systems
SPEAKER: Kanika Sood

ABSTRACT. Scientific applications from a variety of domains, such as astrophysics, computational fluid dynamics, and thermodynamics among many others, rely on scalable solutions of sparse linear systems. For large, sparse linear systems, iterative methods are the preferred choice of solution method. With numerous software libraries offering a large collection of parallel preconditioned Krylov methods, selecting an optimal solver technique is extremely challenging. In this talk, we describe a technique to suggest well-performing methods based on the characteristics of the linear systems and the amount of communication occurring in the methods. With our approach, we model two aspects of parallel preconditioned Krylov methods: (1) their convergence behavior and (2) their parallel overhead. We also provide preliminary results for a matrix-free approach to selecting iterative Krylov methods. For modeling the convergence behavior of solvers, we create a machine learning (ML) model that uses the problem characteristics, solver name and a binary class label (“good” or “bad”) as input. The problem characteristics are the structural, numerical and spectral properties of the matrix such as norm, trace, and row variance among others. Each solver and preconditioner combination along with its configuration parameters is assigned a unique solver name. Each linear system is solved with multiple solver-preconditioner combinations. The class label is assigned by comparing the computation time of the Krylov methods for the same linear system. All the solvers that solve the system within a specified threshold of the best-observed solution time are labeled as “good”, and the rest are labeled as “bad”. We train our model with thousands of linear systems by computing the features and assigning labels to each data point. A data point is a single instance in the training set. Each instance represents the properties of one linear system and the solver-preconditioner combination used to solve that system. Computing features is an expensive process. Therefore, we perform a feature reduction step before the final solver classification. In this stage, we reduce the number of features that have to be computed for a new incoming system. Specifically, we remove the features that either vary too much (>99% variance) or remain consistent for almost all cases (<1% variance). Next, we apply multiple attribute evaluators with different search methods to select the features that are significant and contribute the most towards the classification process. These attribute evaluators apply different techniques to assess the contribution of each feature. The evaluators assign a weight to each feature and rank the features to get an overall ranking. The weight determines the contribution of the feature, with a higher weight signaling a higher contribution. Feature reduction is performed by collecting the features that are ranked highly by all or most of the attribute evaluators and removing the lowest ranked features. Although the feature reduction phase reduces the number of features to be computed for any new system, this ML model still relies on matrix properties and can be trained only on small-scale inputs due to the runtime cost of collecting the solver timings required for labeling the training data. For modeling the parallel overhead of solving larger systems, we use an analytical communication model to quantify the differences in inter-process communication across Krylov methods. To rank methods based on their scalability alone, we compare communication costs by computing the number of calls made to the vector and matrix operations that perform communication. We focus on the main iteration and do not include initial setup, I/O functions or other functions that are called just once per solution. Because the number of iterations varies for each solver, we normalize the function calls to be per iteration values rather than the total counts of the function calls. We assign a cost variable for all the communication-containing matrix-vector operations. These operations include reduction operations (VecNorm, VecDot), matrix-vector products (MatMult, MatMultTranspose), scatter-gather vector operations (VecScatterBegin) and preconditioner application operations (PCApply, PCApplyTranspose). Next, we associate a value with each cost variable in terms of the number of processors and the amount of data transferred. This model produces a scalability-based ranking for all of the Krylov methods. To capture both convergence and parallel scalability, we combine the communication model with the ML model. First, the problem characteristics along with the solver id and class label are fed as input to the ML model. The ML model then produces a set of “good” solver-preconditioner combinations as suggestions. These solver suggestions are taken as input by the communication model. The communication model generates the subset of solvers suggested by the ML model that are most highly ranked by the communication ranking. The combined model provides a more accurate selection of preconditioned Krylov methods than either model alone. It further can enable solver recommendations at different parallelism scales, which is not possible with the ML model in isolation. We test the combined model on a numerical simulation of driven fluid flow in a two-dimensional cavity. The traditional way of representing sparse matrices involves storing each nonzero element by using data structures, such as compressed sparse row/column, coordinate, diagonal, or hybrid dense/sparse representations. For certain types of computations, such as nonlinear PDE solution via finite-difference Newton-Krylov methods, where the memory requirements of explicitly storing the sparse matrix exceed available capacity, matrix-free approaches can be used. The Krylov solution of the linearized system is computed by using approximations of matrix-vector products based only on the function computing the current discretized solution approximation at each grid point. Because there is no explicit matrix, it is impossible to compute most of the features used in our ML-based solver selection. Hence, a different set of features must be defined and computed for matrix-free approaches. We present initial results using features based on matrix-free eigenvalue approximation, infinity norm, and structural problem features.

High Performance Block Incomplete LU Factorization And Its Applications

ABSTRACT. Many application problems that lead to solving linear systems make use of preconditioned Krylov subspace solvers to compute their solution. Among the most popular preconditioning approaches are incomplete factorization methods either as single-level approaches or within a multilevel framework. We will present a block incomplete triangular factorization that is based on skilfully blocking the system initially and throughout the factorization. This approach allows for the use of cache-optimized dense matrix kernels such level-3 BLAS or LAPACK. We will demonstrate how this block approach may signifcantly outperform the scalar method on modern architectures, paving the way for its prospective use inside various multilevel incomplete factorization approaches or other applications where the core part relies on an incomplete factorization. Furthermore, we demonstrate its use for indefinite systems as well as for methods that heavily rely on the matrix inverse such as selective inversion or approximate inversion inside inverse covariance matrix estimation methods.

Multilevel Incomplete LU-Factorization Preconditioner for Predominantly Symmetric Systems
SPEAKER: Aditi Ghai

ABSTRACT. Preconditioned Krylov subspace methods are widely used for solving large-scale sparse linear systems, such as those arising from discretizations of partial differential equations. For some modern applications, these linear systems are predominantly symmetric, in that the matrix has a large leading principle sub-matrix that is symmetric, while the remainder may be nonsymmetric. These systems can arise due to various reasons, such as the Neumann boundary conditions using polygon domains, strongly imposed Neumann boundary conditions or using some mixed or hybrid finite element methods.

We propose a multilevel incomplete LU-factorization preconditioner for predominantly symmetric systems to take advantage of the predominant symmetry of linear systems. Incomplete LU factorization techniques are among the most successful preconditioners. However, they tend to fail in the presence of many zero diagonal entries without pivoting and tend to scale poorly with super-linear growth in the number of unknowns with partial pivoting. The multilevel incomplete LU factorization, such as that in ILUPACK, can overcome these shortcomings by employing diagonal pivoting and by bounding the estimated condition numbers of the approximate factors. In this work, we extend the multilevel ILU to predominantly symmetric systems. In our variant, whenever possible, we use an incomplete LDL’ factorization on the leading block or incomplete Cholesky factorization if the system is also positive definite, and it automatically reverts to incomplete LU when the system is no longer symmetric. For efficiency, our algorithm employs the Crout version of ILU with diagonal pivoting. Our numerical experimentation shows that our incomplete factorization scales approximately linear in the number of unknowns, and it roughly halves the cost of the factorization compared to using ILU naively for such systems. We will report some numerical comparisons of the multilevel ILU-based GMRES for solving large nearly symmetric systems in their convergence rates and computational costs, as well as the impacts of drop tolerances, reordering and matching techniques, etc.

10:25-12:30 Session 12C: Earth Systems. Room: Bighorn C/2.
A partitioned method for interface problems based on a primal Schur complement
SPEAKER: Pavel Bochev

ABSTRACT. Mathematical models of complex systems such as Global Earth System Models comprise multiple components, describing diverse physical phenomena and employing different types of discretizations, adapted to the particular scales and features of the underlying physics. Partitioned, or loosely coupled algorithms are one of the most effective solution techniques for such problems because they enable the use of separate analysis codes that can be optimized for each physics component.

Most partitioned methods treat interface nodes as boundary nodes and perform data transfers between the subdomains to provide the necessary boundary conditions. These methods can be thought of as performing one or more steps of an iterative procedure such as a fixed point iteration or non-overlapping Schwarz.

In this talk we consider a different ``slide line'' approach which treats interface nodes as interior nodes whose state is determined by discrete equations similar to those in the subdomain interiors. A "slide line" scheme splits each interface node into "halves" associated with the two sides of the interface, and endows each half node with its own, independently assembled discrete equation. The basic design principle for this class of partitioned methods is to ``complete the assembly'' of these ``half'' equations to match the equation at their parent node. We discuss such schemes for explicit partitioned elastodynamics in which case completion of the ``half'' equations boils down to calculation of the mass and force contributions to each half node from the opposite side of the interface. Existing methods estimate these contributions by creating virtual meta cells surrounding the half nodes. Two issues with this approach are its restriction to lumped mass matrices and the lack of an obvious connection to a well-posed formulation of the problem that could be used to asses its accuracy and stability.

We present an alternative approach to the estimation of the missing mass and force terms, which starts from a well-posed mixed formulation of the interface problem. Discretization of the Lagrange multiplier by the trace of the finite element space on one of the subdomains renders the $2\times 2$ block corresponding to the state on this subdomain and the multiplier invertible, and its Schur complement yields the desired mass and force estimates for the opposite subdomain. Repeating this process with the Lagrange multiplier represented by trace of the finite element space on the opposite subdomain in turn provides mass and force estimates on the current subdomain.

We show that the so defined partitioned method passes a patch test, is second order accurate and is equivalent to a monolithic formulation of the problem on matching interface grids. Finally, we show that a well-known slide rule scheme is a particular case of our algorithm.

Mesh-tying via multiple-control PDE-constrained optimization and GMLS function reconstruction
SPEAKER: Paul Kuberry

ABSTRACT. Separate meshing of subdomains in mesh-tying, transmission, and other interface problems, induces independent interface discretizations that can have gaps and overlaps. This presents a challenge for traditional, Lagrange multiplier formulations because the usual coupling conditions are no longer well-defined over non-coincident interfaces. At the same time, these conditions can be used to define a measure of the solution discrepancy between the spatially non-coincident interfaces. Minimizing this measure, subject to the governing equations on the subdomains, provides an alternative, optimization-based approach for coupling over mismatched interfaces.

In this talk we develop this approach for a model scalar elliptic problem with an interface induced by a discontinuous “material” property. We use continuity of the states and the normal flux to obtain discrepancy measures that define the optimization objective, while restrictions of the governing equations to the discretized subdomains provide the optimization constraints. We close each subdomain equation by specifying a Neumann boundary condition on its respective version of the true interface. These boundary conditions represent the virtual controls driving the minimization process.

We consider two alternatives for implementing the discrepancy measure term involving the normal flux term. In the first approach [1], the flux on the interface is computed by using the gradient of the finite element solution on the associated subdomain. Although this allows us to pass a linear patch test, it also limits the accuracy of the method resulting in optimal H1, but sub-optimal L2, convergence rates. The second approach uses a Generalized Moving Least Squares (GMLS) reconstruction [2] to lift the order of the finite element solution before evaluating the normal fluxes, effectively removing the barrier to optimal L2 convergence. This version of the method utilizes the GMLS implementation in the Compadre toolkit.

We will present the formulation of the method, extensions operators used to compare solutions on non-coincident interfaces, and a GMLS reconstruction operator designed to lift the approximation order of the gradient. Numerical studies will be shown demonstrating optimal convergence of the method for a manufactured solution, successful passing of a patch test for recovering a globally linear solution, and the degree to which global flux is conserved.

REFERENCES [1] P. Kuberry, B. Bochev, and K. Peterson, An optimization-based approach for elliptic problems with interfaces, SIAM J. Sci. Comput., Vol. 39, No. 5, pp. S757-S781, 2017. [2] H. Wendland, Scattered data approximation, Vol. 17, Cambridge University Press, 2004.

Adaptive Solution Transfer for Nonsmooth Functions Between Nonmatching Meshes
SPEAKER: Xuebin Wang

ABSTRACT. Multiphysics coupling, such as global climate modeling (GCM), often requires transferring solutions at the interfaces of the coupled systems. Increasingly, higher order methods and higher resolution meshes are used by the physics models, the interface meshes between the physics models maybe nonmatching, and the solutions to be transferred may be nonsmooth, involving large second derivatives or weak discontinuities. These properties require more robust and efficient solution transfer methods, which is decidedly challenging. The most commonly used solution-transfer methods are either pointwise-based (such as interpolation or moving least squares) or integral-based (such as the common-refinement or super-mesh based methods). In this work, we propose an adaptive solution transfer method, which takes advantages of both approaches. Our approach applies piecewise weighted least square fitting to obtain an intermediate approximation of the solution with high-order accuracy. This intermediate solution is in general high-order accurate, but the solution in nonsmooth regions may be oscillatory, especially after many iterations of solution transfer across the interface. Then, we apply a regularization process, which first identifies the nonsmooth regions and then uses a Petrov-Galerkin formulation to enforce local conservation in a weighted sense. The regularization process suppresses oscillations and helps improve long-term stability. Another advantage of our approach is that, unlike most other integral-based methods, it does not require computing the intersections of the cells. We will present some theoretical analysis of our method as well as numerical results of repeated solution transfer between 2-D meshes and spherical geometry of different grid resolution.

Optimization-based, property preserving finite element methods for transport equations
SPEAKER: Mauro Perego

ABSTRACT. We present an optimization-based approach for the accurate, property preserving finite element solution of Partial Differential Equations (PDEs) for transport problems in which the solution is obtained by solving a suitably defined constrained optimization problem. The objective is to minimize the distance to a given finite element target, computed by a formally accurate but not necessarily property preserving scheme, while physical properties such as maximum principle and/or preservation of local solution bounds define the constraints. This divide-and-conquer strategy separates solution accuracy from the preservation of the relevant physical properties and always finds a globally optimal, with respect to the given target, solution that also satisfies these properties. To illustrate the approach we consider the finite element solution of a model scalar transport equation and present numerical studies. We will also present the fast optimization algorithms adopted that make the computational cost of the proposed approach comparable to related methods such as the flux-corrected transport method.

This material is based upon work supported by the U.S. Department of Energy, Office of Science, Office of Advanced Scientific Computing Research. D.~Ridzal also acknowledges funding by the Advanced Simulation & Computing (ASC) Program.

Preconditioned Iterative techniques for geophysical electromagnetic problems

ABSTRACT. One of the ways to map the different conductive layers in Earth's crust is to construct a so-called forward model in terms of Maxwell's equations in the frequency domain. A popular decomposition approach is to consider the vector potential and the solenoidal parts of the electrical field individually in the form of a coupled model. Suitable FEM discretization (Edge Elements coupled with nodal elements) leads to a particular complex-valued block-structured matrix system for these two components; and consequently their equivalent real form is a 4x4 block structured system, half of which has a non-trivial nullspace due to the underlying curl-curl operator. We demonstrated (in the 18th CM conference on Multigrid Methods) a block-diagonal preconditioner based on Auxilliary-space Maxwell Solver (Hiptmair-Xu) AMS, and treated a dimensionless academic model problem successfully. In this meeting, we demonstrate successful AMS based inversion of the full Schur complement preconditioner for FGMRES for two industry-scale problems; and emphasize that the iterative properties of the dimensionless problems fundamentally differ from the industry-scale 3D models.

16:30-18:10 Session 13: Student Competition Winners.
A Golub-Kahan Davidson Method for Accurately Computing Singular Triplets of Large Sparse Matrices
SPEAKER: unknown

ABSTRACT. Obtaining high accuracy singular triplets for large sparse matrices is a significant challenge, especially when searching for the smallest triplets. Due to the difficulty and size of these problems, efficient methods must function iteratively, with preconditioners, and under strict memory constraints. In this research, we present a Golub-Kahan-Davidson method (GKD), which satisfies these requirements and includes features such as soft-locking with orthogonality guarantees, an inner correction equation similar to Jacobi-Davidson, GD+k restarting, and the ability to find real zero singular values in both square and rectangular matrices. Additionally, our method achieves full accuracy while avoiding the augmented matrix, which often converges slowly due to the difficulty of interior eigenvalue problems. We describe our method in detail, including implementation issues that may arise. Our experimental results confirm the efficiency and stability of our method over the current implementation of PHSVDS in the PRIMME software package.

An Adaptive Unsmoothed Aggregation Multigrid Method Based on Path Coverings
SPEAKER: unknown

ABSTRACT. We propose an adaptive algebraic multigrid (αAMG) method for efficient solution of linear systems with matrices derived from discretizations of second order elliptic partial differential equations or weighted Laplacians from graph problems. The overall algorithm is nonlinear and the main idea is to approximate the level sets of the smooth errors by nearly optimal path coverings and matching along the paths. Traditionally, the unsmoothed aggregation AMG (UA-AMG) methods require more sophisticated cycling techniques, such as W-cycle or K-cycle. As the numerical results show, the proposed method based on adaptive path coverings leads to an optimal V-cycle algorithm for model problems. Therefore, it is more efficient and easier to implement in practice which will have advantages in many applications.

Low-Rank Solution Methods for Stochastic Eigenvalue Problems
SPEAKER: unknown

ABSTRACT. We study efficient solution methods for stochastic eigenvalue problems arising from discretization of PDEs with random data. With the stochastic Galerkin approach, the solutions are represented as generalized polynomial chaos expansions. A low-rank variant of the inverse subspace iteration algorithm is presented for computing one or several minimal eigenvalues and corresponding eigenvectors of parameter-dependent symmetric positive-definite matrices. In the algorithm, the iterates are approximated by low-rank matrices, which leads to significant cost savings. The algorithm is tested on two benchmark problems, a stochastic diffusion problem with some poorly separated eigenvalues, and an operator derived from a discrete stochastic Stokes problem whose minimal eigenvalue is related to the inf-sup stability constant. The effectiveness of the algorithm is demonstrated by comparison with Monte Carlo solutions.