CM2020: 16TH COPPER MOUNTAIN CONFERENCE ON ITERATIVE METHODS
PROGRAM FOR WEDNESDAY, MARCH 25TH
Days:
previous day
next day
all days

View: session overviewtalk overview

08:00-10:05 Session 11A: Optimization Methods. Bighorn B.
Chair:
08:00
Analysis of the sensitivity of output wave height to input bottom friction and bathymetry in the coastal wave model STWAVE

ABSTRACT. STWAVE is a steady-state spectral wave model for simulating wind wave growth and propagation. The model accepts many inputs, including bathymetry and friction. We use monthly bathymetry survey data to construct a reduced SVD representation of bathymetry. The polynomial chaos expansion provides sensitivity indices for the reduced bathymetry coefficients as well as friction. We analyze the sensitivity indices for insight into the importance of accurate and finely-resolved inputs to the model.

08:25
Sparse trigonometric interpolation with applications to computational chemistry

ABSTRACT. We present a method for isotropic and anisotropic sparse interpolation using a trigonometric interpolation basis. The application of this work is the approximation of potential energy surfaces (PESs) in computational chemistry, to be used in molecular dynamics (MD) simulations. Previous interpolation techniques did not adequately preserve the underlying periodicity of the PES, leading to nonphysical MD results. A trigonometric basis preserves the required periodicity and leads to sensible MD results. This interpolation method has been implemented in the freely available Department of Energy tool Tasmanian.

08:50
Regularized Non-linear Least-Squares Unconstrained Minimization Method for High-order Finite-Element Implicit Shock-Fitting in Space-Time

ABSTRACT. Using high-order finite-element discretization presents significant numerical challenges for compressible fluid flows with discontinuities (shocks, multimaterial contacts, rarefaction waves), necessitating introduction of numerical diffusion to regularize the solution in the vicinity of discontinuities, typically in the form of limiters, hyper-viscosity, flux-corrected procedures, etc. We present a new approach (“Moving Discontinuous Galerkin with Interface Condition Enforcement”, MDG-ICE), which is designed to avoid this issue, by incorporating mesh motion into the global non-linear implicit iterative procedure. This is achieved by incorporating the grid nodal coordinates into the solution vector. By simultaneously enforcing conservation laws and the corresponding set of jump conditions using high-order approximations on, in general, curvilinear space-time elements, we “fit”, the mesh to align with discontinuities. This method can fully exploit a high-order discontinuous finite-element framework, demonstrating the ability to converge with high-order even in the presence of discontinuities, without adding any numerical dissipation. The driving mechanism for moving the mesh within an implicit iterative procedure is due to the interface jump conditions, which in the context of a space-time finite-element representation corresponds to the Rankine-Hugoniot conditions enforced implicitly as a part of the global weak formulation. Since the number of enforced non-linear equations is generally different from the number of unknowns, we cast the problem as an unconstrained minimization, solved using a Gauss-Newton non-linear least squares approach. The problem is generally ill-posed and requires regularization, which is currently implemented in the form of the Levenberg-Marquardt method. We demonstrate the method’s performance for a set of challenging shock-dynamics problems in 2D and 3D space-time domains, and discuss the challenges, including the efficient incorporation of mesh topology changes, the choice of the regularization strategy and parameters, parallelization (both CPU and GPU), and scalable preconditioning.

09:15
A Projected Newton-Krylov Method for Large-Scale Bound-Constrained Optimization Problems

ABSTRACT. We present a Newton-Krylov scheme for approximately solving large-scale continuous optimization problems with bound constraints. The critical challenge in these problems is the determination of the active set. Our method is geared toward situations in which function and gradient evaluations are expensive, and the Hessian is only available through matrix-vector products. In each step, our method generates a low-rank approximation of the (approximate) Hessian using a Krylov scheme and then solves a quadratic program to update the iteration. Due to the low-rank structure, the cost of solving the quadratic program scales linearly with respect to the number of variables and constraints. We also experiment with schemes that incorporate knowledge of the current estimate of the active set into the Hessian approximation. Using three numerical experiments motivated by parameter estimation, machine learning and image processing, we show that our method compares favorably to existing Newton-Krylov solvers that use projected line searches based on active/inactive splitting.

09:40
Energy Function Approach to the Time-Varying Optimal Gas Flow

ABSTRACT. Energy Function Approach is a good tool to tackle optimization problems on the gas flow networks such as the design and the operations of gas transmission networks and many others. This approach has one notable feature: it allows us to avoid non-convexity of the optimization problems on the gas grids. Considering the energy function of the gas flow network in the steady-state case one ends up with a convex optimization problem. This optimization problem – minimization of the energy dissipated into the network – can be efficiently used to model transmission laws. It is challenging to consider this approach in the unsteady-state case. In this case, one has additional law that bind temporal dependencies of the pressure with spatial ones of the throughput. One of the key questions on this topic is: if the corresponding optimization problem will be convex. This problem is of high demand in the industry, since this generalization will allow modeling the gas transmission networks more precise, taking into account changes of the system under various external factors. The current work considers transient gas flow problem and studies the question of existence of the energy function of this problem. Particularly, a number of convex optimization problems are solved to construct the corresponding optimization problem of energy function.

08:00-10:05 Session 11B: Low-rank methods. Bighorn C/1
Chair:
08:00
Krylov Methods for Low-Rank Regularization

ABSTRACT. We introduce new solvers for the computation of low-rank approximate solutions to large-scale linear problems, with a particular focus on the regularization of linear inverse problems. Although Krylov methods incorporating explicit projections onto low-rank subspaces are already used for well-posed systems that arise from discretizing stochastic or time-dependent PDEs, we are mainly concerned with algorithms that solve the so-called nuclear norm regularized problem, where a suitable nuclear norm penalization on the solution is imposed alongside a fit-to-data term expressed in the 2-norm: this has the effect of implicitly enforcing low-rank solutions. By adopting an iteratively reweighted norm approach, the nuclear norm regularized problem is reformulated as a sequence of quadratic problems, which can then be efficiently solved using Krylov methods, giving rise to an inner-outer iteration scheme. Our approach differs from the other solvers available in the literature in that: (a) Kronecker product properties are exploited to define the reweighted 2-norm penalization terms; (b) efficient preconditioned Krylov methods replace gradient (projection) methods; (c) the regularization parameter can be efficiently and adaptively set along the iterations. Furthermore, we reformulate within the framework of flexible Krylov methods both the new inner-outer methods for nuclear norm regularization and some of the existing Krylov methods incorporating low-rank projections. This results in an even more computationally efficient (but heuristic) strategy, that does not rely on an inner-outer iteration scheme. Numerical experiments show that our new solvers are competitive with other state-of-the-art solvers for low-rank problems, and deliver reconstructions of increased quality with respect to other classical Krylov methods.

08:25
A Low-Rank Solver for the Unsteady Navier-Stokes Equations

ABSTRACT. We study a low-rank iterative solver for the unsteady Navier–Stokes equations for incompressible flows with a stochastic viscosity. The equations are discretized using the stochastic Galerkin method, and we consider an all-at-once formulation where the algebraic systems at all the time steps are collected and solved simultaneously. The problem is linearized with Picard’smethod. To efficiently solve the linear systems at each step, we use low-rank tensor representations within the Krylov subspace method, which leads to significant reductions in storage requirements and computational costs. Combined with effective mean-based preconditioners and the idea of inexact solve, we show that only a small number of linear iterations are needed at each Picard step. The proposed algorithm is tested with a model of flow in a two-dimensional symmetric step domain with different settings to demonstrate the computational efficiency.

08:50
Low-rank plus sparse matrices: ill-posedness and guaranteed recovery

ABSTRACT. Robust principal component analysis and low-rank matrix completion are extensions of PCA that allow for outliers and missing entries, respectively. Solving these problems requires a low coherence between the low-rank matrix and the canonical basis. However, in both problems the well-posedness issue is even more fundamental; in some cases, both Robust PCA and matrix completion can fail to have any solutions due to the fact that the set of low-rank plus sparse matrices is not closed. Another consequence of this fact is that the lower restricted isometry property (RIP) bound cannot be satisfied for some low-rank plus sparse matrices unless further restrictions are imposed on the constituents. By restricting the energy of one of the components, we close the set and are able to derive the RIP over the set of low rank plus sparse matrices and operators satisfying concentration of measure inequalities. We show that the RIP of an operator implies exact recovery of a low-rank plus sparse matrix is possible with computationally tractable algorithms such as convex relaxations or line-search methods. We propose two efficient iterative methods called Normalized Iterative Hard Thresholding (NIHT) and Normalized Alternative Hard Thresholding (NAHT) that provably recover a low-rank plus sparse matrix from subsampled measurements taken by an operator satisfying the RIP.

09:15
A low-rank matrix equation method for solving PDE-constrained optimization problems

ABSTRACT. PDE-constrained optimization problems arise in a broad number of applications such as hyperthermia cancer treatment or blood flow simulation. Discretization of the optimization problem and using a Lagrangian approach result in a large-scale saddle-point system, which is challenging to solve, and acquiring a full space-time solution is often infeasible. We present a new framework to efficiently compute a low-rank approximation to the solution by reformulating the KKT system into a Sylvester-like matrix equation. This matrix equation is subsequently projected onto a small subspace via an iterative rational Krylov method and we obtain a reduced problem by imposing a Galerkin condition on its residual. In our work we discuss implementation details and dependence on the various problem parameters. Numerical experiments illustrate the performance of the new strategy also when compared to other low-rank approaches.

09:40
Randomized Iterative Methods for Low Rank Matrix Approximation

ABSTRACT. In this manuscript, novel methods are developed to iteratively embed aggregate pieces of information from a data matrix to generate low rank approximations. The algorithms are built by subsampling the column and row spaces of the matrix simultaneously. The developed sub-sampled randomized algorithms converge with provable error bounds. A heuristic accelerated scheme is also developed, motivated by the sub-sample analysis. We compare our algorithms to current sampled algorithms on a selection of matrices from a standard test-suite and verify that the theoretical convergence rates are numerically realized.

08:00-10:05 Session 11C: Multigrid Algorithms and their Performance on GPU Platforms. Bighorn C/2
08:00
Algebraic Multigrid Methods for Large-Scale Electromagnetic Particle-in-Cell Simulations

ABSTRACT. Specialized algebraic multigrid (AMG) techniques for Maxwell's equations are critical for the success of large-scale calculations in a Sandia electromagnetic particle-in-cell simulation code. In this talk we discuss challenges in applying AMG in such simulations across a variety of architectures, including Power9/Volta. In particular, we discuss architecture-specific algorithm decisions, such as smoother selection and load-balancing, as well as lessons-learned while improving AMG performance. Numerical results will be given for a variety of meshes and architectures.

08:25
MueLu's SA-AMG implementation and performance on GPUs

ABSTRACT. Exascale supercomputers scheduled to come online in the next two to three years are all GPU based architectures. This requires multigrid libraries to focus on implementing robust and performant GPU algorithms. To that intent large amount of work has been conducted recently in the MueLu package (Trilinos' multigrid package) to refactor or re-design the kernels needed to perform Smoothed-Aggregation Algebraic Multigrid (SA-AMG). In this talk I will present the current status of that effort and highlight the challenges encountered along the way. I will then show the current performance of the GPU algorithms on ORNL/Summit and finally outline the direction of future work in MueLu.

08:50
One-Reduce FGMRES-AMG with Two-Stage Gauss-Seidel Smoothers on GPU

ABSTRACT. We have recently derived one-reduce variants of the Arnoldi-QR algorithm based on the inverse compact WY modified Gram-Schmidt algorithm and recursive classical Gram-Schmidt algorithm with delayed re-orthogonalization (DCGS-2). These lead to one-reduce MGS-GMRES, pipelined s-step GMRES and DCGS2-MINRES Lanczos solvers and rely on delayed corrections or a lower triangular matrix inversion equivalent to a Neumann polynomial expansion. In this talk we present a one-reduce variant of the FMGRES algorithm with a multi-MPI multi-GPU implementation. This solver is combined with one-level momentum and AMG pressure preconditioners (hybrid block-Jacobi), based on a two-stage nested Gauss-Seidel type iteration, Frommer and Szyld (1992). These are an extension of Richardson iteration with row-wise updating or equivalently Gauss-Seidel formulated with an inner iteration. Similar to Gauss Southwell relaxation, these allow for random and asynchronous variants. We present improved solver convergence rates and run times compared to the traditional Gauss-Seidel smoother. These only require matrix-vector multiplies and thus are ideally suited for the GPU. Performance results comparing Kokkos and CUDA implementations are presented.

09:15
Recent Development of Multigrid Solvers in HYPRE on Modern Heterogeneous Computing Platforms

ABSTRACT. Modern many-core processors such as the graphics processing units (GPUs) are becoming an integral part of high performance computing systems nowadays. Accelerating multigrid methods on GPUs has drawn a lot of research attention in recent years. For instance, in the recent releases of the HYPRE package, the structured multigrid and the BoomerAMG solvers have been ported to CUDA-based GPUs using different approaches. In this talk, we will provide an overview of the available GPU support in HYPRE and present our current work on various components of AMG setup and solve on GPUs. In particular, we focus on several important computational kernels such as sparse matrix-vector products, sparse triangular solves, sparse matrix-matrix multiplications, and maximal independent set algorithms. We will also cover the heterogeneous memory and MPI-communication models designed in HYPRE. Results on the GPU-based systems at LLNL will also be included.

09:40
Robust Structured Multigrid on GPUs: Leveraging legacy kernels in the Cedar Framework

ABSTRACT. Emerging high-performance computing systems provide an opportunity to significantly increase on-node parallelism by using a heterogeneous architectures comprised of multiple GPUs per node. The Summit and Sierra machines are the latest examples and these machines motivate the need to target execution on GPUs for performance. These machines offer higher GPU floating-point and memory performance than each CPU. Yet, not all algorithms are able to fully utilize the capabilities of the GPU out of the box. While robust structured solvers have low arithmetic intensity due to non-constant, spatially dependent stencil weights, their direct memory access and structured loop patterns make them a natural candidate for exploiting the significantly higher memory bandwidth performance of GPUs on these machines.

In this talk, an approach for offloading legacy code with structured regions on current Power9/Volta systems is presented. Benchmarks on the CPU and GPU are used to evaluate technologies used to simplify the porting of legacy code to this architecture and to provide context for performance improvements. Using this approach, the solve phase is offloaded to the GPU resulting in a speedup of \(6.2\) over the CPU with a global problem size of approximately \(85 \times 10^6\) unknowns using \(12\) multigrid levels. Requiring minor changes to the Fortran code, this approach is attractive for offloading legacy code containing structured parts to the GPU. We provide an overview of the underlying software package, Cedar, and highlight areas for additional improvement.

10:25-12:30 Session 12A: Multilevel Methods. Bighorn B
10:25
p-Multigrid with Partial Smoothing: An Efficient Preconditioner for Discontinuous Galerkin Discretizations with Modal Bases

ABSTRACT. p-Multigrid methods for the numerical solution of partial differential equations construct multigrid hierarchies using discretizations of different order instead of discretizations on different computational grids. These methods have been applied successfully to various finite element and discontinuous Galerkin (DG) discretizations on structured and unstructured grids. When solving linear systems from modal DG discretizations with hierarchical bases, p-multigrid methods assume a simple algebraic structure, because the lower-order DG systems can be obtained by selecting only those matrix and vector entries of the linear system that correspond to lower-order basis functions.

For these p-multigrid methods with hierarchical bases, we propose the use of "partial" smoothers which update only the higher-order unknowns on each p-multigrid level. For example, when applied to a piecewise quadratic discretization, partial smoothers only update unknowns that correspond to quadratic basis functions, leaving unknowns that correspond to linear or constant basis functions unchanged. Thus, partial smoothing trades numerical strength of each smoother iteration for lower per-iteration cost. p-Multigrid methods with partial smoothing take the form of a symmetric Gauss-Seidel iteration. In this way, they can be viewed as an extension of hierarchical basis multigrid and hierarchical scale separation methods to the p-multigrid framework.

We discuss the implementation of p-multigrid methods in a hybrid parallel setting with MPI and OpenMP. An algebraic multigrid (AMG) method is used as the "coarse grid solver," i.e., to solve the piecewise constant problem at the lowest p-multigrid level. A combined p-multigrid and AMG cycle is then used as a preconditioner for a conjugate gradient method. Numerical experiments with large-scale simulations of fluid flow in porous media show that the computational cost of the proposed methods is only slightly superlinear with respect to the problem size. Furthermore, we demonstrate that the use of partial smoothing instead of standard smoothing can reduce computational cost by more than 30% in simulations of practical relevance.

10:50
Non-overlapping block smoothers for the Stokes equations

ABSTRACT. Multigrid (MG) methods are efficient methods to solve systems of partial differential equations (PDEs). They are used to speed up linear system solves in a wide variety of applications. The efficiency of multigrid methods is due to the combination of suitable smoothers with a coarse grid correction. As one of the two key ingredients smoothers have a significant impact on the performance of multigrid solvers. Most of the literature on smoothers in multigrid methods is concerned with scalar PDEs, only. Systems are considered less often.

This talk is concentrated on developing smoothers within geometric multigrid methods for the solution of the Stokes equations. We compare the commonly used Vanka smoother with a non-overlapping variant of the Vanka smoother. While the latter is computationally cheaper, the convergence depends much more on the implementation than that of the overlapping method. We consider a finite difference discretization on staggered grids.

A local Fourier analysis and a general comparison including the computational cost and the convergence properties of the two different methods will be presented.

11:15
Expressing AMG Interpolation in Terms of Sparse Matrix-Matrix Multiplications

ABSTRACT. Computational science is facing several major challenges with rapidly changing highly complex heterogeneous computer architectures. To meet these challenges and yield fast and efficient performance, solvers need to be easily portable. Algebraic multigrid (AMG) methods have great potential to achieve good performance, since they have shown excellent numerical scalability for a variety of problems. However, their implementation can be fairly complex. Traditionally, AMG’s setup phase has consisted of the generation of coarsening, interpolation and coarse grid generation. Breaking these components down into smaller kernels can significantly improve portability. In this presentation, we investigate ways to rewrite interpolation operators in terms of sparse matrix-matrix multiplications.

11:40
Parallel-in-time simulation of biofluids

ABSTRACT. Solving for the long-time collective dynamics of micro-structures such as bacteria, sperm and cilia is computationally expensive. In this study, we aim to accelerate the simulation of dynamic micro-structures immersed in a three-dimensional, viscous fluid using Parareal, a parallel-in-time method that decomposes time domain into subintervals and alternate between a fine and a coarse solver iteratively to converge to the solution. Specifically, we model micro-structures using Kirchhoff rods that can bend, twist and translate in the fluid governed by Stokes equation. A Parareal algorithm based on the Method of Regularized Stokeslet (MRS) is applied to solve the equations of fluid-structure interaction. The coarse solver employed uses both a reduced spatial and temporal resolution. Results on the reduction of runtime in simulating the long-time collective swimming of a group of interacting rods are presented to demonstrate the efficiency of the algorithm. We show that when the same number of cores are used, it can be considerably more efficient than the more conventional spatial parallelization. In cases where the number of swimmers is large, the runtime of our simulation can be further reduced if Kernel-Independent Fast Multipole Method is used to accelerate MRS.

12:05
Optimizing MGRIT and Parareal coarse-grid operators for linear advection

ABSTRACT. We apply the parallel-in-time methods MGRIT and Parareal to the linear advection equation, appealing to existing convergence theory to provide insight into their typically non-scalable or even divergent behaviour for this problem. To overcome these failings, we replace the standard choice of rediscretization on coarse grids with near-optimal coarse-grid operators that are computed by approximately minimizing error estimates from the convergence theory. Our main finding is that, in order to obtain fast convergence, coarse-grid operators should properly track the characteristic curves of the hyperbolic problem. Our approach is tested on discretizations of various orders using explicit Runge-Kutta time integration with upwind-finite-difference spatial discretizations. In all cases, we obtain scalable convergence in just a handful of iterations, dramatically improving upon existing results, with parallel tests also showing significant speed-ups over sequential time-stepping.

10:25-12:30 Session 12B: Data Science and Graph Algorithm Applications. Bighorn C/1
10:25
Iterative Solution Methods for Nonnegative Matrix Factorizations

ABSTRACT. Approximating matrices by a product of nonnegative matrices of low rank has become an important tool for extracting features in data science applications such as image processing or text mining. Several numerical methods rely on the underlying nonnegative least-squares problem and compute the constrained low rank factors iteratively by using alternating least squares. At its core a large number of linear equations or inequalities need to be solved. For the efficient computation of the constrained low-rank factorization we will discuss iterative solution methods for solving sequences of linear systems and inequalities and demonstrate their effectiveness for nonnegative matrix factorizations with dense or sparse nonnegative factors.

10:50
A Multilevel Subgraph Preconditioner for Linear Equations in Graph Laplacians

ABSTRACT. We propose a Multilevel Subgraph Preconditioner (MSP) to efficiently solve linear equations in graph Laplacians corresponding to general weighted graphs. The MSP preconditioner combines the ideas of expanded multilevel structure from multigrid (MG) methods and spanning subgraph preconditioners (SSP) from Computational Graph Theory. To start, we expand the original graph based on a multilevel structure to obtain an equivalent expanded graph. Although the expanded graph has a low diameter, which is a favorable property for the SSP, it has negatively weighted edges, which is an unfavorable property for the SSP. We design an algorithm to properly eliminate the negatively weighted edges and theoretically show that the resulting subgraph with positively weighted edges is spectrally equivalent to the expanded graph. Then, we adopt algorithms to find SSP, such as augmented low stretch spanning trees, for the positively weighted expanded graph and, therefore, provide an MSP for solving the original graph Laplacian. MSP is practical to find thanks to the multilevel property and has provable theoretical convergence bounds based on the support theory for preconditioning graphs.

11:15
Parallel spectral partitioning on multiple GPUs

ABSTRACT. We present SPHYNX, a parallel spectral graph partitioner that can run on multiple GPUs. SPHYNX computes the lowest eigenvalues (vectors) of the graph Laplacian (either the combinatorial or the normalized Laplacian) and embeds the graph into a low-dimensional space. The vertices are then partitioned using a fast geometric method (MultiJagged). Our code is based on Trilinos, and uses the LOBPCG eigensolver from Anasazi. A key advantage of our method is the partitioner can run on multiple CPUs and/or GPUs, allowing us to run on supercomputers and future exascale systems. We present some results for highly irregular graphs, and compare against ParMetis.

11:40
Mixed-Precision Arithmetic for Graph Clustering and Graph Ranking

ABSTRACT. In addition to the relief low precision naturally provides to bandwidth and storage needs, speed-ups from hardware accelerators and processors like GPUs that support mixed-precision arithmetic further motivate its usage in new applications. Loss in precision, stability, and representable range offset for those advantages, but these shortcomings may have little to no impact in some applications it may even be possible to navigate around those drawbacks with algorithmic design. Specifically, we propose using mixed-precision arithmetic for various iterative eigensolvers specifically aimed at two graph problems: clustering and ranking. We test whether sufficient clustering and ranking performances are possible when using low precision storage and mixed-precision arithmetic.

12:05
Computing a posteriori error estimates for algebraic multigrid methods on graphs

ABSTRACT. In this paper we study the functional a posteriori error estimator for solving linear systems of graph Laplacians. Previously, these estimates were computed by solving a perturbed global optimization problem iteratively, which could be computational expensive. In this work, we propose a novel strategy to compute the a posteriori error estimator. Our technique involves constructing a Helmholtz decomposition on the graph by solving a linear system on a spanning tree of the graph and a least-squares problem on the cycle space of the graph. Such a posteriori error estimator can then be used to adaptively generate coarser aggregations with good approximation properties or generate accurate approximations to the level sets of the error for the path cover adaptive algebraic multigrid method. Numerical experiments are presented to demonstrate the efficiency of the proposed method.

10:25-12:30 Session 12C: Nonlinear Problems. Bighorn C/2
10:25
A Rational Approximation Method for Large-Scale Nonlinear Eigenvalue Problems

ABSTRACT. Eigenvalue problems in which the coefficient matrices depend nonlinearly on the eigenvalues arise in a variety of applications in science and engineering, e.g., dynamic analysis of structures or computational nanoelectronics, to mention just a few. This talk will discuss how the Cauchy integral-based approaches combined with rational approximation offer an attractive framework to develop highly efficient and flexible techniques for solving large-scale nonlinear eigenvalue problems. The main idea is to approximate the functions involved in the eigenvalue problem by rational functions and then apply a lineariztion. A few different schemes are proposed to solve the resulting linear eigenvalue problem by exploiting the special structure of the underlying linearization.

This is a joint work with M. El-Guide and Y. Saad.

10:50
Nonlinear Solvers for Two-species Neutrino-matter Equations in Core-Collapse Supernovae Simulations

ABSTRACT. Efficient numerical schemes are needed for high-fidelity simulations of neutrino transport in nuclear astrophysics applications, where the dominating computational cost is in modeling neutrino-matter interactions. With implicit/implicit-explicit time discretization, simulating neutrino-matter interactions requires solutions to nonlinear systems that couple the neutrino transport equation and the matter equation. We compare various iterative solvers for these nonlinear systems arising from four-momentum and lepton exchange due to emission and absorption, scattering, and pair processes. The numerical tests are performed on a multi-group two-moment model employing discontinuous Galerkin phase space discretization and implicit-explicit time integration. We analyze the performance of these iterative solvers on CPUs and discuss preliminary results from porting these solvers to GPUs.

11:15
Optimal Asymptotic Convergence Rates for Nesterov and Anderson Acceleration with Application to Tensor Decomposition

ABSTRACT. In this talk, we study optimal achievable asymptotic convergence rates of the Anderson and nonlinear GMRES (NGMRES) methods for nonlinear optimization. We consider the canonical tensor decomposition problem, for which we accelerate steepest descent and alternating least squares (ALS). We present some new theoretical results on the optimal asymptotic convergence rate for acceleration with window size two (which corresponds to Nesterov acceleration in the Anderson case). We also investigate higher window sizes numerically. We present numerical results for canonical tensor decomposition to validate our theoretical results. Our approach allows us to quantify how much convergence improvement can be expected from using ALS as a nonlinear preconditioner for Anderson and NGMRES, as opposed to using standard steepest descent as the nonlinear preconditioner.

11:40
A new quasi-Newton method for a black-box system with a physics-based surrogate

ABSTRACT. Many problems can be reduced to finding the root of a black-box system, using a quasi-Newton scheme such as Broyden’s method [1]. Evaluating the black-box can be very expensive however, so that it becomes essential to minimize the required number of quasi-Newton iterations. If (part of) the physics of the black-box are available as a cheap surrogate model, can convergence be accelerated?

The quasi-Newton with least-squares and surrogate (QN-LeSS) method is proposed for this purpose. It combines a physics-based surrogate model for the (inverse of the) Jacobian with the quasi-Newton least-squares method [2,3]. The mathematical properties and scaling of the computational cost are analyzed.

The performance of QN-LeSS is demonstrated by calculating the steady free surface solution of a 2D supercritical flow: a black-box Reynolds-averaged Navier-Stokes solver is used in order to find the free surface shape that corresponds to a zero pressure. The surrogate model stems from a perturbation analysis of a simplified problem [4]. The convergence and scalability of QN-LeSS are compared to several other quasi-Newton methods for this problem, showing that the new method effectively accelerates convergence.

[1] Broyden CG. A class of methods for solving nonlinear simultaneous equations. Mathematics of computation. 1965 Oct 1;19(92):577-93. [2] Degroote J, Bathe KJ, Vierendeels J. Performance of a new partitioned procedure versus a monolithic procedure in fluid–structure interaction. Computers & Structures. 2009 Jun 1;87(11-12):793-801. [3] Haelterman R, Degroote J, Van Heule D, Vierendeels J. The quasi-Newton least squares method: a new and fast secant method analyzed for linear systems. SIAM Journal on numerical analysis. 2009 Jun 25;47(3):2347-68. [4] Demeester T, Degroote J, Vierendeels J. Stability analysis of a partitioned iterative method for steady free surface flow. Journal of Computational Physics. 2018;354:387-92.