previous day
next day
all days

View: session overviewtalk overview

08:00-10:05 Session 2A: Data Science. Room: Bighorn B.
Scalable Spectral Clustering. Room: Bighorn B. Chair: Geoff Sanders

ABSTRACT. Locally Optimal Block Preconditioned Conjugate Gradient (LOBPCG) [] has practically demonstrated [] to efficiently solve eigenvalue problems for graph Laplacians that appear in spectral clustering, e.g., determining 100% correct 160 clusters in Stochastic Block Partition Streaming Graph Challenge (SBPSGC) static 2M/41M vertices/edges graph in <2,000 sec and with 50GB memory using LOBPCG Python code with threaded ATLAS. We now test scalability partitioning SBPSGC static graphs using PETSc/SLEPc/LOBPCG code with MPI in shared and distributed memory environments.

A domain decomposition approach for object recognition

ABSTRACT. Principal component analysis is one of the popularly used techniques for dimension reduction, and it has many applications such as model order reduction for PDE related calculations, face recognition in image sciences etc. The method is relatively easy to implement, but because it involves the solution of a dense eigenvalue, or singular value, problem, the cost is high and the parallelization is quite difficult. In this talk, we discuss a multilevel domain decomposition approach and the method has much lower parallel complexity and it has reasonable accuracy for certain class of problems.

Multigrid Method as Convolutional Neural Network
SPEAKER: Sarah Gage

ABSTRACT. We propose a method to represent a multigrid algorithm as a convolutional neural network. Training our network to approximate the inverse corresponds to optimizing parameters in the restriction and prolongation. The significance of optimizing the parameters via a neural network is that prior knowledge is not required for construction and there is the potential for greater accuracy given comparable efficiency to traditional multigrid methods. Our approach seeks to utilize the sparsity pattern of the network, along with the action of the operator, to represent the inverse. Unlike a conventional deep network, our network has an aggressive reduction to small coarse layers that require fewer parameters and less expense for the network to learn the inverse. The coarsening and prolongation steps are done with convolutional layers and specific padding criteria. We are looking in the context of discretized PDEs and comparing to known multigrid techniques.

Applications of Spectral Clustering to Signed Graph Matrix Representations
SPEAKER: Alyson Fox

ABSTRACT. A common graph data-mining task is the clustering or grouping of objects. Spectral clustering is a dimensionality reduction technique that typically uses the eigenvectors associated with the user-defined, unsigned graph representation before employing spatial clustering in the few output dimensions. However, a user may have a signed graph, which captures the notation of friendly and adverse relationships between objects, e.g., a friend-enemy social network. Signed graphs are typically represented by the signed graph Laplacian, where the diagonal entry associated with a vertex is the total number of positive or negative relationships. However, depending on the graph’s sign structure, spectral clustering employing the signed graph Laplacian is often incapable of distinguishing meaningful clusters. Instead, one could use Gremban’s expansion matrix that maps the signed graph Laplacian to a matrix with twice the numbers of unknowns. The spectrum of Gremban’s expansion matrix of a signed graph is the union of the spectrum of the signed and signless graph Laplacian, where all negative edges are turned positive, which provides a more robust representation for spectral clustering. A row-sum persevering graph Laplacian [A. Knyazev, Signed Laplacian for Spectral Clustering Revisited, 2017, arXiv:1701.01394] has also been proposed, which leads to possible negative eigenvalues and it is argued that the negative spectrum could be beneficial for spectral clustering. This talk will investigate all three representations, Gremban’s expansion matrix, signed graph Laplacian, and row-sum preserving signed graph Laplacian, their respected eigenpairs, and their performance for spectral clustering.

An Optimal Control Framework for Efficient Training of Deep Neural Networks
SPEAKER: Lars Ruthotto

ABSTRACT. One of the most promising areas in artificial intelligence is deep learning, a form of machine learning that uses neural networks containing many hidden layers. Recent success has led to breakthroughs in applications such as speech and image recognition. However, more theoretical insight is needed to create a rigorous scientific basis for designing and training deep neural networks, increasing their scalability, and providing insight into their reasoning. The central issue addressed in this talk is the impact of network design on the performance of iterative optimization methods used to train deep networks.

We present a new mathematical framework that simplifies designing, training, and analyzing deep neural networks. It is based on the interpretation of deep learning as a dynamic optimal control problem. The deep learning problem can thus be cast as a continuous problem and can also be interpreted as a dynamic inverse problem (comparable to, e.g., electromagnetic imaging, optical flow) or optimal control problems (similar to, e.g., optimal mass transport or path-planning). We will exemplify how the understanding of the underlying dynamical systems helps design, analyze, and train deep neural networks.

The talk focusses on ways to ensure the stability of the dynamics in both the continuous and discrete setting to obtain a well-posed learning problem that allows effective, iterative solution. We will also present three new classes of Convolutional Neural Networks (CNNs) that are inspired by partial differential equations of parabolic, hyperbolic, and diffusion reaction type, respectively. We show how the type of PDE can be exploited to reduce training time by exploiting their multiscale, multilevel, or reversibility properties. Throughout the talk, we will illustrate the impact of stability and discretization on the performance of both stochastic and deterministic iterative optimization algorithms.

The talk will also give a brief overview of two new open-source software packages written in MATLAB and Julia, respectively, that implement the ideas presented in the talk. The packages are primarily designed as academic and teaching tools and aim at supporting research both on learning algorithms but also high-performance implementations.

08:00-10:05 Session 2B: Optimization. Bighorn C/1.
Robust Nonlinear Optimization
SPEAKER: Sven Leyffer

ABSTRACT. Many optimization problems involve uncertain data, or decision variables that cannot be implemented exactly. Problems of this type can be formulated as robust optimization problems, a class of optimization problems under uncertainty. We present an overview of robust nonlinear optimization, and discuss potential algorithmic approaches motivated by standard nonlinear programming approaches.

Derivative-Free Robust Optimization by Outer Approximations

ABSTRACT. We develop an algorithm for minimax problems that arise in robust optimization in the absence of objective function derivatives. The algorithm utilizes and extension of methods for inexact outer approximation in sampling a potentially infinite-cardinality uncertainty set. Clarke stationarity of the algorithm output is established alongside desirable features of the model-based trust-region subproblems encountered. We demonstrate practical benefits of the algorithm on a new class of test problems.

Inexact bundle methods for nonsmooth nonconvex optimization in Hilbert space

ABSTRACT. Motivated by optimal control problems for elliptic variational inequalities we investigate inexact bundle methods for nonsmooth nonconvex optimization. The proposed class of methods requires only approximate (i.e., inexact) evaluations of the cost function and of elements of the generalized gradient and allows for incorporating curvature information. A global convergence theory in a suitable infinite-dimensional Hilbert space setting is presented. We discuss the application of our framework to optimal control problems with variational inequalities, which is an MPEC in function space, and present preliminary numerical results.

A Globally Convergent Cutting-Plane Method for Simulation-Based Optimization with Integer Constraints

ABSTRACT. We present a method for optimizing computationally expensive functions with integer constraints. We assume that objective gradients are unavailable, the objective cannot be evaluated when certain variables take noninteger values, and the objective function is convex on the lattice of unrelaxable integer variables. Our method constructs hyperplanes that interpolate the objective function at previously evaluated points; in portions of the domain, these hyperplanes are valid underestimators of the objective. The union of these conditional cuts provide a nonconvex underestimator of the objective. We study two approaches for minimizing this underestimator in order to generate iterates for our method and we provide conditions under which our method converges to a global minimizer of the objective.

A locally adapted reduced basis method for solving risk-averse PDE- constrained optimization problems
SPEAKER: Drew Kouri

ABSTRACT. Many science and engineering applications require the control or design of a physical system governed by partial differential equations (PDEs). More often then not, PDE inputs such as coefficients, boundary and initial conditions, and modeling assumptions are unknown, unverifiable or estimated from data. Such problems can be formulated as infinite-dimensional risk-averse optimization problems, the numerical solution of which requires substantial computational effort resulting from the discretization of the underlying PDE in both the physical and stochastic dimensions. To practically solve problems with high-dimensional uncertainties, one must intelligently manage the individual discretization fidelities throughout the optimization iteration. In this talk, we develop an inexact trust-region algorithm that leverages locally adapted reduced basis approximations of the governing PDE. Our algorithm constructs these reduced basis models on subsets of the stochastic parameter space that are tuned to the risk measure defining the objective function and therefore permits higher fidelity approximation in "risky" regions of the parameter space. The algorithm only refines the fidelity of these approximations as required to ensure global convergence of the trust-region algorithm. We demonstrate the effectiveness of our algorithm through numerical examples.

08:00-10:05 Session 2C: Advanced Architectures. Room: Bighorn C/2.
Parallel Algebraic Multigrid with Node-Aware Communication
SPEAKER: unknown

ABSTRACT. Algebraic multigrid (AMG) is often viewed as a scalable O(n) solver for sparse linear systems. However, parallel AMG lacks scalability due to increasingly large costs associated with communication, both in the initial construction of a multigrid hierarchy as well as the iterative solve phase. The parallel implementation of AMG can be altered to reduce the cost of communication, yielding an increase in scalability.

Standard inter-process communication consists of sending data regard- less of the locations of send and receive processes. However, simple models display noteworthy differences in the cost of intra- and inter-node communication. Communication can be restructured to take advantage of the less costly intra-node communication, reducing both the number and size of inter-node messages. This node-aware communication can be applied to various operations in both the setup and solve phase of AMG, yielding an increase in the weak and strong scalability of the entire method.

Optimization of processor allocation for domain decomposed Monte Carlo calculations
SPEAKER: John Ellis

ABSTRACT. Load imbalance plagues domain decomposed Monte Carlo calculations when sources are not uniform. We improve parallel efficiency for domain decomposed Monte Carlo transport calculations through a nonuniform allocation of processors over subdomains. We optimize the allocation with runtime diagnostics collected during a calibration step then complete the full calculation. We compare our diagnostic-based approach to implicit filtering, a noisy optimization algorithm with bound constraints. We consider both forward and hybrid radiation transport calculations to measure performance.

Experience porting Monte Carlo particle transport to GPUs

ABSTRACT. Continuous-energy Monte Carlo particle transport is challenging to execute efficiently on modern high performance computing architectures such as GPUs because it suffers from low arithmetic intensity and random memory access patterns. In this talk, we discuss the process of porting the Shift particle transport code to GPUs. This process includes software design decisions as well as algorithmic considerations. We hope that our experience may provide guidance to other applications codes that suffer from challenging computational and/or memory access patterns. We present numerical results for the simulation of a small modular reactor (SMR) design under investigation through the Exascale Computing Project. Several GPU architectures are considered, including a large-scale parallel scaling study on the Titan supercomputer at Oak Ridge National Laboratory as well as runs on the SummitDev early-access machine preparing for the Summit supercomputer.

GPUDirect Async: GPU synchronous communication techniques applied to multi-grid methods

ABSTRACT. NVIDIA GPUDirect is a family of technologies aimed at optimizing data movement among GPUs (P2P) or among GPUs and third-party devices (RDMA). GPUDirect Async, introduced in CUDA 8.0, is a new addition which allows direct synchronization between GPU and third party devices. For example, Async allows an NVIDIA GPU to directly trigger and poll for completion of communication operations queued to an InfiniBand Connect-IB network adapter, with no involvement of CPU in the critical communication path of GPU applications. During this talk I will describe the motivations and the building blocks of GPUDirect Async showing the potential benefits of using two different asynchronous communication models supported by this new technology in a MPI multi-GPU application like HPGMG-FV, a proxy for real-world geometric multi-grid applications.


ABSTRACT. We investigate the convergence of the fine-grained parallel algorithm for computing an incomplete LU factorization (FGPILU) for non-symmetric and indefinite matrices. Additionally, we consider how the occurrence of soft faults may impact the convergence of the algorithm for these problems. The numerical results suggest that this class of problems presents challenges for the FGPILU (less than 30% of cases considered converged) and that, while the occurrence of a fault can cause significant negative effects the simple algorithmic change advocated here can ameliorate such effects.

10:25-12:30 Session 3A: Rank Structured Solvers. Room: Bighorn B.
A parallel hierarchical low-rank Schur complement preconditioner for indefinite linear systems

ABSTRACT. Nonsymmetric and highly indefinite linear systems can be quite difficult to solve by iterative methods. This talk combines multilevel Schur Low-Rank preconditioner with classic block preconditioning strategies, and describes its implementation in Message Passing Interface environments. The method to be described generates a tree structure T that represents a hierarchical decomposition of the original matrix. This decomposition gives rise to a block structured matrix at each level of T. An approximate inverse of the original matrix based in its block LU factorization is computed at each level via a low rank property that characterizes the difference between the inverses of the Schur complement and another block of the reordered matrix. The low rank correction matrix is computed by performing a few Arnoldi process. We report experiments performed on distributed memory architectures, and illustrate the scalability and robustness of the proposed preconditioner with respect to indefiniteness for a few test problems.

Effective and Robust SIF Preconditioners for General SPD Matrices
SPEAKER: Jianlin Xia

ABSTRACT. For symmetric positive definite (SPD) dense and sparse matrices, we present a framework for designing effective and robust black-box preconditioners via structured incomplete factorization (SIF). In a scaling-and-compression strategy, off-diagonal blocks are first scaled and then compressed into low-rank approximations. ULV-type factorizations are then computed. A resulting prototype preconditioner is always positive definite. Generalizations to practical hierarchical multilevel preconditioners are given. Systematic analysis of the approximation error, robustness, and effectiveness is shown for both the prototype preconditioner and the multilevel generalization. In particular, we show how local scaling and compression control the approximation accuracy and robustness, and how aggressive compression leads to efficient preconditioners that can significantly reduce the condition number and improve the eigenvalue clustering. Analysis is also given for some discretized model problems.

The SIF preconditioners can be constructed via direct or randomized compression. The costs to apply the preconditioners are about O(N), where N is the matrix size. Numerical tests on several ill-conditioned problems show the effectiveness and robustness even if the compression uses very small numerical ranks. In addition, significant robustness and effectiveness benefits can be observed as compared with a standard rank structured preconditioner based on direct off-diagonal compression.

A Distributed-Memory Hierarchical Solver for General Sparse Linear Systems
SPEAKER: Chao Chen

ABSTRACT. We present a parallel hierarchical solver for general sparse linear systems on distributed-memory machines. For large-scale problems, this fully algebraic algorithm is faster and more memory-efficient than sparse direct solvers because it exploits the low-rank structure of fill-in blocks. Depending on the accuracy of low-rank approximations, the hierarchical solver can be used either as a direct solver or as a preconditioner. The parallel algorithm is based on data decomposition and requires only local communication for updating boundary data on every processor. Moreover, the computation-to- communication ratio of the parallel algorithm is approximately the volume-to-surface-area ratio of the subdomain owned by every processor. We present various numerical results for both structured and unstructured problems to demonstrate the versatility and scalability of the parallel algorithm.

Low-rank approximation of matrices in BEM applications
SPEAKER: Eric Darve

ABSTRACT. We present new methods to calculate low-rank approximations of matrices arising in boundary element formulations. Various techniques are proposed and compared in terms of computational cost of forming the low-rank approximation, the estimated rank (for various tolerances), and the robustness of the method. The new approach leverages several ideas including Chebyshev interpolation, Maximally Dispersed Vertices, Rank-Revealing LU and QR factorizations, and the Skeletonized Interpolation. Theoretical results establish the accuracy and backward stability of some of these methods. Numerical examples and benchmarks of industrial relevance will be presented.

An efficient method for block low-rank approximations for kernel matrix systems

ABSTRACT. In the iterative solution of dense linear systems from boundary integral equations or systems involving kernel matrices, the main challenges are the expensive matrix-vector multiplication and the storage cost which are usually tackled by structured matrix techniques such as $\mathcal{H}$ and $\mathcal{H}^2$ matrix approximations. However, in structured matrix constructions, the low-rank approximations of the sub-blocks of the kernel matrix also lead to a high construction cost. In this paper, an efficient method is proposed to give a low-rank representation of the kernel matrix block $K(X_0, Y_0)$ for a kernel function $K(x,y)$ and two point clusters $X_0, Y_0$. The proposed method combines interpolative decomposition (ID), which is purely algebraic, with the analytic information of the kernel function to reduce the construction cost from $O(|X_0||Y_0|)$, for ID alone, to $O(|X_0|)$ which is not related to $|Y_0|$.

10:25-12:30 Session 3B: Optimization. Room: Bighorn C/1.
Towards a Scalable Parallel-in-Time Full Space Optimization Algorithm
SPEAKER: Eric C. Cyr

ABSTRACT. Transient PDE constrained optimization is challenging due to repeated forward and backward simulation. Using current approaches, when a simulation uses only a parallel spatial decomposition near the strong scaling limit, the time to solution for the optimization problem cannot be decreased by adding more computational resources. Addressing this issue requires algorithmic advancement in optimization algorithms and solution methods. This talk proposes a new parallel in time preconditioner for solving transient KKT systems arising in an inexact SQP algorithm. Our composite-step SQP algorithm utilizes inexact iterative solution of "benign" KKT systems, corresponding to a sequence of strictly convex quadratic programs. Its inexactness-handling mechanisms ensure global and fast local convergence. The preconditioner used to solve the KKT system is critical to the efficiency of this algorithm. Our preconditioner explicitly exposes continuity in time constraints using a time domain decomposition technique. These constraints are relaxed and the coupled forward-adjoint system is solved using a multigrid in time process. This talk will present the preconditoner and show results in prototypical scenarios indicating convergence independent of the number of time steps. The performance of the preconditioner is demonstrated to be independent of the number of time steps both when used to solve a standalone KKT system and when used within a Burgers equation constrained optimization problem.

Sensitivity Analysis for PDE-Constrained Optimization

ABSTRACT. We investigate the sensitivity of uncertain parameters in the context of optimization problems constrained by partial differential equations (PDEs). This approach is motivated by complex dynamics for a range of engineering and science applications where optimal design, control, and inversion solutions are needed, but the forward simulation consisting of multi-physics dynamics is plagued by a range of uncertain parameters and conditions. For instance, laser melting of powder metals in additive manufacturing requires tension driven fluid flow with multi-phase thermal models and solidification models to predict the microstructure. A range of multi-scale parameters are required to achieve accurate predictions in addition to the development of interfaces to couple a range of physics at different spatial and temporal scales. Climate science depends on a conglomeration of physics models including fluid flow, thermal and geo-chemical models. Uncertainties exist with initial conditions and with resolving complex features which typically are represented as coarse parameters. At the heart of magnetohydrodynamics lies turbulent Navier Stokes combined with Maxwells equations. Depending on what the desired design goals are, for instance whole device performance predictions in a fusion-based Tokamak reactor, additional dynamics may need to be coupled which not only introduces new material parameters and undefined interfaces because of particle in cell (non-PDE based models) requirements.

The goal for all these problems is to ultimately solve design, control, and inverse problems. The main challenges consist of solving highly nonlinear and large scale PDE-constrained optimization problems in addition to addressing remaining uncertain parameters and conditions in the underlying dynamics. Addressing the uncertainties in all parameters can be an overwhelming task, and further, some parameters which required extensive efforts may have minimal impact on the final design goal. Consequently, evaluating the influence of the unknown parameter on the design problem would allow an appropriate prioritization for data acquisition, laboratory experimentation, and code development. This motivates a computational procedure to determine the variation of optimal design/control/inversion solutions with respect to a number of uncertain parameters.

Following Brandes et al., we have developed a computational framework to evaluate the sensitivity of the optimal control strategy with respect to the uncertain parameters. At its core, computing these sensitivities requires an eigenvalue computation involving the optimality system. Our goal is to extend Brandes' approach so that large scale systems can be efficiently evaluated. To that end, our software capability was designed to leverage parallelism for the underlying linear algebra. In addition, we have extended Brandes' work to reformulate the eigenvalue problem so that a randomized eigenvalue solver may be used which permit all matrix vector products to be taken in parallel. Our software is based on C++, Trilinos based parallel constructs, and special PDE-constrained solver interfaces that generalizes the solution procedure to a range of PDE-based models. We leverage the Trilinos' Rapid Optimization Library that consist of the state-of-the-art Newton Krylov based optimization methods with both reduced and full space solution methods (trust region and line search globalization).

Our numerical examples consist of a thermal problem, similar to Brandes et al., but extended in overall size to demonstrate parallel scalability and performance. In addition we demonstrate the approach on a multi-physics problem that explores fluid flow and thermal dynamics. The motivation for this physics is based on the goal to achieve fluid flow characteristics for optimal particle deposition on a substrate. The controllers are boundary heat sources to suppress the circulation patterns from fluid flow reflecting off a heated bottom substrate.

A Bundle Trust Region Method for Shape Optimization with Stress Constraints for Contact Problems based on Isogeometric Analysis
SPEAKER: Benjamin Horn

ABSTRACT. We present an application of nonsmooth optimization techniques to a shape optimization problem for mechanical connectors including stress constraints. To this end, we follow the approach of an algorithm based product development, which links the product and manufacturing process requirements to the simulation based iterative optimization methods. To ensure a consistent model representation between CAD and finite element simulation, we choose an isogeometric approach to model the contact problem within the optimization method. This leads to a shape optimization governed by an elastic contact problem with friction. The frictional contact problem itself is formulated in the sense of the mortar approach using dual basis functions and is solved by a semismooth Newton method. Based on the damage parameter of Smith, Watson and Topper, also called PSWT, we include stress constraints to guarantee a predefined level of fatigue strength. To determine the PSWT we calculate approximated elasto-plastic strains and stresses using the von-Neuber hypothesis and the Ramberg-Osgood material law. The resulting shape optimization problem is nonconvex and due to the contact conditions and the chosen objective functions nonsmooth. We solve this optimization problem with a bundle trust region algorithm, which is modified to ensure a feasible shape in each iteration. The design subgradients required by the bundle method are calculated efficiently with the adjoint approach.

Global optimization of mixed-integer ODE constrained network problems with application to stationary gas transport

ABSTRACT. We propose a new approach for finding global solutions of mixed-integer nonlinear optimization problems with ordinary differential equation constraints on networks. Instead of using a first discretize then optimize approach, we combine spatial and variable branching with appropriate discretizations of the differential equations to derive relaxations of the original problem. To construct the relaxations we derive convex under- and concave over-estimators for the ODE solution operators using numerical discretization schemes. Thereby, we make use of the underlying network structure, where the solutions of the ODEs only need to be known at a finite number of points. This property enables us to adaptively refine the discretization and relaxation without introducing new variables. The incorporation into a spatial branch-and-bound process allows to compute global epsilon-optimal solutions or decide infeasibility. We prove that the algorithm terminates finitely under some natural assumptions. We then show how this approach works for the example of stationary gas transport, where the flow in the pipes is modeled by the stationary isothermal Euler equations and provide numerical results.

Shape optimization for unsteady fluid-structure interaction

ABSTRACT. We consider shape optimization for unsteady fluid-structure interaction problems that couple the Navier-Stokes equations with elasticity equations. We focus on the monolithic ALE formulation which is obtained by transforming the fluid equations to a fixed reference domain. Shape optimization with the method of mappings approach requires another transformation to a shape reference domain. This yields an optimal control setting and therefore can be used to drive an optimization algorithm with adjoint based gradient computation. Numerical results for our implementation, which builds on FEniCS, dolfin-adjoint, and IPOPT, are presented. In addition, Fréchet differentiability of the state variables of unsteady fluid-structure interaction with respect to domain variations is shown.

10:25-12:30 Session 3C: Multi-physics EM/Fluid Plasma.Room: Bighorn C/2.
High-Order Hybridized Discontinuous Galerkin (HDG) Method and a Multigrid solver for Magnetohydrodynamic applications
SPEAKER: Tan Bui-Thanh

ABSTRACT. In this talk we will present a high-order hybridized discontinuous Galerkin (HDG) method and an efficient solver for MHD systems. The advantages of high-order HDG methods over DG methods is that they have much less globally coupled degrees of freedom when combined with implicit time integration schemes. The coupled unknowns are only the hybrid variables introduced on the skeleton of the mesh, which for high-order is much less compared to the total volume unknowns. Here we will present a multi-grid approach defined entirely on the skeletal system (hence least number of unknowns) and demonstrate its scalability. Multigrid solvers on the skeletal system represent difficulty because of the non-nested nature of the grids i.e. new edges/faces appear on refinement which does not have parents from the previous level. With examples from incompressible and compressible MHD systems we will show the effectiveness and scalability of the multigrid solver.

Benchmarking Fully-Implicit Discretizations of Multi-fluid Plasma-electromagnetic Models for Pulsed Power Applications
SPEAKER: Kris Beckwith

ABSTRACT. Many problems of interest in plasma modeling are subject to the `tyranny of scales', specifically, problems that encompass physical processes that operate on timescales that are separated by many orders of magnitude. Investigating such problems therefore requires the use of implicit time-integration schemes, which advance problem solutions on the timescale of interest, while incorporating the physics of the fast-timescales. One promising route to develop these implicit solvers is the combination of Jacobian-Free Newton-Krylov methods with physics-based preconditioners and recent work has focussed on the application of these techniques to plasma physics. In particular, the dynamical behavior of plasmas is strongly dependent on frequency. At the lowest frequency the plasma is in the regime of magnetohydrodynamics (MHD) and has been the focus of extensive research in fluid plasma modeling in the past few decades. At somewhat higher frequencies, the electrons and ions can move relative to each other, behaving like two charge separated, interpenetrating fluids. This is the regime of high-frequency, non-neutral two-fluid physics and is relevant to high-density, fast MHD phenomena encountered in pulsed-power devices like dense plasma focus, Z-pinches, plasma thrusters and field-reversed configurations. Although initial work has been done on efficiently solving fast MHD phenomena, several open research problems remain. For example, implicit schemes developed for application in slow MHD can not be applied directly as pulsed-power devices commonly exhibit shocks and other sharp features in the flow. To meet this need, we have developed fully implicit schemes for solving the two fluid equations based on a combination of physics-based preconditioning (using a linearization of the two-fluid equations) and Jacobian-Free Newton Krylov solvers. Here, we present application of this approach to a range of problems, including shock physics, ambipolar expansion and shear flow. Results obtained from our approach will be compared to analytic theory and, where appropriate, magnetohydrodynamic and kinetic simulations.

Sandia National Laboratories is a multimission laboratory managed and operated by National Technology and Engineering Solutions of Sandia LLC, a wholly owned subsidiary of Honeywell International Inc. for the U.S. Department of Energy’s National Nuclear Security Administration under contract DE-NA0003525.

Monolithic Multigrid for a Mixed-Method Finite-Element Formulation of Incompressible, Resistive Magnetohydrodynamics
SPEAKER: James Adler

ABSTRACT. Magnetohydrodynamics (MHD) models describe a wide range of plasma physics applications, from thermonuclear fusion in tokamak reactors to astrophysical models. These models are characterized by a nonlinear system of partial differential equations in which the flow of the fluid strongly couples to the evolution of electromagnetic fields. As a result, the discrete linearized systems that arise in the numerical solution of these equations are generally difficult to solve, and require effective preconditioners to be developed. This talk investigates monolithic multigrid preconditioners for a one-fluid, viscoresistive MHD model in two dimensions that utilizes a second Lagrange multiplier added to Faraday's law to enforce the divergence-free constraint on the magnetic field. We consider the extension of a well-known relaxation scheme from the fluid dynamics literature, Vanka relaxation, to this formulation. To isolate the relaxation scheme from the rest of the multigrid method, we utilize structured grids, geometric interpolation operators, Galerkin coarse grid operators, and inf-sup stable elements for both constraints in the system. Parallel numerical results are shown for the Hartmann flow problem, a standard test problem in MHD.

Block Preconditioning Informed by Plasma Timescales

ABSTRACT. This work aims to develop a robust preconditioning capability for multifluid continuum plasma simulations. These simulations involve timescales associated with electromagnetic waves, advection of each fluid species, and plasma and cyclotron frequencies for each species. At a given timestep, each timescale can be regarded as fast or slow with respect to a corresponding CFL number. Linear operators associated with fast timescales are stiff and require representation in the preconditioner for fast convergence. We propose a method for adapting the preconditioner such that Schur complement approximations and sub-solver settings are informed by the various CFLs of the problem, where the fidelity of approximations is determined by the relative resolution of different timescales. The preconditioner should then automatically account for the different complications of a plasma simulation run at the long MHD limit and one that resolves the speed of light. In this talk, we will motivate the preconditioner by analyzing the blocked linear system in terms of physical timescales, and demonstrate the effectiveness of the method in different problem regimes.

Progress on Schwarz-type coupling of core- and edge-region magnetic fusion simulations
SPEAKER: Lee Ricketson

ABSTRACT. The edge and core regions of magnetic fusion devices differ drastically on many fronts – geometry and collisionality, to name only two among many. It is thus natural that different numerical methods are optimally suited to the simulation of each region. However, this creates a challenge for the pursuit of whole device modeling (WDM): How can one self-consistently couple two distinct codes to achieve a uniformly accurate description of the entire tokamak? In support of the ECP goal of coupling the codes GENE (core) and XGC (edge) for whole device modeling, we present such a coupling scheme inspired by classical additive Schwarz methods. While traditional Schwarz schemes require iteration to convergence inside a single time step, this is computationally intractable in the context of 5-D gyrokinetic simulations. We give evidence – both analytic and empirical – that if one is interested only in long-time averages (as scientists and engineers typically are in this field) then this expensive iteration can be avoided while retaining the scheme’s convergence properties. We present numerical results from tests on both a 1-D model problem and 2-D simulations of the Hasegawa-Wakatani equations.

This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344.

16:30-18:35 Session 4A: Randomized Algorithms. Room: Bighorn B.
A Damped Row Access Method for Solving Ill-Posed Massive Least Squares Problems
SPEAKER: Tanner Slagel

ABSTRACT. Row action methods, such as the block Kaczmarz method, are commended for their ability to solve massive least squares problems where the size of the forward model matrix exceeds the storage capabilities of computer memory. When the least squares problem is ill-posed, row actions methods can become unstable. This paper introduces a randomized damped row access method that is well suited for the ill-posed linear least squares problem. Expected convergence rates of this method are given to with what is known as an "convergence horizon" . Numerical results show that the damped row access method produces better results than its un-damped sibling.

A Sampling-based Method for Solving Linear Systems
SPEAKER: Erin Carrier

ABSTRACT. Solving systems of linear algebraic equations is crucial for many problems in science and engineering. A variety of techniques are available to solve such linear systems, including direct methods such as Gaussian elimination and iterative methods such as GMRES. We present a remarkably simple, but previously unexplored, approach to solving linear systems based on sampling: to solve a linear system Ax = b, sample the column space of A and then project b onto the subspace spanned by the samples.

This method is potentially competitive with current standard methods if relatively few samples are required to produce a sufficiently accurate solution. In effect, the method seeks a compressed representation of the solution to the linear system. For linear systems resulting from discretizing continuous problems, the method allows for the discretization basis and the solution basis to be decoupled, so the method may be advantageous as the most convenient basis for the discretization may not yield a compressed representation of the solution.

We discuss various sampling strategies and the resulting performance of the method on some major classes of test problems.

Iterative projective approach for highly corrupted linear systems

ABSTRACT. We consider solving large-scale systems of linear equations Ax = b that are inconsis- tent. We propose a hybrid approach utilizing ideas from the randomized Kaczmarz method and those put forth by Agmon, Motzkin et al. that exhibits a faster initial convergence rate without sacrificing solution accuracy. We show that significant improvements in the convergence rate are possible when the residual vector has a high dynamic range, as is the case when the system has been corrupted by a small number of large errors in b. With this as our motivating example, we further develop an approach for this setting that allows detection of the corrupted entries and thus convergence to the “true” solution of the original uncorrupted system. We provide analytical justification for our approaches as well as experimental evidence on real and synthetic systems.

Analysis of Randomized Subspace Iteration: New insights into an old algorithm

ABSTRACT. Subspace iteration is a powerful algorithm proposed in the 1970s for computing few singular pairs of large matrices. As a result of recent work in the last two decades, there was renewed interest in a randomized variant of this algorithm because of its success in developing cheap and accurate low-rank approximations, with a very high probability of success, for a class of matrices with rapidly decaying singular values. The randomized methods exploit modern computational architectures in ways that are more powerful than classical methods and have shown great promise in dealing with massive data sets.

We outline a basic version of the randomized subspace iteration. Suppose $A \in \mathbb{C}^{m\times n}$ and suppose we want a rank-k approximation of $A$, where $k \leq \text{rank}(A)$. We draw a random matrix $\Omega \in \mathbb{C}^{n\times (k + p)}$, where $p \geq 0$ is a small oversampling parameter. We perform $q\geq 0$ steps of the subspace iteration $Y = (AA^*)^q A\Omega$, take the thin QR factorization of $Y = QR$. A rank-$(k+p)$ approximation to $A$ is obtained as $\widehat{A} = QQ^*A$, whereas if a rank$-k$ approximation is desired, it is computed as $\widehat{A}_k = Q[[Q^*A]]_k$. Here $[[Q^*A]]_k$ is the best rank-$k$ approximation to $Q^*A$. Several recent papers have developed bounds for the low-rank approximation $\|A- QQ^*A\|$ and $\| A- Q[[Q^*A]]_k\|$ in the spectral and Frobenius norms. Very little work has been devoted to the accuracy of the singular values and singular vectors.

We provide new insights into this algorithm in the following ways: We derive bounds for (1) the canonical angles between the subspaces spanned by the exact and the approximate singular vectors; (2) the error in the low-rank approximation in any unitarily invariant norm, thereby generalizing the results in the spectral and Frobenius norms; and (3) the error in the approximate singular values. The error bounds we propose are novel and explicitly account for the dependence of the starting guess $\Omega$, and the singular value gap. Our analysis shows that the singular values and singular vectors are very accurate for matrices with rapidly decaying singular values and a large singular value gap. The quality of the error bounds will be demonstrated on several test matrices.

Matrix-free construction of HSS representations using adaptive randomized sampling
SPEAKER: X. Sherry Li

ABSTRACT. An active research area in recent years is the development of fast hierarchical matrix tools for linear and eigen solvers. Although the theoretical foundation for the hierarchical matrices has been solidified, there is a lack of robust algorithms and software that can handle many practical issues. For example, randomized sampling has been shown to be an effective tool to reveal the low rank structure, but choosing the right number of samples is difficult. In this talk we present new results for HSS compression using an adaptive randomized sampling strategy. In particular, we developed robust stopping criteria based on a new stochastic relative error estimation. We present results for situations when the dense matrix is given and in the matrix-free setting. We show some practical dificulties about implementation and present some solutions that get around the problems.

16:30-18:35 Session 4B: Inverse Problems & Regularization. Room: Bighorn C/1.
An Uncertainty-Weighted Asynchronous ADMM Method for Large-Scale PDE Parameter Estimation
SPEAKER: unknown

ABSTRACT. We consider a global variable consensus ADMM algorithm for computing the maximum a posteriori (MAP) estimate of large-scale PDE parameter estimation problems. We obtain an efficient optimization scheme by partitioning the data and solving the resulting subproblems in parallel. The parallelization can be implemented asynchronously, and each subproblem can be associated with different forward models and/or right-hand-sides, leading to ample options for tailoring the method to different applications.

A drawback of consensus ADMM is its slow convergence, especially when there are a large number of subproblems. This is particularly challenging in PDE parameter estimation due to the immense costs per iteration. To accelerate the convergence in the first few iterations, we introduce a novel weighting scheme in the algorithm that accounts for the uncertainty associated with the solutions of each subproblem. We exemplarily show that the weighting scheme reduces the time-to-solution of a multiphysics parameter estimation problem.

A Newton-Based Optimization Method for Quantitative Photo-Acoustic Tomography

ABSTRACT. Quantitative photo-acoustic tomography (QPAT) is an imaging modality that is used in breast and brain imaging. The goal is to recover the spatial distribution of the absorption coefficient from photo-acoustic images; an ill-posed, non-linear inverse problem. The absorption field is treated as a sum of a smooth background field (e.g. tissue of interest) and a perturbation (e.g. tumor or different tissue type).  We use Tikhonov-based regularization and a preconditioned Inexact Newton-CG method to find the smooth background and perturbation. The inversion process is demonstrated on different synthetic data sets.  

Subspace recycling for a sequence of linear problems with general-form Tikhonov regularization

ABSTRACT. In many applications involving inverse problems and optimization, a sequence of linear systems with general-form Tikhonov regularization or Levenberg-Marquardt steps must be solved. This leads to an inner-outer iteration combining two iterative methods with new opportunities for recycling subspaces to both compute initial guesses and enhance convergence. We discuss which subspaces to select for recycling and convergence issues. In this talk, we focus on a sequence of linear problems in which an optimal regularization operator is computed and the inverse problem is solved, but we briefly discuss more general problems.

Multilevel Preconditioning for Ridge Regression

ABSTRACT. Solving linear systems is often the computational bottleneck in real-life problems. Iterative solvers are frequently the only option due to the complexity of direct algorithms or because the system matrix is not explicitly known. Here, we develop a multilevel preconditioner for regularized least squares linear systems (X^TX+βI)w=X^Tb with X a feature or data matrix and β the regularization parameter. Variants of this linear system may appear in machine learning applications, such as ridge regression, logistic regression, support vector machines and matrix factorization with side information.

We applied clustering algorithms to create coarser levels that preserve the principal components of the covariance or gram matrix. These coarser level approximate the eigenvectors with large eigenvalues and are used to build a multilevel preconditioner and accelerate the Conjugate Gradient method. Our preconditioner achieves speed-up for artificial and real-life data.

Singular Value Analysis of Joint Inversion
SPEAKER: Jodi Mead

ABSTRACT. The degree of ill-posedness of the linear operator equation Ax = b can be measured by the decay rate of the singular values of A. If the inverse problem is nonlinear and A is the corresponding Jacobian matrix, the singular values measure the local ill-posedness of the problem. We consider the case where the operator A is compact and maps infinite dimensional Hilbert spaces. The degree of ill-posedness can be measured by μ, where the singular values σ_n(A) ≈ n^(−μ). The problem becomes more difficult to solve numerically with increasing μ.

When the singular values quickly decay to zero, it is common to truncate them or introduce a regularization term. However, if the problem is severely ill-posed, a large amount of regularization may be required to solve the problem, and this can introduce significant bias error in the solution estimate. Regularization can be viewed as adding information to the ill-posed problem, and hence we consider the regularized inverse problem as simultaneous joint inversion. Simultaneous joint inversion has recently become a common method to incorporate multiple types of data and physics in a single inversion.

We extend discrete techniques of stacking matrices in joint inversion, to combining Green’s function solutions of multiple differential equations representing different types of data. The singular values of the joint operators indicate the effectiveness of combining multiple types of physics. This knowledge provides mathematical justification for joint inversion, and can be determined before the complicated machinery of discretizing and solving the problem is implemented. We will give an example of two differential equations with known Green’s function solutions. The decay rate of the singular values of the individual operators are compared to the singular values of the joint operator, and the extent to which the ill-posedness was resolved is quantified.

16:30-18:35 Session 4C: Multi-physics EM/Fluid Plasma. Room: Bighorn C/2.
An implicit, conservative, hybrid particle-kinetic-ion fluid-electron algorithm
SPEAKER: Adam Stanier

ABSTRACT. Bridging the inherent multi-scale nature of many important problems in plasma physics requires a combination of model reduction and algorithmic innovation. The hybrid model with full-orbit kinetic ions and fluid electrons is a promising approach to describe a wide range of space and laboratory plasmas. In particular, it has been recently demonstrated that the hybrid model is the minimum sufficient model to correctly capture the rate and global evolution of a reconnecting system for arbitrary guide magnetic field [1,2,3], with important consequences for space weather modeling.

Here we focus on the quasi-neutral hybrid model using macro-particles to model the kinetic ion species, which avoids the need to solve for the distribution function in 6D (3D-3V) phase space, but is subject to discrete particle noise that scales as sqrt(N) for the number of macro-particles N. The majority of existing algorithms employ explicit time-stepping, and much of the development has focused on the accuracy and stability of the time integration schemes. Key unresolved issues with these approaches involve the numerical instability of cold ion beams moving through the spatial grid due to aliasing errors [4], and the quadratic CFL condition (dt ~ dx^2) associated with fast dispersive Whistler waves. Recent work [5,6] has explored the use of fully implicit methods to step over such timescales in a stable manner, but have not addressed the topic of momentum or energy conservation.

Here we present a novel fully implicit hybrid algorithm that features discrete mass, momentum and energy conservation, as well as discrete preservation of the solenoidal condition. The algorithm features sub-cycling and orbit averaging of the ions, with cell-centred finite differences and implicit midpoint time advance. To reduce numerical noise, the algorithm allows arbitrary-order shape functions and conservative smoothing on the gather and scatter operations. We discuss the implementation of the algorithm into a Jacobian-Free Newton Krylov solver framework, and then verify it for a number of test problems. These demonstrate the correctness of our implementation, the unique conservation properties, and its favorable stability properties.

References: [1] A. Stanier, W. Daughton, Andrei N. Simakov, L. Chacon, A. Le, H. Karimabadi, J. Ng, and A. Bhattacharjee, Phys. Plasmas 24, 022124 (2017). [2] J. Ng, Y.-M. Huang, A. Hakim, A. Bhattacharjee, A. Stanier, W. Daughton, L. Wang, K. Germaschewski, Phys. Plasmas 22, 112104 (2015). [3] A. Stanier, W. Daughton, L. Chacon, H. Karimabadi, J. Ng, Y.-M. Huang, A. Hakim, and A. Bhattacharjee, Phys. Rev. Lett. 115, 175004 (2015). [4] P. W. Rambo, Journal of Computational Physics, 118, 152-158 (1995). [5] B. Sturdevant, S. E. Parker, Y. Chen and B. B. Hause, Journal of Computational Physics, 316, 519 (2016). [6] J. Cheng, S. Parker, Y. Chen, D. Uzdensky, Journal of Computational Physics, 245, 364 (2013).

Multiscale Radiation Transport Method for Eulerian Radiation-Hydrodyamics

ABSTRACT. Accurate modelling of radiation-hydrodynamics problems is important for many high-energy density physics (HEDP) applications. One of main challenges is efficiently solving a kinetic system (e.g., thermal radiative transfer, or TRT system). Recently, we have developed a moment-based, scale-bridging algorithm (a.k.a., High-Order/Low-Order, HOLO) for solving various transport (kinetic) systems. A basic idea behind the scale-bridging algorithm is to implicitly treat stiff (nonlinear) physics via a discretely-consistent LO system. Specifically, our HOLO algorithm for TRT aims to accelerate convergence of the absorption-reemission physics.

Although the previous study has shown a good algorithmic acceleration and results for many TRT problems, it has also revealed an inadequate accuracy for some radiation-hydrodynamics problems. This inaccuracy is due to not preserving a diffusion limit with a finite volume (FV)-based discretization of the LO system. To mitigate this deficiency, we have recently developed a fully lumped linear discontinuous Galerkin (LDG) LO solver that satisfies an asymptotic diffusion limit. The LDG LO solver has lead to significant improvements in the solution accuracy compared to the previous FV-based LO solver. In this talk, we present an extension of the HOLO algorithm to radiation-hydrodynamics problems and to the fully lumped LDG LO solver, and demonstrate applicability of the method through several numerical examples.

High-Order Low-Order Nonlinear Convergence Accelerator for the Rosenbluth-Fokker-Planck Collision Operator

ABSTRACT. In weakly coupled plasmas, the integro-differential Fokker-Planck collision operator describes the dynamical collisional relaxation of the plasma distribution function in the velocity space. The operator is quadratically nonlinear, features exact conservation of mass, momentum, and energy, satisfies the Boltzmann H-theorem, preserves positivity of the distribution function, and features a unique equilibrium solution (i.e., a drifting Maxwellian). Additionally, the collisional velocity space transport coefficients are obtained from integrals of the distribution function (the so-called Rosenbluth potentials). The integral formulation, together with the above-mentioned properties make the numerical solution aspects of this system exceedingly challenging to deal with.

Often, collision times, τcol, are much shorter than dynamical time scales of interest, and the use of implicit methods (i.e. Newton, accelerated Picard [1], etc.) is warranted. Fully implicit methods are advocated here, owing for the need to exactly preserve conservation laws, and to ensure asymptotic convergence to the Maxwellian when appropriate [2]. However, fully implicit methods demand nonlinear iterative solvers, which developing an effective one is challenging in this context owing to the non-locality of the formulation (via the Rosenbluth potentials), leading to dense linear systems.

In order to effectively deal with the numerical challenges arising from non-locality, we explore a multiscale iterative strategy based on high-order/low-order (HOLO) convergence accelerator scheme [3,4] for the full nonlinear Rosenbluth-Fokker-Planck collision operator. HOLO schemes employ an LO (fluid) moment system to accelerate the convergence of the HO (kinetic) system. In turn, the LO system is closed self-consistently with the HO system. The LO system is comprised of equations for plasma number density, drift velocity, and temperature. These quantities, in turn, inform an LTE approximation for the Rosenbluth potentials plus a perturbation term [of O(Δt/τcol)<< 1] computed from the HO solution. This reformulation shifts the non-local contributions through the Rosenbluth potentials from the HO system to the LO one, which lives in a low-dimensional space and where they can be dealt with efficiently. Numerical experiments in challenging applications (shocks, ICF implosions) will demonstrate the enabling capabilities of the HOLO scheme when Δt >> τcol.

[1] Anderson, J. Assoc. Comput. Mach., 12, 547-560 (1965) [2] Taitano et al., J. Comp. Phys., 297, 357-380 (2015) [3] Taitano et al., J. Comp. Phys., 284, 737-757 (2015) [4] Chacón et al., J. Comp. Phys., 330, 21-45 (2017)

A framework for microscopic/macroscopic simulations of magnetized plasmas

ABSTRACT. Many problems in plasma physics require the solution of the Vlasov-Maxwell (VM) or Vlasov-Boltzmann equations. These equations are extremely hard to solve numerically because of their high dimensionality, nonlinearities and the huge disparity of spatial and temporal scales that have to be bridged between microscopic and system scales. While several reduced methods have been developed in certain limits, a comprehensive approach capable of obtaining accurate solutions in all parameters regimes remains elusive.

We will present a spectral method for the VM equations based on a decomposition of the plasma phase-space density in Hermite or Legendre modes. Its most important feature is that, with a suitable spectral basis, the low-order moments are akin to the typical moments (mass, momentum, energy) of a fluid/macroscopic description of the plasma, while the kinetic/microscopic physics can be retained by adding more moments.  With the 'built-in' fluid/kinetic coupling, the method might offer an optimal way to perform accurate simulations of macroscopic phenomena including microscopic physics.

The method features favorable numerical properties, such as spectral convergence and exact conservation laws in the limit of finite time step. A comparison between the Particle-In-Cell (PIC) and the spectral method on standard electrostatic test problems shows that the spectral method can be orders of magnitude faster/more accurate than PIC for some problems. Furthermore, we have recently developed a hybrid simulation approach that couples the spectral method with a PIC technique. The goal is to combine the accuracy typical of spectral methods, with the flexibility of PIC in dealing with complex distribution functions that might otherwise require a large number of moments for convergence. The application of the spectral/PIC method to the problem of the interaction of a weak beam with a background plasma will be discussed, showing the potential of the hybrid method in terms of computational efficiency and accuracy.

A low-dispersion, energy-conserving relativistic particle-in-cell algorithm for laser-plasma interaction applications
SPEAKER: Guangye Chen

ABSTRACT. Leap-frog-based explicit algorithms, either “energy-conserving” or “momentum-conserving”, do not conserve energy discretely. Time-centered fully implicit algorithms can conserve discrete energy exactly [1], which is desirable for long-term simulations, but introduce large dispersion errors in the light-wave modes, regardless of timestep sizes. This can lead to intolerable simulation errors quickly where highly accurate light propagation is needed (e.g. laser-plasma interactions, LPI). In this study, we selectively combine the leap-frog and Crank-Nicolson schemes to produce a low-dispersion, discretely energy-and-charge-conserving PIC algorithm, in order to preserve the simulation accuracy at both small and large time-scales. Specifically, we employ the leap-frog method for Maxwell equations, and the Crank-Nicolson method for particle equations. The algorithm demands nonlinear iteration for Ampere’s law, but equations are not stiff (algorithm is still limited by the CFL timestep stability constraint), and convergence is fast with simple Picard iterations. Such an algorithm admits exact global energy conservation, exact local charge conservation, and preserves the dispersion properties of the leap-frog method for the light wave. As a result, energy conservation error does not accumulate over time, and the scheme enables much longer simulations than purely explicit schemes. The algorithm has been implemented in a code named iVPIC, builted on the extremely efficient, LANL-based VPIC code [2]. We will present numerical results that demonstrate the properties of the scheme with sample test problems (e.g. Weibel instability run for a few tens of million of timesteps, and LPI applications).

[1] Lapenta and Stefano. Phys Plasmas 18.7 (2011): 072101 [2]