View: session overviewtalk overview

08:00 | Multilevel Low-Rank Correction Preconditioning Techniques SPEAKER: Yuanzhe Xi ABSTRACT. This presentation will discuss a class of preconditioning methods for solving general sparse linear systems of equations that are based on exploiting low-rank approximations to certain matrices. These methods have a number of appealing features. Because they are essentially approximate inverse techniques, they handle indefiniteness quite well. Furthermore, they are amenable to SIMD comptuations such those inherent to GPUs. The talk will first describe a few techniques geared toward a Domain Decomposition framework for a parallel computing environment. These techniques are all based on exploiting Schur complements and low-rank corrections and are applicable to both symmetric and non-symmetric problems. We will then describe our ongoing work to develop low-rank correction methods for highly indefinite systems such as those that arise from Helmholtz equations. Finally, we will describe our current efforts to build a software package that will implement these multilevel low-rank preconditioners for high-performance computing environments. |

08:25 | Rank-structured incomplete factorization preconditioning with adaptive random sampling SPEAKER: Pieter Ghysels ABSTRACT. We present a parallel and fully algebraic preconditioner based on an approximate sparse factorization using low-rank matrix compression. The sparse factorization uses a multifrontal algorithm with fill-in occurring in dense frontal matrices. These frontal matrices are approximated as hierarchically semi-separable matrices, which are constructed using a randomized sampling technique. The resulting approximate solver has (close to) optimal complexity in terms of flops and memory usage for many discretized partial differential equations. The randomized compression scheme uses an adaptive rank determination technique for which we developed a new relative stopping criterion. We illustrate the robustness and performance of this new preconditioner for a number of structured and unstructured grid problems. We look at the impact of fill-reducing reorderings and supernode reorderings on the overall fill and compression rate. We apply the incomplete factorization as a preconditioner for GMRES or BiCGStab and compare with a number of other common preconditioners. Our implementation uses MPI and OpenMP and supports real and complex arithmetic. We present a detailed performance analysis, and discuss recent performance improvements made in the code. The code is released as the STRUMPACK library with a BSD license. |

08:50 | A recursive skeletonization factorization based on strong admissibility SPEAKER: Victor Minden ABSTRACT. The strong recursive skeletonization factorization (RS-S) is a new approximate factorization for "inverting the fast multipole method" for discretized linear integral equations associated with elliptic partial differential equations in two or three dimensions. Unlike previous skeletonization-based methods for solving such systems, the RS-S factorization uses the structure of the fast multipole method to hierarchically compress matrix blocks corresponding to far-field interactions while treating near-field interactions as full rank. This leads to an approximate factorization of the inverse operator as the product of many block unit-triangular matrices that may be used as a preconditioner or moderate-accuracy direct solver. Under suitable rank assumptions, the RS-S factorization exhibits linear computational complexity, which I will demonstrate with a number of numerical examples. |

09:15 | Preconditioning for time-dependent problems SPEAKER: Andy Wathen ABSTRACT. Monolithic (or all-at-once) discretizations of evolutionary problems most often give rise to nonsymmetric linear(ised) systems of equations which can be of very large dimension for PDE problems. In this talk we will describe preconditioners for such systems with guaranteed fast convergence via use of MINRES (not LSQR or CGNE). These results apply with regular time-stepping schemes and relate to Toeplitz matrix technology and preconditioning via circulants using the FFT. This is joint work with Elle McDonald (CSIRO, Australia) and Jennifer Pestana (Strathclyde University, UK) |

09:40 | Sampling for Hierarchical Approximation of Kernel Matrices SPEAKER: James Levitt ABSTRACT. A key challenge for rank-structured solvers and preconditioners is the construction of low-rank approximations of certain submatrices. The cost of computing or accessing all entries of the matrix grows quadratically with the problem size, which becomes prohibitively expensive for large problems. Therefore it is necessary to use only a small sample of the matrix entries in constructing the low-rank approximation, which in our case is an interpolative decomposition (ID). We compare a number of methods for sampling matrix entries, drawing inspiration from theoretical results in randomized linear algebra. Additionally, we present methods for estimating the error of a sampled ID approximation and for efficiently updating a sampled ID with additional samples. Together, these methods allow for the number of samples to be chosen adaptively depending on the problem. We present results for real-world datasets. |

08:00 | Convergence in Norm of Nonsymmetric Algebraic Multigrid SPEAKER: unknown ABSTRACT. Fast linear solvers and preconditioners are well developed for symmetric positive definite (SPD) matrices. Linear or near-linear complexity ("fast") algorithms have been developed for many systems of interest and, in many cases, theoretical results have been established on the convergence or accuracy of a given method. The nonsymmetric setting poses a number of unique challenges over SPD matrices, both in theory and in practice. Developing fast and robust nonsymmetric linear solvers is an active area of research and, in particular, theoretical results on fast, nonsymmetric solvers are limited. Algebraic multigrid (AMG) is one of the fastest numerical methods to solve large sparse linear systems, often arising in the study of graph Laplacians, Markov chains, and the discretization of partial differential equations (PDEs). For SPD matrices, convergence of AMG is well motivated in the A-norm, and AMG has proven an effective solver for many applications. Recently, several AMG algorithms have been developed that are effective on highly nonsymmetric linear systems. Although motivation was provided in each case, the convergence of AMG for nonsymmetric linear systems is still not well understood, and algorithms are based largely on heuristics and/or incomplete theory. A handful of works have delved into convergence of nonsymmetric AMG, but there has yet to be a thorough study on conditions for convergence and, in particular, the practical implications for solver development. Here, we present the first such work, discussing why SPD theory breaks down in the nonsymmetric setting, and developing a general framework for convergence of nonsymmetric AMG. The classical multigrid weak and strong approximation properties are generalized to a "fractional approximation property," and conditions are developed for two-grid and multigrid convergence in the (A^*A)^{1/2}-norm. Results are based on the construction of appropriate bases for multigrid interpolation and restriction operators under which to consider convergence. |

08:25 | Approximate Ideal Restriction (AIR): an Algebraic Multigrid Method for Nonsymmetric Linear Systems SPEAKER: Thomas Manteuffel ABSTRACT. A new algebraic multigrid (AMG) algorithm has been developed recently based on an “approximate ideal restriction” (AIR), which has demonstrated potential as a fast, parallel solver for scalar hyperbolic PDEs. In this talk, we discuss the convergence properties of AIR, motivating choices in its development, and explaining the strong convergence attained on difficult scalar problems. We then demonstrate AIR as a tool for solving high-dimensional PDEs, or systems of PDEs, with hyperbolic character. First, we look at coupling AIR with the diffusion synthetic acceleration (DSA) approach to solving the radiative transport equation. Because AMG scales well in parallel (discussed in a separate talk), AIR promises better parallel scaling than a traditional transport sweep associated with the DSA algorithm. This is especially promising in the context of curvilinear meshes, which often yield non-triangular matrices that are difficult for traditional sweeping algorithms. Then, we explore the potential of developing a DSA-like algorithm for implicit discretizations of the full Vlasov-Maxwell description of plasma, based on AIR’s capability to solve the Vlasov equation and standard AMG to resolve the Maxwell equations. Due to the high dimensionality of the Vlasov-Maxwell system and a lack of fast linear solvers for implicit discretizations of Vlasov, implicit time stepping for kinetic plasma discretizations is largely unexplored. AIR provides such a solver, opening new doors for implicit discretizations of Vlasov-Maxwell and other hyperbolic PDEs, with tractable computational cost. |

08:50 | Parallel approximate ideal restriction AMG for nonsymmetric linear systems SPEAKER: Ruipeng Li ABSTRACT. Algebraic multigrid (AMG) can provide optimal (linear cost) solution methods for linear systems arising from many applications and can be highly scalable for massively parallel machines. Most AMG methods and theory assume a symmetric positive definite (SPD) operator. Highly nonsymmetric matrices, such as those that arise in the discretization of hyperbolic PDEs, come up often in large-scale scientific simulations, yet remain a challenge for AMG methods. A new reduction-based AMG method is developed for hyperbolic-type PDEs and other highly nonsymmetric systems. The proposed method is based on ``approximate ideal restriction'' (AIR) combined with interpolation operators of low complexity and F-relaxations, which is now available in parallel in the Hypre library. The differing interpretations of AIR give rise to different potential parallel implementations. In this talk, we will discuss such algorithmic differences and present numerical results for solving systems from discretizations of the advection-diffusion-reaction equation, from purely advective to purely diffusive, showing AIR’s robustness and efficiency in highly parallel, distributed memory environments. |

09:15 | Solvers for hypersonic flow applications SPEAKER: Jonathan Hu ABSTRACT. While multigrid methods have been successfully applied to subsonic and transsonic flow problems, multigrid methods for hypersonic flow problems is still an open area of research. In this talk, I discuss ongoing efforts to apply linear multigrid methods to hypersonic flow regimes. I'll provide some background on the motivating problems, approaches that are currently being considered, and as well as numerical examples from Sandia's simulation code SPARC. |

09:40 | High-fidelity simulation of wind-turbine incompressible flows with classical and aggregation AMG preconditioners SPEAKER: Stephen Thomas ABSTRACT. The incompressible Navier-Stokes equations are employed to simulate turbulent flow around wind turbines, with the capability to simulate rotating blades. A mesh re-mapping algorithm has been developed for a hybrid control volume finite element discretization (CVFEM) combined with a discontinuous Galerkin (DG) finite element method (FEM). Time-stepping is based upon an inexact projection scheme and BDF-2 integrator. Simulation results are presented for large meshes, surrounding a Vestas V27 turbine and tower, with up to 1.8 Billion degrees of freedom. High-fidelity simulations on up to 50K NERSC-Cori Haswell cores are presented where the set-up and solver costs are examined with repeated re-initialization of linear systems. The nonsymmetric GMRES Krylov iteration is employed to solve both the momentum and mass-continuity equations. A Gauss-Seidel relaxation scheme is employed as the momentum preconditioner. Our talk focuses on a comparison of the classical and aggregation variants of the algebraic multigrid (AMG) algorithm employed as a preconditioner for the mass-continuity equation GMRES Krylov solver. Perhaps the most important factor in the comparison of the C-AMG and SA-AMG algebraic multigrid variants is the effect of anisotropic meshes with high aspect-ratios on convergence. AMG preconditioners are compared on the basis of complexity, set-up/solve time and overall time to solution. Smoothed-aggregation SA-AMG is provided by the Trilinos-MueLu solver framework. Classical Ruge-St\"uben C-AMG is implemented by the Hypre boomerAMG library from CASC-LLNL, Henson and Yang (2000). The Trilinos-Belos solver framework implements the iterated classical Gram-Schmidt ICGS-GMRES algorithm and requires two passes of Gram-Schmidt to recover the stability of the modified Gram-Schmidt (MGS) algorithm Bjorck (1994). However, the ICGS-GMRES variant requires less communication than the MGS-GMRES algorithm provided by Hypre. |

08:00 | A Lanczos-Stieltjes method for one-dimensional ridge function approximation and integration SPEAKER: Andrew Glaws ABSTRACT. Cheap surrogate models, such as polynomial expansions, can enable otherwise infeasible computational studies that rely on multiple evaluations of large-scale, complex simulations. However, constructing multivariate polynomial expansions requires a number of function evaluations that grows exponentially with the number of input parameters. To combat this curse of dimensionality, we often turn to tools for dimension reduction, such as active subspaces or sufficient dimension reduction, which reparameterize the function in terms of a few linear combinations of the original parameters. When this reparameterization fully describes all of the function's behavior, we refer to it as a ridge function. Constructing a polynomial expansion for ridge functions is difficult since the reparameterization transforms the input space as well as the associated input measure. In this paper, we introduce a Lanczos-Stieltjes approach for building polynomial surrogates on one-dimensional ridge functions. We use the Lanczos method as a discrete approximation to the Stieljes procedure to obtain orthogonal polynomials and Gauss-Christoffel quadrature rules with respect to arbitrary measure. This enables cheap and accurate construction of one-dimensional polynomial expansions of the ridge function as well as integration. We numerically study the behavior of the proposed algorithm on several test problems. |

08:25 | Active learning for cost-aware model reduction SPEAKER: Jed Brown ABSTRACT. Active learning is a technique in which a model is constructed incrementally by designing the next experiment. The cost is typically not uniform when the experiment is running a PDE solver, especially when using adaptive methods. The experiments may also fail when resource constraints are exceeded, incurring a cost without obtaining results. We propose an active learning technique for building reduced models of adaptive PDE simulations using Gaussian Process Regression to incorporate uncertainty in models of cost, resource usage, and quantities of interest. |

08:50 | Data-Driven Polynomial Ridge Approximation Using Variable Projection SPEAKER: Jeffrey Hokanson ABSTRACT. Inexpensive surrogates are useful for reducing the cost of science and engineering studies involving large-scale, complex computational models with many input parameters. A ridge approximation is one class of surrogate that models a quantity of interest as a nonlinear function of a few linear combinations of the input parameters. When used in parameter studies (e.g., optimization or uncertainty quantification), ridge approximations allow the low-dimensional structure to be exploited, reducing the effective dimension. We introduce a new, fast algorithm for constructing a ridge approximation where the nonlinear function is a polynomial. This polynomial ridge approximation is chosen to minimize least squared mismatch between the surrogate and the quantity of interest on a given set of inputs. Naively, this would require optimizing both the polynomial coefficients and the linear combination of weights; the latter of which define a low-dimensional subspace of the input space. However, given a fixed subspace the optimal polynomial can be found by solving a linear least-squares problem, and hence by using variable projection the polynomial can be implicitly found leaving an optimization problem over the subspace alone. We provide an algorithm that finds this polynomial ridge approximation by minimizing over the Grassmann manifold of low-dimensional subspaces using a Gauss-Newton method. We provide details of this optimization algorithm and demonstrate its performance on several numerical examples. Our Gauss-Newton method has superior theoretical guarantees and faster convergence than the alternating approach for polynomial ridge approximation earlier proposed by Constantine, Eftekhari, Hokanson, and Ward [https://doi.org/10.1016/j.cma.2017.07.038] that alternates between (i) optimizing the polynomial coefficients given the subspace and (ii) optimizing the subspace given the coefficients. |

10:25 | Polynomial Preconditioned GMRES for Communication Reduction SPEAKER: Jennifer Loe ABSTRACT. We study polynomial preconditioned GMRES on parallel architectures. Polynomial preconditioned methods use more sparse matrix-vector products per orthogonalization step, potentially reducing the number of operations requiring synchronous communication. The GMRES minimum residual polynomial can make an effective preconditioner for Krylov solvers and can be obtained without bounds on the eigenvalues of the matrix. We implement the GMRES minimum residual polynomial as a preconditioner in the Trilinos package Belos and study its effectiveness for solving nonsymmetric linear systems with GMRES. Numerical experiments demonstrate reduction in solve time and communication costs. We also discuss a strategy for choosing the degree of the polynomial. |

10:50 | Toward a Community Software Development Kit (SDK) Including Iterative Solver Software SPEAKER: Carol Woodward ABSTRACT. As the scientific computing community leverages unprecedented high-performance computing resources to work toward predictive science, software complexity is increasing due to multiphysics and multiscale modeling, the coupling of simulations and data analytics, and the demand for greater reproducibility in the midst of disruptive architectural changes. Applications increasingly require the combined use of independent software packages, which have diverse sponsors, priorities, and processes for development and release. These challenges create the unique opportunity to fundamentally change how scientific software is designed, developed, and sustained---with explicit work toward scientific software ecosystems. Numerical software libraries provide a large and growing resource for high-quality, reusable software components on which applications can be rapidly constructed, with improved robustness, portability, and sustainability. This presentation will introduce the xSDK, or Extreme-scale Scientific Software Development Kit, where community-defined policies are increasing the quality and interoperability across numerical libraries as needed by the DOE Exascale Computing Project. This talk will present the community policies developed through the xSDK project, their expected benefits, and lessons learned from adoption of these policies into existing packages. Lastly, the talk will include progress and plans for increasing interoperability among member projects, including: hypre, MAGMA, MFEM, PETSc, SUNDIALS, SuperLU and Trilinos. This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344. LLNL-ABS-744374. |

11:15 | Data Centric Systems: The Architecture Ready for Algorithmic and Implementation Exploitation SPEAKER: Kirk Jordan ABSTRACT. As has been stated many times the volume, variety, velocity and veracity data is pushing how we think about computer systems. IBM Research’s Data Centric Solutions organization has been developing systems that handle large data sets shortening time to solution. This group has created a data centric architecture being delivered to the DoE labs starting at the end of 2017 and being completed in 2018. As various features to improve data handling now exist in these systems, we need to begin to rethink the algorithms and their implementations to exploit these features. In this talk I will describe the architecture and point of hardware and software features ready for exploitation. In addition, cognitive computing using machine learning techniques is all the rage. I will show how we are continuing to evolve and apply data centric cognitive computing to address a variety of problems as some case studies. |

11:40 | A two-level checkpointing formulation for heterogeneous memory systems SPEAKER: Debanjan Datta ABSTRACT. Solving discretized adjoint PDEs form the building block of different applications like derivative-free optimization and uncertainty analysis. Often, the adjoint problems not only pose a computational challenge but are also limited by the speed of data movement. This is particularly true for accelerators like GPUs which have limited memory and data have to be loaded/off-loaded frequently. We present a two-level checkpointing formulation to efficiently solve adjoint problems by exploiting copy-compute concurrency to hide all data-transfer latency. We present a simple optimization model to estimate the optimal checkpointing parameters from system parameters like compute and data transfer rates that minimize the time-to-solution. We also demonstrate the benefit of faster interconnects like NVLink and faster on-node memory like NVMe over traditional devices like PCIe and SSDs respectively. Our findings suggest faster interconnects can not only run faster than legacy interconnects when using the same GPU, but they can also run bigger problems with the same amount of memory. We validate our model using an example from the seismic migration, namely Reverse Time Migration. |

12:05 | A Symbiotic Relationship of Direct and Iterative Methods to Target Sparse Solution on Extreme Scale Computers. SPEAKER: Iain Duff ABSTRACT. We discuss research supported by the EU FET-HPC Project NLAFET that is developing algorithms and software for implementing linear algebra programs on extreme scale computers. In our work at RAL, we are concerned by the solution of the linear system $$Ax=b$$ where the matrix $A$ is large and sparse. Direct methods for solving sparse systems of equations can now solve systems with millions of equations and there are several software packages that can very efficiently exploit parallelism. In fact current codes can achieve a performance level of half the performance of dense matrix-matrix multiply (GEMM) on a range of multi-core architectures. That the solution is to high accuracy and that this is the case over matrices from a wide range of applications is testimony to the power of current sparse direct algorithms. However, the Achilles heel of direct methods is that the factors need more storage than the matrix. In fact, in some cases, they need substantially more storage, sometimes a factor in the hundreds. This then limits the size of the system that can be solved. Iterative methods do not have this problem since they normally only require an interface for forming the product of a vector by the matrix and so can even be matrix-free. However, they will only work on the most simple systems unless the system is preconditioned. There is no universal preconditioner and effective ones often encounter storage problems akin to direct methods. Also it is more difficult to obtain high levels of parallelism from basic iterative methods. The solution lies in combining these two approaches in a hybrid approach. There are many ways to do this and in this talk we discuss a block iterative method where the blocks are factorized using a direct method. The approach that we have studied for some time is the block Cimmino algorithm, a block projection method that has the strong merit of being embarrassingly parallel. In this method, we partition the matrix as: \begin{equation} \label{eqn:partitioned} \left ( {\begin{array}{l} A^1 \\ A^2 \\ \cdot \\ \cdot \\ \cdot \\ A^P \end{array} } \right ) x ~=~ \left ( {\begin{array}{l} b^1 \\ b^2 \\ \cdot \\ \cdot \\ \cdot \\ b^P \end{array} } \right ) \end{equation} and the algorithm computes a solution iteratively from an initial estimate $x^{(0)}$ according to: \begin{eqnarray} \label{eqn:bc} u^{i} & = & A^{i^+} \left ( {b^i - A^i x^{(k)}} \right ) ~~~ i = 1, \ldots , P \\ \label{eqn:bc2} x^{(k+1)} & = & x^{(k)} + \omega \sum_{i=1}^P{u^i}, \end{eqnarray} where $A^{i^+}$ is the Moore-Penrose generalized inverse of $A^i$. We note that the set of $P$ equations are totally independent and can be solved in parallel although all systems need to be solved before the update of $x$ in the second equation can be completed. To get a good balance between stability and preservation of sparsity we choose to solve the equations $$A^i u^i ~=~ r^i,$$ by direct methods using an augmented system of the form \begin{equation} \label{eqn:augmented} \left ( \begin{array}{cc} I & A^{i^T} \\ A^i & 0 \end{array} \right ) \left ( \begin{array}{l} u^i \\ v^i \end{array} \right ) ~=~ \left ( \begin{array}{l} 0 \\ r^i \end{array} \right ). \end{equation} These systems are symmetric indefinite so we need a direct code that can accommodate numerical pivoting. Since we also want to perform each solution in parallel, we use the parallel direct solver MUMPS in the first instance. We are developing our software using as a basis the ABCD code written by Mohammed Zenadi \cite{zena:13}. In addition to the NLAFET researchers working on this we are also working closely with people in Toulouse. All are recognized as co-workers in the acknowledgements. We have already tested the ABCD code on multi-core computers. We show, in Figure~\ref{fig:bc_results} a result when solving a system with the coefficient matrix \textsf{cage12} from the SuiteSparse Test set. The runs were performed on a heterogeneous machine at RAL called SCARF. It has a number of Intel nodes (including some E5-2650, E2660, X5675, X5530, and E5530 nodes, see {\tt http://www.scarf.rl.ac.uk/hardware}). The actual configuration is determined by the batch scheduler at runtime. We see the expected good scaling of the iterations (the number of partitions is fixed at 32 for these runs). \begin{figure}[hbt] \centering \includegraphics[width=.8\linewidth]{processes-cage12} \caption{\label{fig:bc_results}The speedup of block Cimmino for \textsf{cage12} on the SCARF machine at RAL.} \end{figure} Since storage requirements for the sparse factors are superlinear in problem size, the use of the direct solver on subproblems greatly reduces the amount of storage required over using the solver on the whole problem. We have cases where this reduction is a factor of over 100. Additionally, the inherent parallel scalability of the block Cimmino algorithm is easily coupled with the parallel direct solver to enable a hierarchical exploitation of parallelism that we will demonstrate on the parallel heterogeneous computer, Kebnekaise, in Sweden that has multiple KNL nodes, inter alia. \section*{Acknowledgement} This work is co-authored with Philippe Gambron and Florent Lopez at RAL ably supported by Daniel Ruiz, Philippe Leleux, and Sukru Torun in the CERFACS-IRIT Common Laboratory in Toulouse. |

10:25 | A concurrent multiscale method using geometric multigrid and the full-approximation scheme with applications to metal plasticity SPEAKER: Joseph Bishop ABSTRACT. The predictive modeling of plasticity, damage, and fracture phenomena in polycrystalline materials is a grand challenge in solid mechanics. Two scales of primary interest are the macroscale, where engineering quantities of interest are extracted, and the mesoscale where grain-scale physics is operative. Macroscale simulations using conventional homogenized plasticity models are limited in their predictive capability, especially when the assumptions of homogenization theory, for example scale separation and statistical homogeneity, are violated. Mesoscale models of grain-scale physics simulate the plastic response of individual crystals directly and are thus potentially more predictive of the overall mechanical behavior. However, the direct numerical simulation of grain-scale physics within macroscale structures is computationally prohibitive. Here, the use of geometric multigrid techniques and the full approximation scheme is explored as a framework for creating a concurrent multiscale method bridging the mesoscale and macroscale for modeling the large-deformation, nonlinear, response of polycrystalline materials. Initial work and applications to materials and structures created using metal-additive manufacturing will be presented. |

10:50 | Convergence Analysis of the EDIIS Algorithm SPEAKER: Tim Kelley ABSTRACT. We will present global and local convergence results for the EDIIS method (Kudin, Scuseria, Cances 2002). This method is a modification of Anderson acceleration which imposes nonnegativity constraints on the coefficients. We show that the method converges for any initial iterate in the domain of contractivity and that, once near the fixed point, the convergence is no worse than Picard. The local convergence theory is, therefore, the same as Anderson acceleration without constraints. |

11:15 | Adaptive Storage Depth Selection for Anderson Acceleration SPEAKER: Alex Toth ABSTRACT. Anderson acceleration is a method to improve upon convergences rates of Picard iteration for solving fixed-point problems. This method utilizes a history up to some user provided number of previous iterates and function evaluations to compute a new iterate as a linear combination of the stored function evaluations, where the coefficients in this linear combination are determined by solving a constrained least-squares problem over fixed-point residuals. Traditionally, the iteration history is built up until the storage depth is saturated, and after that point, when a new iterate is computed information for the oldest iterate is discarded. The performance of this method can depend greatly on the selection of the maximum storage depth parameter, and a good choice for a given problem is often determined by trial and error. In this presentation, we consider methods for adaptively selecting the storage depth to be used each iteration. This includes a variant in which the storage depth is modified to maintain good conditioning of the least-squares problem. We will show that this modification allows us to prove stronger theoretical convergence results than what has been shown previously for the standard algorithm, and we consider numerical examples illustrating this behavior. |

11:40 | Nonlinearly Preconditioned LBFGS and Nesterov Methods as Acceleration Mechanisms for Alternating Least Squares, with Application to Tensor Decomposition SPEAKER: Hans De Sterck ABSTRACT. We derive nonlinear acceleration methods based on the limited memory BFGS (LBFGS) update formula for accelerating iterative optimization methods of alternating least squares (ALS) type applied to canonical polyadic (CP) and Tucker tensor decompositions. Our approach starts from linear preconditioning ideas that use linear transformations encoded by matrix multiplications, and extends these ideas to the case of genuinely nonlinear preconditioning, where the preconditioning operation involves fully nonlinear transformations. As such, the ALS-type iterations are used as fully nonlinear preconditioners for LBFGS, or, equivalently, LBFGS is used as a nonlinear accelerator for ALS. Numerical results show that, using a modified backtracking linesearch for LBFGS, the resulting methods perform much better than either stand-alone LBFGS or stand-alone ALS, offering substantial improvements in terms of time-to-solution and robustness over state-of-the-art methods for large and noisy tensor problems. The LBFGS acceleration is shown numerically to be more efficient than previously described acceleration methods based on nonlinear conjugate gradients (NCG) and nonlinear GMRES (a.k.a. Anderson acceleration). Our approach provides a general LBFGS-based acceleration mechanism for nonlinear optimization. We also present a version of Nesterov acceleration for general nonconvex problems, using ALS as a nonlinear preconditioner for Nesterov in the context of our applications. Our numerical tests indicate that LBFGS acceleration is more efficient than Nesterov acceleration. |

10:25 | OPTIMAL ITERATIVE SOLVERS FOR LINEAR SYSTEMS WITH (STOCHASTIC) PDE ORIGINS: BALANCED BLACK-BOX STOPPING TESTS SPEAKER: unknown ABSTRACT. This work discusses the design of efficient solution algorithms for symmetric positive-definite, symmetric indefinite, and nonsymmetric linear systems associated with FEM approximation of (stochastic) PDEs. The novel feature of the designed iterative solvers is the incorporation of error control in the `natural norm' in combination with an effective a posteriori estimator for the PDE approximation error. This leads to robust and optimal black-box stopping criteria: the iteration is terminated as soon as the algebraic error is insignificant compared to the approximation error. |

10:50 | Multilevel Monte Carlo Methods with Sample Reuse and Automatic Coarse Grid Generation using Algebraic Multigrid SPEAKER: unknown ABSTRACT. Multilevel Monte Carlo methods are efficient variance reduction techniques that use a coarse to fine hierarchy of approximations to reduce the computational complexity in uncertainty quantification applications. We investigate how the coarse grids can be generated automatically in the partial differential equation setting by using the Algebraic Multigrid method. Furthermore, when using Full Multigrid to solve the underlying deterministic equation, we show how coarse solutions obtained by the solver can be reused as samples in the Multilevel Monte Carlo method. We present numerical results that show significant speedup, even in cases where the underlying coarse grids are hard to obtain. |

11:15 | Multilevel sampling methods for the quantification of uncertainties within the Bayesian approach to inverse problems SPEAKER: Sarah Osborn ABSTRACT. The input parameters in mathematical models for many physical processes are often subject to uncertainty, as the input parameters are typically impossible to determine fully or accurately. If some observational data is available, it is possible to reduce the overall uncertainty and to get a better representation of the input parameters. The Bayesian approach provides a framework for combining prior knowledge of the input parameters with observational data, resulting in the posterior distribution. Then the goal is to compute statistics of quantities of interest under the posterior distribution. We consider standard Monte Carlo methods, where the desired posterior expectation is written as the ratio of two prior expectations and is estimated directly using independent samples, and Markov chain Monte Carlo methods using correlated and approximate samples to directly sample the posterior distribution. In addition, we use a novel, scalable sampling technique to generate the input realizations from a spatially correlated random field and consider multilevel hierarchies using algebraically coarsened spatial discretizations. Numerical results are presented that compare the computational costs of different multilevel sampling techniques to compute posterior expectations of particular quantities of interest for subsurface flow applications and demonstrate the scalability of the methods for large-scale simulations. This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344. |

11:40 | Multifidelity Optimization Under Uncertainty SPEAKER: Anirban Chaudhuri ABSTRACT. This work presents a multifidelity method for optimization under uncertainty. Accounting for uncertainties during optimization ensures a robust design that is more likely to meet performance requirements. Designing robust systems can be computationally prohibitive due to the numerous evaluations of expensive high-fidelity numerical models required to estimate system level statistics at each optimization iteration. In this work, we focus on the robust optimization problem formulated as a linear combination of the mean and the standard deviation of the quantity of interest. We propose a multifidelity Monte Carlo approach to estimate the mean and the variance of the system outputs using the same set of samples. The method uses control variates to exploit multiple fidelities and optimally allocates resources to different fidelities to minimize the variance in the estimators for a given budget. The multifidelity method maintains the same level of accuracy as a regular Monte Carlo estimate using only high-fidelity solves. However, the use of cheaper low-fidelity models speeds-up the estimation process and leads to significant computational savings for the multifidelity robust optimization method as compared to a regular Monte-Carlo-sampling-based approach. |

16:30 | Three dimensional shape reconstruction from liquid displacement using parametric level set methods SPEAKER: Eran Treister ABSTRACT. We consider the reconstruction of a three dimensional object from measurements of liquid volume displacement, exploiting Archimedes law. These measurements are obtained by draining fluid from a container while the object is placed inside in various angles. The reconstruction is obtained by fitting a simulation to the measurements, which leads to an optimization problem. To reduce the number of the parameters of the problem, and to impose proper regularization, we define the object using a parametric level set approach. We show that using this approach, the reconstruction is obtained more accurately and efficiently than with other common approaches. |

16:55 | IR Tools MATLAB Package for Large-Scale Inverse Problems SPEAKER: James Nagy ABSTRACT. In this talk we describe and demonstrate capabilities of a new MATLAB software package that consists of state-of-the-art iterative methods to solve large scale discretizations of inverse problems. The package allows users to easily experiment with different iterative methods and regularization strategies with very little programming effort. The package includes several test problems and examples to illustrate how the iterative methods can be used on a variety of large-scale inverse problems. |

17:20 | Bathymetry estimation from video imagery via the linear dispersion relations and the assimilation of wavenumber observations. SPEAKER: Daniel Reich ABSTRACT. Our goal is to produce accurate bathymetry estimates with valid uncertainties from video imagery of the nearshore zone. We maintain Kalman filters for wavenumber at selected frequencies, and compute the posterior distribution of depth using the linear dispersion relations and Bayes’ formula. We produce bathymetry estimates with uncertainties over one year at the FRF in Duck, NC and compare the results to monthly ground truth surveys. The results show improved uncertainties over the entire nearshore when compared to cBathy. |

17:45 | Iterated Conditional Mode algorithm for a Variable splitting and sparsity enforcing priors for linear inverse problems SPEAKER: Ali Mohammad Djafari ABSTRACT. Optimization of a regularization criterion or equivalently Joint Maximum A Posteriori (JMAP) or Variational Bayesian Approximation (VBA) methods have become now common tools in many applied linear inverse problems. In these methods, often simple Gaussian or Poisson models for the forward model errors have been considered. In this work, first we propose a forward model which splits the errors and uncertainties into modeling and measurement error. A Student-t model is used for the modeling error and non-stationary Gaussian model for the measurement errors. In a second step, a complete Bayesian inference framework is proposed to obtain the expression of the posterior law of all the unknowns (parameters of interests and the hyper-parameters). One advantage of the Bayesian approach is the possibility to estimate, jointly with the reconstruction, the hyper-parameters such as the regularization parameter, thus the capability of proposing unsupervised or at least semi-supervised methods. Finally an Iterated Conditional Mode (ICM) algorithm is implemented to try to obtain the JMAP solution for large scale inverse problems. Examples of implementation of the proposed methods in biological signal processing, in mass spectrometry and in computed tomography are mentioned and referenced. |

18:10 | A non-linear stochastic inverse problem for faults in geophysics SPEAKER: Darko Volkov ABSTRACT. We introduce in this study an algorithm for the imaging of faults and of slip fields on those faults. The physics of this problem are modeled using the equations of linear elasticity. We define a regularized functional to be minimized for building the image. We prove that the minimum of that functional converges to the unique solution of the related fault inverse problem. Due to inherent uncertainties in measurements, rather than seeking a deterministic solution to the fault inverse problem, we consider a Bayesian approach. In this approach the geometry of the fault is assumed to be planar, it can thus be modeled by a three dimensional random variable whose probability density has to be determined knowing surface measurements. The standard deviation of the slip (or any moment of the slip) is computed in a separate step where the geometry of the fault is fixed (for example, the most likely geometry or the mean geometry). The advantage of such an approach is that we obtain a way of quantifying uncertainties as part of our final answer. On the downside, this Bayesian approach leads to a very large computation since the slip is unknown. To contend with the size of this computation we developed an algorithm for the numerical solution to the stochastic minimization problem which can be easily implemented on a parallel multi-core platform and we discuss techniques aimed at saving on computational time. After showing how this algorithm performs on simulated data, we apply it to measured data. The data was recorded during a slow slip event in Guerrero, Mexico. |

16:30 | A Weak Least-Squares Finite Element Method for Hyperbolic Balance Laws SPEAKER: unknown ABSTRACT. In this paper, a least-squares method for scalar nonlinear hyperbolic balance laws is proposed. The focus is on a formulation that utilizes an appropriate Helmholtz decomposition and is closely related to the notion of a weak solution. As a consequence, an important numerical conservation theorem, similar to the famous Lax-Wendroff theorem, is obtained, which guarantees that, when certain convergence holds, the resulting approximations approach a weak solution to the hyperbolic problem. The convergence properties of the method are also discussed. Numerical results for the inviscid Burgers equation with discontinuous sources are shown. The numerical method utilizes a least-squares functional and a Gauss-Newton iterative technique. |

16:55 | Time-Parallel Simultaneous Optimization with Unsteady PDEs SPEAKER: Stefanie Günther ABSTRACT. A non-intrusive framework for integrating existing time-marching schemes solving unsteady partial differential equations (PDEs) into a time-parallel simultaneous optimization algorithm will be presented. To that end, an iterative multigrid reduction technique is applied to the time domain of existing PDE solvers utilizing the non-intrusive software library XBraid. Its general user-interface is extended with the capability of computing adjoint sensitivities such that gradients of output quantities with respect to design changes can be computed iteratively alongside with the primal PDE solution. The primal and adjoint multigrid iterations are embedded into a simultaneous optimization framework, namely the One-shot method. In that method, design updates towards optimality are employed after each primal and adjoint update based on inexact gradient information such that optimality and feasibility of the design and the PDE solution are reached simultaneously. Assigning the time domain to multiple processors allows for a parallel-in-time One-shot optimization where speedup over time-serial optimization methods can be achieved from greater concurrency. The time-parallel One-shot algorithm is validated on an advection-dominated flow control problem which shows significant speedup over a conventional time-serial optimization algorithm. |

17:20 | Convergence Analysis of the Full Approximation Scheme for Nonlinear Problems SPEAKER: Xiaozhe Hu ABSTRACT. The Full Approximation Scheme (FAS) is a variant of multigrid methods, which has mainly been used for solving nonlinear problems but also has many other applications. Although the FAS achieves good performance in practice, its convergence theory is not well understood. In this talk, we view the FAS as an inexact successive subspace optimization method and show that the FAS converges uniformly under the standard assumptions on the objective function and space decomposition. This is a joint work with Long Chen and Steven Wise. |

17:45 | Subdivision-Based Nonlinear Multiscale Cloth Simulation SPEAKER: Alena Kopanicakova ABSTRACT. Cloth simulation is an important topic in both computer graphics and in the textile industry. It is essential for creating compelling animated characters, and it is also increasingly important for virtual try-on applications in e-commerce. Convincing visual results often require high level of details to capture wrinkles and folding patterns, but these results can be computationally very expensive to achieve. It is therefore desirable to improve the performance and efficiency of cloth simulation systems. The work presented here focuses on cloth modeled as Kirchhoff-Love thin shells which leads to strongly nonlinear partial differential equations. A standard approach for dealing with the nonlinearities is Newton's method. This method is locally quadratically convergent, but its convergence is erratic when a poor initial guess is supplied. Furthermore, each Newton step requires solving a large linear system. Given the fourth-order nature of the Kirchhoff-Love equations, the condition number of this linear system scales as O(h^{-4}), where h represents the mesh size. Consequently, solving large systems that allow for an adequate amount of details can become prohibitive. Globalization strategies, such as line-search and trust region methods, provide a remedy for convergence control, but they do not address the ill-conditioning of the linear systems. In this work, we employ the recursive multilevel trust region (RMTR) strategy. The method internally employs models with multiple levels of the detail and provides global convergence guarantees. The application of the globalization strategy inside of the multilevel framework is essential, since the solution obtained on a coarse level may not be within the basin of attraction for Newton's method on the next finer level. Our simulation framework employs an "isogeometric analysis" approach based on Catmull-Clark subdivision surfaces. This allows us to model cloth objects with arbitrary topology and to generate the multilevel hierarchy in very natural way. The exchange of the information between neighboring levels is ensured by several transfer operators. The prolongation and restriction operators, well-known from multigrid literature, are assembled based on subdivision rules. In addition, we leverage a reverse subdivision operator, which is used to transfer the iterates from the fine level to the coarse. The novel use of this operator provides a computationally efficient alternative to the L_2-projection operator used elsewhere. We demonstrate the overall performance of the RMTR method on several numerical examples. The presented results illustrate the robustness of the employed multilevel solution strategy with respect to the large time-steps. We will also analyze the convergence behavior of the method both as a nonlinear solver and as a preconditioner. At the end, a comparison with commonly used nonlinear solvers will be made and we show a reduction in the number of iterations by several orders of magnitude. |

18:10 | Scalable solution methodologies for computationally challenging flow simulations from the high-order CFD workshop SPEAKER: Behzad R Ahrabi ABSTRACT. In recent years, high-order finite-element discretizations such as continuous Petrov-Galerkin (PG) and discontinuous Galerkin (DG) schemes have received a significant attention in the computational fluid dynamics (CFD) community. The International Workshop on High Order CFD Methods, held by the American Institute of Aeronautics and Astronautics (AIAA), has aimed at providing an impartial forum for evaluating the status of high-order methods in solving a wide range of flow problems, including two-dimensional simple-geometry test cases for assessment of the order of accuracy, as well as computationally challenging three-dimensional cases with transonic turbulent flows for assessment of the nonlinear and linear convergence behavior of these methods. The latter has been motivated by the fact that the finite-element discretized equations that must be solved for steady-state turbulent flows have proven to be very stiff and difficult to converge efficiently and robustly. In this study, two high-order finite-element flow solvers, with SUPG and DG discretizations, have been employed to run turbulent flow simulations on the NASA common research model (CRM) wing-body configurations. To perform these simulations, two nonlinear solution strategies, including an inexact Newton method based on pseudo-transient continuation (PTC), and a nonlinear p-multigrid scheme based on full approximation scheme (FAS) have been combined in various ways to devise robust, scalable, and efficient solution techniques for reaching steady-state solutions. The linear systems have been solved using a preconditioned FGMRES algorithm. For preconditioning, the ILU(k) method (with- and without overlapping) and a new high-order implicit line smoothing method have been used and compared. In the line smoothing method, the lines are generated using a matrix-based approach based, a dual-CFL strategy is used for extra stabilization, and the connectivity pattern of unknowns on the lines are extended for high-order continuous discretizations. The advantages of the line smoothing method in terms of robustness, memory consumption, and computational efficiency is demonstrated on a transonic turbulent flow (at cruise condition) and a subsonic turbulent flow (at high-lift condition). |

16:30 | Local Fourier analysis of BDDC-like algorithms SPEAKER: unknown ABSTRACT. Local Fourier analysis (LFA) is a commonly used tool for the analysis of multigrid and other multilevel algorithms, providing both insight into observed convergence rates and predictive analysis of the performance of many algorithms. In this paper, we adapt LFA to variants of two- and three-level FETI-DP and BDDC algorithms, to better understand the eigenvalue distributions and condition number bounds on these preconditioned operators. The LFA is validated by considering the 2D Laplacian and predicting the condition numbers of the preconditioned operators with different sizes of subdomains. Several variants are analyzed, which can yield good improvements in performance for some, but not all, two-level methods. |

16:55 | A multigrid algorithm with subspace correction smoother for grids with arbitrary hanging node configurations SPEAKER: Sara Calandrini ABSTRACT. In this talk, I will present a multigrid V-cycle algorithm for locally refined finite element meshes with arbitrary-level hanging nodes. Multigrid algorithms differ in the type of smoothing procedure employed but also in the way such a smoothing procedure is carried out on the domain. The smoothing process we adopt is of successive subspace correction (SSC) type. The subspaces involved in the subspace decomposition of the multigrid space are chosen according to a multilevel strategy. This approach does not require changes in the basis functions and naturally solves the problem of hanging nodes. Unlike existing algorithms that perform smoothing only on a subspace of the multigrid space, we adopt a global smoothing strategy at each multigrid level. This guarantees that an arbitrary improvement of the convergence bound can be obtained when increasing the number of smoothing iterations. I will present numerical tests to investigate the convergence properties of the proposed method. The results highlight how the proposed algorithm has better convergence properties than local smoothing strategies that have a comparable computational complexity. |

17:20 | Construction of continuous finite element spaces with arbitrary-level hanging nodes and applications to multigrid algorithms SPEAKER: Giacomo Capodaglio ABSTRACT. I will present an algorithm for the construction of continuous basis functions to be employed in h-refined finite element applications with local refinement and arbitrary-level hanging nodes configurations. In such a framework, the 1-irregularity condition on the mesh is not enforced, meaning that the maximum difference between the refinement level of two adjacent elements can be greater than one. The algorithm is simple to implement and does not require the solution of a linear system to obtain the constraint coefficients necessary to enforce continuity of the basis functions. I will illustrate analytic results to prove the linear independence and continuity of the proposed functions. Finite element spaces are then defined as the spanning sets of such basis functions. These finite element spaces are convenient to use from an implementational point of view, and suitable for multigrid applications. I will conclude with numerical results to study the convergence properties of a multigrid algorithm built on the finite element spaces here defined. |

17:45 | Algebraic Multigrid Methods, Data Structures and Performance SPEAKER: Ulrike Yang ABSTRACT. The hypre software library provides high performance preconditioners and solvers for the solution of large sparse linear systems on massively parallel computers. One of its attractive features is the provision of conceptual interfaces, which include a structured, a semi-structured, and an unstructured interface. Each of these interfaces uses very different matrix data structures, which affect the performance of the provided multigrid methods. Structured multigrid solvers can take advantage of additional information in the structured matrix data structures, but are confined to structured problems, whereas unstructured multigrid solvers can be applied to more general problems. In addition, work is in progress to design a semi-structured multigrid method that can take advantage of the structured parts of a problem, but is capable to solve more general problems. This presentation discusses these methods, their implementation and their performance. |