CM2020: 16TH COPPER MOUNTAIN CONFERENCE ON ITERATIVE METHODS
PROGRAM FOR MONDAY, MARCH 23RD
Days:
previous day
next day
all days

View: session overviewtalk overview

08:00-10:05 Session 5A: Machine Learning and Iterative Methods. Bighorn B
08:00
Future of Computing – How May Iterative Methods Exploit Future Technology Directions

ABSTRACT. We are at an inflexion point in the way we are computing today and the way we will compute tomorrow. While computer systems will consist of heterogeneous components adding to complexity, we need to make high end systems much more accessible. Combine this machine complexity with the fact that more compute is required to tackle the ever increasing complexity of the problems and data that are driving science investigations using iterative methods. How will we address this ever increasing complexity. IBM Research is looking at the Future of Computing through a new lens which consists of bits, neurons and qubits. In this talk I will describe how IBM Research is looking at the future of computing and I will outline how we are increasing the use of Artificial Intelligence (AI) and Machine Learning (ML) to address some of the complexity while enabling us to tackle the interesting science questions which will require new algorithms, we and our collaborators are currently investigating with some discussion on how iterative methods and associated algorithms may come into play in this future computational ecosystem.

08:25
Asynchronous SGD for Training DNNs

ABSTRACT. In the past decade, Deep Neural Networks (DNNs) have gained a lot of interest due to their successes for solving machine learning tasks such as image recognition and machine translation. For a given machine learning application, training a DNN involves optimizing a highly nonlinear and nonconvex problem which is not possible to solve optimally in practice. For this reason, variants of the Stochastic Gradient Descent (SGD) method have been used to find a good approximation of this problem, making it possible for the model to achieve good accuracy in a great variety of applications. However, due to the high number of parameters to be optimized and the large sizes of the datasets, training DNNs remains computationally costly. One solution for reducing the training time consists in exploiting parallelism in the SGD algorithm.

In this presentation we discuss several strategies for parallelizing the SGD algorithm and focus more particularly on the asynchronous variants. In the context of the MagmaDNN framework, we implemented an asynchronous SGD algorithm capable of running on both multicore CPUs and GPUs, and using standard DNN benchmarks such as MNIST and CIFAR, we show the benefits of our asynchronous algorithm in terms of performance compared to the traditional synchronous approach.

08:50
Towards Robust Training and Initialization of Deep Neural Networks: An Adaptive Basis Viewpoint

ABSTRACT. The talk develops a novel initialization and a hybrid least squares/gradient descent approach for training deep neural networks. These techniques are motivated by taking an adaptive basis viewpoint of deep neural networks (see for instance [1, 2, 3, 4]). Taking this perspective naturally leads to the least squares/gradient descent algorithm. We show that the developed initialization procedure is a natural outgrowth of maintaining high-quality basis functions. We provide analysis of these techniques and illustrate via numerical examples increases in accuracy and convergence rate for benchmarks characterizing scientific applications where DNNs are currently used, including regression problems and physics-informed neural networks for the solution of partial differential equations.

[1] Kevin P Murphy. Machine learning: a probabilistic perspective. MIT press, 2012. [2] Juncai He, Lin Li, Jinchao Xu, and Chunyue Zheng. ReLU deep neural networks and linear finite elements. arXiv preprint arXiv:1807.03973, 2018. [3] Daria Fokina and Ivan Oseledets. Growing axons: greedy learning of neural networks with appli- cation to function approximation. arXiv preprint arXiv:1910.12686, 2019. [4] Ze Wang, Xiuyuan Cheng, Guillermo Sapiro, and Qiang Qiu. Stochastic conditional generative networks with basis decomposition. arXiv preprint arXiv:1909.11286, 2019.

09:15
Multilevel training of deep residual networks

ABSTRACT. Deep residual networks (ResNets) are widely used for solving computer vision tasks, such as image classification. The popularity of ResNets is due to their impressive performance, even for networks with thousands of layers.The ResNet architecture can mathematically be reformulated as a forward Euler discretization of a nonlinear initial value problem (IVP). The training of ResNets then consists of iteratively solving an optimal control problem with a given dynamical system. The predominant strategy for training any deep network is the stochastic gradient descent (SGD) method. Although the SGD method is simple to implement, the convergence rate of the method deteriorates as the problem size increases.

In this talk, we propose to train ResNets by using a nonlinear multilevel minimization method, namely the MG-Opt. Our variant of the MG-Opt method uses two different strategies to build a multilevel hierarchy. In our first approach, we take advantage of the time-dependent nature of the initial value problem and create a coarse level model by discretizing the original IVP with larger time-steps. In our second approach, we build a multilevel hierarchy by coarsening the sample space. This results in a multilevel-solution strategy, which imitates stochastic variance reduction methods. In this talk, we will analyze the convergence behavior of our proposed multilevel minimization method and show its robustness. A significant speedup compared to the standard SGD method will be demonstrated by means of several numerical examples.

09:40
Layer-Parallel Training and Multilevel Initialization for Deep Residual Neural Networks

ABSTRACT. Residual neural networks (ResNets) are a type of deep neural network and they exhibit excellent performance for many learning tasks, e.g., image classification and recognition. A ResNet architecture can be interpreted as a discretization of a time-dependent ordinary differential equation (ODE), with the overall training process being an ODE-constrained optimal control problem. The time-dependent control variables are the network weights and each network layer is associated with a time-step. However, ResNet training often suffers from prohibitively long run-times because of the many sequential sweeps forwards and backwards across layers (i.e., time-steps) to carry out the optimization. This work first investigates one possible remedy (parallel-in-time methods) for the long run-times by demonstrating the multigrid-reduction-in-time method for the efficient and effective training of deep ResNets. The proposed layer-parallel algorithm replaces the classical (sequential) forward and backward propagation through the network layers by a parallel nonlinear multigrid iteration applied to the layer domain. However, the question remains how one initializes networks with hundreds or thousands of layers, which leads to the second part. Here, a multilevel initialization strategy is developed for deep networks, where we apply a refinement strategy across the time domain, that is equivalent to refining in the layer dimension. The resulting refinements create deep networks, with good initializations for the network parameters coming from the coarser trained networks. We investigate this multilevel "nested iteration" initialization strategy for faster training times and for regularization benefits, e.g., reduced sensitivity to hyperparameters and randomness in initial network parameters.

08:00-10:05 Session 5B: Helmholtz Solvers. Bighorn C/1
08:00
WaveHoltz an Iterative Solver for the Helmholtz Equation via the Wave Equation. Part 1: Theoretical Aspects.

ABSTRACT. A new idea for iterative solution of the Helmholtz equation is presented. We show that the iteration which we denote WaveHoltz and which filters the solution to the wave equation with harmonic data evolved over one period, corresponds to a coercive operator or a positive definite matrix in the discretized case. This is the first talk of two and will focus on convergence theory and error estimates.

08:25
WaveHoltz an Iterative Solver for the Helmholtz Equation via the Wave Equation. Part 2: Experiments, implementation and extensions.

ABSTRACT. A new idea for iterative solution of the Helmholtz equation is presented. We show that the iteration which we denote WaveHoltz and which filters the solution to the wave equation with harmonic data evolved over one period, corresponds to a coercive operator or a positive definite matrix in the discretized case. This is the second talk of two and will focus on implementational aspects and extensions of the method. Numerical results will be presented.

08:50
Scalable convergence using two-level deflation preconditioning for the Helmholtz equation

ABSTRACT. Recent research efforts aimed at iteratively solving the Helmholtz equation have focused on incorporating deflation techniques for accelerating the convergence of Krylov subpsace methods. The requisite for these efforts lies in the fact that the widely used and well acknowledged Complex Shifted Laplacian Preconditioner (CSLP) shifts the eigenvalues of the preconditioned system towards the origin as the wave number increases. The two-level-deflation preconditioner combined with CSLP (DEF) showed encouraging results in moderating the rate at which the eigenvalues approach the origin. However, for large wave numbers the initial problem resurfaces and the near-zero eigenvalues reappear. Our findings reveal that the reappearance of these near-zero eigenvalues occurs if the near-singular eigenmodes of the fine-grid operator and the coarse-grid operator are not properly aligned. This misalignment is caused by accumulating approximation errors during the inter-grid transfer operations. We propose the use of higher-order approximation schemes to construct the deflation vectors. The results from Rigorous Fourier Analysis (RFA) and numerical experiments confirm that our newly proposed scheme outperforms any other deflation-based preconditioner for the Helmholtz problem. In particular, the spectrum of the adjusted preconditioned operator stays fixed near one. These results can be generalized to general shifted indefinite systems with random right-hand sides. For the first time, the convergence properties for very large wavenumbers (k = 10^6 in one-dimension and k = 10^3 in two-dimensions) have been studied, and the convergence is close to wave number independence. Wave number independence for three-dimensions has been obtained for wave numbers up to k = 75. The new scheme additionally shows very promising results for the more challenging Marmousi problem. Despite having a strongly varying wave number, we obtain a constant number of iterations and a reduction in computational time as the results remain robust without the use of the CSLP-preconditioner.

09:15
A Fast Solver for the Fractional Helmholtz Equation

ABSTRACT. In this talk, we discuss a Helmholtz problem with a spectral fractional Laplacian, instead of the standard Laplacian. Recently, it has been established that such a fractional Helmholtz problem better captures the underlying behavior in Geophysical Electromagnetics. We establish the well-posedness and regularity of this problem and introduce a hybrid spectral-finite element approach to discretize it and show well-posedness of the discrete system. In addition, we derive a priori discretization error estimates. Finally, we introduce an efficient solver that scales as well as the best possible solver for the classical integer-order Helmholtz equation. We conclude with several illustrative examples that confirm our theoretical findings.

09:40
A GenEO coarse space for solving the heterogeneous Helmholtz equation with domain decomposition methods

ABSTRACT. The development of effective solvers for high frequency wave propagation problems, such as those described by the Helmholtz equation, presents significant challenges. One promising class of solvers for such problems are parallel domain decomposition methods, however, an appropriate coarse space is typically required in order to obtain robust behaviour (scalable with respect to the number of domains, weakly dependant on the wave number but also on the heterogeneity of the physical parameters). In this talk we introduce a coarse space based on generalised eigenproblems in the overlap (GenEO) for the Helmholtz equation. Numerical results within FreeFEM demonstrate convergence that is effectively independent of the wave number and contrast in the heterogeneous coefficient as well as good performance for minimal overlap.

08:00-10:05 Session 5C: Algebraic Multigrid. Bighorn C/2
Chair:
08:00
Finding AMG aggregates in a haystack: from maximum product matching to scalable solvers

ABSTRACT. The race to achieve exascale computing has been producing a large impact on research on iterative methods and preconditioners to move the available methods and software towards extreme scalability. Effective methods for massively parallel computers must address two generally contrasting issues: algorithmic scalability, i.e., the ability to converge in the same number of iterations irrespective of the problem size and the number of computing cores employed; and implementation scalability, i.e., the ability to execute computations at the same rate on each computing element, as the size of the problem and the number of computing elements grow. In this talk, we will present recent developments in a parallel software framework implementing aggregation-based AMG preconditioners; in particular, we discuss new methods and algorithms for the multigrid hierarchy setup and application to balance computational/communication complexity and convergence properties of AMG at extreme scale. We will also discuss investigations into the theoretical assessment of the quality of coarse index spaces obtained by aggregation of dofs based on maximum product matching of the underlying graph. The presentation includes some experimental results obtained on applications in the context of the Energy Oriented Centre of Excellence EoCoE-II project.

This work was partially supported by EU Grant 824158 - EoCoE-II.

08:25
Algebraicly generated line relaxation for algebraic multigrid domain decomposition

ABSTRACT. Advanced relaxation techniques such as alternating line or plane relaxation are well-known to produce robust geometric multigrid algorithms, but similar approaches to relaxation are less well-studied for algebraic multigrid (AMG) algorithms. This talk examines a completely algebraic approach to finding tridiagonal structures in a given matrix which may then be solved to provide an algebraic analogue to line relaxation. In contrast to other attempts to automatically generate block smoothers for AMG, our algorithm focuses on extracting tridiagonal matrix splittings so that high-performance tridiagonal solvers may be employed for the block solves. Furthermore, we investigate the performance of this new relaxation method in the context of algebraic multigrid domain decomposition (AMG-DD), a communication-avoiding algorithm built on top of AMG.

08:50
Coarse Grid Selection using Simulated Annealing

ABSTRACT. Multilevel techniques are often the most efficient approaches for solving the very large linear systems that arise from discretized PDEs and other problems. While geometric multigrid requires detailed knowledge about the underlying PDE and its discretization, algebraic multigrid aims to be less intrusive, requiring less knowledge about the origin of the linear system. A key step in AMG is the choice of the coarse/fine partitioning, aiming to balance the convergence of the iteration with its cost. In past work (MacLachlan and Saad, SISC 2007), a constrained combinatorial optimization problem was used to define the ``best'' coarse grid within the setting of two-level reduction-based AMG and was shown to be NP-complete. Here, we develop a new coarsening algorithm based on simulated annealing to solve this problem, giving better results than the greedy algorithm developed previously. We present numerical results for test problems on both structured and unstructured meshes, demonstrating the ability to exploit knowledge about the underlying grid structure if it is available.

09:15
A Multigrid Reduction Framework for Flow in Porous and Fractured Media

ABSTRACT. Simulation of underground fluid flow involves solving a multi-physics problem in which multiphase flow and transport are tightly coupled. To capture this dynamic interplay, fully implicit methods, also known as monolithic approaches, are usually preferred. The main bottleneck of a monolithic approach is that it requires solution of large linear systems that result from the discretization and linearization of the governing balance equations. Such systems are non-symmetric, indefinite, and highly ill-conditioned, which make them difficult to solve. These problems become even more challenging when the flow is coupled to mechanical processes such as rock and fracture deformation. Thus, designing preconditioners that can handle the couplings between different physics is critical for fast convergence. Recently, most efforts in developing efficient preconditioners have been dominated by physics-based strategies. Current state-of- the-art “black-box” solvers such as algebraic multigrid (AMG) are ineffective because they cannot resolve the strong coupling between the mechanics and the flow sub-problems, as well as the coupling inherent in the multiphase flow and transport process. In this work, we develop an algebraic framework based on multigrid reduction (MGR) that is suited for tightly coupled systems of PDEs. Using this framework, the decoupling between the equations is done algebraically through defining appropriate interpolation and restriction operators. One can then employ existing solvers for each of the decoupled blocks or design a new solver based on knowledge of the physics. We demonstrate the applicability of our framework when used as a “black-box” solver for multiphase poromechanics and flow in fractured media. We show that the framework is flexible to accommodate a wide range of scenarios, as well as efficient and scalable for large problems.

09:40
Developing Work-Optimal Multilevel Methods

ABSTRACT. There has been significant progress in developing robust multilevel methods for a wider range of problems in recent years. Algebraic multigrid has benefited from advances in in adaptive methods, energy minimization and other interpolation strategies, and improvements to coarsening. In each of these, computational cost is used to guide algorithmic decisions. In this talk we consider the question of work-optimal multilevel methods. That is, methods that achieve the best convergence and computational cost, under a particular model. We present the general optimization framework to develop these methods and highlight several examples that point to the optimality of traditional methods as well as the potential of this approach in more challenging settings.

10:25-12:30 Session 6A: Machine Learning and Iterative Methods. Bighorn B
10:25
Improving linear solver performance using machine learning

ABSTRACT. Analysts running large-scale often have to set numerous “non-physics” parameters that have a substantial effect on computational time and robustness. The further away from the physics these parameters get, the more analysts tend to rely on canned parameter sets reused from previous problems, which may have very different algebraic or operator structure. Many times this leads to suboptimal solutions choices or choices which prevent code from running to completion. Moreover, many codes do not support parameter adaptation associated with these non-physics settings, forcing the analyst to pick one set for all time and hope for the best.

We use machine learning methods to build predictive models for identifying optimal non-physics parameters --- specifically, multilevel preconditioner settings---as a function of the algebraic and statistical properties of the linear operators and/or mesh information involved in the simulations. We demonstrate this in the context of Trilinos' multilevel preconditioning packages, ML and MueLu.

10:50
Multigrid Methods and Convolutional Neural Network

ABSTRACT. In this talk, I will present a theoretical framework, known as MgNet, that unifies classic multigrid method for solving partial differential equations and convolutional neural network (CNN) for image classification. Within this framework, a new class of competitive CNN can be obtained by making very minor modification of a multigrid cycle and a general class of multigrid methods can be obtained.

11:15
Machine Learning in adaptive domain decomposition methods-- predicting the geometric location of constraints

ABSTRACT. The convergence rate of domain decomposition methods is generally determined by the eigenvalues of the preconditioned system. For second-order elliptic partial differential equations, coefficient discontinuities with a large contrast can lead to a deterioration of the convergence rate. A remedy can be obtained by enhancing the coarse space with elements, which are often called constraints, that are computed by solving small eigenvalue problems on portions of the interface of the domain decomposition, i.e., edges in two dimensions or faces and edges in three dimensions. In the present work, without restriction of generality, the focus is on two dimensions. In general, it is difficult to predict where these constraints have to be computed, i.e., on which edges. Here, a machine learning based strategy using neural networks is suggested to predict the geometric location of these edges in a preprocessing step. This reduces the number of eigenvalue problems that have to be solved during the iteration. Numerical experiments for model problems and realistic microsections using regular decompositions as well as those from graph partitioners are provided, showing very promising results.

11:40
Optimizing parameters of iterative methods with stochastic optimization

ABSTRACT. This talk presents a novel approach for optimizing parameters of iterative methods based on the given pairs of initial approximations and the approximate solutions. This approach is demonstrated on a number of iterative methods like multigrid method [1], preconditioned conjugate gradient method [2] and shift-and-invert method for computing matrix exponential action on given vector [3]. These numerical methods depend on some number of parameters that significantly affect the convergence speed. Therefore, a proper choice of parameters is an important problem. To address this problem we propose to minimize some stochastic approximations of the convergence factors that evaluate the quality of the parameters’ values. For example, parameters in the multigrid method are restriction and prolongation operators and convergence of the multigrid method is estimated with a spectral radius of the iteration matrix. Therefore, we propose a stochastic approximation of this spectral radius to use automatic differentiation and compute a stochastic gradient of this approximation. After that, we use a stochastic optimization method to find locally optimal parameters to ensure faster convergence of the multigrid method. In all considered test problems the presented approach outperforms adaptive methods in the term of the spectral radius. The same methodology is adapted for the preconditioned conjugate gradient method, where the parameter is some scalar that is used in a relaxed incomplete Cholesky preconditioner. In this case, we use gradient-free optimization techniques to get the value of the parameter which leads to faster convergence. Experimental results demonstrate the good performance of this method compared with other target functions. The last numerical method that is used to demonstrate the performance of he proposed methodology is the shift-and-invert method for computing action of matrix exponential by a given vector. For this method, we use residual as the target function and gradient-free optimization method and obtain a significant gain in total running time for considered test problems.

[1] Katrutsa, A., Daulbaev T., and Oseledets I. Black-box learning of multigrid parameters.J. Comput.Appl. Math.(2020)368. [2] Katrutsa, A., et al. How to optimize preconditioners for the conjugate gradient method: a stochastic approach. arXiv preprint (2018) arXiv:1806.06045. [3] Katrutsa, A., Botchev M., and Oseledets I. Practical shift choice in the shift-and-invert Krylov subspace evaluations of the matrix exponential. arXiv preprint (2019) arXiv:1909.13059

12:05
Generative adversarial networks and iterative methods

ABSTRACT. Generative adversarial networks (GAN) are one of the hot topic in machine learning and deep learning nowdays. Given samples from a probability distribution, one can learn the probability distribution and efficient sampler from it. In iterative methods the most promising application is the regularization of solution of inverse problems: we parametrize the prior knowledge of the solution by GAN, and that serves as an implicit regularizer. We present several examples from the solutions of ill-posed problem in porous media modelling.

10:25-12:30 Session 6B: Computational Electromagnetics. Bighorn C/1
10:25
High Performance Domain Decomposition Method for 3D Electromagnetic simulations

ABSTRACT. We present a non-overlapping Domain Decomposition Method (DDM) for the solution of electromagnetics problems such as the computation of the Radar Cross Section (RCS) of an object. The numerical computation of such problems remains limited by computer resources, due to the large number of unknowns, especially when inhomogeneous materials with high index are present. Since we target objects that are coated with layers of material, we restrict ourselves to a DDM with non-intersecting interfaces so that no subdomain has more than two neighbours. For the sake of high accuracy, an exact radiation condition is used to close the computational domain.

It is well-known that the efficiency of DD methods relies strongly on the transmission conditions used to couple the subdomains. In this context, we investigate various transmission conditions, for which Sufficient Conditions for Uniqueness (SCU) can be derived, ensuring the wellposedness of the local problems. An implementation of this algorithm, using finite elements, is presented and numerical results show the interest of our method. An emphasis will be made on the HPC framework in which the code has been developed (that uses MPI and OpenMP), and in particular we will present scalability results of some part of the code obtained on the Tera 1000-2 supercomputer at the CEA facility.

10:50
Analysis of parallel Schwarz solvers for time-harmonic wave propagation problems

ABSTRACT. Time harmonic wave propagation problems are notoriously difficult to solve especially in high frequency regime. Several reasons are at the origin of this: first of all the oscillatory nature of the solution, meaning that the number of degrees of freedom after discretisation increases drastically with the wave number (especially for lower order approximations) giving rise to complex valued large problems to solve. Secondly, the non-normality of the operator making it difficult to control and predict the behaviour of a Krylov type solver. In this work we propose an analysis of a parallel Schwarz iteration on some simple configurations in one and two dimensions which uses the block Toeplitz structure of the underlying iteration matrices, giving an important insight on the behaviour of the algorithm.

11:15
A Finite-Element Framework for a Mimetic Finite-Difference Discretization of Maxwell's Equations

ABSTRACT. Maxwell’s equations are a system of partial differential equations that govern the laws of electromagnetic induction. When solving such problems computationally, it is important to preserve underlying physical properties such as enforcing a divergence-free magnetic field. With this in mind, we consider a structure-preserving mimetic finite-difference (MFD) scheme which satisfies the physical constraints on the discrete level. In order to better understand the MFD scheme, we examine it under the structure-preserving finite-element (FE) framework. More precisely, by scaling and applying mass-lumping to the FE scheme, we show that MFD and FE are equivalent. This connection gives us the simplicity of a finite-difference method while permitting us to theoretically analyze the MFD scheme using the FE framework. We are able to show the well-posedness and a priori error estimates for the MFD scheme based on the mass-lumping accuracy. Furthermore, we can develop efficient and robust linear solvers for the MFD scheme using techniques from the FE method. Finally, numerical tests are presented to demonstrate the theoretical results.

10:25-12:30 Session 6C: Multigrid. Bighorn C/2
10:25
An h-multigrid method for Hybrid High-Order discretizations

ABSTRACT. We consider a second order elliptic PDE discretized by the Hybrid High Order (HHO) method, for which globally coupled unknowns are located at faces. To efficiently solve the resulting linear system, we propose a geometric multigrid algorithm that keeps the degrees of freedom on the faces at every level. The core of the algorithm resides in the design of the prolongation operator that passes information from coarse to fine faces through the reconstruction of an intermediary polynomial of higher degree on the cells. Higher orders are natively handled by the conservation of the same polynomial degree at every level. The proposed algorithm requires a hierarchy of nested meshes where the faces are also successively coarsened. Numerical tests on homogeneous and heterogeneous diffusion problems in square and cubic domains show fast convergence, scalability in the mesh size and polynomial order, and robustness with respect to heterogeneity of the diffusion coefficient.

10:50
Multigrid in H(div) on Axisymmetric Domains

ABSTRACT. In this talk, we will present a multigrid algorithm that can be applied to weighted H(div) problems on a two-dimensional domain. These problems arise after performing a dimension reduction to a three-dimensional axisymmetric H(div) problem with general data. We will use recently developed Fourier finite element spaces that can be applied to H(div) problems on axisymmetric domains. We present mathematical results that show that if the axisymmetric domain is convex, then the multigrid V-cycle will converge uniformly with respect to the mesh size. We will also show numerical results consistent with the mathematical theory.

11:15
A two level method for isogeometric discretizations

ABSTRACT. Isogeometric Analysis (IGA) is a numerical method that integrates finite element methods (FEM) and computer-aided design (CAD). In the isogeometric setting, basis functions are used for both the representation of the computational domain and the numerical approximation of partial differential equations (PDEs). They are spline-type basis functions that are able to hold a global smoothness and allow to exactly capture a wide set of common geometries. Nowadays, the search for fast solvers for isogeometric discretizations is receiving a lot of attention in the literature. A desired property of such solvers is the robustness with respect to both the spline degree p and the mesh size h. For this purpose, in this work we propose a two-level method such that a discretization of order p is considered in the first level whereas the second level consists of a linear or quadratic discretization. On the first level, we suggest to apply one smoothing step of a multiplicative Schwarz method. This choice is supported by a local Fourier analysis (LFA). Then, at the second level one can apply any given strategy to solve the problem exactly. A suitable choice is to get an approximation of the solution at this level by using an h-multigrid method. Based on this alternative, our approach consists of not performing any smoothing steps at the second level but to apply them in the coarser levels instead. Using this approach, the resulting solver is efficient and robust with respect to the spline degree p. Finally, some numerical experiments are given in order to demonstrate the good performance of our solver.

11:40
Deflated p-multigrid solvers for Isogeometric Analysis

ABSTRACT. Within Isogeometric Analysis, solving the resulting linear systems becomes a challenging task when higher order B-spline basis functions are adopted. The use of (deflated) p-multigrid methods, however, leads to convergence rates independent of the mesh width h and approximation order p of the B-spline basis functions. In this talk, we investigate the use of block ILUT smoothing and deflation within p-multigrid methods in case of multipatch geometries.

12:05
Nesterov Accelerated Multigrid Method

ABSTRACT. The development of efficient multilevel methods for solving sparse linear systems of equations arising from discretizations of partial differential equations (PDEs) is an important task in scientific computing. The performance and efficiency of multigrid (MG) methods with standard V- or W-cycle may degenerate when the physical and geometric properties of the PDEs become more and more complicated. For symmetric positive definite (SPD) problems, more involved cycles, such as the algebraic multilevel iteration (AMLI)-cycle and its nonlinear version, K-cycle, have been proposed. However, AMLI-cycle requires accurate estimation of extreme eigenvalues, which may be difficult in practice, and K-cycle may suffer from the increasing computational and memory cost when more iterations are needed on the coarse levels. In this work, we propose an accelerated MG cycle. The idea is to rewrite the linear systems as optimization problems and apply Nesterov acceleration. The resulting MG cycle, which we call it N-cycle, shares advantages of both AMLI- and K-cycle. Namely, similar to K-cycle, N-cycle does not require the estimation of extreme eigenvalues while its computational cost is the same as AMLI-cycle. Theoretical analysis shows that N-cycle is essentially a special AMLI-cycle and, thus, it can be shown to be uniformly convergent under standard assumptions. Additionally, since N-cycle is derived from the point of view of optimization, it has the potential to be generalized to other types of problems rather than SPD problems. Finally, we present numerical experiments to demonstrate the efficiency of N-cycle. The results show that N-cycle is better than standard AMLI-cycle, K-cycle, and even two-grid cycle for some test problems.

16:30-18:35 Session 7A: Machine Learning and Iterative Methods. Bighorn B
16:30
Efficient Training of Neural Network Surrogate Methods with Variable Projection

ABSTRACT. There has been growing interest in training deep neural networks (DNNs) as surrogates for partial differential equations (PDEs) with the goal of accelerating computations. Developing these DNN surrogates in the supervised setting (i.e., from a set of known input-output pairs) can be posed as a separable, nonlinear least-squares problem aimed at tuning the network weights. In these physically-motivated applications, it is essential to solve the optimization problem to a high accuracy to ensure that the DNN surrogate is reliable. However, it is difficult to train DNNs to the necessary level of accuracy with commonly-used stochastic optimization schemes.

In this talk, we will exploit the separability of the least-squares problem and apply the method of variable projection (VarPro) to obtain a reduced optimization problem that can solved accurately and efficiently through deterministic iterative methods (e.g., quasi-Newton). In addition, we will show that the overhead of training DNNs with VarPro is minimal (e.g., independent of the DNN architecture). Through numerical experiments of two PDE parameter estimation problems, we will demonstrate that our VarPro scheme outperforms popular stochastic schemes (e.g., ADAM) in terms of accuracy with comparable computational complexity while offering more potential for parallel implementations.

16:55
Multi-tasking deep learning models for predictions of total energy of solid solution alloys

ABSTRACT. In this work we present a deep learning approach to produce highly accurate predictions of macroscopic physical properties of solid crystals. Since the distribution function of the total energy would enable a thorough understanding of the macroscopic properties for alloys and magnetic systems, surrogate deep learning models can replace first principle calculations to speed up each sample from the total energy distribution function. However, neural networks lose accuracy with respect to first principle calculations, affecting the reliability of the estimate. In this paper we focus on improving the accuracy of neural network models that are used to predict the total energy distribution function. The accuracy is improved by reducing the overfitting with multi-tasking neural networks that perform a joint training on multiple physical properties. The strong correlation between the physical quantities is exploited so that they mutually serve as constraints for the training of the deep learning model. This leads to more reliable models because the physics is also learned correctly. Results show that multitasking neural networks estimate the material properties for a specific state space hundreds of times faster than the first principle codes used as a reference. Moreover, the joint training exploits the correlations between the quantities to improve the reliability of the prediction with respect to the neural networks separately trained on individual quantities.

17:20
LeanConvNets: Low-cost Yet Effective Convolutional Neural Networks

ABSTRACT. Convolutional Neural Networks (CNNs) have become indispensable for solving machine learning tasks in speech recognition, computer vision, and other areas that involve high-dimensional data. A CNN filters the input feature using a network containing spatial convolution operators with compactly supported stencils. In practice, the input data and the hidden features consist of a large number of channels, which in most CNNs are fully coupled by the convolution operators. This coupling leads to immense computational cost in the training and prediction phase. In this paper, we introduce LeanConvNets that are derived by sparsifying fully-coupled operators in existing CNNs. Our goal is to improve the efficiency of CNNs by reducing the number of weights, floating point operations and latency times, with minimal loss of accuracy. Our lean convolution operators involve tuning parameters that controls the trade-off between the network's accuracy and computational costs. These convolutions can be used in a wide range of existing networks, and we exemplify their use in residual networks (ResNets). Using a range of benchmark problems from image classification and semantic segmentation, we demonstrate that the resulting LeanConvNet's accuracy is close to state-of-the-art networks while being computationally less expensive. In our tests, the lean versions of ResNet in most cases outperform comparable reduced architectures such as MobileNets and ShuffleNets.

17:45
GMLS-Nets: a convolutional neural network architecture for unstructured point-cloud data

ABSTRACT. Generalized moving least squares (GMLS) provides a framework for approximating functionals of functions sampled at scattered data points. As a numerical method, it offers a flexible strategy for integrating PDEs compared to mesh based approaches. In contrast, convolutional neural networks (CNNs) produce excellent results for many machine learning tasks, but are restricted to data on a Cartesian mesh, e.g. images. Regression or classification tasks on less structured datasets may benefit from architectures with similar behaviors as CNNs. In this work, we apply principles developed in GMLS to generalize CNNs to point-cloud data.

A convolution layer can be viewed as the composition of a point-wise applied activation function and a local, translation equivariant linear operator. For unstructured data sampled on a point cloud, GMLS can provides approximations to such operators. Similar to the parameterized stencils in standard convolutional layers, we parameterize the GMLS approximation of the operator and introduce GMLS-Nets. Using this architecture, we reproduce standard CNN results for the MNIST benchmark, we perform PDE system identification, and we learn regression models for quantities of interest from fluid simulations on a block structured mesh. Additionally, we compare the training costs of various gradient based optimization techniques.

18:10
Deep Learning Enriched with Fractional Operators

ABSTRACT. We present a rigorous mathematical framework of two Deep Neural Networks (DNNs). The first DNN is based on two-level optimization and is called Bilevel Optimization NN (BONNet). We employ BONNet to variational models in imaging science to learn the regularization strength. In addition, we propose to use fractional Laplacian as a regularizer, and learn the fractional exponent via BONNet. We show an application of BONNet with fractional Laplacian regularization to Tomographic Reconstruction.

Secondly, we develop a fractional residual neural network (fRNN) which allows the network to admit memory across all the subsequent layers. This is accomplished via an optimal control formulation of a deep Residual NN with a fractional time derivative. We establish that keeping track of history in this manner alleviates the vanishing gradient problem, strengthens feature propagation, encourages feature reuse and reduces the number of unknown parameters. We show a successful application of fRNN on several datasets.

16:30-18:35 Session 7B: Mixed Precision. Bighorn C/1
Chair:
16:30
Newton's Method in Mixed Precision

ABSTRACT. We consider using reduced precision to solve the linear equation for the Newton step. If one neglects the backward error in the linear solve, then the standard convergence theory implies that using single precision in the linear solve has very little negative effect on the nonlinear convergence rate.

However, if on considers the effects of backward error, then the usual textbook estimates are very pessimistic and even the state-of-the-art estimates using probabilistic rounding analysis do not fully conform to experiments. We show via a specific example how pessimistic these estimates can be.

16:55
Low-precision orthogonalization in eigensolvers

ABSTRACT. Orthogonalization is a critical component in iterative solvers that build explicitly orthonormal bases, especially in the case of eigensolvers, as its cost eventually dominates as the number of computed eigenvectors increases. When low precision is possible, the general cost of the eigensolver can be reduced by using low precision arithmetic, up to two for single precision, and up to four for half-precision on recent GPUs. However, numerical errors reach faster the requested precision undermining the numerical stability of the method, and dwindling the practical speedups.

In this talk, we present the advances we made in the library PRIMME to improve the performance of the eigensolvers at low precision when computing thousands of eigenvectors, and to make PRIMME one of the first eigensolver libraries supporting half-precision. The implemented measures included carefully tracking the conditioning of the basis and its impact on the computed solutions, especially when using block orthogonalization with SVQB. We discuss the numerical behavior and the obtained time speedups.

17:20
Compressibility Constraints for Adaptive Rate ZFP Compression in Iterative Methods

ABSTRACT. The amount of data being generated and gathered in scientific simulations is continuously growing, putting mounting pressure on storage and bandwidth limitations. Data compression has been proposed as a means of reducing such issues. As lossless data compression is largely ineffective when applied to floating-point data, users typically apply a lossy data compressor, which allows for small deviations from the original data. One such algorithm is ZFP, which decomposes the data set into blocks which are then compressed and decompressed independently. If the blocks are smooth, i.e., variation is small, then ZFP can compress the block with a few bits, while highly oscillatory data will require significantly more bits to reach a desired accuracy. Within a time step of a simulation, similar to adaptive mesh refinement, we can dynamically update the number of bits per block depending on the desired accuracy, which we call adaptive rate ZFP, allowing more bits to be allocated to an area of interest. I will present theoretical concepts that will enable the adaptive rate implementation of ZFP to determine a priori the compressibility of the block as well as show how adaptive rate ZFP can be applied seamlessly into scientific simulations.

17:45
Progressive three-precision multigrid solvers

ABSTRACT. At last year's Copper Mountain Conference on Multigrid methods, we showed that multigrid methods are adversely affected by ill-conditioning due to accumulation of roundoff errors. To alleviate this difficulty, we presented a theoretical analysis of a mixed precision scheme based on iterative refinement (IR). The focus of the present talk is an extension of this work to leverage three different precisions. The lowest precision is used for V-cycles, while the medium precision is used for the update step in the IR algorithm, and the highest precision is only used for the IR residual computation. Additionally, we combine this idea with the notion of "progressive precision" whereby each of the three precisions grows with each finer level of the multigrid hierarchy. Based on a fundamental observation that simply rounding an exact result to finite precision causes an error in the energy norm that is proportional to the square root of the associated matrix condition number, we show that the resulting multigrid method achieves optimal accuracy.

18:10
Error Analysis of Mixed Precision Algorithms for Computing Matrix Interpolative Decompositions

ABSTRACT. Renewed interest in mixed-precision algorithms has emerged due to the advancement of GPU’s, which enable significant speedup for low precision arithmetic, as well as growing data capacity and bandwidth concerns. In light of this, we propose a novel mixed-precision algorithm for matrix column subset selection (CSS), wherein we select a subset of columns in low precision to generate a double-precision accurate matrix interpolative decomposition (ID). Though low precision arithmetic suffers from quicker accumulation of round-off error, for many data-rich applications we nevertheless attain viable approximation accuracies, as the error incurred using low precision arithmetic is dominated by the error inherent to low-rank approximation. To support this claim, we present deterministic error analysis to provide error estimates which help ensure accurate matrix approximations to the double precision solution. We then conduct several simulated numerical tests to demonstrate the efficacy of the algorithms and the corresponding error estimates.

16:30-18:35 Session 7C: Domain Decomposition. Bighorn C/2
16:30
Additive Schwarz Preconditioners for a Localized Orthogonal Decomposition Method

ABSTRACT. The solution of elliptic Partial Differential Equations (PDEs) with multiscale diffusion coefficients using regular Finite Element methods (FEM) typically requires a very fine mesh to resolve the small scales, which might be unfeasible. The use of generalized finite elements such as in the method of Localized Orthogonal Decomposition (LOD) requires a coarser mesh to obtain an approximation of the solution with similar accuracy. We present a solver for multiscale elliptic PDEs based on a variant of the LOD method. The resulting multiscale linear system is solved by using a two-level additive Schwarz preconditioner. We provide an analysis of the condition number of the preconditioned system as well as the numerical results which validate our theoretical results.

16:55
FROSch - A framework for parallel Schwarz preconditioners in Trilinos

ABSTRACT. FROSch (Fast and Robust Overlapping Schwarz) is a framework for parallel Schwarz domain decomposition preconditioners, which is part of the Trilinos package ShyLU. Although being a general framework for the construction and combination of Schwarz operators, FROSch currently focusses on preconditioners that are algebraic in the sense that they can be constructed from a fully assembled, parallel distributed matrix.

Typically, the computation of the first level only requires the construction of overlapping subdomains, which can be carried out based on the sparsity pattern of the matrix. Many coarse spaces, however, rely on geometric information, such as an additional coarse triangulation or Neumann subdomain matrices, which cannot be constructed or recovered from the fully assembled matrix. However, it is possible to construct robust and scalable coarse spaces from a partition of unity on the domain decomposition interface and energy-minimizing extensions, without the need for additional information. GDSW (Generalized Dryja-Smith-Widlund) type coarse spaces belong to this class of coarse spaces, and FROSch contains parallel implementations of several variants of GDSW type coarse spaces.

Furthermore, FROSch supports the Kokkos programming model through the use of the Tpetra stack in Trilinos. The FROSch framework can therefore benefit from the efforts of the Kokkos package to obtain performance portability by template meta-programming, on modern hybrid architectures with accelerators.

In this talk, the construction of algebraic and scalable Schwarz preconditioners will be described and numerical results for the application of FROSch preconditioners to several model problems, such as scalar elliptic, elasticity, and fluid flow problems, will be presented. In the computations, different variants of GDSW coarse spaces are used; for instance, a variant for saddle-point problems is used to precondition the fluid flow problems, and a multi-level variant is used in scalability results for more than 100.000 MPI-ranks for a scalar elliptic model problem.

Acknowledgements: Sandia National Laboratories is a multimission laboratory managed and operated by National Technology and Engineering Solutions of Sandia, LLC., a wholly owned subsidiary of Honeywell International, Inc., for the U.S. Department of Energy€™'s National Nuclear Security Administration under contract DE-NA-0003525.

17:20
On appropriate coarse spaces for asynchronous Optimized Schwarz Method

ABSTRACT. Asynchronous iterative methods are currently undergoing a resurgence in popularity, due in part to the dramatic increase of parallelism in modern computers. The good performance of asynchronous methods is due to non synchronization of the computational tasks, and hence the minimization of idle time. i.e., time for which some processors are inactive but can be used. The effect of data exchange in the asynchronous method is then less pronounced. We explore such asynchronous iterations for the Optimized Schwarz method (OSM). The latter is a class of Schwarz methods that uses optimized transmission conditions on the overlap to update its current data, and is in general shown to be computationally very effective for many engineering problems. Moreover, to speed-up the convergence of OSM, we add an additional coarse space correction. We explain in this talk how to design appropriate coarse spaces for OSM in the synchronous and asynchronous frameworks. We test these coarse spaces numerically for different problems.

17:45
A new implicit Eulerian solver for semiconductor models

ABSTRACT. I will present a new nonlinear iterative strategy for solving semiconductor models in the Eulerian framework. We use a kinetic equation model, which can have nonlocal operators, high dimensionality, and multiple scales in the particle collision frequency. Stiffness of the problem may necessitate the use of implicit timestepping methods, and it is important to develop efficient nonlinear solvers to address this.

The new solver poses the problem as a fixed point problem on a lower dimensional space using a domain decomposition approach. We take advantage of the reduced dimensionality to store more solution vectors, and are thus able to apply Anderson Acceleration to speed up convergence. We also accelerate convergence by preconditioning the system with the drift-diffusion limiting equations when collisions dominate.