View: session overviewtalk overview

08:00 | On inner iterations of Jacobi-Davidson type methods for large SVD computations SPEAKER: Zhongxiao Jia ABSTRACT. We make a convergence analysis of the standard, harmonic, refined and refined harmonic extraction versions of Jacobi-Davidson SVD (JDSVD) type methods for computing an interior singular triplet of a large matrix $A$. At every iteration of these methods, a correction equation, i.e., inner linear system, is solved approximately by using iterative methods, which leads to four inexact JDSVD type methods, as opposed to the exact methods where inner linear systems are solved exactly. Accuracy of inner iterations critically affects the convergence and overall efficiency of the inexact JDSVD methods. A central problem is how accurately the correction equations should be solved so as to ensure that each of the inexact JDSVD methods can mimic its exact counterpart well, that is, they use almost the same outer iterations to achieve the convergence. In this paper, we prove that each inexact JDSVD method behaves like its exact counterpart if all the inner linear systems are solved with low or modest accuracy during outer iterations. Based on the theory, we propose practical stopping criteria for inner iterations. Numerical experiments confirm our theory. |

08:25 | Polynomial Preconditioned Arnoldi for Eigenvalues SPEAKER: Ron Morgan ABSTRACT. Polynomial preconditioning of large eigenvalue problems has been discussed for many years, but has not become standard. Here we give a simple choice for the polynomial to use with the Arnoldi method. It is shown that this approach can significantly improve the efficiency for difficult problems. Also, there is an especially large reduction in dot products which has potential for communication-avoiding methods. We discuss ways to keep the method stable even for fairly high-degree polynomials and ways to make the method black-box. Finally, double-secret polynomial preconditioning can give even more dot product reduction. |

08:50 | Combining Refined and Standard/Harmonic Rayleigh-Ritz for Interior Hermitian Eigenvalue Problems SPEAKER: Eloy Romero Alcalde ABSTRACT. Large-size interior eigenvalue problems are difficult to solve partly because Rayleigh-Ritz, the standard approach to compute the approximate eigenpairs in the search subspace, may perform poorly. This results in slower convergence of restarted methods, and especially of Davidson type methods, where the extraction drives the expansion of the search subspace. Refined and harmonic Rayleigh-Ritz are the usual alternatives. Both methods are efficient in parallel, although each has issues when eigenvalues lie symmetrically around the target or are very close to the target, respectively. To fix these issues, methods like Jia's refined-harmonic propose a harmonic procedure followed by a refined post-processing step that uses a harmonic Ritz value as an updated shift. This is an effective procedure if applied to Krylov methods, but for Davidson type methods the update of the shift causes an orthonormalization of a basis as large as the search subspace at every iteration and for each different target Ritz vector (if a block method is used). This is expensive both in flops but also in increased parallel synchronization. In the context of Hermitian problems, we propose to use a fixed shift refined extraction as a pre-processing step instead. The Rayleigh-Ritz or the harmonic procedure is applied only on the subset of the search subspace where refined extraction may have issues. We also discuss why these subsets are less problematic for Rayleigh-Ritz or harmonic procedure. The resulting method has almost no impact on the parallel performance, and we show experimentally that it is competitive to other extraction methods. |

08:00 | Interconnected hierarchical rank-structured methods for directly solving and preconditioning the Helmholtz equation SPEAKER: Xiao Liu ABSTRACT. We study direct and iterative solution methods for the Helmholtz equation. For the low-frequency case, we propose a rank-structured direct method that has two hierarchical layers: the hierarchical domain partitioning for reducing a large problem into smaller sub-problems on the boundaries, and the hierarchically semiseparable (HSS) representation for approximating the sub-problems with nested low-rank representations. By exploiting the interconnection between the two hierarchical structures, the time to generate low-rank compression is minimized. In contract, previous structured direct solvers usually require expensive compression operations which contribute to the dominant cost. For the high-frequency case, we propose to use the rank-structured approach in the framework of shifted Laplacian preconditioners. We give estimates on how the magnitude of damping affects the HSS compressibility and the condition number after preconditioning. |

08:25 | Computing the Eigenvalues of the Dirichlet-to-Neumann Map for Indefinite Helmholtz Equation SPEAKER: unknown ABSTRACT. We introduce an efficient method for computing the eigenvalues of the Dirichlet-to-Neumann (DtN) map associated with the Helmholtz equation. Potential applications are found in the computation of refraction indices of obstacles in inverse scattering. We use periodic extensions and restrictions to subspaces to solve the constant coefficient Helmholtz equation efficiently. The proposed Fourier method, combined with an eigensolver, results in an efficient and clear approach for computing the eigenvalues of DtN map. We further propose an auxiliary space method which extends this technique to unstructured grids. |

08:50 | Pollution and convergence of numerical solutions of the Helmholtz equation SPEAKER: Cornelis Vuik ABSTRACT. In the 20 years of research on the Helmholtz problem, the focus has either been on the accuracy of the numerical solution (pollution) or the acceleration of the convergence of a Krylov-based solver (scalability). While it is widely recognized that the convergence properties can be investigated by studying the eigenvalues, information from the eigenvalues is not used in studying the pollution error. Our aim is to bring the topics of accuracy and scalability together; instead of approaching the pollution error in the conventional sense of being the result of a discrepancy between the exact and numerical wave number, we show that the pollution error can also be decomposed in terms of the eigenvalues. This approach reveals that the error is mostly driven by differences in the location and value of the smallest analytical and numerical eigenvalue in terms of magnitude. It also connects the notion of the smallest eigenvalue determining the convergence rate to the study of the accuracy of the numerical solution. Using these novel insights, we construct sharper lower and upper bounds for the pollution error independent of the grid resolution. While the pollution error can be minimized in one-dimension by introducing a dispersion correction, the latter is not possible in higher dimensions. We show that using a correction to the coefficient matrix based on the smallest eigenvalue, we are able to mitigate the pollution error, both in one- and two-dimensions while still using the second-order finite difference scheme. The results indicate that using our proposed correction leads to more accurate results compared to implementing the explicit dispersion correction between the exact and numerical wave number. |

09:15 | Iterative methods using deflation for the Helmholtz equation SPEAKER: Vandana Dwarka ABSTRACT. Recent research efforts aimed at iteratively solving the Helmholtz equation has focused on incorporating deflation techniques for GMRES-convergence accelerating purposes. The requisite for these efforts lies in the fact that the widely used and well acknowledged Complex Shifted Laplacian Preconditioner (CSLP) shifts the eigenvalues of the preconditioned system towards the origin as the wave number increases. The two-level-deflation preconditioner combined with CSLP (ADEF) showed encouraging results in moderating the rate at which the eigenvalues approach the origin. However, for large wave numbers the initial problem resurfaces and the near-zero eigenvalues reappear. Our findings reveal that the reappearance of these near-zero eigenvalues occurs if the near-singular eigenmodes of the fine-grid operator and the coarse-grid operator are not properly aligned. This misalignment is caused by accumulating approximation errors during the inter-grid transfer operations. We propose the use of higher-order approximation schemes to construct the deflation vectors. The results from Rigorous Fourier Analysis (RFA) and numerical experiments confirm that our newly proposed scheme outperforms any deflation-based preconditioner for the Helmholtz problem. In particular, the spectrum of the adjusted preconditioned operator stays fixed near one. For the first time, the convergence properties for very large wavenumbers (k = 10^6 in one-dimension and k = 1500 in two-dimensions) have been studied, and the convergence is close to wave number independence. The new scheme additionally shows very promising results for the more challenging Marmousi problem. |

08:00 | Spectral Computed Tomography with Linearization and Preconditioning SPEAKER: unknown ABSTRACT. In the area of image sciences, the emergence of spectral computed tomography (CT) detectors highlights the concept of quantitative imaging, in which not only the reconstructed images are oﬀered, but also the weights of diﬀerent materials that compose the object are provided. In this paper, we present a linearization technique to transform the nonlinear matrix equation that models spectral computed tomography into an optimization problem that is based on a weighted least squares term and nonnegative bound constraints. To solve this optimization problem, we propose a new preconditioner that can reduce the condition number significantly, and with this preconditioner, we implement a highly eﬃcient ﬁrst order method, Fast Iterative Shrinkage-Thresholding Algorithm (FISTA), to achieve remarkable improvements on convergence speed and image quality. |

08:25 | Tomographic reconstruction with simultaneous error correction SPEAKER: Zichao Di ABSTRACT. The advanced imaging technique, which leads to unquestionable improved clarity of sample structure, however, is highly vulnerable to systematic and random errors. These errors are fundamental obstacles to a full realization of the next-generation photon science since they can lead to reduced spatial resolution and even misinterpretation of the underlying sample structures. In this work, we target on the most common experimental error which is the center of rotation shift, we formulate new physical models to capture its experimental setups, then we devise new mathematical optimization formulations for robust inversion of complex sample. The numerical results are demonstrated on both synthetic and real data. |

08:50 | Randomization for Efficient Reduced Order Models in Diffuse Optical Tomography SPEAKER: unknown ABSTRACT. Nonlinear parametric inverse problems appear in many applications. In this work, we focus on one such application, namely diffuse optical tomography (DOT) in medical image reconstruction. In DOT, we aim to recover an unknown image of interest, such as cancerous tissue in a given medium, using a mathematical (forward) model. The forward model in DOT is a diffusion model for the photon flux. The main computational bottleneck in such inverse problems are the repeated evaluations of a large-scale forward model. For DOT, this corresponds to solving large linear systems for each source and frequency at each optimization step. In addition, as Newton methods are very effective for these problems, we need to solve linear systems with the adjoint for each detector and frequency at each optimization step to efficiently compute derivative information. As rapid advances in technology allow for large numbers of sources and detectors, these problems become computationally prohibitively expensive. In the past, the use of reduced order models (ROM) has been proposed to drastically reduce the size of the linear systems solved in each optimization step, while still solving the inverse problem accurately. However, as the number of sources and detectors increases, even the construction of the ROM bases still incurs a substantial cost in the offline stage. Interpolatory model reduction requires the solution of large linear systems for all sources and frequencies as well as for all detectors and frequencies for each chosen interpolation point, followed by an expensive rank-revealing factorization to reduce the dimension. Using randomization, we propose to drastically reduce the number of large linear solves for constructing the global ROM basis for DOT. The ideas presented in this paper are relevant to many other large scale applications as well. |

09:15 | Block-Newton Iterative Solvers for Joint Inverse Tumor Growth and Image Registration SPEAKER: Klaudius Scheufele ABSTRACT. We present two block-newton-type iterative solvers in SIBIA (Scalable Integrated Biophysics-based Image Analysis), a framework for the iterative coupling of image registration and biophysical brain tumor growth inversion. Our objective is to identify tumor growth parameters and initial (time t=0) tumor concentrations (point sources/ tumor seeds) from given segmented MR images of a patient brain showing a pathology. We consider gliomas, primary brain tumors, in particular glioblastoma, one of the most common malignant brain tumor types. The tumor growth simulation requires knowledge of the healthy brain geometry determining the local growth conditions. For clinical cases, however, MR images of the healthy patient are usually not available. Therefore, the inverse tumor solver is combined with medical image registration mapping from a statistical standard brain to the patient brain. The combination of these two components can be formulated as a constrained non-linear optimization problem. Our contribution is the derivation of two different respective optimization formulations along with the derivation and implementation of two Picard iteration schemes re-using existing software components for forward and inverse tumor growth simulation, and image registration and transformation (block-newton-type iterative solvers). The biophysical tumor model is a relatively simple diffusion-reaction model formulated as a partial differential equation (PDE). For image registration, we use large-scale diffeomorphic image registration parameterized by an Eulerian velocity field: the forward image transformation is an advection equation with a given (constant in time) transport velocity. Both components, their internal solver approach, the used algorithms, and their highly scalable parallel distributed memory implementation have been presented in [1]. We present two approaches tightly integrating these two components: The first approach is based on the following global optimization problem: Given the segmentation of a normal brain MRI (atlas) and the segmentation of a cancer patient MRI, we wish to determine tumor growth parameters and an image advection velocity such that if we grow a tumor (using our tumor model) in the atlas and then transform the resulting atlas brain with tumor to the patient segmented image with the advection velocity, then the mismatch between the patient image and the tranformed atlas with tumor is minimal. The iterative coupling of image registration and tumor inversion is achieved in a Picard iteration executing the following steps: 1) grow a tumor in the atlas with a given initial tumor concentration and given tumor growth parameters; 2) register the resulting atlas image with tumor to the patient image; 3) advect the patient image with the computed registration velocity; 4) run the inverse tumor solver using the advected patient pathology as input data. In [2], we introduce the underlying PDE-constrained optimization formulation of the coupled optimization problem, derive the optimality conditions, and the iterative Picard scheme together with several tests demonstrating the performance of our method on synthetic and clinical datasets. The complete coupled solver can be executed in only a few minutes using 11 dual-x86 nodes for an image resolution of 256x256x256. We also demonstrate that we can successfully register normal MRI to tumor-bearing MRI (normal-to-abnormal registration) while obtaining Dice coefficients that match those achieved when registering of normal-to-normal MRI. The drawback of this approach, however ist two fold: 1) careful calibration of solver-internal parameters over the Picard iterations is required to keep the image registration from matching an arbitrarily wrong tumor in the atlas to the observed patient tumor, i.e., doing the job of the tumor solver; 2) if we can identify tumor growth parameters in spite of 1), they are valid only in the atlas while we are interested in the patient-specific values to be able to predict future tumor growth. Therefore, we introduce a second formulation in this presentation that throughout the optimization process gradually translates the healthy atlas brain used for tumor growth simulation towards the patient brain, i.e., in every Picard iteration, we improve a deformation map that registers the healthy atlas brain to the patient brain, resulting in a gradually improving approximation of the unknown healthy patient brain. We call this the moving atlas formulation in contrast to the moving patient formulation mentioned above. The moving patient formulation that only changes the input data to the tumor inversion (patient image advected towards the atlas) in every iteration. The new moving atlas formulation keeps the target data (patient image) constant over all Picard iterations, but changes the spatial distribution of the tumor growth coefficient values. It resolves the two drawbacks we identified for the moving patient scheme: the registration does no longer work explicitly on the pathology but only on the brain tissue labels (meaning the tumor inversion alone controls the tumor misfit); the final model parameters we invert for are valid in the patient space and can directly be used for prediction. Since the healthy brain geometry used for brain tumor growth, and, thus the diffusion and reaction coefficients change gradually, the new formulation and solution scheme get more involved, having to deal with derivatives of these coefficients. As a consequence, both tumor solver and image registration need to be opened up, modified and tightly integrated within the new Picard iteration. However, also the moving patient scheme requires some modifications of the combined components to ensure that the fixed-point is the solution oft the global optimization problem. In the presentation, we show these modifications for both schemes along with extensive numerical studies showing the applicability, robustness and accuracy of the Picard schemes for various synthetic and clinical test cases. In addition, we present strategies to adjust parameters such as regularization coefficients of the two coupled problems throughout the Picard iterations in order to achieve optimal convergence speed and accuracy and prevent the optimization from getting stuck in local minima. [1] Gholami, Amir; Mang, Andreas; Scheufele, Klaudius; Davatzikos, Christos; Mehl, Miriam; Biros, George: A Framework for Scalable Biophysics-based Image Analysis. In: IEEE/ACM Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC17, 2017. [2] Scheufele, Klaudius; Mang, Andreas; Gholami, Amir; Davatzikos, Christos; Biros, George; Mehl, Miriam: Coupling Brain-Tumor Biophysical Models and Diffeomorphic Image Registration. In: Medical Image Analysis Journal, Elsevier, submitted. |

09:40 | Tensor Dictionaries for Image Compression and Deblurring SPEAKER: Misha Kilmer ABSTRACT. In recent work (Soltani, Kilmer, Hansen, BIT 2016), an algorithm for non-negative tensor patch dictionary learning in the context of Xray CT imaging, based on a tensor-tensor product called the t-product (Kilmer and Martin, 2011), was presented. In that paper, the dictionary learned from a single high-resolution X-ray CT image was used to constrain the image reconstruction for X-ray CT problems. In this talk, we show that non-negative tensor patch dictionaries can also be successfully trained on other data, such as facial image data, and utilized in other applications such as image compression and image deblurring. We employ a variant of the Modified Residual Norm Steepest Descent (MRNSD) algorithm to find the non-negative tensor coefficients that we need in both applications. The corresponding algorithm for deblurring images has the power to recover edge information. It also has a computational advantage over typical edge-preserving regularized deblurring schemes in that no additional regularization or regularization parameter selection choice is required. We demonstrate the efficacy and efficiency of our new approach on several examples. In particular, we illustrate that the dictionary training and test data need not be of the same class for our approach to work well. |

10:25 | Direct, Stochastic Analogues to Deterministic Optimization Methods using Statistical Filters SPEAKER: Vivak Patel ABSTRACT. Stochastic optimization---those problems that involve random variables---is a fundamental challenge in many disciplines. Unfortunately, current solvers for stochastic optimization restrictively require finiteness by either replacing the original problem with a sample average surrogate, or by having complete knowledge of a finite population. To help alleviate this restriction, we state a general, novel framework that generates practical, robust numerical methods to solve the actual stochastic optimization problem iteratively. Our key insight is to treat the objective and its gradient as a sequential estimation problem that is solved by integrating statistical filters and deterministic optimizers. We demonstrate the framework by generating a Kalman Filtering-based gradient descent method with line search or trust region to solve a challenging stochastic optimization problem in statistics. |

10:50 | EFFICIENT POD-BASED DEFLATION METHODS FOR THE SOLUTION OF LARGE AND ILL-CONDITIONED LINEAR SYSTEMS SPEAKER: unknown ABSTRACT. We develop and explore the possibilities of combining POD and deflation techniques for the acceleration of the solution of large and ill-conditioned linear systems. We propose obtaining information of the system with a POD procedure and a latter utilization of this information for a defation procedure. The convergence properties of the resulting POD-based deflation method are studied for an incompressible single and two-phase flow problem in heterogeneous porous media, presenting a contrast in permeability coefficients of up to O(10^7). We compare the number of iterations required to solve the above-mentioned problem using the Conjugate Gradient method preconditioned with Incomplete Cholesky (ICCG), against the deflated version of the same method (DICCG). The efficiency of the method is illustrated with the SPE 10 benchmark, our largest test case, containing O(10^6) cells. For this problem, the DICCG method requires only two iterations to achieve convergence in the single-phase case and around 20% of the number of ICCG iterations for the two-phase case. |

11:15 | Collocation Methods for Exploring Perturbations in Linear Stability SPEAKER: David Silvester ABSTRACT. Eigenvalue analysis is a well-established tool for stability analysis of dynamical systems. However, there are situations where eigenvalues miss some important features of physical models. For example, in models of incompressible fluid dynamics, there are examples where linear stability analysis predicts stability but transient simulations exhibit significant growth of infinitesimal perturbations. This behavior can be predicted by pseudo-spectral analysis. In this study, we show that an approach similar to pseudo-spectral analysis can be performed inexpensively using stochastic collocation methods and the results can be used to provide quantitative information about instability. In addition, we demonstrate that the results of the perturbation analysis provide insight into the behavior of unsteady flow simulations. |

11:40 | Projected Nonlinear Least Squares for H2 Model Reduction SPEAKER: Caleb Magruder ABSTRACT. Optimal H2 model reduction seeks to build a low-dimensional reduced order model of a linear, time invariant (LTI) system that minimizes the mismatch of the transfer function of these two systems along the imaginary axis in the L2-norm. Here we propose a new approach to the H2 model reduction problem that only requires access to samples of the transfer function. Our approach projects the infinite dimensional optimization problem onto a finite dimensional subspace via the sampling operator that results in a weighted rational approximation problem. Rational approximation in a conventional pole-residue parameterization often possesses many spurious local minima; therefore, we introduce a new approach for applying Gauss-Newton that frequently avoids these spurious minima. Finally, we provide an iteration that improves this subspace using the previous, suboptimal solution to the weighted rational approximation problem. As the number of transfer function samples increases, the suboptimal solutions converge to the optimal H2 solution. Currently, the main technique for constructing an optimal reduced order model in the H2-norm is the Iterative Rational Krylov Algorithm (IRKA) which requires access to the full order system in state-space form. However, not all LTI systems possess a state-state form; two alternatives that avoid this limitation are transfer function IRKA (TF-IRKA) and a quadrature-based vector fitting approach (QuadVF). Our approach differs from these two methods because it does not require Hermite information, unlike TF-IRKA, and makes sparing use of transfer function evaluations, unlike QuadVF. Moreover, since this projection-based approach is based on numerical optimization techniques, our approach has the potential to be extended to new classes of transfer function realizations. |

12:05 | Multiscale methods to simulate time-domain geophysical electromagnetic responses SPEAKER: Luz Angelica Caudillo Mata ABSTRACT. Efficient and accurate simulation of transient electromagnetic responses of geologically-rich geophysical settings is crucial in a variety of scenarios, including mineral and petroleum exploration, water resource utilizations and geothermal power extractions. In this talk, we develop a multiscale method to compute such type of responses in a very effective manner. Geophysical time-varying electromagnetic simulations of highly heterogeneous media are computationally expensive. One major reason for this is the fact that very fine meshes are often required to accurately discretize the physical properties of the media, which may vary over a wide range of spatial scales and several orders of magnitude. Using very fine meshes for the discrete models lead to solve large systems of equations that are often difficult to deal with at each time step. To reduce the computational cost of the electromagnetic simulation, we develop a multiscale method for the quasi-static Maxwell’s equations in the time domain. Our method begins by locally computing multiscale basis functions at each time step, which incorporate the small-scale information contained in the physical properties of the media. Using a Galerkin proper orthogonal decomposition approach, the local basis functions are used to represent the solution on a coarse mesh. The governing equations are numerically integrated using an implicit time marching scheme. Our approach leads to a significant reduction in the size of the final system of equations to be solved and in the amount of computational time of the simulation, while accurately approximating the behavior of the fine-mesh solutions. We demonstrate the performance of our method in the context of a geophysical electromagnetic application. |

10:25 | An overview of the theory of convergence of asynchronous iterations SPEAKER: Daniel Szyld ABSTRACT. We present different results from the literature on parallel asynchronous iterations. These include the classical theorems by Chazan and Miranker, Betserkas and Tsisiklis, and El Tarazi. We relate them to one another, and to more current efforts, including our own work. |

10:50 | Asynchronous Domain Decomposition Solvers SPEAKER: Christian Glusa ABSTRACT. Parallel implementations of linear iterative solvers generally alternate between phases of data exchange and phases of local computation. Increasingly large problem sizes on more heterogeneous systems make load balancing and network layout very challenging tasks. In particular, global communication patterns such as inner products become increasingly limiting at scale. In this talk, we explore the use of asynchronous domain decomposition solvers based on one-sided MPI primitives. We will discuss practical issues encountered in the development of a scalable solver and show experimental results obtained on state-of-the-art supercomputer systems. Joint work with Erik Boman, Edmond Chow, Jose Garay, Mireille Haddad, Kathryn Lund, Sivasankaran Rajamanickam, Paritosh Ramanan, Daniel Szyld, Jordi Wolfson-Pou and Ichitaro Yamazaki. |

11:15 | An Asynchronous Multigrid Method SPEAKER: Jordi Wolfson-Pou ABSTRACT. We propose an asynchronous Gauss-Southwell smoother and an asynchronous implementation of an additive multigrid method. The smoother is designed to reduce communication compared to commonly used smoothers. Additive multigrid methods apply the smoother at all grid levels simultaneously. We consider additive methods that can work without CG/GMRES acceleration, so that the method can be entirely asynchronous. |

11:40 | Asynchronous Optimized Schwarz methods for Poisson's equation in Rectangular domains SPEAKER: unknown ABSTRACT. Asynchronous iterative algorithms are parallel iterative algorithms in which communications and iterations are not synchronized among processors. Thus, as soon as a processing unit finishes its own calculations, it starts the next cycle with the latest data received during a previous cycle, without waiting for any other processing unit. These algorithms increase the number of updates in some processors (with respect to the synchronous case) but suppress all the idle times. This usually results in a reduction of the (execution) time to achieve convergence [1]. Optimized Schwarz methods are Domain Decomposition methods in which the transmission conditions between subdomains are chosen in such a way that the convergence properties of the method are enhanced with respect to classical Schwarz Methods. In Optimized Schwarz, the choice of transmission conditions depends on the PDE to solve. These transmission conditions are optimized approximations of the optimal transmission conditions, which are obtained by approximating the global Steklov-Poincaré operator by local differential operators. There is more than one family of transmission conditions that can be used for a given PDE (e.g., OO0 and OO2 for the Poisson's equation), each of these families consisting of a particular approximation of the optimal transmission conditions [2]. Optimized Schwarz methods are fast methods in terms of iteration count and can be implemented asynchronously [3]. In this work, we analyze the convergence properties of an Asynchronous Optimized Schwarz method applied as a solver for Poisson's Equation in a bounded rectangular domain with Dirichlet (physical) boundary conditions. We use transmission conditions on the artificial boundaries of the family OO0, which are obtained by taking the zero-th order approximation of the Steklov-Poincaré operator, i.e., we approximate this operator by a constant. This leads to having Robin boundary conditions on the artificial boundaries, and such conditions contain a parameter alpha whose value is chosen to optimize the convergence properties of the method. In order to prove convergence for this asynchronous implementation of Optimized Schwarz, using functional analysis tools and PDEs theory, we write the local errors in terms of generalized Fourier series and recast the equations that result after applying Optimized Schwarz to the given problem into a (synchronous) fixed point iteration problem for the coefficients of the error series. The information of the PDE, physical boundary conditions and transmission conditions are encoded in the new (infinite-dimensional) iteration operator. Then, we prove that the iteration operator is contracting in a weighted max norm and that each local error series converges uniformly, which, as it is shown, give sufficient conditions for local errors of the asynchronous implementation to converge to zero as the number of updates goes to infinity. The weighted max norm of the iteration operator depends on the parameter alpha, the amount of overlap and the number of subdomains. We provide insights about how the convergence speed of the method is affected when we vary the values of these parameters. We also determine the value of alpha that minimizes such weighted max norm and consequently minimizes convergence bounds for both synchronous and asynchronous cases. To our knowledge, this is the first time that a convergence proof of Optimized Schwarz for a problem defined in a bounded domain with an arbitrary number of subdomains forming a 2D array and arbitrary overlap among subdomains is presented. The technique we developed here can also be used for other PDEs and in higher dimensions. References [1] Andreas Frommer, Daniel B. Szyld. On Asynchronous Iterations. Journal of Computational and Applied Mathematics, Volume 123, Pages 201-216, 2000. [2] Martin J. Gander. Optimized Schwarz methods. SIAM Journal on Numerical Analysis, Volume 44, Pages 699-731, 2006. [3] Frédéric Magoulès, Daniel B. Szyld, and Cèdric Venet. Asynchronous Optimized Schwarz Methods with and without Overlap. Numerische Mathematik, Volume 137, Pages 199-227, 2017. |

12:05 | Parallel asynchronous preconditioners for compressible flow problems SPEAKER: Aditya Kashi ABSTRACT. We present the application of asynchronous methods to the thread-parallel preconditioning of large sparse linear systems of equations arising in computational fluid dynamics (CFD). Asynchronous symmetric Gauss-Seidel (SGS) and the asynchronous incomplete LU (ILU) factorization preconditioners are considered. New asynchronous block variants of these schemes are also studied and tested. The most computationally intensive task in CFD solvers which use implicit temporal discretization is usually the solution of large systems of linear algebraic equations. With the introduction of specialized computing devices dedicated to 'massively parallel' computing, it has been found that many of the established techniques for implicit solution of CFD equations are not capable of exploiting the increased level of available parallelism. Such devices include graphics processing units (GPUs) and many-core devices. It is important for CFD software to take advantage of these highly parallel platforms because of their high throughput and energy efficiency in numerical computing. An important class of iterative solvers for linear systems of equations in CFD are Krylov subspace methods, such as the generalized minimum residual (GMRES) and stabilized biconjugate gradient (BiCGSTAB) methods. Preconditioning is crucial to the success of Krylov subspace methods. Stationary linear iterations such as symmetric Gauss-Seidel (SGS) and incomplete LU (ILU) factorization are popular as preconditioners. Many of the operations in Krylov solvers are inherently parallel, a notable exception being the preconditioning operation which tends to depend on traditional sequential methods. Efforts to parallelize preconditioners like SGS and ILU include the techniques of level-scheduling and multi-colour ordering. These methods usually face drawbacks of either exposing limited parallelism or increasing the number of solver iterations required for convergence. A class of parallel preconditioners called sparse approximate inverses have been studied but they have not found widespread application in CFD. Thus, there is a need to investigate linear solvers suited to massively parallel hardware, particularly for CFD problems where we encounter non-symmetric and indefinite matrices. We emphasize that we are interested in fine-grain parallelism within each node of a cluster, over and above existing parallelism across the nodes of the cluster. Asynchronous iterations were studied as early as 1969 under the name `chaotic relaxation', but were revisited recently for use in preconditioners. We investigate the suitability of these methods in solving compressible flow problems of interest in aerodynamics, which in our knowledge, has not been done yet. Further, we specialize the very general concept of asynchronous iteration to make it better suited to spatially discontinuous discretizations of the flow equations (such as finite volume and discontinuous Galerkin finite element methods). This is done by deriving asynchronous block preconditioners where the blocks are small and dense. The specific methods we target are asynchronous SGS sweeps as well as asynchronous ILU(0) factorization recently proposed by Chow and Patel. We also explore the block version of asynchronous SGS and propose a block version of the Chow-Patel ILU(0) scheme. These are `point-block' methods where the blocks correspond to coupling between the conserved flow variables at a given mesh entity (cell, in our case). Chow and Patel give a proof for the convergence of the asynchronous sweeps to the ILU factorization (in exact arithmetic). We extend this proof to the case of asynchronous block-ILU factorization. We apply these methods to some cases of external aerodynamics, including inviscid and viscous compressible flow. A second-order upwind finite volume discretization on unstructured meshes is used. The nonlinear system of equations is solved by implicit pseudo-time stepping and preconditioned Krylov subspace linear solvers. Asynchronous preconditioners require solvers which are flexible with regard to the preconditioner and we use restarted FGMRES. We demonstrate appreciable parallel speed-up from using multiple threads. It is also evident from the results that the block algorithms lead to much better performance for our application. |

10:25 | Iterative Multifrequency Corrected Implicit Monte Carlo SPEAKER: Mathew Cleveland ABSTRACT. In this work we explore an iterative, multifrequency, non-negative, non-linear, correction that eliminates artificial overheating that occurs in the original “Implicit Monte Carlo” (IMC) method presented by Fleck and Cummings. This work focuses on a better multifrequency extension of the original “Corrected Implicit Monte Carlo” (CIMC) method, recently published by Cleveland and Wollaber. The previous implementation of CIMC strictly adhered to maximum principle for frequency independent problems, but numerical evidence indicated that it could still cause artificial overheating in frequency dependent problems. In this work we present a new iterative multifrequency corrected IMC (IMC2) implementation that is non-negative and strictly adheres to the maximum principle. This research was performed by Los Alamos National Security, LLC, for the National Nuclear Security Administration of the U.S. Department of Energy under contract DE-AC52-06NA25396. |

10:50 | Analysis of the Application of Multiple Preconditioners for Discrete Ordinates Transport Iterations SPEAKER: Jim Warsa ABSTRACT. The topic being considered is the synthetic acceleration of the iterative solution of multigroup discrete ordinates, or SN, equations with upscatter. Synthetic acceleration is an additive correction to the current iterate, computed by solving a low-order operator equation with a source term that is the residual from the outer iteration. A certain acceleration method might be available that is very effective, but there may be a benefit in applying several other, less effective acceleration methods during a given iteration instead of the more effective one. This could be the case, for example, if the more effective method by itself is complex or costly to solve and the less effective ones can be solved efficiently. In that case, it might be that they can be applied together during a given iteration in such a way that it leads to better computational efficiency overall. Therefore, we look at applying a series of synthetic acceleration methods in an as-yet-to-be-determined sequence during fixed-point (source) iteration. It is neither immediately clear in what order the sequence of acceleration steps should be applied to be most effective, nor is it clear that there is even any benefit at all in applying multiple acceleration methods during an iteration. Furthermore, there is no obvious best way to select the residual source terms in each acceleration step. These questions will be answered through Fourier analysis in one-dimensional slab geometry. Three synthetic acceleration methods are considered, within-group diffusion synthetic acceleration (DSA), multigroup DSA (FULL-DSA), and linear multifrequency grey diffusion acceleration (LMFG). Results of the analysis are confirmed by numerical measurements of the spectral radius. The application of the multi-stage acceleration methods will then formulated for use with Krylov subspace iterative methods by collapsing the sequence of operations into the form of a (left) preconditioner. The effectiveness of the preconditioners in improving the convergence rate of Krylov iterative methods is demonstrated through numerical calculations in 2D Cartesian coordinates. Only the effectiveness of iterative acceleration is investigated here. The issue of computational efficiency is left for future work. Note that the low-order linear systems associated with the acceleration methods are typically solved iteratively. The impact of that on overall computational expense is also left for future work. |

11:15 | A Locally Scaled Least Squares Finite Element Method for the Transport Equation SPEAKER: Steffen Muenzenmaier ABSTRACT. This talk examines the first-order system least squares method for the transport equation. A general Boltzmann transport equation, which describes the transport of neutral particles through a material media, can be efficiently solved by the the Discrete Ordinates Method (DOM) and Diffusion Synthetic Acceleration (DSA). The dominant computational expense in such a method is performing "transport sweeps", which consist of solving an advection-reaction equation for many different angles. These transport sweeps can be solved for example by (lumped) corner balance, SUPG or (upwinded) discontinuous Galerkin methods. Another approach undertaken in the transport community is the so called self-adjoint form, which corresponds to a least-squares finite element method and will be examined in this talk. The focus of this talk is on problems where the total cross-sections may differ by several orders of magnitude across the spatial domain. When using least squares finite element methods, a naive approach can lead to unphysical behavior, such as unphysical damping, flux-dips, and cross-stream diffusion, requiring excessively fine grids to get a reasonable solution. The first part of this talk examines how these problems can be overcome by applying a proper local scaling to the problem. This can greatly increase the accuracy of the discrete solution without introducing any significant additional computational cost. In the second part of the talk, an efficient AMG-solver for the resulting linear equation systems is presented. The dependence on the scalings introduced earlier with respect to accuracy per cost is closely examined. |

11:40 | Improved Compton Scattering Methods for Implicit Monte Carlo (IMC) Radiative Transfer Simulations SPEAKER: Kendra Keady ABSTRACT. Compton scattering is a dominant source of energy exchange between radiation and matter in many problems of importance to the astrophysics and high-energy-density physics (HEDP) communities. Thus, it is important to accurately and efficiently model Compton scattering physics in the radiative transfer codes used in these fields. However, because the Compton Scattering Kernel (CSK) depends strongly on electron temperature, photon frequency, and scattering angle, and because Compton scattering increases coupling between radiation and matter, robust numerical schemes for Compton scattering are not straightforward to develop. In this work, we discuss ongoing research to characterize and improve the Compton implementation in the Jayenne IMC software package at Los Alamos National Laboratory. These efforts can be generally divided into three categories: (i) development of an application to generate accurate multi-group scattering matrices for a given frequency group structure, (ii) investigation of variance reduction methods for Monte Carlo sampling of the Compton process, and (iii) development of efficient and stable iterative schemes for IMC with Compton scattering. We have nearly completed work on the data library and variance reduction efforts, and have subsequently re-focused our research efforts on the iterative scheme. The existing iteration scheme in Jayenne treats the Compton scattering term explicitly (i.e. using the beginning-of-timestep temperature to evaluate the scattering kernel), with a continuous-in-frequency kinematic model for the scattering physics. There are a number of issues with this approach. First of all, the existing Compton scattering process involves sampling an electron from a relativistic Maxwellian distribution, and thus introduces a significant amount of statistical noise into the material energy deposition. Second, time-step restrictions for the explicit Compton treatment become onerous at the high densities and temperatures of interest to the HEDP community. Third, existing Compton time-step controllers are based on the Compton-Fokker-Planck approximation, which is invalid in the HEDP regime. In general, any in-depth investigation of an iterative method should include a convergence analysis of the system of equations for a problem in which the exact solution is known. Inclusion of Compton scattering in the IMC equations severely complicates the coupled radiation-matter system; consequently, this precludes a full Fourier stability analysis. Work is currently underway to create a surrogate model with similar iterative characteristics that is more amenable to this analysis. To do this, we plan to simplify the electron temperature dependence of the CSK using an analytic function. If this simplification is done carefully, we should be able to investigate the convergence properties of the coupled radiation-matter system without the additional complication of the full CSK expression. We then plan to modify this surrogate model to test alternate iteration schemes for IMC with Compton scattering. |

16:30 | An Order N log N Parallel Time-Spectral Solver for Periodic and Quasi-Periodic Problems SPEAKER: unknown ABSTRACT. The time-spectral method is a fast and efficient scheme for computing the solution to temporal periodic problems. Compared to traditional backward difference implicit time-stepping methods, time-spectral methods incur significant computational savings by using a temporal Fourier representation of the time discretization and solving the periodic problem directly. In the time-spectral discretization, all the time instances are fully coupled to each other, resulting in a dense temporal discretization matrix, the evaluation of which scales as O(N^{2}), where N denotes the number of time instances. However, by implementing the time-spectral method based on the fast Fourier transform (FFT) the computational cost can be reduced to O(NlogN). Furthermore, in parallel implementations, where each time instance is assigned to an individual processor,the wall-clock time necessary to converge time-spectral solutions is reduced to O(logN) using the FFT-based approach, as opposed to the O(N) weak scaling incurred by previous discrete Fourier transform (DFT) or time-domain matrix-based parallel time-spectral solver implementations. However, as the number of time instances or the reduced frequency of the problem becomes larger, the non-linear system associated with the time-spectral method becomes larger and/or stiffer to solve. Achieving superior efficiency with the time-spectral method based on the FFT requires a robust solver strategy that solves the large non-linear space-time system rapidly. For this goal, Approximate Factorization(AF) in space and time is used as a preconditioner for a Newton-Krylov method applied directly to the complete non-linear space-time time-spectral residual. The use of the Generalized Minimal Residual Method (GMRES) Krylov method enables additional coupling between the various time instances running in parallel on different processors, resulting in faster overall convergence. While the basic time-spectral scheme is only applicable to problems that are periodic in time, quasi-periodic problems containing a slow transient in addition to a strong periodic component can also be solved with a modified version of the time-spectral discretization known as BDFTS (BDF-Time-Spectral). Similar to previous work on this topic, the algorithm is based on a collocation method which makes use of a combination of spectral and polynomial basis functions in time. The BDFTS time-discretization can be implemented as a rank-one modification of the original time-spectral matrix and can thus be solved efficiently using the Sherman-Morrison formula. Time-parallel solutions of periodic and quasi-periodic computational fluid dynamics problems are solved using the proposed approach, showing large efficiency gains over traditional implementations of the time-spectral method. |

16:55 | Parallel-in-time multigrid with adaptive spatial coarsening for the linear advection and inviscid Burgers equations SPEAKER: Robert Falgout ABSTRACT. The focus of this talk is on the application of the multigrid reduction in time (MGRIT) method to 1D hyperbolic partial differential equations. In contrast to the parabolic case, the development of robust and effective parallel-in-time solution strategies for hyperbolic problems has proven to be difficult. We consider an MGRIT approach that coarsens in both time and space. In the case of explicit time-stepping, spatial coarsening is needed to ensure stability on all levels, but it is also useful for implicit time-stepping by producing cheaper multigrid cycles. Unfortunately, uniform spatial coarsening results in extremely slow convergence when the wave speed is near zero, even if only locally. We present an adaptive spatial coarsening strategy that addresses this issue for the variable coefficient linear advection equation and the inviscid Burgers equation using first-order explicit or implicit time-stepping methods. Numerical results show that this offers significant improvements over classical coarsening, and parallel scaling tests indicate that speedups over serial time-stepping strategies are possible. This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344. |

17:20 | Acceleration of Time Parareal Method for Simulation of Dynamics of Power Systems SPEAKER: Srdjan Simunovic ABSTRACT. Cascading outages of electrical power systems involve complex dynamic processes and long sequences of disturbances or equipment failures towards catastrophic power blackouts if not timely prevented or mitigated. Electric utilities often apply steady state, power-flow based tools to identify potential cascading outages in planning studies but those tools are not accurate for assessing angular, frequency and voltage stability of the system, which, however, is critical to the propagation of outages. Knowing the dynamics of the grid is important for real-time situational awareness and remedial control actions and can only be achieved by the explicit dynamic power system simulation. The computational model of dynamics power system is described by a set of differential algebraic equations (DAE). The power system dynamic simulations solve DAE systems and predict transient trajectories of power system following an initial disturbance and the events that may follow as a result of it. We have developed several computational algorithms based on time parallelization (Time Parareal method) and investigated their performance for realistic electrical grid problems. The performance of Time Parareal algorithms is determined by the number of sequential, coarse step iterations. A common tradeoff in designing an efficient Time Parareal algorithm is between accuracy of the coarse solver and the number of iterations. In this work, we describe two methods for acceleration of coarse method system simulations. One method is based on Krylov subspace enhanced Time Parareal algorithm, and the other uses adaptive model reduction of the power systems network. The proposed methods were demonstrated on electrical grid models of various sizes from a single-machine-infinite-bus system up to the Eastern Interconnect. Noticeable decrease of number of iterations and improvement in method performance was observed. Notice: This manuscript has been authored by UT-Battelle, LLC, under Contract No. DE-AC0500OR22725 with the U.S. Department of Energy. The United States Government retains and the publisher, by accepting the article for publication, acknowledges that the United States Government retains a non-exclusive, paid-up, irrevocable, world-wide license to publish or reproduce the published form of this manuscript, or allow others to do so, for the United States Government. |

17:45 | Convergence of the multigrid reduction in time algorithm for the linear elasticity equations SPEAKER: Andreas Hessenthaler ABSTRACT. In this talk, we will discuss the application of the multigrid reduction in time (MGRIT) algorithm [1] for an important application area, linear elasticity. Previously, efforts at parallel-in-time for elasticity have experienced difficulties, for example, the beating phenomenon. We explore some solutions made possible by MGRIT (e.g., slow temporal coarsening and FCF-relaxation) and more importantly, introduce a different formulation of the problem that is more amenable to parallel-in-time methods. Using a recently developed convergence theory for MGRIT and Parareal [2], we show that the changed formulation of the problem avoids the instability issues and allows reduction of the error using two temporal grids. We then extend our approach to the multilevel case (V-cycles and FMG-cycles), where we demonstrate how slow temporal coarsening improves convergence. We present supporting numerical results showing a practical algorithm enjoying speedup benefits over the sequential algorithm. REFERENCES [1] Falgout RD, Friedhoff S, Kolev TZ, MacLachlan SP, Schroder JB. Parallel time integration with multigrid. SIAM J Sci Comput 2014; 36(6):C635–C661. [2] Dobrev VA, Kolev TZ, Petersson NA, Schroder JB. Two-level convergence theory for multigrid reduction in time (MGRIT). SIAM J Sci Comput 2017; 39(5):S501–S527. [3] Hessenthaler A, Nordsletten D, Röhrle O, Schroder JB, Falgout RD. Convergence of the multigrid reduction in time algorithm for the linear elasticity equations, Numer Linear Algebra Appl (accepted). |

18:10 | Multigrid Reduction in Time (MGRIT) for Eddy Current Problems SPEAKER: Stephanie Friedhoff ABSTRACT. Maxwell's equations are an essential tool in the numerical simulation of problems in electrical engineering. A standard approach for the simulation of electrical machines is to neglect the displacement current in Maxwell's equations, yielding the so-called magnetoquasistatic approximation or, synonymously, the eddy current problem. Typically, solution algorithms for the time-dependent eddy current problem are based on a time-marching approach, solving sequentially for one time step after the other. The computational complexity of such approaches is high, particularly if long time periods have to be considered as, for example, in the case of simulating the start-up of an electrical machine. One approach for reducing the simulation time is with parallel-in-time integration techniques as shown in [1] for the Parareal algorithm [2]. In this talk, we consider Multigrid Reduction in Time [3] for the time-parallel solution of the eddy current problem. In particular, we present numerical results for a 2D model problem of a conducting wire surrounded by a pipe. References [1] S. Schoeps, I. Niyonzima, and M. Clemens, Parallel-in-Time Simulation of Eddy Current Problems using Parareal, 21st Conference on Computation of Electromagnetic Fields (COMPUMAG 2017), Daejeon, Korea, June 2017. [2] J.-L. Lions, Y. Maday, and G. Turinici, A "parareal" in time dis- cretization of PDEs, C. R. Acad. Sci. 332 (2001), pp. 661-668. [3] R. D. Falgout, S. Friedhoff, Tz. V. Kolev, S. P. MacLachlan, and J. B. Schroder, Parallel time integration with multigrid, SIAM J. Sci. Comput. 36 (6), pp. C635-C661. |

16:30 | The EVSL package for symmetric eigenvalue problems SPEAKER: Yousef Saad ABSTRACT. A number of applications require the computation of tens of thousands of eigenvalues of very large matrices. For these problems, it is imperative to take advantage of spectrum slicing strategies whereby different `slices' of the spectrum are extracted independently. The presentation will begin by describing a general approach for spectrum slicing based on polynomial filtering. This approach can be quite efficient in the situation where the matrix-vector product operation is inexpensive and when a large number of eigenvalues is sought. Polynomial filtering can be combined with the Lanczos algorithm with and without restarts, as well as with subspace iteration. An alternative to polynomial filtering that is generating a growing interest is a class of methods that exploit filtering by rational functions. Good representatives of this general approach are the FEAST eigensolver and the Sakurai-Sugiura algorithm. Here we will argue that the standard Cauchy integral--based approach can be substantially improved upon -- especially when iterative solvers are involved. These two classes of techniques have recently been implemented in a code named EVSL (for eigenvalues slicing library) and the presentation will end with the latest updates to the code and our progress with its (forthcoming) parallel version. |

16:55 | Floating Point Analysis for Half-Precision Eigensolvers SPEAKER: Geoffrey Sanders ABSTRACT. Cutting edge GPU hardware (e.g. NVIDIA Pascal architecture) is able to do exceptionally fast half-precision floating-point arithmetic: up to 4x the speed of double precision. We explore how these technologies can be leveraged for applications involving eigensolvers. Several applications of eigensolvers do not necessarily require double-precision accuracy in the eigenresidual or eigenvector orthogonality; these include spectral graph embedding for data analysis or graph partitioning, adaptive setup for algebraic multigrid, graph visualization, and warm starting challenging discrete optimization algorithms. Some aspects of this have been previously demonstrated with respect to single precision [1]. However, using low precision arithmetic within iterative eigensolvers (such as IRAM [2], LOBPCG [3], Block-Krylov Schur [4], etc) has stability issues. The out-of-box basic linear algebra kernels these algorithms employ, sparse matrix-vector multiplication and vector orthogonalization routines, have limited stability at low precision, and the problem sizes are fundamentally limited. We discuss the numerical analysis of these routines in the context of half precision, and propose changes: (i) optimizing summation ordering in matrix-vector multiplication and (ii) enforcing numerical linear dependence without orthogonality. We validate the stability of these alternative kernels and compare with the respective standards. Then we form simple eigensolver-like iterative methods that employ these kernels to generate vectors that approximately span the sought eigenspaces in half precision arithmetic. We demonstrate the performance of these methods as standalone techniques for data analysis tasks (e.g. spectral graph clustering) and as techniques for developing warm starts for single and double precision eigensolvers. We extrapolate our results to estimate the possible speedup of these approaches on modern GPUs employing half-precision floating point arithmetic. [1] David Zhuzhunashvili and Andrew Knyazev, (2017) "Preconditioned Spectral Clustering for Stochastic Block Partition Streaming Graph Challenge". arXiv:1708.07481 [2] R. B. Lehoucq & D. C. Sorensen (1996). "Deflation Techniques for an Implicitly Restarted Arnoldi Iteration". SIAM. [3] Knyazev, Andrew V. (2001). "Toward the Optimal Preconditioned Eigensolver: Locally Optimal Block Preconditioned Conjugate Gradient Method". SIAM Journal on Scientific Computing. [4] Zhou, Yunkai & Saad, Yousef (2008). "Block Krylov--Schur method for large symmetric eigenvalue problems". Numerical Algorithms. |

17:20 | Robust computation of the right-most eigenvalues SPEAKER: Fei Xue ABSTRACT. We propose a reliable numerical algorithm for computing the right-most eigenvalues of large sparse matrices, primarily motivated by the need for linear stability analysis. The key technique is to use the matrix exponential transformation $A \rightarrow exp(hA)$ with $h>0$ to transform the right-most eigenvalues of $A$ to the dominant eigenvalues of $exp(hA)$. The transformed eigenvalues can be computed in a robust manner as long as we can approximate the exponential matrix-vector multiplications $v \rightarrow exp(hA)v$ relatively accurately, no matter if $A$ is real or stable. We consider a new single-pole rational method for approximating $exp(hA)v$, and compare it with several other polynomial and rational methods. Comprehensive experiments show that the proposed strategy is more reliable and less sensitive to parameters than existing methods, and with a competitive method for approximating $exp(hA)v$, it is comparable in efficiency with the Lyapunov inverse iteration for stable matrices. |

17:45 | Unconstrained Functionals for Improved Parallel Scaling of Conjugate Gradient Based Eigensolvers SPEAKER: Andrew Canning ABSTRACT. A large number of scientific applications require the solution of large scale eigenvalue problems. For applications where some small percentage of the eigenpairs are required, rather than the full spectrum, iterative rather than direct eigensolvers are typically used. Often, the scaling of these types of iterative eigensolvers on modern massively parallel multi-core computers is limited by the reorthogonalization step, which typically uses direct diagonalization of the subspace matrix, Cholesky or QR decomposition. Unconstrained functionals are more complex than the standard functionals but avoid the reorthogonalization step so that the trial vectors are only orthogonal at the minimum. In this work, we compare the convergence and parallel scaling of the unconstrained functional formalism with that of existing approaches on realistic problems from materials science and chemistry. We present parallel scaling results on the Cray XC40 computer at NERSC, LBNL on large core counts. |

18:10 | Does machine learning need the power of iterative methods for the SVD? SPEAKER: Andreas Stathopoulos ABSTRACT. Machine learning has emerged as one of the primary clients for large scale singular value calculations. The matrices are typically very large, although nonzero sparsity may not be present always. In some cases, one or two largest or smallest singular triplets are needed, while in other cases, a very low rank approximation to the matrix is needed. What is special in these applications is that low accuracy (typically 1-2 relative digits) is sufficient. For such type of problems randomized SVD methods have risen drastically in popularity. Recently, deterministic but streaming methods, where the matrix is accessed in its entirety but only once, have also attracted attention. On the other hand, our recent work on Davidson methods for the SVD has pushed the envelope of what traditional iterative methods can do. In this talk we address the question of what problems are best suited for what type of methods, and where hybrid methods may be beneficial. |

16:30 | Augmented Lagrangian-based preconditioners for steady buoyancy driven flow SPEAKER: Guoyi Ke ABSTRACT. In this work, we apply the augmented Lagrangian (AL) approach to steady buoyancy driven flow problems. The governing equations are discretized by the stabilized finite element method using equal first order elements, and linearized by Picard iteration. Two AL preconditioners are developed based on the variable's order, specifically whether the leading variable is the velocity or the temperature variable. Correspondingly, two non-augmented Lagrangian (NAL) preconditioners are also considered for comparison. An eigenvalue analysis for these two pairs of preconditioners is conducted to predict the rate of convergence for the GMRES solver. The performance of all preconditioners is investigated on two-dimensional numerical examples. The numerical results show that the AL preconditioner pair is insensitive with respect to the mesh size, Rayleigh number, and Prandtl number in terms of GMRES iterations, resulting in a significantly more robust preconditioner pair compared to the NAL pair. Accordingly, the AL pair performs much better than the NAL pair in terms of computational time. For the AL pair, the preconditioner with velocity as the leading variable gives slightly better efficiency than the one with temperature as the leading variable. |

16:55 | Semi-smooth Newton methods for nonlinear complementarity formulation of compositional multiphase flow in porous media SPEAKER: unknown ABSTRACT. Simulating compositional multiphase flow in porous media is a challenging task, especially when phase transition is taken into account. The main problem with phase transition stems from the inconsistency of the primary variables such as phase pressure and phase saturation, i.e. they become ill-defined when a phase appears or disappears. Recently, a new approach for handling phase transition is developed by formulating the system as a nonlinear complementarity problem (NCP). Unlike the widely used primary variable switching (PVS) method which requires a drastic reduction of the time step size when a phase appears or disappears, this approach is more robust and allows for larger time steps. One way to solve the NCP system is to reformulate the inequality constraints as a non-smooth equation using a complementary function (C-function). Because of the non-smoothness of the constraint equations, a semi-smooth Newton method needs to be developed. In this work, we evaluate the performance of a semi-smooth Newton method for two C-functions: the minimum and the Fischer-Burmeister functions. We also propose a new semi-smooth Newton method which employs a smooth version of the Fischer-Burmeister function as the C-function. We show that the new method is robust and efficient for standard benchmark problems as well as realistic examples with highly heterogeneous media such as the SPE10 case. |

17:20 | Parallel Simulations of Impinging Jets in Crossflow Using PETSc SPEAKER: Nicholas Stegmeier ABSTRACT. Impinging jets in crossflow have a variety of industrial applications including gas turbines, heat dissipation, vertical take-off and landing aircraft, and many others. Direct Numerical Simulation (DNS) of turbulent impinging jets is difficult and requires high performance computing to obtain accurate solutions. We present DNS results of impinging jets from a parallel, implicit implementation of the Navier Stokes equations using the Portable, Extensible Toolkit for Scientific Computation (PETSc). Parallel decomposition is accomplished using PETSc’s Data Management for Distributed Arrays (DMDA) object. The parallel and numerical performance of the implementation is evaluated with strong scaling tests and convergence studies. |

17:45 | High Fidelity Simulations of Biofilm Flow on the Modified Cahn-Hilliard Equation. SPEAKER: Nathan McClanahan ABSTRACT. Biofilms are attached microbial communities made of many different components. Biofilms have both positive and negative effects. They can be used for bioremediation but also are the cause of the majority of chronic infections. It is for these reasons that we study them. Due to their composition being mostly water we chose to model them as a fluid. We used a visco-elastic phase separation model based on the Modified Cahn-Hilliard equation and the Flory-Huggins energy density. We will discuss the current implementation of efficient parallel simulation procedures based on parallel numerical algorithms and toolkits including PETSc (Portable, Extensible Toolkit for Scientific Computing). |