 
 
 
 
 
   
The time-dependent Schrödinger equation (TDSE) describes the quantum dynamical nature of molecular processes. Simulations are, however, computationally very demanding due to the curse of dimensionality. With our MPI and OpenMP parallelized code, HAParaNDA (cf.[1]), we are able to accurately solve the full Schrödinger equation, currently in up to five dimensions using a medium-size cluster.
The  -dimensional TDSE reads
-dimensional TDSE reads
|  | 
 denotes the wave packet and the potential
 denotes the wave packet and the potential  describes interactions within the system. Currently, we only consider
localized molecules on sufficiently large
describes interactions within the system. Currently, we only consider
localized molecules on sufficiently large  -orthotopes and close the
system by periodic boundary conditions. We introduce a spatial grid and
compute the derivatives with an
-orthotopes and close the
system by periodic boundary conditions. We introduce a spatial grid and
compute the derivatives with an  th order finite difference (FD) method
(8th order in the reported experiments). For a time-independent
potential, the solution of the semi-discrete system is given by
th order finite difference (FD) method
(8th order in the reported experiments). For a time-independent
potential, the solution of the semi-discrete system is given by
|  | 
 is the spatially discretized differential operator and
 is the spatially discretized differential operator and  is
the discrete wave packet. For time-dependent potentials, the exponential
form can still be exploited on small time intervals but the Hamiltonian
has to be time-averaged using e.g. the Magnus expansion (see [3] for
details). The matrix exponential can be computed efficiently in serial
using the Lanczos algorithm. In a parallel environment, the fact that two
inner products have to be computed in each iteration hampers scalability.
Kim and Chronopoulos [2] discuss two modifications of the Lanczos
algorithm with reduced communication. Rearranging the computations makes
it possible to compute both inner products at the same point, reducing
the number of synchronizations to one per iteration. In the
 is
the discrete wave packet. For time-dependent potentials, the exponential
form can still be exploited on small time intervals but the Hamiltonian
has to be time-averaged using e.g. the Magnus expansion (see [3] for
details). The matrix exponential can be computed efficiently in serial
using the Lanczos algorithm. In a parallel environment, the fact that two
inner products have to be computed in each iteration hampers scalability.
Kim and Chronopoulos [2] discuss two modifications of the Lanczos
algorithm with reduced communication. Rearranging the computations makes
it possible to compute both inner products at the same point, reducing
the number of synchronizations to one per iteration. In the  -step
Lanczos method, blocks of
-step
Lanczos method, blocks of  consecutive steps are executed with only
one synchronization step required for each block. We have transferred
these ideas to the Lanczos propagator method. A second hassle with using
the Lanczos procedure is that the number of iterations has to be chosen
with care. Numerical round-off errors cause instabilities. We therefore
propose to choose the number of steps adaptively and to make sure that
the iterations are stopped once the residual is small enough. For a
discussion of error estimation in the propagation of the TDSE, we refer
to [4].
 consecutive steps are executed with only
one synchronization step required for each block. We have transferred
these ideas to the Lanczos propagator method. A second hassle with using
the Lanczos procedure is that the number of iterations has to be chosen
with care. Numerical round-off errors cause instabilities. We therefore
propose to choose the number of steps adaptively and to make sure that
the iterations are stopped once the residual is small enough. For a
discussion of error estimation in the propagation of the TDSE, we refer
to [4].
In order to demonstrate the performance of a massively parallel
simulation of the TDSE based on a FD-Lanczos discretization, we have
conducted several simulations on a medium-size cluster, consisting of 316
nodes. Each node is equipped with dual quad-core Intel Nehalem CPU and 24
GB of DRAM. The nodes are interconnected by an InfiniBand fabric. For the
experiments, we have considered the harmonic oscillator and the
Henon-Heiles potential. The analytical solution is known for the
harmonic oscillator, so we are able to verify the correctness and the
accuracy of our numerical results. Table ![[*]](file:/usr/share/latex2html/icons/crossref.png) shows the
scalability of the results for the 4D harmonic oscillator. Comparing the
variants of the Lanczos algorithm, we see that the performance can indeed
be improved by reducing the communication. We are currently working on an
implementation of the
 shows the
scalability of the results for the 4D harmonic oscillator. Comparing the
variants of the Lanczos algorithm, we see that the performance can indeed
be improved by reducing the communication. We are currently working on an
implementation of the  -step method and hope to further improve the
scalability in that way. The Henon-Heiles potential is a common test
case for high-dimensional simulations. We have simulated this problem in
five dimensions on a grid with
-step method and hope to further improve the
scalability in that way. The Henon-Heiles potential is a common test
case for high-dimensional simulations. We have simulated this problem in
five dimensions on a grid with  points over
 points over  time steps. We
used the standard Lanczos algorithm and have chosen the size of the
Krylov space adaptively. The simulation was performed on 1024 cores in 19
hours.
 time steps. We
used the standard Lanczos algorithm and have chosen the size of the
Krylov space adaptively. The simulation was performed on 1024 cores in 19
hours.
| # Cores | 8 | 16 | 32 | 64 | 128 | 256 | 512 | 
| L1 | 3.92 | 4.31 | 4.37 | 4.55 | 4.72 | 4.87 | 5.16 | 
| L2 | 3.79 | 4.06 | 4.26 | 4.33 | 4.43 | 4.72 | 4.83 | 
| [1] | M. Gustafsson and S. Holmgren. An implementation framework for solving high-dimensional PDEs on massively parallel computers, to appear in: Proceedings of ENUMATH 2009, Uppsala, Sweden | 
| [2] | S.K. Kim and A.T. Chronopoulos. A class of Lanczos-like algorithms implemented on parallel computers, Parallel Comput. 17 (1991) | 
| [3] | K. Kormann, S. Holmgren, and H.O. Karlsson. Accurate time propagation for the Schrödinger equation with an explicitly time-dependent Hamiltonian, J. Chem. Phys. 128 (2008) | 
| [4] | K. Kormann and A. Nissen. Error Control for Simulations of a Dissociative Quantum System, to appear in: Proceedings of ENUMATH 2009, Uppsala, Sweden | 
 
 
 
 
