Traditional algorithms for First-Principles Molecular Dynamics (FPMD) simulations only gain a modest capability increase from current petascale computers, due to their complexity. To address this issue, we are developing a truly scalable FPMD algorithm, based on density functional theory (DFT), with complexity and controllable accuracy. The computational model uses a general non-orthogonal orbital formulation for the energy functional in the DFT equations, which requires knowledge of selected elements of the inverse of the associated overlap matrix. We present a scalable algorithm for approximately computing selected entries of the inverse of the overlap matrix, based on an approximate inverse technique, by inverting local blocks corresponding to principal submatrices of the global overlap matrix. The new FPMD algorithm exploits sparsity and uses nearest neighbor communication to provide a computational scheme capable of extreme scalability. Accuracy is controlled by the mesh spacing of the finite difference discretization, the size of the localization regions in which the electronic wave functions are confined, and a cutoff beyond which the entries of the overlap matrix can be omitted when computing selected entries of its inverse. We demonstrate the algorithm's excellent parallel scaling for up to atoms on processors, with a wall-clock time of minute per molecular dynamics time step.