===affil2: ===firstname: Veronica ===firstname4: ===firstname3: ===lastname2: ===lastname: Mejia Bustamante ===firstname5: ===affil6: ===lastname3: ===email: vmejia@emory.edu ===lastname6: ===affil5: ===otherauths: ===lastname4: ===affil4: ===lastname7: ===affil7: ===firstname7: ===postal: Dept. of Math & CS Emory University 400 Dowman Dr., W401 Atlanta, GA 30322 ===firstname6: ===ABSTRACT: Tomosynthesis imaging provides a viable alternative to computed tomography (CT) and has obtained significant interest from the medical community as a means for diagnostic radiology and radiation therapy. In digital tomosynthesis imaging, multiple projections of an object are obtained along a small range of different incident angles in order to reconstruct a 3D representation of the object. In this paper we discuss the implementation details of the polyenergetic digital breast tomosynthesis reconstruction algorithm in a GPU using OpenCL. We describe three different algorithm implementations: a serial implementation, a GPU implementation threaded by functionality of the model, and a GPU fused kernel implementation which is threaded to increase performance, throughput, and GPU utilization in the application. We show that the explicit kernel fusion achieves significant speed-up in the reconstruction process of a clinical size patient data set, from running over 100X faster than the version threaded by functionality to 200X faster than the serial approach. ===affil3: ===lastname5: ===affilother: ===title: Implementation of Iterative Solvers for the Digital Tomosynthesis Problem in GPUs ===firstname2: