We discuss the use of segmental refinement (SR) to optimize the performance of multigrid on modern architectures by requiring minimal communication, and allowing loop fusion and asynchronous processing, with minimal additional work on textbook efficient multigrid methods. We begin by continuing the work of Brandt and Diskin (1994) and investigate the robustness and quantitative work costs of SR and consider extensions to multi-scale analysis methods.