Over the last two decades, processor speeds have been improving much faster than memory speeds. As a result, memory access delay is a major performance bottleneck in today's systems. Because compilers often fail to automatically choreograph data and computation to avoid memory access delay, we have developed a source-to-source transformation tool for this purpose. To use our tool, developers annotate their code with directives that specify how our tool should apply loop transformations to improve performance. In this paper, we describe a set of storage reduction optimizations that are automatically applied by our tool. These optimizations improve code performance by reducing the memory hierarchy footprint of temporary arrays. Our experiments with a numerical kernel and two weather codes show that our storage reduction optimizations amplify the benefits of loop transformations and doubles performance achievable with loop transformations alone.