John Mellor-Crummey's Publications by Year
[Note: If you are interested in a copy of a paper or publication that
is not available for download below, email me and I'll be happy to
provide it.]
[
1987
1988
1989
1990
1991
1992
1993
1994
1995
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
]
2019
Keren Zhou and John Mellor-Crummey. 2019. A tool for performance analysis of GPU-accelerated applications. In Proceedings of the 2019 IEEE/ACM International Symposium on Code Generation and Optimization (CGO 2019). IEEE Press, Piscataway, NJ, USA, 282-282. [Poster Abstract PDF, Poster PDF]
2018
-
Lai Wei and John Mellor-Crummey. 2018. Automated Analysis of Time Series Data to Understand Parallel Program Behaviors. In Proceedings of the 2018 International Conference on Supercomputing (ICS '18). ACM, New York, NY, USA, 240-251. DOI: https://doi.org/10.1145/3205289.3205308 [PDF]
-
Yizi Gu and John Mellor-Crummey. 2018. Dynamic data race detection for OpenMP programs. In Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis (SC '18). IEEE Press, Piscataway, NJ, USA, Article 61, 12 pages. [PDF]
2017
-
Scott Parker, John Mellor-Crummey, Dong H Ahn, Heike Jagode, Holger Brunst, Sameer Shende, Allen D Malony, David Lecomber, John V DelSignore Jr, Ronny Tschüter, Ralph Castain, Kevin Harms, Philip Carns, Ray Loy, Kalyan Kumaran. Performance Analysis and Debugging Tools at Scale. Chapter 2. In Exascale Scientific Applications: Scalability and Performance Portability. Editors: Tjerk P. Straatsma, Katerina B. Antypas, Timothy J. Williams.
2016
-
Karthik Murthy, Sri Raj Paul, Kuldeep S. Meel, Tiago Cogumbreiro, and John Mellor-Crummey.
Design and Verification of Distributed Phasers.
In Proceedings of the 22nd International European Conference
on Parallel and Distributed Computing
Grenoble, France, August 22-26, 2016.
-
Sri Raj Paul, John Mellor-Crummey, Mauricio Araya-Polo, Detlef Hohl.
Performance Analysis and Optimization of a Hybrid Distributed Reverse Time Migration Application.
In Proceedings of the International Conference on Computational Science (ICCS 2016).
San Diego, CA, USA, June 6-8, 2016, Pages 8-18.
[DOI]
-
Chaoran Yang, John Mellor-Crummey. A Practical Solution to the Cactus Stack Problem.
In Proceedings of the 28th ACM Symposium on Parallelism in Algorithms and Architectures (SPAA '16).
Asilomar State Beach, CA, USA, July 11-13, 2016.
[PDF]
-
Chaoran Yang and John Mellor-Crummey.
A Wait-Free Queue as Fast as Fetch-and-Add.
In Proceedings of the 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP '16). March, 2016.
[DOI]
-
Milind Chabbi and John Mellor-Crummey.
Contention-Conscious, Locality-Preserving Locks.
In Proceedings of the 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP '16). March, 2016.
[DOI]
2015
-
Sri Raj Paul, John Mellor-Crummey, Mauricio Araya, Detlef Hohl.
Performance Analysis and Optimization of a Hybrid Distributed
Reverse Time Migration Application. International Conference for High Performance Computing,
Networking, Storage, and Analysis (SC 15).
Austin, TX. November, 2015. (Poster)
[PDF]
-
Sri Raj Paul, Karthik Murthy, Kuldeep S. Meel and John Mellor-Crummey. Distributed Phasers. In Proceedings of the 24th International Conference on Parallel Architectures and Compilation Techniques. October 18-21, San Francisco, CA, USA. (Poster)
[PDF]
-
Karthik Murthy and John Mellor-Crummey.
Communication Avoiding Algorithms: Analysis and Code Generation for Parallel
Systems. In Proceedings of the 24th International Conference on Parallel Architectures and Compilation Techniques. October 18-21, San Francisco, CA, USA.
[DOI]
-
Milind Chabbi, Mike Fagan, John Mellor-Crummey. High Performance Locks for Multi-level NUMA Systems.
In Proceedings of the 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP '15). February, 2015.
[DOI]
-
Milind Chabbi, Wim Lavrijsen, Wibe de Jong, Koushik Sen, John Mellor-Crummey, Costin Iancu.
Barrier Elision for Production Parallel Programs.
In Proceedings of the 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP '15). February, 2015.
[DOI]
2014
-
Xu Liu, Kamal Sharma, and John Mellor-Crummey.
ArrayTool: a lightweight profiler to guide array regrouping. In Proceedings of the 23rd international conference on Parallel architectures and compilation (PACT '14). August, 2014. ACM, New York, NY, USA, 405-416. [DOI]
-
Seema Hiranandani, Ken Kennedy, John M. Mellor-Crummey, Ajay Sethi.
Compilation Techniques for Block-cyclic Distributions.
International Conference on Supercomputing (ICS) 25th Anniversary Volume, June 2014. ACM, New York, NY, USA.
[DOI]
-
John Mellor-Crummey, Seema Hiranandani, Ajay Sethi.
Author's Retrospective: Compilation Techniques for Block-cyclic Distributions.
International Conference on Supercomputing (ICS) 25th Anniversary Volume, June 2014. ACM, New York, NY, USA.
[DOI]
-
Rishi Surendran, Raghavan Raman, Swarat Chaudhuri, John Mellor-Crummey, and Vivek Sarkar. 2014.
Test Driven Repair of Data Races in Structured Parallel Programs.
In Proceedings of the ACM SIGPLAN Symposium on Programming Language Design and Implementation (PLDI '14). ACM, New York, NY, USA.
[DOI]
-
Milind Chabbi, Xu Liu, and John Mellor-Crummey. 2014. Call Paths for Pin Tools. In Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization (CGO '14). ACM, New York, NY, USA, , Pages 76 , 11 pages. [DOI]
-
Xu Liu and John Mellor-Crummey. 2014. A tool to analyze the performance of multithreaded programs on NUMA architectures. In Proceedings of the 19th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP '14). ACM, New York, NY, USA, 259-272. [DOI]
-
Chaoran Yang, Wesley Bland, John Mellor-Crummey, and Pavan Balaji. 2014. Portable, MPI-interoperable Coarray Fortran. In Proceedings of the 19th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP '14). ACM, New York, NY, USA, 81-92. [DOI]
2013
-
Xu Liu and John Mellor-Crummey. A Data-centric Profiler for Parallel Programs.
Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC'13), November 17-22, 2013, Denver, Colorado, USA. [DOI]
-
Milind Chabbi, Karthik Murthy, Michael Fagan, and John Mellor-Crummey. Effective Sampling-Driven Performance Tools for GPU-Accelerated Supercomputers.
Proceedings of the
International Conference for High Performance Computing, Networking, Storage and Analysis (SC13). Denver, CO.
[DOI]
-
Alexandre Eichenberger, John Mellor-Crummey, Martin Schulz, Michael Wong, Nawal Copty, Robert Dietrich, Xu Liu, Eugene Loh and Daniel Lorenz. OMPT: OpenMP Tools Application Programming Interfaces for Performance Analysis. 2013 International Workshop on OpenMP (IWOMP'13),
Canberra, Australia, September 16-18, 2013.
[DOI]
-
Ashwin M. Aji, Lokendra S. Panwar, Feng Ji, Milind Chabbi, Karthik Murthy, Pavan Balaji, Keith R. Bisset, James Dinan, Wu-chun Feng, John Mellor-Crummey, Xiaosong Ma, and Rajeev Thakur. 2013. On the efficacy of GPU-integrated MPI for scientific applications. In Proceedings of the 22nd international symposium on High-performance parallel and distributed computing (HPDC '13). ACM, New York, NY, USA, 191-202.
[DOI]
-
Xu Liu, John Mellor-Crummey, and Michael Fagan. 2013. A new approach for performance analysis of openMP programs. In Proceedings of the 27th international ACM conference on International conference on supercomputing (ICS '13). ACM, New York, NY, USA, 69-80. [DOI]
-
Xu Liu and John Mellor-Crummey. Pinpointing Data Locality Bottlenecks with Low Overhead. 2013 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS'13), April 21-23, Austin, Texas, USA.
[DOI]
2012
-
Milind Chabbi and John Mellor-Crummey. DeadSpy: A Tool to Pinpoint Program Inefficiencies.
In CGO '12: Proc. of the 2012 International Symposium on Code Generation and Optimization, 2012.
[DOI]
2011
-
Nathan R. Tallent and John Mellor-Crummey. Using Sampling to Understand Parallel Program Performance. In Proc. of the 5th Parallel Tools Workshop. September 26-27, 2011, Dresden/Germany.
[DOI]
-
Nathan R. Tallent, John M. Mellor-Crummey, Michael Franco, Reed Landrum, and Laksono Adhianto. Scalable fine-grained call path tracing. In ICS '11: Proc. of the 25th International Conference on Supercomputing, 63-74, ACM, New York, NY, USA, June 2011.
[DOI]
[PDF]
-
Xu Liu and John Mellor-Crummey. Pinpointing data locality problems using data-centric analysis.
In CGO '11: Proc. of the 2011 International Symposium on Code Generation and Optimization, 2011. [DOI]
2010
-
Nathan R. Tallent, Laksono Adhianto, and John M. Mellor-Crummey. Scalable identification of load imbalance in parallel executions using call path profiles.
In Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis (SC '10).
IEEE Computer Society, Washington, DC, USA, 1-11, 2010.
[DOI]
[PDF]
-
Laksono Adhianto, John Mellor-Crummey, and Nathan R. Tallent. Effectively presenting call path profiles of application performance.
In PSTI 2010: Workshop on Parallel Software Tools and Tool Infrastructures, in conjuction with the 2010 International Conference on Parallel Processing, 2010.
[PDF]
-
Laksono Adhianto, Sinchan Banerjee, Mike Fagan, Mark Krentel, Gabriel Marin, John Mellor-Crummey, and Nathan R. Tallent.
HPCToolkit: Tools for performance analysis of optimized parallel programs. Concurrency and Computation: Practice and Experience, 22(6):685-701, 2010.
[DOI]
[PDF]
-
Nathan R. Tallent, John M. Mellor-Crummey, and Allan Porterfield. Analyzing lock contention in multithreaded applications. In PPoPP '10: Proc. of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pages 269-280, New York, NY, USA, 2010. ACM.
[DOI]
[PDF]
2009
-
Nathan R. Tallent and John M. Mellor-Crummey. Identifying performance bottlenecks in work-stealing computations. Computer, 42(12):44-50, 2009.
[DOI]
-
John Mellor-Crummey, Laksono Adhianto, and William Scherer III.
A New Vision for Coarray Fortran. The Third Conference on Partitioned Global
Address Space Programming Models. Ashburn, VA, October 5-8, 2009.
[PDF]
-
Nathan R. Tallent, John M. Mellor-Crummey, Laksono Adhianto, Michael W. Fagan, and Mark Krentel. Diagnosing performance bottlenecks in emerging petascale applications. In Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis (Portland, Oregon, November 14 - 20, 2009). SC '09. ACM, New York, NY, 1-11. [DOI]
[PDF]
-
Robert Fowler, Laksono Adhianto, Bronis de Supinski, Michael Fagan, Todd Gamblin, Mark Krentel, John Mellor-Crummey, Martin Schulz, and Nathan Tallent. Frontiers of performance analysis on leadership-class systems. Journal of Physics: Conference Series, 180:012041 (6pp), 2009. [DOI] [PDF]
-
Nathan Tallent, John Mellor-Crummey, and Michael Fagan.
Binary analysis for measurement and attribution
of program performance.
In Proceedings of the ACM SIGPLAN Symposium on Program Language
Design and Implementation (Dublin, Ireland, June 15 - 21, 2009). PLDI '09. ACM, New York, NY, 441-452.
Distinguished paper award.
[DOI]
[PDF]
-
Nathan Tallent and John Mellor-Crummey.
Effective performance measurement and analysis of multithreaded
applications.
In Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of
Parallel Programming (PPOPP), Raleigh, North Carolina, USA, February 2009.
[DOI]
[PDF]
-
J. H. Chen, A. Choudhary, B. de Supinski, M. DeVries, E. R. Hawkes,
S. Klasky, W. K. Liao, K. L. Ma, J. Mellor-Crummey, N. Podhorski,
R. Sankaran, S. Shende, and C. S. Yoo. Terascale direct numerical
simulations of turbulent combustion using S3D.
Computational Science & Discovery, 2 015001, 2009.
[PDF]
2008
-
Gabriel Marin, Guohua Jin, and John Mellor-Crummey.
Managing locality in grand challenge applications:
a case study of the gyrokinetic toroidal code.
In Proceedings of SciDAC 2008, Journal of Physics:
Conference Series 125. IOP Publishing, August 2008.
[PDF]
- Nathan Tallent, John Mellor-Crummey, Laksono Adhianto, Michael Fagan,
and Mark Krentel. HPCToolkit: performance tools for scientific computing.
In Proceedings of SciDAC 2008, Journal of Physics: Conference Series 125.
IOP Publishing, August 2008.
[PDF]
-
Gabriel Marin and John Mellor-Crummey.
Pinpointing and exploiting opportunities for enhancing data reuse.
In Proceedings of 2008 IEEE International Symposium on Performance Analysis
of Systems and Software, pages 115126, Austin, Texas, April 2008.
[PDF]
-
Laksono Adhianto, Michael Fagan, Mark Krentel, Gabriel Marin,
John Mellor-Crummey, Nathan Tallent.
HPCToolkit: Performance Measurement and Analysis for Supercomputers with
Node-level Parallelism. Supercomputing 2008 Workshop on
Node Level Parallelism for Large Scale Supercomputers, November 2008.
Austin, TX.
[PDF]
-
Nathan Tallent and John Mellor-Crummey. A methodology for accurate,
effective and scalable performance analysis of application programs.
In TIMERS 2008, Austin, Texas, April 2008.
[PDF]
2007
- John Mellor-Crummey, Peter Beckman, Jack Dongarra, Ken Kennedy,
Barton Miller, Katherine Yelick.
Software for leadership-class computing. SciDAC Review.
Fall 2007, pages 36-45. [PDF]
-
John Mellor-Crummey. Harnessing the power of emerging petascale
platforms. SciDAC 2007. Journal of Physics: Conference Series 78 (2007) 012048
[DOI]
[abstract]
[PDF]
- Cristian Coarfa, John Mellor-Crummey, Nathan Froyd, and Yuri Dotsenko.
Scalability analysis of SPMD codes using expectations. In Proceedings
of the 21st Annual international Conference on Supercomputing (Seattle,
Washington, June 17 - 21, 2007). ICS '07. ACM Press, New York, NY,
13-22.
[DOI] [PDF]
-
John Mellor-Crummey, Nathan Tallent, Michael Fagan, Jan E. Odegard.
Application performance profiling on the Cray XD1 using HPCToolkit.
49th Cray User Group Conference, Seattle, WA, May 7-10, 2007.
[PDF]
- Gabriel Marin and John Mellor-Crummey. Application Insight Through
Performance Modeling. In Proceedings of the 26th IEEE International
Performance Computing and Communications Conference (IPCCC'07), New
Orleans, LA, April 2007.
[PDF]
2006
-
Nathan Froyd, Nathan Tallent, John Mellor-Crummey, and Robert Fowler. "Call path profiling for unmodified, optimized binaries." Proceedings of GCC Developer's Summit. (June 2006.): 21-35.
[Proceedings PDF]
-
Cristian Coarfa, Yuri Dotsenko, John Mellor-Crummey. Experiences with Sweep3D implementations in Co-array Fortran.
The Journal of Supercomputing, 36(2):101-121, May 2006. (Special Issue on Computer Science Research
Supporting High-Performance Applications, Rod Oldehoeft, guest editor.) Springer, Netherlands.
[DOI]
-
Apan Qasem, Ken Kennedy and John Mellor-Crummey. Automatic tuning of whole applications using direct search and a
performance-based transformation system. The Journal of Supercomputing, 36(2):183-196, May 2006.
(Special Issue on Computer Science Research Supporting High-Performance Applications, Rod Oldehoeft, guest editor.) Springer, Netherlands.
[DOI]
-
Yuri Dotsenko, Cristian Coarfa, Luay Nakhleh, John Mellor-Crummey, and Usman Roshan. "PRec-I-DCM3: A parallel framework for fast and accurate large scale phylogeny reconstruction." International Journal on Bioinformatics Research and Applications, 2(4) (2006): 407-419.
[abstract]
[PDF]
2005
-
L. Nakhleh, G. Jin, F. Zhao, and J. Mellor-Crummey, "Reconstructing Phylogenetic Networks Using Maximum Parsimony." Submitted.
-
C. Coarfa, Y. Dotsenko, J. Mellor-Crummey, L. Nakhleh, and U. Roshan, "PRec-I-DCM3: A Parallel Framework for Fast and Accurate Large Scale Phylogeny Reconstruction." In Proceedings of the 1st IEEE Workshop on High Performance Computing in Medicine and Biology (HiPCoMB 2005),
Fukuoka, Japan, July 20-22, 2005. Best paper award.
-
Nathan Froyd, John Mellor-Crummey and Robert Fowler.
Low-overhead Call Path Profiling of Unmodified, Optimized Code.
In Proceedings of the 19th ACM International Conference on Supercomputing.
Cambridge, MA, June 20-22, 2005.
[PDF]
-
Daniel Chavarria-Miranda and John Mellor-Crummey.
Effective Communication Coalescing for Data Parallel Applications.
In Proceedings of the ACM SIGPLAN Symposium on Principles and
Practice of Parallel Programming (PPOPP).
Chicago, IL, June 15-17, 2005.
-
Cristian Coarfa, Yuri Dotsenko, John Mellor-Crummey, Francois Cantonnet, Tarek El-Ghazawi, Ashrujit Mohanty, Yiyi Yao and Daniel Chavarria-Miranda.
An Evaluation of Global Address Space Languages: CoArray Fortran and Unified Parallel C.
In Proceedings of the ACM SIGPLAN Symposium on Principles and
Practice of Parallel Programming (PPOPP).
Chicago, IL, June 15-17, 2005.
-
Daniel Chavarria-Miranda, Guohua Jin, and John Mellor-Crummey.
COTS Clusters vs. the Earth Simulator: An Application Study Using IMPACT-3D.
In Proceedings of the 2005 International Parallel and Distributed Processing Symposium.
Denver, Co, April, 2005.
-
Guohua Jin and John Mellor-Crummey.
SFCGen: A Framework for Efficient Generation of Multi-dimensional Space-filling Curves by Recursion. ACM Transactions on Mathematical Software 31(1):120-148, March, 2005.
[abstract]
-
Ken Kennedy, Bradley Broom, Arun Chauhan, Robert Fowler, John Garvin,
Charles Koelbel, Cheryl McCosh, John Mellor-Crummey
Telescoping languages: A System for Automatic Generation of Domain Languages.
In Proceedings of the IEEE 93(2):387-408, February 2005.
[abstract]
2004
-
Apan Qasem, Ken Kennedy, John Mellor-Crummey.
Automatic Tuning of Whole Applications Using Direct Search and
a Performance-based Transformation System.
In Proceedings of the Los Alamos Computer Science Institute 5th Annual Symposium (LACSI 2004), Santa Fe, New Mexico, October 2004.
[abstract]
[PDF]
-
Cristian Coarfa, Yuri Dotsenko, John Mellor-Crummey. Experiences with Sweep3D Implementations in Co-array Fortran. In Proceedings of the Los Alamos Computer Science Institute 5th Annual Symposium (LACSI 2004), Santa Fe, New Mexico, October 2004.
[abstract]
[PDF]
-
Yuri Dotsenko, Cristian Coarfa, John Mellor-Crummey, Daniel Chavarria-Miranda. Experiences with Co-Array Fortran on Hardware Shared Memory Platforms. In Proceedings of the 17th International Workshop on Languages and Compilers for Parallel Computing (LCPC 2004), West Lafayette, Indiana, September 2004.
[abstract]
[PDF]
-
Yuri Dotsenko, Cristian Coarfa, and John Mellor-Crummey.
A Multi-platform Co-Array Fortran Compiler.
In Proceedings of the 13th International Conference on
Parallel Architecture and Compilation Techniques (PACT 2004),
Antibes Juan-les-Pins, France, September, 2004.
[abstract]
[PDF]
-
Gabriel Marin and John Mellor-Crummey.
Cross Architecture Performance Predictions for Scientific Applications
Using Parameterized Models. In Proceedings of the Joint International Conference on Measurement and Modeling of Computer Systems, NY, NY, June 2004.
[abstract]
[PDF]
-
John Mellor-Crummey and John Garvin.
Optimizing Sparse Matrix Vector Multiply using Unroll-and-jam.
International Journal of High Performance Computing Applications,
18(2), Summer 2004.
[PDF]
2003
-
Guohua Jin and John Mellor-Crummey.
On Reducing Storage Requirement of Scientific Applications.
Proceedings of the Los Alamos Computer Science Institute 4th Annual Symposium
October, 2003,
Santa Fe, NM. Published on CD-ROM.
[abstract]
[PDF]
-
Alain Darte, John Mellor-Crummey, Robert Fowler and Daniel
Chavarria-Miranda.
Generalized multipartitioning of multi-dimensional arrays for parallelizing
line-sweep computations.
Journal of Parallel and Distributed Computing
63(9), September 2003, Pages 887-911.
[abstract]
[PDF]
-
Cristian Coarfa, Yuri Dotsenko, Jason Eckhardt and John Mellor-Crummey.
Co-Array Fortran Performance and Potential: An NPB Experimental Study.
Proceedings of the 16th International Workshop on Languages and Compilers for Parallel Computing,
College Station, TX, Oct 2003.
[abstract]
[PDF]
-
Apan Qasem, Guohua Jin and John Mellor-Crummey.
Improving Performance with Integrated Program Transformations
Technical Report TR03-419, Dept. of Computer Science, Rice University,
October, 2003.
[abstract]
[PDF]
-
Daniel Chavarria-Miranda and John Mellor-Crummey.
An Evaluation of Data-Parallel Compiler Support for Line-Sweep Applications.
The Journal of Instruction-Level Parallelism, vol. 5, February 2003
(http://www.jilp.org/vol5).
Special issue with selected papers from:
The Eleventh International Conference on Parallel Architectures and Compilation Techniques, September 2002.
Guest Editors: Erik Altman and Sally McKee. [PDF].
2002
-
John Mellor-Crummey and John Garvin.
Optimizing Sparse Matrix Vector Multiply using Unroll-and-jam.
Proceedings of the Los Alamos Computer Science Institute 3rd Annual Symposium
October, 2002, Santa Fe, NM. Published on CD-ROM.
[abstract]
[PDF]
-
Cristian Coarfa, Yuri Dotsenko, Daniel Chavarria-Miranda and John Mellor-Crummey
An Emerging Co-Array Fortran Compiler.
Extended poster abstract.
Proceedings of the Los Alamos Computer Science Institute 3rd Annual Symposium
October, 2002, Santa Fe, NM. Published on CD-ROM.
-
Robert Fowler, John Mellor-Crummey, Guohua Jin and Apan Qasem.
A Source-to-source Loop Transformation Tool.
Extended poster abstract.
Proceedings of the Los Alamos Computer Science Institute 3rd Annual Symposium
October, 2002, Santa Fe, NM. Published on CD-ROM.
-
Gabriel Marin and John Mellor-Crummey.
Building parameterized performance models for black-box applications.
Extended poster abstract.
Proceedings of the Los Alamos Computer Science Institute 3rd Annual Symposium
October, 2002, Santa Fe, NM. Published on CD-ROM.
[PDF]
-
Daniel Chavarria-Miranda and John Mellor-Crummey.
An Evaluation of Data-Parallel Compiler Support for Line-Sweep Applications.
Proceedings of PACT'02: Eleventh International Conference on Parallel
Architectures and Compilation Techniques , September 2002,
Charlottesville, VA.
Best Student Paper
[PDF]
-
Guohua Jin and John Mellor-Crummey.
Experiences Tuning SMG98 --- a Semicoarsening Multigrid Benchmark based
on the hypre Library.
Proceedings of the International Conference on Supercomputing,
June 22-26, 2002, New York, New York, USA.
[abstract]
[PDF]
-
John Mellor-Crummey, Vikram Adve, Bradley Broom, Daniel
Chavarria-Miranda, Robert Fowler, Guohua Jin, Ken Kennedy and Qing Yi. Advanced
optimization strategies in the Rice dHPF compiler.
Concurrency and Computation: Practice
and Experience. 14:741-767, 2002.
[abstract],
[ps],
[PDF]
-
John Mellor-Crummey, Robert Fowler, Gabriel Marin and Nathan Tallent.
HPCView: A tool for top-down analysis of node performance.
The Journal of Supercomputing, 23:81-104, 2002.
Special Issue with
selected papers from the 2001 Los Alamos Computer Science Institute
Symposium.
[PDF]
- Ken Kennedy, Mark Mazina, John Mellor-Crummey, Keith Cooper, Linda
Torczon, Fran Berman, Andrew Chien, Holly Dail, Otto Sievert,
Dave Angulo, Ian Foster, Dennis Gannon, Lennart Johnsson, Carl
Kesselman, Ruth Aydt, Daniel Reed, Jack Dongarra, Sathish Vadhiyar,
and Rich Wolski.
Toward a Framework for Preparing and
Executing Adaptive Grid Programs.
Proceedings of NSF Next Generation Systems Program Workshop (International Parallel and Distributed Processing
Symposium 2002), Fort Lauderdale, FL, April 2002.
[abstract]
[PDF]
-
Daniel Chavarria-Miranda, Alain Darte,
Robert Fowler, and John
Mellor-Crummey. Generalized Multipartitioning
for Multi-dimensional Arrays. In
Proceedings of International Parallel and Distributed Processing
Symposium, Fort Lauderdale, FL, April 2002. Selected as a Best
Paper.
[abstract],
[ps],
[PDF]
2001
-
Guohua Jin, John Mellor-Crummey and Robert Fowler. Increasing temporal locality with skewing and
recursive blocking. In Proceedings of SC2001, Denver, CO, Nov
2001. Distributed on CD-ROM.
[abstract],
[ps],
[PDF]
-
Daniel Chavarria-Miranda, Alain Darte, Robert Fowler and John Mellor-Crummey.
On efficient parallelization of line-sweep computations. Research Report
2001-45, Laboratoire de l'Informatique du Parallélisme, École
Normale Supériore de Lyon, November 2001.
[abstract],
[ps],
[PDF]
-
Francine Berman, Andrew Chien, Keith Cooper, Jack Dongarra, Ian Foster,
Dennis Gannon, Lennart Johnsson, Ken Kennedy, Carl Kesselman, John Mellor-Crummey,
Dan Reed, Linda Torczon and Rich Wolski. The GrADS project: Software support
for high-level grid application development.
International Journal of
High Performance Computing Applications, 15(4), Winter 2001.
[abstract],
[PDF]
-
Ken Kennedy, Bradley Broom, Keith Cooper, Jack Dongarra, Rob Fowler, Dennis
Gannon, Lennart Johnsson, John Mellor-Crummey and Linda Torczon. Telescoping
languages: A strategy for automatic generation of scientific problem-solving
systems from annotated libraries.
Journal of Parallel and Distributed
Computation. 61(12), 1803-1826, Dec 2001.
[abstract],
[PDF]
-
John Mellor-Crummey, Robert Fowler and Gabriel Marin. HPCView: A tool
for top-down analysis of node performance. In Proceedings of the Los
Alamos Computer Science Institute 2nd Annual Symposium, Santa Fe,
NM, October 2001. Distributed on CD-ROM.
[abstract],
[ps],
[PDF]
-
Alain Darte, John Mellor-Crummey, Robert Fowler and Daniel Chavarria-Miranda. Generalized multipartitioning. In Proceedings of the Los
Alamos Computer Science Institute 2nd Annual Symposium, Santa Fe,
NM, October 2001. Distributed on CD-ROM.
[abstract],
[ps],
[PDF]
-
Daniel Chavarria-Miranda, John Mellor-Crummey and Trushar Sarang.
Data-parallel compiler support for multipartitioning. In European Conference
on Parallel Computing (Euro-Par), Manchester, United Kingdom, August
2001.
[abstract],
[ps],
[PDF]
-
Alain Darte, Daniel Chavarria-Miranda,
Robert Fowler and John Mellor-Crummey.
On efficient parallelization of line-sweep computations. In
9th Workshop on Compilers for Parallel Computers, Edinburgh, Scotland,
June 2001.
[abstract],
[ps],
[PDF]
-
John Mellor-Crummey, Robert Fowler and David Whalley. Tools for application-oriented
performance tuning. In Proceedings of the 15th ACM International Conference
on Supercomputing, Sorrento, Italy, June 2001.
[abstract],
[PDF]
-
John Mellor-Crummey, David Whalley and Ken Kennedy. Improving memory hierarchy
performance for irregular applications using data and computation reorderings.
International
Journal of Parallel Programming, 29(3), June 2001.
[abstract],
[PDF]
-
John Mellor-Crummey, Robert Fowler and David Whalley. On providing useful
information for analyzing and tuning applications. In Joint International
Conference on Measurement & Modeling of Computer Systems, Cambridge,
MA, June 2001. (Poster abstract.)
[abstract],
[ps]
2000
-
Bradley Broom, Daniel Chavarria-Miranda, Guohua Jin, Rob Fowler,
Ken Kennedy and John Mellor-Crummey. Overpartitioning with the Rice dHPF
compiler. In Proceedings of the 4th Annual HPF User Group meeting,
Tokyo, Japan, October 2000.
-
Kai Zhang, John Mellor-Crummey and Robert Fowler. Compilation and runtime
optimizations for software distributed shared memory. In Proceedings
of the Fifth Workshop on Languages, Compilers and Runtime Systems for
Scalable Computers, Lecture Notes in Computer Science 1915, pages 182-191,
Rochester, NY, May 2000. Springer-Verlag.
-
Daniel Chavarria-Miranda and John Mellor-Crummey. Towards compiler
support for scalable parallelism. In Proceedings of the Fifth Workshop
on Languages, Compilers and Runtime Systems for Scalable Computers,
Lecture Notes in Computer Science 1915, pages 272-284, Rochester, NY, May
2000. Springer-Verlag.
1999
-
John Mellor-Crummey, David Whalley and Ken Kennedy. Improving memory hierarchy
performance for irregular applications. In Proceedings of the 13th ACM
International Conference on Supercomputing, pages 425-433, Rhodes,
Greece, June 1999.
-
Collin McCurdy and John Mellor-Crummey. An evaluation of computing paradigms
for n-body simulations on distributed memory architectures. In Proceedings
of the Seventh ACM SIGPLAN Symposium on Principles and Practice of Parallel
Programming, May 1999.
1998
-
John Mellor-Crummey and Vikram Adve. Simplifying control flow in compiler-generated
parallel code.
International Journal of Parallel Programming, 26(5),
1998.
-
Vikram Adve, Guohua Jin, John Mellor-Crummey and Qing Yi. High Performance
Fortran Compilation Techniques for Parallelizing Scientific Codes. In Proceedings
of SC98: High Performance Computing and Networking, Orlando, FL, Nov
1998.
-
Vikram Adve and John Mellor-Crummey. Using Integer Sets for Data-Parallel
Program Analysis and Optimization. In Proceedings of the SIGPLAN '98
Conference on Programming Language Design and Implementation, Montreal,
Canada, June 1998.
-
Bo Lu and John Mellor-Crummey. Compiler optimization of implicit reductions
for distributed memory multiprocessors. In Proceedings of the 12th International
Parallel Processing Symposium, Orlando, FL, March 1998.
1997
-
John Mellor-Crummey and Vikram Adve. Simplifying control flow in compiler-generated
parallel code (extended abstract). In Proceedings of the Tenth International
Workshop on Languages and Compilers for Parallel Computing, Lecture
Notes in Computer Science 1366, Minneapolis, MN, August 1997. Springer-Verlag.
A full version of this paper was selected for publication in a special
issue of the International Journal of Parallel Programming.
-
Vikram Adve and John Mellor-Crummey. ``Advanced Code Generation for High
Performance Fortran''.
Languages, Compilation Techniques and Run Time
Systems for Scalable Parallel Systems (Recent Advances and Future Perspectives)
(D. P. Agrawal and S. Pande, editors), Lecture Notes in Computer Science.
Springer-Verlag, Berlin, 1997.
-
G. Roth, J. Mellor-Crummey, K. Kennedy and R. G. Brickner. Compiling stencils
in High Performance Fortran. In Proceedings of SC'97: High Performance
Networking and Computing, San Jose, CA, November 1997.
1995
-
Vikram Adve, Jhy-Chun Wang, John Mellor-Crummey, Dan Reed, Mark Anderson,
and Ken Kennedy. An integrated compilation and performance analysis environment
for data parallel programs. In Proceedings of Supercomputing '95,
San Diego, CA, November 1995.
-
K. Kennedy, J. Mellor-Crummey and G. Roth. Optimizing Fortran 90 shift
operations on distributed-memory multicomputers. In Languages and Compilers
for Parallel Computing, Eighth International Workshop, Columbus, OH,
August 1995. Springer-Verlag.
-
Vikram Adve, Jhy-Chun Wang, John Mellor-Crummey, Dan Reed, Mark Anderson,
and Ken Kennedy. Integrating compilation and performance analysis for data
parallel programs. In Proceedings of the Workshop on Debugging and Performance
Tuning of Parallel Computing Systems, Chatham, MA, October 1995.
1994
-
Vikram Adve, Alan Carle, Elana Granston, Seema Hiranandani, Ken Kennedy,
Charles Koelbel, Ulrich Kremer, John Mellor-Crummey and Scott Warren.
Requirements for data parallel programming environments.
IEEE Transactions
on Parallel and Distributed Technology, 2(3):48-58, Fall 1994.
-
John Mellor-Crummey, Vikram Adve and Charles Koelbel. The compiler's role
in analysis and tuning of data parallel programs. In Proceedings of
the Workshop on Environments and Tools for Parallel and Scientific Computing,
pages 211-220, Townsend, TN, May 1994.
-
Seema Hiranandani, Ken Kennedy, John Mellor-Crummey and Ajay Sethi. Compilation
techniques for block-cyclic distributions. In Proc. of the 1994 International
Conference on Supercomputing, Manchester, England, July 1994.
-
Michael L. Scott and John M. Mellor-Crummey. Fast, contention-free combining
tree barriers for shared-memory multiprocessors.
International Journal
of Parallel Programming, 22(4), 1994.
1993
-
Mary Hall, John Mellor-Crummey, Alan Carle and Rene Rodriguez. FIAT: a
framework for interprocedural analysis and transformation. In Proc.
Workshop on Compilers for Parallel Processing, Portland, OR, August
1993.
-
Keith D. Cooper, Mary W. Hall, Robert T. Hood, Ken Kennedy, Kathryn S.
McKinley, John M. Mellor-Crummey, Linda Torczon and Scott K. Warren. The
ParaScope parallel programming environment. In Proceedings of the IEEE,
volume 81, February 1993. Submmitted by invitation.
-
John M. Mellor-Crummey. Compile-time support for efficient data race detection
in shared-memory parallel programs. In Proc. ACM/ONR Workshop on Parallel
and Distributed Debugging, pages 129-139, San Diego, CA, May 1993.
Available as SIGPLAN NOTICES, 28(12), December 1993.
-
Uli Kremer, John Mellor-Crummey, Ken Kennedy and Alan Carle. Automatic
data layout for distributed-memory machines in the D programming environment.
In Proc. of AP'93 International Workshop on Automatic Distributed Memory Parallelization,
Automatic Data Distribution and Automatic Parallel Performance Prediction,
pages 108-123, Saarbrücken, Germany, March 1993.
-
Seema Hiranandani, Ken Kennedy, John Mellor-Crummey and Ajay Sethi. Advanced
compilation techniques for FORTRAN D. Technical Report CRPC-TR93338, Center
for Research on Parallel Computation, Rice University, October 1993.
1992
-
S. L. Lin, J. Mellor-Crummey, B. M. Pettitt and G. N. Phillips, Jr. Molecular
dynamics on a distributed-memory multiprocessor.
Journal of Computational
Chemistry, 13(8):1022-1035, 1992.
-
John M. Mellor-Crummey. Compile-time support for efficient data race detection
in shared-memory parallel programs. In Proc. Supercomputer Debugging
Workshop '92, Dallas, TX, October 1992.
-
Ervan Darnell, John Mellor-Crummey and Ken Kennedy. Automatic software
cache coherence through vectorization. In Proceedings of 1992 International
Conference on Supercomputing, July 1992.
1991
-
John M. Mellor-Crummey. On-the-fly detection of data races for programs
with nested fork-join parallelism. In Proc. of Supercomputing '91,
pages 24-33, Albuquerque, NM, November 1991.
[abstract]
[PDF]
-
John M. Mellor-Crummey and Michael L. Scott. Algorithms for scalable synchronization
on shared-memory multiprocessors.
ACM Transactions on Computer Systems,
9(1):21-65, February 1991.
[PDF]
-
John M. Mellor-Crummey and Michael L. Scott. Synchronization without contention.
In Proc. of the 4th International Conference on Architectural Support
for Programming Languages and Operating Systems, pages 269-278, Palo
Alto, CA, April 1991.
-
John M. Mellor-Crummey and Michael L. Scott. Scalable reader-writer synchronization
for shared-memory multiprocessors. In Proc. of the 3rd ACM Symposium
on Principles and Practice of Parallel Programming, pages 106-113,
Williamsburg, VA, April 1991.
1990
-
Robert Hood, Ken Kennedy and John Mellor-Crummey. Parallel program debugging
with on-the-fly anomaly detection. In Supercomputing 1990, pages
74-81, November 1990.
-
Thomas J. LeBlanc, John M. Mellor-Crummey and Robert J. Fowler. Analyzing
parallel program executions using multiple views.
Journal of Parallel
and Distributed Computing, 9:203-217, June 1990.
1989
-
Thomas J. LeBlanc, John M. Mellor-Crummey, Neal M. Gafter, Lawrence A.
Crowl and Peter C. Dibble. The Elmwood multiprocessor operating system.
Software--Practice
and Experience, 19(11):1029-1056, November 1989.
-
John M. Mellor-Crummey and Thomas J. LeBlanc. A software instruction counter.
In Proc. of the 3rd International Conference on Architectural Support
for Programming Languages and Operating Systems, pages 78-86, Boston,
MA, April 1989.
-
R. J. Fowler, T. J. LeBlanc and J. M. Mellor-Crummey. An integrated approach
to parallel program debugging and performance analysis on large-scale multiprocessors.
In Proc. of the SIGPLAN/SIGOPS Workshop on Parallel and Distributed
Debugging, pages 163-173, Madison, WI, May 1988. Special issue of SIGPLAN
Notices, 24(1), Jan. 1989.
-
John M. Mellor-Crummey.
Debugging and Analysis of Large-Scale Parallel
Programs. PhD thesis, Department of Computer Science, University of
Rochester, September 1989. Available as Technical report URCS-TR-312.
1988
-
John M. Mellor-Crummey. Experiences with the BBN Butterfly. In Proc.
of the 1988 COMPCON, pages 101-104, San Franciso, CA, February 1988.
IEEE. Invited paper.
1987
-
Thomas J. LeBlanc and John M. Mellor-Crummey. Debugging parallel programs
with Instant Replay.
IEEE Transactions on Computers, C-36(4):471-482,
April 1987.
-
John M. Mellor-Crummey and Thomas J. LeBlanc. Instrumentation for distributed
systems. In Proc. of the Workshop on Instrumentation for Distributed
Computing Systems, pages 16-18, Sanibel Island, FL, January 1987.
-
John M. Mellor-Crummey. Concurrent queues: Practical fetch-and-phi algorithms.
Technical Report 229, Department of Computer Science, University of Rochester,
November 1987.
[abstract]
[PDF]