COMP 522 Multicore Computing: Tentative Paper Schedule

COMP 522

Multicore Computing

Spring 2019

Tentative Paper Schedule

Note: you can download any of the papers found in the ACM Digital Library or IEEE Xplore from the Rice campus or using the Rice VPN.

January 8: Introduction - John Mellor-Crummey

Software and the Concurrency Revolution, Herb Sutter and James Larus in ACM Queue Special Issue on Multiprocessors, 3(7), September, 2005.

Jaunary 10: Simultaneous Multithreading and the Case for Chip Multiprocessing - No class meeting

Simultaneous multithreading: maximizing on-chip parallelism. Dean M. Tullsen, Susan J. Eggers, and Henry M. Levy. In Proceedings of the 22nd annual international symposium on Computer architecture (ISCA '95). ACM, New York, NY, USA, 392-403. 1995. DOI=http://dx.doi.org/10.1145/223982.224449
Chapter 5: Simultaneous Multithreading. Multithreading Architecture, Mario Nemirovsky, Dean M. Tullsen. Synthesis Lectures on Computer Architecture. Morgan Claypool. 2013. DOI=https://doi.org/10.2200/S00458ED1V01Y201212CAC021
A single-chip multiprocessor, Lance Hammond, Basem Nayfeh, Kunle Olukotun. Computer 30(9):79-85, September 1997. DOI=http://dx.doi.org/10.1109/2.612253

January 15: Fine-grain Multithreading - John Mellor-Crummey

Niagara: A 32-Way Multithreaded SPARC Processor, Poonacha Kongetira, Kathirgamar Aingaran, and Kunle Olukotun, IEEE Micro, pp. 21-29, March-April 2005. https://ieeexplore.ieee.org/document/1453485.
Chapter 2.2 Case Studies of Throughput-oriented CMPs. Chip Multiprocessor Architecture: Techniques to Improve Throughput and Latency Kunle Olukotun, Lance Hammond, James Laudon. Synthesis Lectures on Computer Architecture. Morgan Claypool. 2007.
ELDORADO. John Feo, David Harper, Simon Kahan, Petr Konecny. In Proceedings of the 2nd Conference on Computing Frontiers (Ischia, Italy, May 04 - 06, 2005). CF '05. ACM, New York, NY, 28-34.
Evaluating the Potential of Multithreaded Platforms for Irregular Scientific Computations, Jarek Nieplocha, Andres Marquez, John Feo, Daniel Chavarria-Miranda, George Chin, Chad Scherrer, Nathaniel Beagley. In Proceedings of the 4th Intl. Conference on Computing Frontiers, Ischia, Italy, 2007, pages 47 - 58.

January 17: Future Microprocessors - John Mellor-Crummey

The Future of Microprocessors. Shekhar Borkar and Andrew A. Chien. Communications of the ACM, Vol. 54 No. 5, Pages 67-77 10.1145/1941487.1941507.
Looking back and looking forward: power, performance, and upheaval. Hadi Esmaeilzadeh, Ting Cao, Xi Yang, Stephen M. Blackburn, and Kathryn S. McKinley. Communications of the ACM 55, 7 (July 2012), 105-114. DOI=10.1145/2209249.2209272

January 22: Cache Coherence Protocols - I - Avery Whitaker

Chapter 6: Coherence Protocols; Chapter 7 Snooping Coherence Protocols; Chapter 8: Directory Coherence Protocols. A Primer on Memory Consistency and Cache Coherence Daniel J. Sorin, Mark D. Hill, David A. Wood Synthesis Lectures on Computer Architecture. Morgan Claypool. 2011.
Victim Replication: Maximizing Capacity while Hiding Wire Delay in Tiled Chip Multiprocessors, M. Zhang and K. Asanovic. In Proceedings 32nd International Symposium on Computer Architecture, Madison, WI, June 2005.

January 24: Cache Coherence Protocols - II

Elastic Cooperative Caching: An Autonomous Dynamically Adaptive Memory Hierarchy for Chip Multiprocessors. Enric Herrero, Jose Gonzalez, Ramon Canal. International Symposium on Computer Architecture, Saint-Malo, France, June 2010.
Tardis: Time Traveling Coherence Algorithm for Distributed Shared Memory. Xiangyao Yu and Srinivas Devadas. 2015. In Proceedings of the 2015 International Conference on Parallel Architecture and Compilation (PACT) (PACT '15). IEEE Computer Society, Washington, DC, USA, 227-240. DOI=http://dx.doi.org/10.1109/PACT.2015.12

January 29: IBM Power7

IBM POWER7 multicore server processor. Sinharoy, B.; Kalla, R.; Starke, W. J.; Le, H. Q.; Cargnoni, R.; Van Norstrand, J. A.; Ronchetti, B. J.; Stuecheli, J.; Leenstra, J.; Guthrie, G. L.; Nguyen, D. Q.; Blaner, B.; Marino, C. F.; Retter, E.; Williams, P. IBM Journal of Research and Development 55(3), May-June 2011, 1:1-1:29. http://dx.doi.org/10.1147/JRD.2011.2127330
IBM POWER7 performance modeling, verification, and evaluation Srinivas, M.; Sinharoy, B.; Eickemeyer, R. J.; Raghavan, R.; Kunkel, S.; Chen, T.; Maron, W.; Flemming, D.; Blanchard, A.; Seshadri, P.; Kellington, J. W.; Mericas, A.; Petruski, A. E.; Indukuru, V. R.; Reyes, S. IBM Journal of Research and Development 55(3), May-June 2011, 4:1-4:19. DOI=https://doi.org/10.1147/JRD.2011.2147170.

January 31: Memory Consistency Models

Chapter 3: Memory Consistency Motivation A Primer on Memory Consistency and Cache Coherence Daniel J. Sorin, Mark D. Hill, David A. Wood Synthesis Lectures on Computer Architecture. Morgan Claypool. 2011.
Shared Memory Consistency Models: A Tutorial, Sarita V. Adve. Kourosh Gharachorloo. Technical Report 95-7, Digital Western Research Laboratory, Palo Alto, CA.
Memory Models: A Case for Rethinking Parallel Languages and Hardware Sarita V. Adve, Hans-J. Boehm. Communications of the ACM, Vol. 53 No. 8, Pages 90-101, August, 2010. 10.1145/1787234.1787255

February 5: Hardware Memory Models

x86-TSO: a rigorous and usable programmer's model for x86 multiprocessors. Peter Sewell, Susmit Sarkar, Scott Owens, Francesco Zappa Nardelli, and Magnus O. Myreen. Communications of the ACM 53, 7 (July 2010), 89-97. DOI=10.1145/1785414.1785443 http://doi.acm.org/10.1145/1785414.1785443
Chapter 4: Total Store Order and the x86 Memory Model; Chapter 5: Relaxed Memory Consistency. A Primer on Memory Consistency and Cache Coherence Daniel J. Sorin, Mark D. Hill, David A. Wood Synthesis Lectures on Computer Architecture. Morgan Claypool. 2011.

February 12: Java Memory Model - Keren Zhou

The Java Memory Model, J. Manson, W. Pugh, and S. V. Adve. In Proceedings of the Symposium on Principles of Programming Languages (PoPL), January 2005.

February 14: C++ Concurrency Memory Model - Brett Zibilich

Foundations of the C++ Concurrency Memory Model, H. Boehm, and S. V. Adve. In Proceedings of the 2008 ACM SIGPLAN Conference on Programming Language Design and Implementation (Tucson, AZ, USA, June 07 - 13, 2008). PLDI '08. ACM, New York, NY, 68-78. DOI= http://doi.acm.org/10.1145/1375581.1375591

February 19: Programming Models: Cilk and Cilk++ - Advait Balaji

The Implementation of the Cilk-5 Multithreaded Language by Matteo Frigo, Charles E. Leiserson, and Keith H. Randall. 1998 ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), Montreal, Canada, June 1998.
Reducers and other Cilk++ hyperobjects. M. Frigo, P. Halpern, C.E. Leiserson, and S. Lewin-Berlin. In Proceedings of the Twenty-First Annual Symposium on Parallelism in Algorithms and Architectures (Calgary, AB, Canada, August 11 - 13, 2009). SPAA '09. ACM, New York, NY, 79-90.

February 21: Programming Models: Thread Building Blocks and OpenMP

The Foundations for Scalable Multi-core Software in Intel Threading Building Blocks, Alexey Kukanov, Michael J. Voss. Intel Technology Journal, Volume 11, Issue 4, 2007. Note: The link takes you to the whole journal issue. You only need read the article about Threading Building Blocks.
Experimenting with low-overhead OpenMP runtime on IBM Blue Gene/Q. A. E. Eichenberger and K. O'Brien. IBM J. Res. Dev. 57, 1 (January 2013), 91-98. DOI=10.1147/JRD.2012.2228769

February 26: Performance Analysis of Multithreaded Programs

The Cilkprof Scalability Profiler. Tao B. Schardl, Bradley C. Kuszmaul, I-Ting Angelina Lee, William M. Leiserson, and Charles E. Leiserson. 2015. In Proceedings of the 27th ACM on Symposium on Parallelism in Algorithms and Architectures (SPAA '15). ACM, New York, NY, USA, 89-100.
A new approach for performance analysis of OpenMP programs. Xu Liu, John Mellor-Crummey, and Michael Fagan. In Proceedings of the 27th ACM International conference on supercomputing (ICS '13). ACM, New York, NY, USA, 69-80.

February 28: Data Race Detection I: Locksets and Happens-before

Eraser: A Dynamic Data Race Detector for Multithreaded Programs. Savage, S., Burrows, M., Nelson, G., Sobalvarro, P., and Anderson, T. 1997. ACM Trans. Comput. Syst. 15, 4 (Nov. 1997), 391-411. DOI= http://doi.acm.org/10.1145/265924.265927
Efficient detection of determinacy races in Cilk programs. M. Feng and C. Leiserson. In Proceedings of the Ninth Annual ACM Symposium on Parallel Algorithms and Architectures (Newport, Rhode Island, United States, June 23 - 25, 1997). SPAA '97. ACM, New York, NY, 1-11. DOI= http://doi.acm.org/10.1145/258492.258493

March 5: Data Race Detection II

Detecting data races in Cilk programs that use locks, G. Cheng, M. Feng, C.E. Leiserson, K. Randall, and A.F. Stark. In Proceedings of the Tenth Annual ACM Symposium on Parallel Algorithms and Architectures (Puerto Vallarta, Mexico, June 28 - July 02, 1998). SPAA '98. ACM, New York, NY, 298-309. DOI=http://doi.acm.org/10.1145/277651.277696
FastTrack: efficient and precise dynamic race detection. Cormac Flanagan and Stephen N. Freund. In Proceedings of the 30th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI '09). ACM, New York, NY, USA, 121-133. 2009.
ThreadSanitizer: data race detection in practice. Konstantin Serebryany and Timur Iskhodzhanov. In Proceedings of the Workshop on Binary Instrumentation and Applications (WBIA '09). ACM, New York, NY, USA, 62-71. http://doi.acm.org/10.1145/1791194.1791203

March 7: Scheduling - I (Work Stealing) - Vu Phan

Scheduling multithreaded computations by work stealing, Blumofe, Robert D. and Leiserson, Charles E. Journal of the ACM 46, 5 (Sep. 1999), 720-748. DOI=http://doi.acm.org/10.1145/324133.324234

March 19: Scheduling - II (Parallel Depth-First Scheduling)

Provably efficient scheduling for languages with fine-grained parallelism. Blelloch, G. E., Gibbons, P. B., and Matias, Y. 1995. In Proceedings of the Seventh Annual ACM Symposium on Parallel Algorithms and Architectures (Santa Barbara, California, United States, June 24 - 26, 1995). SPAA '95. ACM Press, New York, NY, 1-12.
Space-efficient implementation of nested parallelism. Girija J. Narlikar and Guy E. Blelloch. In Proceedings of the 6th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming (PPOPP '97). ACM, New York, NY, USA, 25-36.

March 21: Wait-free Synchronization

Wait-free synchronization, Maurice Herlihy, ACM Trans. Program. Lang. Syst. 13, 1 (Jan. 1991), 124-149.

March 26: Synchronization Primitives: Locks and Barriers - Srdan Milakovic

Algorithms for scalable synchronization on shared-memory multiprocessors, John Mellor-Crummey and Michael L. Scott, ACM Trans. Comput. Syst. 9, 1 (Feb. 1991), 21-65.

March 28: Synchronization on Multicore Processors - Siyu Zhu

Everything you always wanted to know about synchronization but were afraid to ask. Tudor David, Rachid Guerraoui, and Vasileios Trigonakis. In Proceedings of the 24th ACM Symposium on Operating Systems Principles (SOSP '13). ACM, New York, NY, USA, 33-48. DOI=10.1145/2517349.2522714 http://doi.acm.org/10.1145/2517349.2522714
Lock cohorting: a general technique for designing NUMA locks. David Dice, Virendra J. Marathe, and Nir Shavit. 2012. In Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming (PPoPP '12). ACM, New York, NY, USA, 247-256. DOI=10.1145/2145816.2145848 http://doi.acm.org/10.1145/2145816.2145848

April 2: Concurrent Data Structures - I - Jonathon Anderson

Simple, Fast, and Practical Non-Blocking and Blocking Concurrent Queue Algorithms. Maged M. Michael and Michael L. Scott. In Proceedings of the 15th ACM Symposium on Principles of Distributed Computing (PODC), May 1996.
Nonblocking Concurrent Objects with Condition Synchronization, William N. Scherer III and Michael L. Scott. In Proceedings of the 18th International Symposium on Distributed Computing, Amsterdam, The Netherlands, Oct, 2004.

April 4: Software Transactional Memory - John Mellor-Crummey

Software transactional memory for dynamic-sized data structures, Maurice Herlihy, Victor Luchangco, Mark Moir, and William N. Scherer, III. In Proceedings of the Twenty-Second Annual Symposium on Principles of Distributed Computing (Boston, Massachusetts, July 13 - 16, 2003). PODC '03. ACM Press, New York, NY, 92-101.
Understanding Tradeoffs in Software Transactional Memory, Dice, D. and Shavit, N. 2007. In Proceedings of the international Symposium on Code Generation and Optimization (March 11 - 14, 2007). Code Generation and Optimization. IEEE Computer Society, Washington, DC, 21-33.

April 9: Transactional Memory - Ramla Ijaz

Transactional memory. J. Larus, and C. Kozyrakis, Communications of the ACM 51, 7 (Jul. 2008), 80-88.
Transactional Memory: Architectural Support for Lock-free Data Structures, Maurice Herlihy and J. Eliot B. Moss. In Proceedings of the 20th Annual International Symposium on Computer Architecture, San Diego, California, 1993, ACM Press, New York, NY, USA, 289-300. ISCA most influential paper award, 2008.

April 11: Concurrent Data Structures - II - Qiao He

Transactional data structure libraries. Alexander Spiegelman, Guy Golan-Gueta, and Idit Keidar. In Proceedings of the 37th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI '16). ACM, New York, NY, USA, 682-696, 2016.
More than you ever wanted to know about synchronization: synchrobench, measuring the impact of the synchronization on concurrent algorithms. Vincent Gramoli. In Proceedings of the 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP 2015). ACM, New York, NY, USA, 1-10. 2015.

April 16: Speculative Execution and Transactional Memory on Blue Gene/Q

IBM Blue Gene/Q memory subsystem with speculative execution and transactional memory. M. Ohmacht, A. Wang, T. Gooding, B. Nathanson, I. Nair, G. Janssen, M. Schaal, B. Steinmacher-Burow. IBM Journal of Research and Development , 57(1/2), pp.7:1,7:12, Jan.-March 2013
Evaluation of Blue Gene/Q hardware support for transactional memories. Amy Wang, Matthew Gaudet, Peng Wu, Jose Nelson Amaral, Martin Ohmacht, Christopher Barton, Raul Silvera, and Maged Michael. In Proceedings of the 21st international conference on Parallel architectures and compilation techniques (PACT '12). ACM, New York, NY, USA, 127-136, 2012.

April 18: Lock Elision, Transactional Memory, and Performance - Hanzhang Song

Speculative lock elision: enabling highly concurrent multithreaded execution. Ravi Rajwar and James R. Goodman. In Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture (MICRO 34). IEEE Computer Society, Washington, DC, USA, 294-305. 2001.
Intel architecture instruction set extensions programming reference. Chapter 8: Intel transactional synchronization extensions. Intel Corporation. Reference Number 319433-012A. February 2012.
Quantitative comparison of hardware transactional memory for Blue Gene/Q, zEnterprise EC12, Intel Core, and POWER8. Takuya Nakaike, Rei Odaira, Matthew Gaudet, Maged M. Michael, and Hisanobu Tomari. In Proceedings of the 42nd Annual International Symposium on Computer Architecture (ISCA '15). ACM, New York, NY, USA, 144-157.

Modification History

8 January 2019 Initial version