Jaunary 10:
Simultaneous Multithreading and the Case for Chip Multiprocessing - No
class meeting
Simultaneous multithreading: maximizing on-chip
parallelism.
Dean M. Tullsen, Susan J. Eggers, and Henry M. Levy.
In Proceedings of the 22nd annual international symposium
on Computer architecture (ISCA '95). ACM, New York, NY, USA,
392-403.
1995.
DOI=http://dx.doi.org/10.1145/223982.224449
Chapter 5: Simultaneous Multithreading.
Multithreading Architecture,
Mario Nemirovsky, Dean M. Tullsen.
Synthesis Lectures on Computer Architecture. Morgan Claypool.
2013. DOI=https://doi.org/10.2200/S00458ED1V01Y201212CAC021
A single-chip multiprocessor,
Lance Hammond, Basem Nayfeh, Kunle Olukotun.
Computer 30(9):79-85, September
1997. DOI=http://dx.doi.org/10.1109/2.612253
January 15: Fine-grain Multithreading - John Mellor-Crummey
Niagara: A 32-Way Multithreaded SPARC Processor,
Poonacha Kongetira, Kathirgamar Aingaran, and Kunle Olukotun,
IEEE Micro, pp. 21-29, March-April 2005.
https://ieeexplore.ieee.org/document/1453485.
ELDORADO.
John Feo, David Harper, Simon Kahan, Petr Konecny.
In Proceedings of the 2nd Conference on Computing Frontiers (Ischia,
Italy, May 04 - 06, 2005). CF '05. ACM, New York, NY, 28-34.
January 22: Cache Coherence Protocols - I - Avery Whitaker
Chapter 6: Coherence Protocols; Chapter 7 Snooping Coherence Protocols;
Chapter 8: Directory Coherence Protocols.
A Primer on Memory Consistency and Cache Coherence
Daniel J. Sorin, Mark D. Hill, David A. Wood
Synthesis Lectures on Computer Architecture. Morgan Claypool.
2011.
Tardis: Time Traveling Coherence Algorithm for Distributed Shared Memory.
Xiangyao Yu and Srinivas Devadas. 2015. In Proceedings of
the 2015 International Conference on Parallel Architecture and
Compilation (PACT) (PACT '15). IEEE Computer Society, Washington, DC,
USA, 227-240. DOI=http://dx.doi.org/10.1109/PACT.2015.12
January 29: IBM Power7
IBM POWER7 multicore server processor. Sinharoy, B.; Kalla, R.; Starke, W. J.; Le, H. Q.; Cargnoni, R.; Van Norstrand, J. A.; Ronchetti, B. J.; Stuecheli, J.; Leenstra, J.; Guthrie, G. L.; Nguyen, D. Q.; Blaner, B.; Marino, C. F.; Retter, E.; Williams, P. IBM Journal of Research and Development 55(3), May-June 2011, 1:1-1:29. http://dx.doi.org/10.1147/JRD.2011.2127330
IBM POWER7 performance modeling, verification, and evaluation
Srinivas, M.; Sinharoy, B.; Eickemeyer, R. J.; Raghavan, R.; Kunkel, S.; Chen, T.; Maron, W.; Flemming, D.; Blanchard, A.; Seshadri, P.; Kellington, J. W.; Mericas, A.; Petruski, A. E.; Indukuru, V. R.; Reyes, S.
IBM Journal of Research and Development 55(3), May-June 2011, 4:1-4:19.
DOI=https://doi.org/10.1147/JRD.2011.2147170.
Chapter 4: Total Store Order and the x86 Memory Model; Chapter 5:
Relaxed Memory Consistency.
A Primer on Memory Consistency and Cache Coherence
Daniel J. Sorin, Mark D. Hill, David A. Wood
Synthesis Lectures on Computer Architecture. Morgan Claypool.
2011.
February 12: Java Memory Model - Keren Zhou
The Java Memory Model, J. Manson, W. Pugh, and S. V. Adve.
In Proceedings of the Symposium on Principles of Programming Languages (PoPL), January 2005.
February 14: C++ Concurrency Memory Model - Brett Zibilich
Foundations of the C++ Concurrency Memory Model,
H. Boehm, and S. V. Adve.
In Proceedings of the 2008 ACM SIGPLAN Conference on Programming Language Design and Implementation (Tucson, AZ, USA, June 07 - 13, 2008). PLDI '08. ACM, New York, NY, 68-78. DOI= http://doi.acm.org/10.1145/1375581.1375591
February 19: Programming Models: Cilk and Cilk++ - Advait Balaji
The Implementation of the Cilk-5 Multithreaded Language
by Matteo Frigo, Charles E. Leiserson, and Keith H. Randall.
1998 ACM SIGPLAN Conference on Programming Language Design and
Implementation (PLDI), Montreal, Canada, June 1998.
Reducers and other Cilk++ hyperobjects.
M. Frigo, P. Halpern, C.E. Leiserson, and S. Lewin-Berlin. In Proceedings
of the Twenty-First Annual Symposium on Parallelism in Algorithms and
Architectures (Calgary, AB, Canada, August 11 - 13, 2009). SPAA
'09. ACM, New York, NY, 79-90.
February 21: Programming Models: Thread Building Blocks and OpenMP
February 26: Performance Analysis of Multithreaded Programs
The Cilkprof Scalability Profiler.
Tao B. Schardl, Bradley C. Kuszmaul, I-Ting Angelina Lee, William
M. Leiserson, and Charles E. Leiserson. 2015.
In Proceedings of the 27th ACM on Symposium on Parallelism
in Algorithms and Architectures (SPAA '15). ACM, New York, NY, USA,
89-100.
Efficient detection of determinacy races in Cilk programs.
M. Feng and C. Leiserson.
In Proceedings of the Ninth Annual ACM
Symposium on Parallel Algorithms and Architectures (Newport, Rhode
Island, United States, June 23 - 25, 1997). SPAA '97. ACM, New York,
NY, 1-11. DOI= http://doi.acm.org/10.1145/258492.258493
March 5: Data Race Detection II
Detecting data races in Cilk programs that use locks,
G. Cheng, M. Feng, C.E. Leiserson, K. Randall, and A.F. Stark.
In Proceedings of the Tenth Annual ACM Symposium on Parallel Algorithms
and Architectures (Puerto Vallarta, Mexico, June 28 - July 02,
1998). SPAA '98. ACM, New York, NY,
298-309. DOI=http://doi.acm.org/10.1145/277651.277696
FastTrack: efficient and precise dynamic race detection.
Cormac Flanagan and Stephen N. Freund.
In Proceedings of the 30th ACM SIGPLAN
Conference on Programming Language Design and Implementation (PLDI
'09). ACM, New York, NY, USA, 121-133. 2009.
ThreadSanitizer: data race detection in practice.
Konstantin Serebryany and Timur Iskhodzhanov.
In Proceedings of the Workshop on
Binary Instrumentation and Applications (WBIA '09). ACM, New York, NY,
USA, 62-71. http://doi.acm.org/10.1145/1791194.1791203
March 19: Scheduling - II (Parallel Depth-First Scheduling)
Provably efficient scheduling for languages with fine-grained parallelism.
Blelloch, G. E., Gibbons, P. B., and Matias, Y. 1995.
In Proceedings of the Seventh Annual ACM Symposium on Parallel Algorithms
and Architectures (Santa Barbara, California, United States, June 24 -
26, 1995). SPAA '95. ACM Press, New York, NY, 1-12.
Space-efficient implementation of nested parallelism.
Girija J. Narlikar and Guy E. Blelloch.
In Proceedings of the 6th ACM
SIGPLAN symposium on Principles and Practice of Parallel Programming
(PPOPP '97). ACM, New York, NY, USA, 25-36.
Lock cohorting: a general technique for designing NUMA
locks. David Dice, Virendra J. Marathe, and Nir Shavit. 2012. In
Proceedings of the 17th ACM SIGPLAN symposium on Principles and
Practice of Parallel Programming (PPoPP '12). ACM, New York, NY, USA,
247-256. DOI=10.1145/2145816.2145848
http://doi.acm.org/10.1145/2145816.2145848
April 2: Concurrent Data Structures - I - Jonathon Anderson
April 4: Software Transactional Memory - John Mellor-Crummey
Software transactional memory for dynamic-sized data
structures,
Maurice Herlihy, Victor Luchangco, Mark Moir, and William N. Scherer,
III. In Proceedings of the Twenty-Second Annual Symposium on
Principles of Distributed Computing (Boston, Massachusetts, July 13 -
16, 2003). PODC '03. ACM Press, New York, NY, 92-101.
Understanding Tradeoffs in Software Transactional Memory, Dice, D. and Shavit, N. 2007.
In Proceedings of the international Symposium on
Code Generation and Optimization (March 11 - 14, 2007). Code
Generation and Optimization. IEEE Computer Society, Washington, DC,
21-33.
April 9: Transactional Memory - Ramla Ijaz
Transactional memory. J. Larus, and C. Kozyrakis, Communications of
the ACM 51, 7 (Jul. 2008), 80-88.
Transactional
Memory: Architectural Support for Lock-free Data Structures,
Maurice Herlihy and J. Eliot B. Moss. In Proceedings of the 20th Annual
International Symposium on Computer Architecture, San Diego,
California, 1993, ACM Press, New York, NY, USA, 289-300.
ISCA most influential paper award, 2008.
April 11: Concurrent Data Structures - II - Qiao He
Transactional
data structure libraries. Alexander Spiegelman, Guy Golan-Gueta,
and Idit Keidar. In Proceedings of the 37th ACM SIGPLAN Conference on
Programming Language Design and Implementation (PLDI '16). ACM, New
York, NY, USA, 682-696, 2016.
Evaluation of Blue Gene/Q hardware support for transactional
memories.
Amy Wang, Matthew Gaudet, Peng Wu, Jose Nelson Amaral, Martin Ohmacht,
Christopher Barton, Raul Silvera, and Maged Michael.
In Proceedings of the 21st international conference on Parallel
architectures and compilation techniques (PACT '12). ACM, New York,
NY, USA, 127-136, 2012.
April 18: Lock Elision, Transactional Memory, and Performance -
Hanzhang Song