Project Proposal for CS740: Computer Architecture

Fast Block Operation in DRAM

Group Member : Ningning Hu (hnn@cs.cmu.edu)
Jichuan Chang (cjc@cs.cmu.edu)
Project Home Page: http://www.cs.cmu.edu/~hnn/cs740-project.html

Project Description: In a traditional DRAM chips, an entire row of bits are read into a latch upon a RAS signal (subsequent CAS signals are used to access individual bits within this row). The latched row values must be written back to the DRAM row after each access, since reading the row is a destructive operation. We plan to modify the DRAM chip so that we can specify that the current contents of the latch be written back to an arbitrary row in the DRAM cell. This would be a variation on a RAS signal, which causes a write rather than a read of a row. If we can make all of the DRAM chips in the system do this simultaneously, we can potentially copy a whole block of bits in just two DRAM cycles, thus improve the system performance greatly. We will also consider other kinds of operation executed directly in DRAM, and try to improve their performance. Fast block operation in DRAM can move large blocks of data quickly from one region of memory to another, as well as quickly clear a large block of memory (when allocating a new page). We will study how to implement these functions in DRAM and evaluate the performance improvement.

Plan of Attack: We will first add a special instruction to implement the fast memory copy operation in SimpleScalar. Our idea is to use this instruction to replace the one in the most popular memory operations in common Glib library, such as memcpy() and memset(), and try to take full advantage of the special block operations in DRAM. When copying consecutive memory blocks, we will not read the data into register or cache, but write directly back to the destination address, using the DRAM read-write mechanism. We hope it could improve the performance of some typical memory operation so as to improve the performance of the whole system (operating system also performs memory copy operation frequently).

Schedule:

Week 1 (Oct. 20 - Oct. 26): Read papers. Install SimpleScalar. Design the hardware block diagram to implement the fast block DRAM operation.
Week 2 (Oct. 27 - Nov. 2): Use a typical benchmark to obtain the percentage of block copy operation in all memory access operations.
Week 3 (Nov. 3 - Nov. 9): Add a special instruction (for fast block copy) into SimpleScalar's ISA.
Week 4 (Nov. 10 - Nov. 16): Modify the assembly code of memcpy function in SimpleScalar's GLib.
Week 5 (Nov. 17 - Nov. 23): Evaluate the performance of new memory system using the same benchmark we used before.
Week 6 (Nov. 24 - Dec. 4): Write the final project report.

The above tasks are supposed to be done by Jichuan Chang and Ningning Hu together.

Milestone: By Nov. 20, we should have finished the implementation of the new memory instruction on the simulator and should be on the way of evaluation.

Literature Search:

David Patterson, Thomas Anderson, et al. A Case for Intelligent RAM: IRAM. IEEE Micro, April 1997.
Rosenblum, M., et. al. The Impact of Architectural Trends on Operating System Performance. 15th ACM Symposium on Operating Systems Principles, Dec. 1995.
Tulika Mitra. Dynamic Random Access Memory: A Survey. Research Proficiency Examination Report. SUNY Stony Brook, March 1999.
J. Carter, W. Hsieh, L. Stoller, et al. Impulse: Building a smarter memory controller. In Proceedings of the 5th IEEE International Symposium on High Performance Computer Architecture, Jan. 1999.
IBM Corp. Synchronous DRAMs: The DRAM of the future.
Ars Technica. RAM Guide.

Resources Needed:

SimpleScalar on Linux.
Machines : Office PC, PIII 700 MHz
Benchmarks: Spec'95

Getting Started: We have already read the related papers on advanced memory systems, and have finished part of the installation of SimpleScalar (we met some trouble when installing SimpleScalar's Gcc and Glib on Linux).