What We Have Accomplished So Far:
1. We have designed the hardware structure of DRAM in order to suppor
the block operation. We consider the following three modes of block copy:
(i) Aligned row copy: copy one row to another row,
with the assumption that source and destination address have the same row
offset;
(ii) Unaligned row copy: this mode is to alleviate
the strict requirement of (i), and it enables systems to do block copy
so long as the block size if big enough;
(iii) Subrow copy: when row size is too large, we
find there are only a few block operations, so we use this mode to reduce
the operation unit so that block copy could work more frequently. To support
this mode, we need more complicated hardware, and that makes us suspect
it will improve the performance greatly, which will be tested by our evaluation.
2. We have defined and implemented the instructions in SimpleScalar
to support the above three modes of block operation.
3. For each of the above three modes, modify the system calls whose
performance could be improved , mainly include memcpy() and bcopy().
Update the library of SimpleScalar.
4. Building benchmarks. Although, there are few existing benchmarks
we could use to analyze our work, we did find two (even these two are not
very useful, better than nothing anyway), and successfully build them on
SimpleScalar.
Meeting Our Milestone: Our milestone is, up to now, we should have finished the implementation of the new memory instruction on the simulator and should be on the way of evaluation. Since we have finished the modification of SimpleScalar and currently working on our benchmarks, we have met our milestone.
Surprise:
1. Because of the limitation of the materials about SimpleScalar, we
have mistakenly spent too much time on hacking the source code of SimpleScalar
Glib. Later, another successful method proves it totally unnecessary.
2. Most of the popular benchmarks proved to be unsuitable for our testing.
The reasons include:
(i) There are too few block copy operations in the
benchmarks, even there exist some, the block sizes are generally too small
to use our block operation instruction, especailly for the two row copy
modes. Most of the benchmarks we could get from Spec95, and the networking
benchmarks belong to this category.
(ii) The benchmarks use some system calls, for example,
some mathematic library, X window library, which is not supported by SimpleScalar.
SPLASH and some of Spec95 belong to this category.
Revised Schedule: We should be able to achieve our modified goal following the schedule listed in our project proposal.
Resource Needed: Currently, what we need is some good benchmarks
which could be supported by SimpleScalar. Since time is limited and all
our effort on it failed, we currently just use our own benchmarks to do
the analysis.