The purpose of Lab 1 was to help you understand the relationship between high--level C code and the machine code that actually runs. Most students worked very hard, and although a few did exceptional work, in general I was disappointed with the overall quality of the labs. Here are some general comments to keep in mind for future labs. There is also a summary of the grades at the end.
This was a difficult lab, but few students came to me (or the other teaching staff) for help. This was probably because many students started too late. The lesson is: start your labs early, and don't be afraid to ask for help. If the office hours don't fit your schedule, demand an appointment!
#nops time 0 10 (Mips anomaly discussed in the newsgroup) 1 9 2 10 (here is where linearity starts) 3 11 4 12
(a) Original C code | (b) Promoting z[r] to a register | (c) Eliminating an index update | |
---|---|---|---|
int r, ci; for (r = 0; r < M->nrow; r++) { z[r] = 0.0; for (ci = M->rstart[r]; ci < M->rstart[r+1]; ci++) { z[r] += M->val[ci] * x[M->cindex[ci]]; } } |
for (r = 0; r < M->nrow; r++) { ftype_t temp = 0.0; for (ci = M->rstart[r]; ci < M->rstart[r+1]; ci++) { temp += M->val[ci] * x[M->cindex[ci]]; } z[r] = temp; } |
int r; ftype_t *val = M->val; int *cindex_start = M->cindex; int *cindex = M->cindex; int *rnstart = M->rstart+1; for (r = 0; r < M->nrow; r++) { ftype_t temp = 0.0; int *cindex_end = cindex_start + *(rnstart++); while (cindex < cindex_end) { temp += *(val++) * x[*cindex++]; } z[r] = temp; } | |
0x4c: lw v1,16(t0) 0x50: sll v0,a3,2 0x54: addu v0,v0,v1 0x58: lw v1,12(t0) 0x5c: lw v0,0(v0) 0x60: addu a0,a0,v1 0x64: sll v0,v0,3 0x68: addu v0,v0,a1 0x6c: lwc1 $f2,0(a0) 0x70: lwc1 $f3,4(a0) 0x74: lwc1 $f0,0(v0) 0x78: lwc1 $f1,4(v0) 0x7c: nop 0x80: mul.d $f2,$f2,$f0 0x84: lwc1 $f0,0(t1) 0x88: lwc1 $f1,4(t1) 0x8c: nop 0x90: add.d $f0,$f0,$f2 0x94: swc1 $f0,0(t1) 0x98: swc1 $f1,4(t1) 0x9c: lw v0,20(t0) 0xa0: nop 0xa4: addu v0,t2,v0 0xa8: lw v0,4(v0) 0xac: addiu a3,a3,1 0xb0: slt v0,a3,v0 0xb4: bne v0,zero,0x4c 0xb8: sll a0,a3,3 |
0x5c: lw v0,0(v1) 0x60: lwc1 $f2,0(t0) 0x64: lwc1 $f3,4(t0) 0x68: sll v0,v0,3 0x6c: addu v0,v0,a1 0x70: lwc1 $f0,0(v0) 0x74: lwc1 $f1,4(v0) 0x78: nop 0x7c: mul.d $f2,$f2,$f0 0x80: addiu t0,t0,8 0x84: addiu v1,v1,4 0x88: addiu a3,a3,1 0x8c: slt v0,a3,t2 0x90: bne v0,zero,0x5c 0x94: add.d $f4,$f4,$f2 |
0x48: lw v0,0(v1) 0x4c: lwc1 $f2,0(t0) 0x50: lwc1 $f3,4(t0) 0x54: sll v0,v0,3 0x58: addu v0,v0,a1 0x5c: lwc1 $f0,0(v0) 0x60: lwc1 $f1,4(v0) 0x64: nop 0x68: mul.d $f2,$f2,$f0 0x6c: addiu v1,v1,4 0x70: addiu t0,t0,8 0x74: sltu v0,v1,a3 0x78: bne v0,zero,0x48 0x7c: add.d $f4,$f4,$f2 |
Figure 1: Optimizing C code
Figure 2: Lab 1 grades