Performance Evaluation of Meld
Development of distributed and parallel programs requires efficient simulation tools that can be used during all steps of the development cycle. More precisely, simulation is particularly useful when the execution platform does not exist yet or simply when it does not scale up to the number of requested processing units, which is particularly true in the Claytronics project.
Performance bottlenecks are still difficult to detect. On the one hand, tracing an execution on the Blinky Blocks hardware faces a number of problems: scarcity of memory resources prevents logging of performance data, and the execution is likely to be affected if traces are sent to an external storage resource via a central point. Also, a special API would need to be developed as standard performance tuning and tracing frameworks cannot be used on the Blinky Blocks hardware.
On the other hand, the Blinky Blocks simulator and DPRSim actually execute the Meld programs but simulate communications and the physical environment. This means that the simulated time of a Meld program does not match the real execution time as embedded processors are much slower than generic CPUs.
Therefore, performance metrics cannot be derived from the simulation. Also, the control flow of a simulated application can mismatch a real execution as, for example, during a simulation, messages can be read before they would have been received during a real execution.
This part of the Claytronics project aims to add performance metrics to the Blinky Blocks Simulator as well as DPRSim to ease performance tuning of the compiler and improve the virtual machine as well as Meld programs. We are using a two-step approach: first, every Meld operation is benchmarked on the Blinky Blocks hardware platform, second these performance data are fed back into the simulators. They will be used to eventually modify the control flow of the program, to match a real execution and also to report performance metrics of the Meld application.