Due: Fri, 27 Jan 2017 23:59:00 -0500
First, work through the Z3Py guide.
Then for each question below, you will create one Python file named “q#.py” (e.g., for Question 1 below, your answer should be in the file “q1.py”). Please comment your code sufficiently, so that I understand the reasoning behind your solution. If you choose to do any of the challenge questions, please put your solutions in “challenge#.py”. These challenge questions may also serve as inspiration for final projects.
Please submit all of your files via Blackboard as a single archive by running “tar cvzf smt.tgz *.py”.
The Jitk paper from OSDI’14
(Figure 11; Section 3.2.2) describes an incorrect optimization in the Linux kernel.
Consider unsigned integer division n / k
,
where n
is a 32-bit unsigned integer and k
is a 32-bit unsigned constant.
The incorrect optimization tries to avoid such a division on the critical path
by using two steps:
R
using compute_R
; andreciprocal_divide(n, R)
, which only uses multiplication and shift.In particular,
R
is precomputed as:
and reciprocal_divide
is implemented as extracting the high 32
bits of the 64-bit product of n
and R
:
Take 42 / 3
as an example:
precomputing R
gives ((1ULL << 32) + (3 - 1)) / 3 = 1431655766
and
reciprocal_divide(42, 1431655766)
gives 14
.
Unfortunately, this optimization is buggy on some n
and k
,
as described by commit aee636c4.
Is this optimization correct for k = 3
?
If so, prove it. If not, find a counter example.
Figure 10-45 of Hacker’s Delight
describes the following optimization that computes n / k
for the k = 3
case:
Is this optimization correct? If so, prove it. If not, find a counter example.
The paper
Division by Invariant Integers Using Multiplication
from PLDI’94 describes a correct optimization that computes n / k
for a general k
.
This is implemented by Linux kernel’s
commit 809fa972
to fix the earlier bug.
In this commit correct? Verify its correctness or find a counter example.
As a reference, see similar optimizations and proofs in CompCert.
OpenSSL is (in)famous for using extensive amounts of assembly code (some generated by hand, some by scripts) to squeeze out every bit of performance they can. Take the code for computing the SHA-256 hash function on ARM.
The SHA specification
defines a Ch
function as:
i.e., as x
AND y
exclusive-or with not x
AND z
.
If you look carefully at the OpenSSL code, you’ll see that they compute it as:
Prove that OpenSSL’s optimization is correct for all 32-bit values x, y, z
.
In the comments of your code, please explain why OpenSSL computes it this way,
rather than according to the original specification.
Another example is the SHA function Maj
which computes a bitwise majority function:
which OpenSSL computes as:
Prove that OpenSSL’s optimization is correct for all 32-bit values x, y, z
.
Finally, ARM supports an “inline barrel shifter”, meaning that the arguments to many instructions can be rotated or shifted essentially for free. As a result, when the SHA specification says to compute:
OpenSSL computes it lazily as:
Prove that OpenSSL’s optimization is correct (i.e., prove that xor commutes with rotates).
Each of these optimizations is quite clever, but someone had to come up with them. Design and implement a tool to discover optimizations like these automatically or prove that no further optimization is possible. That is, given a specification of a bitwise function to compute, and a set of instructions available on a target architecture, find the minimum number of instructions necessary to compute the function. You may want to search the Internet for existing work on “super optimizers”.
Design and implement a network diagnosis tool that resembles Anteater from SIGCOMM’11 using SMT. Your tool should work for a network as shown in Figure 3 of the paper, for example, to find loops in a network. You may design an input file format or hard-code the input in code.
Use an SMT solver to generate MD4 hash collisions. You may find the following references useful.