Assignment: SMT

Due: Fri, 27 Jan 2017 23:59:00 -0500

First, work through the Z3Py guide.

Then for each question below, you will create one Python file named “q#.py” (e.g., for Question 1 below, your answer should be in the file “q1.py”). Please comment your code sufficiently, so that I understand the reasoning behind your solution. If you choose to do any of the challenge questions, please put your solutions in “challenge#.py”. These challenge questions may also serve as inspiration for final projects.

Please submit all of your files via Blackboard as a single archive by running “tar cvzf smt.tgz *.py”.

Division by constants

The Jitk paper from OSDI’14 (Figure 11; Section 3.2.2) describes an incorrect optimization in the Linux kernel. Consider unsigned integer division n / k, where n is a 32-bit unsigned integer and k is a 32-bit unsigned constant. The incorrect optimization tries to avoid such a division on the critical path by using two steps:

precompute a 32-bit unsigned integer R using compute_R; and
compute the result using reciprocal_divide(n, R), which only uses multiplication and shift.

In particular, R is precomputed as:

uint32_t compute_R(uint32_t k) 
{
	uint64_t k64 = (uint64_t)k;
	return (uint32_t)(((UINT64_C(1) << 32) + (k64 - 1)) / k64);
}

and reciprocal_divide is implemented as extracting the high 32 bits of the 64-bit product of n and R:

uint32_t reciprocal_divide(uint32_t n, uint32_t R)
{
	return (uint32_t)(((uint64_t)n * (uint64_t)R) >> 32);
}

Take 42 / 3 as an example: precomputing R gives ((1ULL << 32) + (3 - 1)) / 3 = 1431655766 and reciprocal_divide(42, 1431655766) gives 14. Unfortunately, this optimization is buggy on some n and k, as described by commit aee636c4.

Question

Is this optimization correct for k = 3? If so, prove it. If not, find a counter example.

Figure 10-45 of Hacker’s Delight describes the following optimization that computes n / k for the k = 3 case:

uint32_t divu3(uint32_t n)
{
	uint32_t r = (0x55555555 * n + (n >> 1) - (n >> 3)) >> 30;
	return (n - r) * 0xaaaaaaab;
}

Question

Is this optimization correct? If so, prove it. If not, find a counter example.

Challenge

The paper Division by Invariant Integers Using Multiplication from PLDI’94 describes a correct optimization that computes n / k for a general k. This is implemented by Linux kernel’s commit 809fa972 to fix the earlier bug. In this commit correct? Verify its correctness or find a counter example.

As a reference, see similar optimizations and proofs in CompCert.

Bitwise operations

OpenSSL is (in)famous for using extensive amounts of assembly code (some generated by hand, some by scripts) to squeeze out every bit of performance they can. Take the code for computing the SHA-256 hash function on ARM.

The SHA specification defines a Ch function as:

Ch(x, y, z) = (x & y) ^ (~x & z)

i.e., as x AND y exclusive-or with not x AND z. If you look carefully at the OpenSSL code, you’ll see that they compute it as:

Ch(x, y, z) = ((y ^ z) & x) ^ z

Question

Prove that OpenSSL’s optimization is correct for all 32-bit values x, y, z. In the comments of your code, please explain why OpenSSL computes it this way, rather than according to the original specification.

Another example is the SHA function Maj which computes a bitwise majority function:

Maj(x, y, z) = (x & y) ^ (x & z) ^ (y & z)

which OpenSSL computes as:

Maj(x, y, z) = ((y ^ z) & (x ^ y)) ^ y

Question

Prove that OpenSSL’s optimization is correct for all 32-bit values x, y, z.

Finally, ARM supports an “inline barrel shifter”, meaning that the arguments to many instructions can be rotated or shifted essentially for free. As a result, when the SHA specification says to compute:

Sigma1(x) = RotateRight(x, 6) ^ RotateRight(x, 11) ^ RotateRight(x, 25)

OpenSSL computes it lazily as:

Sigma1(x) = RotateRight((x ^ RotateRight(x, 5)) ^ RotateRight(x, 19), 6)

Question

Prove that OpenSSL’s optimization is correct (i.e., prove that xor commutes with rotates).

Challenge

Each of these optimizations is quite clever, but someone had to come up with them. Design and implement a tool to discover optimizations like these automatically or prove that no further optimization is possible. That is, given a specification of a bitwise function to compute, and a set of instructions available on a target architecture, find the minimum number of instructions necessary to compute the function. You may want to search the Internet for existing work on “super optimizers”.

Network diagnosis

Challenge

Design and implement a network diagnosis tool that resembles Anteater from SIGCOMM’11 using SMT. Your tool should work for a network as shown in Figure 3 of the paper, for example, to find loops in a network. You may design an input file format or hard-code the input in code.

Hash collision generation

Challenge

Use an SMT solver to generate MD4 hash collisions. You may find the following references useful.

Applications of SAT Solvers to Cryptanalysis of Hash Functions, from SAT’06.
MD5 and MD4 Collision Generators.