This description is excerpted from a public message Rocke Verser sent to the des-coding mailing list on March 3, 1997 (here's the original).
[Darrell Kindred's bitslice DES info]
Here is a *reasonably complete* description of the algorithm currently used by DESCHALL. It omits some fine detail. The purpose is illustration! [I am not responsible for typographic errors.]
DESCHALL clients work like this:
1. DESCHALL currently tests keys in blocks of 256 pairs at a time. [512 keys are tested in the algorithm that follows.]
2. Bits 43, 46, 50, and 52, which are not used during Round 1 are 4 of the bits allowed to vary while processing each block.
3. Bits 51, 54, 58, and 60, which are not used during Round 16 are the other 4 bits allowed to vary while processing each block.
4. Holding the other 48 key bits constant, we vary the 8 bits noted in steps 2 and 3.
5. Noting that 4 of the 8 bits are not used in Round 1, we precompute and store a table of all possible Round 1 outputs. (Including the Round 1 outputs of the complements, the table contains *32* 32-bit values [...])
6. Noting that the same 4 bits are not used by S-boxes 6-7-8 in Round 2, we precompute and store a table of Round 2 partial outputs. (Once again, the table contains *32* 32-bit values [...])
7. To be perfectly clear, the stored values from steps 5 and 6 depend solely on bits 51, 54, 58, and 60. One set is computed and stored using the known plaintext one set is computed and stored using the complement of the known plaintext.
8. Now there will be two nested loops. The outer loop will vary bits 43, 46, 50, and 52. The inner loop will vary bits 51, 54, 58, 60, and C. [C is not really a bit, but denotes whether we are currently processing the assigned key or the complement of the assigned key.]
9. Before we enter the inner loop, and noting that bits 51, 54, 58, and 60 are not used in Round 16, we precompute and store the output of this round applied to the known ciphertext (and the output for the complementary ciphertext). (The table contains *2* 32-bit words.)
10. Noting that the same 4 bits are not used by S-boxes 1-4-5-6-8 in Round 15, we precompute and store the Round 15 partial output (and the partial output for the complementary ciphertext). (The table contains *2* 32-bit words.)
11. Now we enter the inner loop.
12. Fetch the stored value from the Round 16 decipher output.
13. Fetch the stored value from the Round 15 partial decipher output.
14. Complete processing of round 15. (This involves applying S-boxes 2-3-7.)
15. Perform Rounds 14 through 5.
16. Perform a partial Round 4, using *only* S-boxes 6-7-8.
17. Compare the 12-bit output of Round 4 with the precomputed and stored 12-bit partial output of Round 2. If no match, this cannot be the correct key. GOTO 24
[We get past this point, on average, once every 4096 keys. Performance is not critical.]
18. Complete Round 4, by applying S-boxes 1-2-3-4-5.
19. Perform Round 3
20. Compare the 32-bit output of Round 3 with the precomputed and stored 32-bit output of Round 1. If no match, this cannot be the correct key. GOTO 24
[We get to this point, on average, once every 2^44 keys. Performance is moot.]
21. Log the "partial-match".
22. Perform Round 2
23. Compare the 32-bit output of Round 2 with the 32-bit known plaintext. If no match, GOTO 24
[We get to this point, on average, once every 2^64 times or if we find the correct key. Log the probable key and terminate.]
24. If C=0 (meaning we just tested the assigned key), set C=1. Set to use the "complementary" table entries stored in steps 5, 6, 9, and 10. GOTO 12.
25. Set C=0. [We just checked a complementary key. Now we'll adjust the key schedule and check another assigned key.]
26. Perform key schedule update. This involves toggling exactly 1 bit of the DES key, and applying this toggle to selected DES Key Schedules entries and S-boxes. [The full key schedule was generated somewhat prior to step #1.]
27. If we toggled one of the bits 51, 54, 58, 60, we iterate the inner loop. GOTO 12.
28. If we toggled one of the bits 43, 46, 50, 52, we iterate the outer loop. GOTO 9.
29. Otherwise, we have toggled 8 bits, and tested the assigned key and the complements of the assigned keys. [We have tested 512 keys in this block.]
30. Getting started on the next block is left as an exercise for the reader. Not much more is required that toggling exactly one bit in the key schedule, and proceeding to step 1.
Perfomance tally:
Round 1 - computed 32 times for each 512 keys
Round 2 - 3/8 of a round is computed 32 times for each 512 keys
Round 16 - computed 2 times for each 32 keys
Round 15 - 5/8 of a round is computed 2 times for each 32 keys
Round 15 - 3/8 of a round is computed for each key
Round 14 - computed once per key
Round 13 - computed once per key
Round 12 - computed once per key
Round 11 - computed once per key
Round 10 - computed once per key
Round 9 - computed once per key
Round 8 - computed once per key
Round 7 - computed once per key
Round 6 - computed once per key
Round 5 - computed once per key
Round 4 - 5/8 of a round is computed for each key
Round 4 - 3/8 of a round is computed about 1 in 4096 keys
Round 3 - computed about 1 in 4096 keys
Round 2 - computed about 1 in 2^44 keys
Total number of rounds performed per key, on average: 11.2
Note that much time can be saved in key schedule maintenance, which I glossed over for the sake of clarity. Here are the rules that DESCHALL uses for when and how much key schedule processing to perform.
U1. The key schedule for Round 1 and for Round 2 S6-7-8 is updated as the table of stored values are built in steps 5 and 6, above. (One set of updates per 32 keys, on average.)
U2. The key schedule for Round 2, S6-7-8 is also updated just prior to step 22. (Once per 2^44 keys, on average.)
U3. Note that it is not necessary to maintain the key schedules for Rounds 1 and 2 in the inner or outer loops.
U4. The key schedule for Round 3 and for Round 4 S6-7-8 is only updated immediately prior to step 18. (Once per 4096 keys, on average.)
U5. Note that it is not necessary to maintain the key schedule for Round 3 in the inner or outer loops.
U6. Note that it is not necessary to maintain the key schedule for Round 4 S6-7-8 in the inner or outer loops.
U7. Note that when changing from the assigned key to the complement of the assigned key, no key schedule maintenance is required.
U8. Note that the way we structured our loops, no key schedule maintenance for Round 16 or for Round 15 S-1-4-5-6-8 is performed in the inner loop.
U9. The inner loop toggles DES keys bits 51, 54, 58, and 60. After applying the rules U1-U7, note which bit requires the fewest changes to the key schedules. Among these 4 bits, this bit should be varied most rapidly. The bit that requires the most updates to the key schedules should be varied least rapidly. [With this particular incarnation of DESCHALL, bit 58 requires 8 bits be toggled in various key schedule entries. Bits 51 and 54 require 10 toggles each in the various key schedule entries. Bit 60 requires 11 toggles to the key schedule entries.]