| | |
Tycon Mismatch was our team for the ICFP 2001 Programming Contest. Most of us are (current or former) developers for the TILT project at Carnegie Mellon University.
I believe we all put in at least one all-nighter, but many of us
were not working on the contest the whole weekend.
The contest task
was to write an optimizer for an HTML-like markup language. We had to
take a well-formed input and produce a smaller (hopefully) document
with the same meaning. This task is quite nontrival, particularly due
to tags like <PL>, which cancels out other style-setting tags.
There seem to be two reasonable strategies for this task. One is to
do local optimizations to the document that we're given, in the style
of the examples given in the task description. The other is to compute
the meaning of the document and then try to reconstruct that as
efficiently as possible.
Our original intention was to do a hybrid of these two methods. We
planned to attempt local optimizations on the original document and a
reencoded one, and after a few rounds of optimizations abandon one to
focus our efforts on whichever seemed more promising. The reencoding
optimization was risky, though, so we all wanted to work on local
optimization passes for the first few days. An hour before the contest
deadline, we decided to submit two separate entries for the two
approaches. Both are written in SML, and were compiled for speed
with MLton.
(\x.x)
This was our premiere entry. It consisted of the following
optimization passes:
WhiteSpace - removes redundant whitespace
Space pushing - rearranges spaces in a "desirable" way
Tag flattening - optimizes the internal ADT (flattens lists)
Scoped Elim - removes redundant tags (as in task description)
Useless - removes tags that set attributes that are never used
Overlap inversion - see task description
Color Nesting - see task description
Hoist - factors out commuting tags which are repeated in sequence (ie <6><B>x</B></6><5><B>y</B></5> to <B><6>x</6><5>y</5></B>)
Joe's Redund - another pass at redundant tag removal
Attrthenpl - Special case of Useless
Fusetags - removes <B></B> and </B><B>
<PL> Shortcut - see task description
Color/Size Redund - special case of Scoped Elim
Some of these passes are well suited to a tree representation
(especially Useless, Hoist, and Scoped Elim); others are much easier
in a linear representation. Consequently, we have two intermediate
languages which the optimization driver automatically converts
between.
Our entry last year was disqualified for being incorrect, so we
really wanted to have a functioning program this year. Owing to our
work in certifying compilers, we knew that mechanically checking our
work would effectively prevent mistakes. To do so, we run a semantic
validator against the intermediate expressions after each optimization
pass. Rogue, malfunctioning optimizations are dynamically disabled
(with a nod to Camls 'R Us
'99), and non-optimizations (those passes that make the tree
larger!) are ignored. Our program spends most of its time verifying
its own work. Since we have two intermediate languages, our program
actually has two separate semantic checkers, giving even more
redundancy!
The judges have begun posting preliminary results, and this entry
seems to be reasonably competitive. It was not eliminated.
It ties for first place on the trivial inputs, takes first place on
input #5 ("random") by a pretty significant margin, and lands around
8th place for the other tests.
Grab our submission for more
details. (Also included are some fun tools.)
| |
reencode
This entry attempts to build the optimal document to represent a
particular meaning, using dynamic programming. We were really
concerned about running out of time when doing this (and rightfully
so; even though the algorithm is polynomial time and space, the
constants are large), so reencode is only a loose approximation
of the optimal solution. It would have been best suited to be used as
described in our "initial plan". From our readme:
This program uses a randomized dynamic programming algorithm to recode
the input document from scratch. It is a modified version of (what is
believed to be) an O(n3) optimal solution (whose constant
is too large). It instead runs in O(n log n) time, and produces only
approximate results.
Bundled with the reencode are a handful of our other optimizations
(Whitespace, Redundant, Useless, and Hoist) to further tweak the
approximate result.
Unfortunately, this optimization ran out of time on the giant
"exhaustive" test case, and was disqualified. I'm pretty sure this was
running fine on our 5mb test files (in 3 minutes), so there must be
something special about "exhaustive" which hurt us. It didn't perform
as well as (\x.x) on the other tests, anyway.
Grab our submission for more
details.
| |
We had fun, and we'll see you next year!
|
|