Creating Proceedings from Postscript Files

As a Publications Chair for SOCS-3 I had to assemble the proceedings. As raw input I had a series of postscript files, with the pages unnumbered. I had to produce a document containing all the files, with numbered pages and a table of contents. I was too lazy to add an author's index, although it wouldn't have been that hard.

This task may be easy if you have the right tools, but I tried to do it using only freely available software (not only free, but also at no cost). A few attempts were needed before a decent success, so this page shares my experience.

The basic trick was to create a LaTeX document which would include the pages of all the papers as figures. For this purpose I had to slice each document into pages and to transform each page in encapsulated postscript. Here are the steps I went through; the solution may be unoptimal, but it works.

First make sure that all documents are in conforming postscript. Some would crash ghostscript, some would crash other postscript utilities. For some very stubborn documents generated with Microsoft tools I had to ask the authors to supply me a better version, as the document would even crash the printer. Apparently Windows 95's Postscript is better than Window NT's.
Make sure you have the latest versions of the psutils tools, and of ghostscript and dvips. Running with older tools didn't give usable results.
If you have some Adobe tools and non-conforming postscript, try to distill them into .pdf and back to postscript using the distill and acroexch tools (available at CMU on Solaris).
Otherwise there are ps2pdf and pdf2ps tools, but they are likely to rely on ghostscript, so they are as weak as this program.
Then I have created the following LaTeX document (I gave it a .txt suffix, so you can see it in Netscape). It uses the macro \pageimage to include page images. Notice the \epsfxsize command which is used to constrain the image to the proper size.
Then I have created a text file called order (the name is hardwired in the script too) listing all the papers. They appear in the order they should be included in the proceedings. Each paper is given by its file name only. I have added comments with # and empty lines, there's also a keyword 'stop' to debug my script.
The main job is done by the following Perl script. This script does the following operations for each paper in the 'order' file:
- It cleans the postscript using the ps2ps program. This will also crop the pages to the bounding box.
- It uses psselect to cut each document into individual pages.
- If transforms each page into an .epsi file. This is a little wasteful, because epsi is not only encapsulated postscript, but it also contains a 'preview' of the file in another format (usually .tiff). But other methods I have tried to generate .eps have failed, because apparently they rasterize the image and loose resolution.
- It will write for each page a LaTeX line to include the page as a figure.
- At the end of each document it would generate the LaTeX macro which is the page number where the document begins.
The script will generate two auxiliary latex files 'pagenumbers.tex' and 'allpapers.tex', which will be included by the main LaTeX file.
In the end, just run the script once and LaTeX once to generate the final document.
Using dvips you can get the postscript booklet.
Caveat: if some of the documents contain pages which are narrower than usual (e.g. a single column), because of the cropping and including using \epsfxsize they will be enlarged a lot. For such pages I had to manually create a new \halfpageimage macro which includes somewhat narrower figures.