HTML Parsing Utilities Page

HTML Parsers

Most of the parsers used by the WebWatcher are variations of the same lex source file. Below, I list parsers we either already have or plan to contribute soon, but rolling your own using the lex source file.

All compiled programs have the following usage:

   % parser <URL> < <HTML source>

Parsers based on Dayne's parser written in lex

A parser (for Sun4) that returns the URLs in a page, one per line, and source.
A parser that returns the title of an html page, and source.
A parser that returns the text of links on a page, and source.

Last modified by: Dayne, 24-5-95