HTML Parsers
Most of the parsers used by the WebWatcher are variations of the same
lex source file. Below, I list parsers we either already have or plan
to contribute soon, but rolling your own using the lex source file.
All compiled programs have the following usage:
% parser <URL> < <HTML source>
Parsers based on Dayne's parser written in lex
- A parser
(for Sun4) that returns the URLs in a page, one per line, and
source.
- A parser that returns the title of an html page, and source.
- A parser that returns the text of links on a page, and source.
Last modified by: Dayne, 24-5-95