org.htmlparser.tests.utilTests
Class CharacterTranslationTest.Generate

java.lang.Object
  extended byorg.htmlparser.tests.utilTests.CharacterTranslationTest.Generate
Enclosing class:
CharacterTranslationTest

public class CharacterTranslationTest.Generate
extends java.lang.Object

Create a character reference translation class source file. Usage:

     java -classpath .:lib/htmlparser.jar Generate > Translate.java
 
Derived from HTMLStringFilter.java provided as an example with the htmlparser.jar file available at htmlparser.sourceforge.net written by Somik Raha ( somik@industriallogic. com http://industriallogic.com).


Field Summary
protected  Parser mParser
          The working parser.
protected  java.lang.String nl
           
 
Constructor Summary
CharacterTranslationTest.Generate()
          Create a Generate object.
 
Method Summary
 void extract(java.lang.String string, java.io.PrintWriter out)
          Parse the sgml declaration for character entity reference name, equivalent numeric character reference and a comment.
 void gather(Node node, java.lang.StringBuffer buffer)
           
 int indexOfWhitespace(java.lang.String string, int index)
          Find the lowest index of whitespace (space or newline).
 java.lang.String pack(java.lang.String string)
          Rewrite the comment string.
 java.lang.String pad(java.lang.String string, char character, int length)
          Pad a string on the left with the given character to the length specified.
 void parse(java.io.PrintWriter out)
          Pull out text elements from the HTML.
 java.lang.String pretty(java.lang.String string)
          Pretty up a comment string.
 void sgml(java.lang.String string, java.io.PrintWriter out)
          Extract special characters.
 java.lang.String translate(java.lang.String string)
          Translate character references.
 java.lang.String unicode(java.lang.String string)
          Convert the textual representation of the numeric character reference to a character.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

mParser

protected Parser mParser
The working parser.


nl

protected java.lang.String nl
Constructor Detail

CharacterTranslationTest.Generate

public CharacterTranslationTest.Generate()
                                  throws ParserException
Create a Generate object. Sets up the generation by creating a new Parser pointed at http://www.w3.org/TR/REC-html40/sgml/entities.html with the standard scanners registered.

Method Detail

translate

public java.lang.String translate(java.lang.String string)
Translate character references. After generating the Translate class we could use it to do this job, but that would involve a bootstrap problem, so this method does the reference conversion for a very tiny subset (enough to understand the w3.org page).

Parameters:
string - The raw string.
Returns:
The string with character references fixed.

gather

public void gather(Node node,
                   java.lang.StringBuffer buffer)

indexOfWhitespace

public int indexOfWhitespace(java.lang.String string,
                             int index)
Find the lowest index of whitespace (space or newline).

Parameters:
string - The string to look in.
index - Where to start looking.
Returns:
-1 if there is no whitespace, the minimum index otherwise.

pack

public java.lang.String pack(java.lang.String string)
Rewrite the comment string. In the sgml table, the comments are of the form:
 -- latin capital letter I with diaeresis,
             U+00CF ISOlat1
 
so we just want to make a one-liner without the spaces and newlines.

Parameters:
string - The raw comment.
Returns:
The single line comment.

pretty

public java.lang.String pretty(java.lang.String string)
Pretty up a comment string.

Parameters:
string - The comment to operate on.
Returns:
The beautiful comment string.

pad

public java.lang.String pad(java.lang.String string,
                            char character,
                            int length)
Pad a string on the left with the given character to the length specified.

Parameters:
string - The string to pad
character - The character to pad with.
length - The size to pad to.
Returns:
The padded string.

unicode

public java.lang.String unicode(java.lang.String string)
Convert the textual representation of the numeric character reference to a character.

Parameters:
string - The numeric character reference (in quotes).
Returns:
The character represented by the numeric character reference.

extract

public void extract(java.lang.String string,
                    java.io.PrintWriter out)
Parse the sgml declaration for character entity reference name, equivalent numeric character reference and a comment. Emit a java hash table 'put' with the name as the key, the numeric character as the value and comment the insertion with the comment.

Parameters:
string - The contents of the sgml declaration.
out - The sink for output.

sgml

public void sgml(java.lang.String string,
                 java.io.PrintWriter out)
Extract special characters. Scan the string looking for substrings of the form:
 <!ENTITY nbsp   CDATA "&#160;" -- no-break space = non-breaking space, U+00A0 ISOnum -->
 
and emit a java definition for each.

Parameters:
string - The raw string from w3.org.
out - The sink for output.

parse

public void parse(java.io.PrintWriter out)
           throws ParserException
Pull out text elements from the HTML.

Parameters:
out - The sink for output.
Throws:
ParserException