org.htmlparser.nodeDecorators
Class AbstractNodeDecorator

java.lang.Object
  extended byorg.htmlparser.nodeDecorators.AbstractNodeDecorator
All Implemented Interfaces:
java.lang.Cloneable, Node, Text
Direct Known Subclasses:
DecodingNode, EscapeCharacterRemovingNode, NonBreakingSpaceConvertingNode

Deprecated. Use direct subclasses or dynamic proxies instead.

Use either direct subclasses of the appropriate node and set them on the PrototypicalNodeFactory, or use a dynamic proxy implementing the required node type interface. In the former case this avoids the wrapping and delegation, while the latter case handles the wrapping and delegation without this class.

Here is an example of how to use dynamic proxies to accomplish the same effect as using decorators to wrap Text nodes:

import java.lang.reflect.InvocationHandler;
import java.lang.reflect.InvocationTargetException;
import java.lang.reflect.Method;
import java.lang.reflect.Proxy;

import org.htmlparser.Parser;
import org.htmlparser.PrototypicalNodeFactory;
import org.htmlparser.Text;
import org.htmlparser.nodes.TextNode;
import org.htmlparser.util.ParserException;

public class TextProxy
    implements
        InvocationHandler
{
    protected Object mObject;

    public static Object newInstance (Object object)
    {
        Class cls;

        cls = object.getClass ();
        return (Proxy.newProxyInstance (
            cls.getClassLoader (),
            cls.getInterfaces (),
            new TextProxy (object)));
    }

    private TextProxy (Object object)
    {
        mObject = object;
    }

    public Object invoke (Object proxy, Method m, Object[] args)
        throws Throwable
    {
        Object result;
        String name;
        try
        {
            result = m.invoke (mObject, args);
            name = m.getName ();
            if (name.equals ("clone"))
                result = newInstance (result); // wrap the cloned object
            else if (name.equals ("doSemanticAction")) // or other methods
               System.out.println (mObject); // do the needful on the TextNode
        }
        catch (InvocationTargetException e)
        {
            throw e.getTargetException ();
        }
        catch (Exception e)
        {
            throw new RuntimeException ("unexpected invocation exception: " +
                                       e.getMessage());
        }
        finally
        {
        }

        return (result);
    }

    public static void main (String[] args)
        throws
            ParserException
    {
        // create the wrapped text node and set it as the prototype
        Text text = (Text) TextProxy.newInstance (new TextNode (null, 0, 0));
        PrototypicalNodeFactory factory = new PrototypicalNodeFactory ();
        factory.setTextPrototype (text);
        // perform the parse
        Parser parser = new Parser (args[0]);
        parser.setNodeFactory (factory);
        parser.parse (null);
    }
}
 

public abstract class AbstractNodeDecorator
extends java.lang.Object
implements Text

Node wrapping base class.


Field Summary
protected  Text delegate
          Deprecated.  
 
Constructor Summary
protected AbstractNodeDecorator(Text delegate)
          Deprecated.  
 
Method Summary
 void accept(NodeVisitor visitor)
          Deprecated. Apply the visitor to this node.
 java.lang.Object clone()
          Deprecated. Clone this object.
 void collectInto(NodeList list, NodeFilter filter)
          Deprecated. Collect this node and its child nodes into a list, provided the node satisfies the filtering criteria.
 void doSemanticAction()
          Deprecated. Perform the meaning of this tag.
 boolean equals(java.lang.Object arg0)
          Deprecated.  
 NodeList getChildren()
          Deprecated. Get the children of this node.
 int getEndPosition()
          Deprecated. Gets the ending position of the node.
 Node getFirstChild()
          Deprecated. Get the first child of this node.
 Node getLastChild()
          Deprecated. Get the last child of this node.
 Node getNextSibling()
          Deprecated. Get the next sibling to this node.
 Page getPage()
          Deprecated. Get the page this node came from.
 Node getParent()
          Deprecated. Get the parent of this node.
 Node getPreviousSibling()
          Deprecated. Get the previous sibling to this node.
 int getStartPosition()
          Deprecated. Gets the starting position of the node.
 java.lang.String getText()
          Deprecated. Accesses the textual contents of the node.
 void setChildren(NodeList children)
          Deprecated. Set the children of this node.
 void setEndPosition(int position)
          Deprecated. Sets the ending position of the node.
 void setPage(Page page)
          Deprecated. Set the page this node came from.
 void setParent(Node node)
          Deprecated. Sets the parent of this node.
 void setStartPosition(int position)
          Deprecated. Sets the starting position of the node.
 void setText(java.lang.String text)
          Deprecated. Sets the contents of the node.
 java.lang.String toHtml()
          Deprecated. Return the HTML for this node.
 java.lang.String toPlainTextString()
          Deprecated. A string representation of the node.
 java.lang.String toString()
          Deprecated. Return the string representation of the node.
 
Methods inherited from class java.lang.Object
finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Field Detail

delegate

protected Text delegate
Deprecated. 
Constructor Detail

AbstractNodeDecorator

protected AbstractNodeDecorator(Text delegate)
Deprecated. 
Method Detail

clone

public java.lang.Object clone()
                       throws java.lang.CloneNotSupportedException
Deprecated. 
Clone this object. Exposes java.lang.Object clone as a public method.

Specified by:
clone in interface Node
Returns:
A clone of this object.
Throws:
java.lang.CloneNotSupportedException - This shouldn't be thrown since the Node interface extends Cloneable.

accept

public void accept(NodeVisitor visitor)
Deprecated. 
Description copied from interface: Node
Apply the visitor to this node.

Specified by:
accept in interface Node
Parameters:
visitor - The visitor to this node.

collectInto

public void collectInto(NodeList list,
                        NodeFilter filter)
Deprecated. 
Description copied from interface: Node
Collect this node and its child nodes into a list, provided the node satisfies the filtering criteria.

This mechanism allows powerful filtering code to be written very easily, without bothering about collection of embedded tags separately. e.g. when we try to get all the links on a page, it is not possible to get it at the top-level, as many tags (like form tags), can contain links embedded in them. We could get the links out by checking if the current node is a CompositeTag, and going through its children. So this method provides a convenient way to do this.

Using collectInto(), programs get a lot shorter. Now, the code to extract all links from a page would look like:

 NodeList list = new NodeList ();
 NodeFilter filter = new TagNameFilter ("A");
 for (NodeIterator e = parser.elements (); e.hasMoreNodes ();)
      e.nextNode ().collectInto (list, filter);
 
Thus, list will hold all the link nodes, irrespective of how deep the links are embedded.

Another way to accomplish the same objective is:

 NodeList list = new NodeList ();
 NodeFilter filter = new TagClassFilter (LinkTag.class);
 for (NodeIterator e = parser.elements (); e.hasMoreNodes ();)
      e.nextNode ().collectInto (list, filter);
 
This is slightly less specific because the LinkTag class may be registered for more than one node name, e.g. <LINK> tags too.

Specified by:
collectInto in interface Node
Parameters:
list - The list to collect nodes into.
filter - The criteria to use when deciding if a node should be added to the list.


getStartPosition

public int getStartPosition()
Deprecated. 
Gets the starting position of the node.

Specified by:
getStartPosition in interface Node
Returns:
The start position.
See Also:
Node.setStartPosition(int)

setStartPosition

public void setStartPosition(int position)
Deprecated. 
Sets the starting position of the node.

Specified by:
setStartPosition in interface Node
Parameters:
position - The new start position.
See Also:
Node.getStartPosition()

getEndPosition

public int getEndPosition()
Deprecated. 
Gets the ending position of the node.

Specified by:
getEndPosition in interface Node
Returns:
The end position.
See Also:
Node.setEndPosition(int)

setEndPosition

public void setEndPosition(int position)
Deprecated. 
Sets the ending position of the node.

Specified by:
setEndPosition in interface Node
Parameters:
position - The new end position.
See Also:
Node.getEndPosition()

getPage

public Page getPage()
Deprecated. 
Get the page this node came from.

Specified by:
getPage in interface Node
Returns:
The page that supplied this node.
See Also:
Node.setPage(org.htmlparser.lexer.Page)

setPage

public void setPage(Page page)
Deprecated. 
Set the page this node came from.

Specified by:
setPage in interface Node
Parameters:
page - The page that supplied this node.
See Also:
Node.getPage()

equals

public boolean equals(java.lang.Object arg0)
Deprecated. 

getParent

public Node getParent()
Deprecated. 
Description copied from interface: Node
Get the parent of this node. This will always return null when parsing with the Lexer. Currently, the object returned from this method can be safely cast to a CompositeTag, but this behaviour should not be expected in the future.

Specified by:
getParent in interface Node
Returns:
The parent of this node, if it's been set, null otherwise.
See Also:
Node.setParent(org.htmlparser.Node)

getText

public java.lang.String getText()
Deprecated. 
Description copied from interface: Text
Accesses the textual contents of the node.

Specified by:
getText in interface Text
Returns:
The text of the node.
See Also:
Text.setText(java.lang.String)

setParent

public void setParent(Node node)
Deprecated. 
Description copied from interface: Node
Sets the parent of this node.

Specified by:
setParent in interface Node
Parameters:
node - The node that contains this node.
See Also:
Node.getParent()

getChildren

public NodeList getChildren()
Deprecated. 
Get the children of this node.

Specified by:
getChildren in interface Node
Returns:
The list of children contained by this node, if it's been set, null otherwise.
See Also:
Node.setChildren(org.htmlparser.util.NodeList)

setChildren

public void setChildren(NodeList children)
Deprecated. 
Set the children of this node.

Specified by:
setChildren in interface Node
Parameters:
children - The new list of children this node contains.
See Also:
Node.getChildren()

getFirstChild

public Node getFirstChild()
Deprecated. 
Description copied from interface: Node
Get the first child of this node.

Specified by:
getFirstChild in interface Node
Returns:
The first child in the list of children contained by this node, null otherwise.

getLastChild

public Node getLastChild()
Deprecated. 
Description copied from interface: Node
Get the last child of this node.

Specified by:
getLastChild in interface Node
Returns:
The last child in the list of children contained by this node, null otherwise.

getPreviousSibling

public Node getPreviousSibling()
Deprecated. 
Description copied from interface: Node
Get the previous sibling to this node.

Specified by:
getPreviousSibling in interface Node
Returns:
The previous sibling to this node if one exists, null otherwise.

getNextSibling

public Node getNextSibling()
Deprecated. 
Description copied from interface: Node
Get the next sibling to this node.

Specified by:
getNextSibling in interface Node
Returns:
The next sibling to this node if one exists, null otherwise.

setText

public void setText(java.lang.String text)
Deprecated. 
Description copied from interface: Text
Sets the contents of the node.

Specified by:
setText in interface Text
Parameters:
text - The new text for the node.
See Also:
Text.getText()

toHtml

public java.lang.String toHtml()
Deprecated. 
Description copied from interface: Node
Return the HTML for this node. This should be the exact sequence of characters that were encountered by the parser that caused this node to be created. Where this breaks down is where broken nodes (tags and remarks) have been encountered and fixed. Applications reproducing html can use this method on nodes which are to be used or transferred as they were received or created.

Specified by:
toHtml in interface Node
Returns:
The (exact) sequence of characters that would cause this node to be returned by the parser or lexer.

toPlainTextString

public java.lang.String toPlainTextString()
Deprecated. 
Description copied from interface: Node
A string representation of the node. This is an important method, it allows a simple string transformation of a web page, regardless of a node. For a Text node this is obviously the textual contents itself. For a Remark node this is the remark contents (sic). For tags this is the text contents of it's children (if any). Because multiple nodes are combined when presenting a page in a browser, this will not reflect what a user would see. See HTML specification section 9.1 White space http://www.w3.org/TR/html4/struct/text.html#h-9.1.
Typical application code (for extracting only the text from a web page) would be:
 for (Enumeration e = parser.elements (); e.hasMoreElements ();)
     // or do whatever processing you wish with the plain text string
     System.out.println ((Node)e.nextElement ()).toPlainTextString ());
 

Specified by:
toPlainTextString in interface Node
Returns:
The text of this node including it's children.

toString

public java.lang.String toString()
Deprecated. 
Description copied from interface: Node
Return the string representation of the node. The return value may not be the entire contents of the node, and non- printable characters may be translated in order to make them visible. This is typically to be used in the manner
 System.out.println (node);
 
or within a debugging environment.

Specified by:
toString in interface Node

doSemanticAction

public void doSemanticAction()
                      throws ParserException
Deprecated. 
Description copied from interface: Node
Perform the meaning of this tag. This is defined by the tag, for example the bold tag <B> may switch bold text on and off. Only a few tags have semantic meaning to the parser. These have to do with the character set to use (<META>) and the base URL to use (<BASE>). Other than that, the semantic meaning is up to the application and it's custom nodes.
The semantic action is performed when the node has been parsed. For composite nodes (those that contain other nodes), the children will have already been parsed and will be available via Node.getChildren().

Specified by:
doSemanticAction in interface Node
Throws:
ParserException - If a problem is encountered performing the semantic action.