AbstractNodeDecorator

Overview

Package

Class

Use

Tree

Deprecated

Index

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

org.htmlparser.nodeDecorators
Class AbstractNodeDecorator

java.lang.Object
  org.htmlparser.nodeDecorators.AbstractNodeDecorator

All Implemented Interfaces:: java.lang.Cloneable, Node, Text

Direct Known Subclasses:: DecodingNode, EscapeCharacterRemovingNode, NonBreakingSpaceConvertingNode

Deprecated. Use direct subclasses or dynamic proxies instead.
Use either direct subclasses of the appropriate node and set them on the PrototypicalNodeFactory, or use a dynamic proxy implementing the required node type interface. In the former case this avoids the wrapping and delegation, while the latter case handles the wrapping and delegation without this class.

Here is an example of how to use dynamic proxies to accomplish the same effect as using decorators to wrap Text nodes:
import java.lang.reflect.InvocationHandler; import java.lang.reflect.InvocationTargetException; import java.lang.reflect.Method; import java.lang.reflect.Proxy; import org.htmlparser.Parser; import org.htmlparser.PrototypicalNodeFactory; import org.htmlparser.Text; import org.htmlparser.nodes.TextNode; import org.htmlparser.util.ParserException; public class TextProxy implements InvocationHandler { protected Object mObject; public static Object newInstance (Object object) { Class cls; cls = object.getClass (); return (Proxy.newProxyInstance ( cls.getClassLoader (), cls.getInterfaces (), new TextProxy (object))); } private TextProxy (Object object) { mObject = object; } public Object invoke (Object proxy, Method m, Object[] args) throws Throwable { Object result; String name; try { result = m.invoke (mObject, args); name = m.getName (); if (name.equals ("clone")) result = newInstance (result); // wrap the cloned object else if (name.equals ("doSemanticAction")) // or other methods System.out.println (mObject); // do the needful on the TextNode } catch (InvocationTargetException e) { throw e.getTargetException (); } catch (Exception e) { throw new RuntimeException ("unexpected invocation exception: " + e.getMessage()); } finally { } return (result); } public static void main (String[] args) throws ParserException { // create the wrapped text node and set it as the prototype Text text = (Text) TextProxy.newInstance (new TextNode (null, 0, 0)); PrototypicalNodeFactory factory = new PrototypicalNodeFactory (); factory.setTextPrototype (text); // perform the parse Parser parser = new Parser (args[0]); parser.setNodeFactory (factory); parser.parse (null); } }

public abstract class AbstractNodeDecorator
extends java.lang.Object
implements Text

Node wrapping base class.

Field Summary
`protected Text`	`delegate` Deprecated.

Constructor Summary
`protected`	`AbstractNodeDecorator(Text delegate)` Deprecated.

Method Summary
`void`	`accept(NodeVisitor visitor)` Deprecated. Apply the visitor to this node.
`java.lang.Object`	`clone()` Deprecated. Clone this object.
`void`	`collectInto(NodeList list, NodeFilter filter)` Deprecated. Collect this node and its child nodes into a list, provided the node satisfies the filtering criteria.
`void`	`doSemanticAction()` Deprecated. Perform the meaning of this tag.
`boolean`	`equals(java.lang.Object arg0)` Deprecated.
`NodeList`	`getChildren()` Deprecated. Get the children of this node.
`int`	`getEndPosition()` Deprecated. Gets the ending position of the node.
`Node`	`getFirstChild()` Deprecated. Get the first child of this node.
`Node`	`getLastChild()` Deprecated. Get the last child of this node.
`Node`	`getNextSibling()` Deprecated. Get the next sibling to this node.
`Page`	`getPage()` Deprecated. Get the page this node came from.
`Node`	`getParent()` Deprecated. Get the parent of this node.
`Node`	`getPreviousSibling()` Deprecated. Get the previous sibling to this node.
`int`	`getStartPosition()` Deprecated. Gets the starting position of the node.
`java.lang.String`	`getText()` Deprecated. Accesses the textual contents of the node.
`void`	`setChildren(NodeList children)` Deprecated. Set the children of this node.
`void`	`setEndPosition(int position)` Deprecated. Sets the ending position of the node.
`void`	`setPage(Page page)` Deprecated. Set the page this node came from.
`void`	`setParent(Node node)` Deprecated. Sets the parent of this node.
`void`	`setStartPosition(int position)` Deprecated. Sets the starting position of the node.
`void`	`setText(java.lang.String text)` Deprecated. Sets the contents of the node.
`java.lang.String`	`toHtml()` Deprecated. Return the HTML for this node.
`java.lang.String`	`toPlainTextString()` Deprecated. A string representation of the node.
`java.lang.String`	`toString()` Deprecated. Return the string representation of the node.

Methods inherited from class java.lang.Object

finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait

Field Detail

delegate

protected Text delegate

Deprecated.

Constructor Detail

AbstractNodeDecorator

protected AbstractNodeDecorator(Text delegate)

Deprecated.

Method Detail

clone

public java.lang.Object clone()
                       throws java.lang.CloneNotSupportedException

Deprecated.

Clone this object. Exposes java.lang.Object clone as a public method.

Specified by:: clone in interface Node

Returns:: A clone of this object.
Throws:: java.lang.CloneNotSupportedException - This shouldn't be thrown since the Node interface extends Cloneable.

accept

public void accept(NodeVisitor visitor)

Deprecated.

Description copied from interface: Node

Apply the visitor to this node.

Specified by:: accept in interface Node

Parameters:: visitor - The visitor to this node.

collectInto

public void collectInto(NodeList list,
                        NodeFilter filter)

Deprecated.

Description copied from interface: Node

Collect this node and its child nodes into a list, provided the node satisfies the filtering criteria.

This mechanism allows powerful filtering code to be written very easily, without bothering about collection of embedded tags separately. e.g. when we try to get all the links on a page, it is not possible to get it at the top-level, as many tags (like form tags), can contain links embedded in them. We could get the links out by checking if the current node is a CompositeTag, and going through its children. So this method provides a convenient way to do this.

Using collectInto(), programs get a lot shorter. Now, the code to extract all links from a page would look like:

 NodeList list = new NodeList ();
 NodeFilter filter = new TagNameFilter ("A");
 for (NodeIterator e = parser.elements (); e.hasMoreNodes ();)
      e.nextNode ().collectInto (list, filter);

Thus, list will hold all the link nodes, irrespective of how deep the links are embedded.

Another way to accomplish the same objective is:

 NodeList list = new NodeList ();
 NodeFilter filter = new TagClassFilter (LinkTag.class);
 for (NodeIterator e = parser.elements (); e.hasMoreNodes ();)
      e.nextNode ().collectInto (list, filter);

This is slightly less specific because the LinkTag class may be registered for more than one node name, e.g. <LINK> tags too.

Specified by:: collectInto in interface Node

Parameters:: list - The list to collect nodes into.; filter - The criteria to use when deciding if a node should be added to the list.

getStartPosition

public int getStartPosition()

Deprecated.

Gets the starting position of the node.

Specified by:: getStartPosition in interface Node

Returns:: The start position.
See Also:: Node.setStartPosition(int)

setStartPosition

public void setStartPosition(int position)

Deprecated.

Sets the starting position of the node.

Specified by:: setStartPosition in interface Node

Parameters:: position - The new start position.
See Also:: Node.getStartPosition()

getEndPosition

public int getEndPosition()

Deprecated.

Gets the ending position of the node.

Specified by:: getEndPosition in interface Node

Returns:: The end position.
See Also:: Node.setEndPosition(int)

setEndPosition

public void setEndPosition(int position)

Deprecated.

Sets the ending position of the node.

Specified by:: setEndPosition in interface Node

Parameters:: position - The new end position.
See Also:: Node.getEndPosition()

getPage

public Page getPage()

Deprecated.

Get the page this node came from.

Specified by:: getPage in interface Node

Returns:: The page that supplied this node.
See Also:: Node.setPage(org.htmlparser.lexer.Page)

setPage

public void setPage(Page page)

Deprecated.

Set the page this node came from.

Specified by:: setPage in interface Node

Parameters:: page - The page that supplied this node.
See Also:: Node.getPage()

equals

public boolean equals(java.lang.Object arg0)

Deprecated.

getParent

public Node getParent()

Deprecated.

Description copied from interface: Node

Get the parent of this node. This will always return null when parsing with the Lexer. Currently, the object returned from this method can be safely cast to a CompositeTag, but this behaviour should not be expected in the future.

Specified by:: getParent in interface Node

Returns:: The parent of this node, if it's been set, null otherwise.
See Also:: Node.setParent(org.htmlparser.Node)

getText

public java.lang.String getText()

Deprecated.

Description copied from interface: Text

Accesses the textual contents of the node.

Specified by:: getText in interface Text

Returns:: The text of the node.
See Also:: Text.setText(java.lang.String)

setParent

public void setParent(Node node)

Deprecated.

Description copied from interface: Node

Sets the parent of this node.

Specified by:: setParent in interface Node

Parameters:: node - The node that contains this node.
See Also:: Node.getParent()

getChildren

public NodeList getChildren()

Deprecated.

Get the children of this node.

Specified by:: getChildren in interface Node

Returns:: The list of children contained by this node, if it's been set, null otherwise.
See Also:: Node.setChildren(org.htmlparser.util.NodeList)

setChildren

public void setChildren(NodeList children)

Deprecated.

Set the children of this node.

Specified by:: setChildren in interface Node

Parameters:: children - The new list of children this node contains.
See Also:: Node.getChildren()

getFirstChild

public Node getFirstChild()

Deprecated.

Description copied from interface: Node

Get the first child of this node.

Specified by:: getFirstChild in interface Node

Returns:: The first child in the list of children contained by this node, null otherwise.

getLastChild

public Node getLastChild()

Deprecated.

Description copied from interface: Node

Get the last child of this node.

Specified by:: getLastChild in interface Node

Returns:: The last child in the list of children contained by this node, null otherwise.

getPreviousSibling

public Node getPreviousSibling()

Deprecated.

Description copied from interface: Node

Get the previous sibling to this node.

Specified by:: getPreviousSibling in interface Node

Returns:: The previous sibling to this node if one exists, null otherwise.

getNextSibling

public Node getNextSibling()

Deprecated.

Description copied from interface: Node

Get the next sibling to this node.

Specified by:: getNextSibling in interface Node

Returns:: The next sibling to this node if one exists, null otherwise.

setText

public void setText(java.lang.String text)

Deprecated.

Description copied from interface: Text

Sets the contents of the node.

Specified by:: setText in interface Text

Parameters:: text - The new text for the node.
See Also:: Text.getText()

toHtml

public java.lang.String toHtml()

Deprecated.

Description copied from interface: Node

Return the HTML for this node. This should be the exact sequence of characters that were encountered by the parser that caused this node to be created. Where this breaks down is where broken nodes (tags and remarks) have been encountered and fixed. Applications reproducing html can use this method on nodes which are to be used or transferred as they were received or created.

Specified by:: toHtml in interface Node

Returns:: The (exact) sequence of characters that would cause this node to be returned by the parser or lexer.

toPlainTextString

public java.lang.String toPlainTextString()

Deprecated.

Description copied from interface: Node

A string representation of the node. This is an important method, it allows a simple string transformation of a web page, regardless of a node. For a Text node this is obviously the textual contents itself. For a Remark node this is the remark contents (sic). For tags this is the text contents of it's children (if any). Because multiple nodes are combined when presenting a page in a browser, this will not reflect what a user would see. See HTML specification section 9.1 White space http://www.w3.org/TR/html4/struct/text.html#h-9.1.
Typical application code (for extracting only the text from a web page) would be:

 for (Enumeration e = parser.elements (); e.hasMoreElements ();)
     // or do whatever processing you wish with the plain text string
     System.out.println ((Node)e.nextElement ()).toPlainTextString ());

Specified by:: toPlainTextString in interface Node

Returns:: The text of this node including it's children.

toString

public java.lang.String toString()

Deprecated.

Description copied from interface: Node

Return the string representation of the node. The return value may not be the entire contents of the node, and non- printable characters may be translated in order to make them visible. This is typically to be used in the manner

 System.out.println (node);

or within a debugging environment.

Specified by:: toString in interface Node

doSemanticAction

public void doSemanticAction()
                      throws ParserException

Deprecated.

Description copied from interface: Node

Perform the meaning of this tag. This is defined by the tag, for example the bold tag <B> may switch bold text on and off. Only a few tags have semantic meaning to the parser. These have to do with the character set to use (<META>) and the base URL to use (<BASE>). Other than that, the semantic meaning is up to the application and it's custom nodes.
The semantic action is performed when the node has been parsed. For composite nodes (those that contain other nodes), the children will have already been parsed and will be available via Node.getChildren().

Specified by:: doSemanticAction in interface Node

Throws:: ParserException - If a problem is encountered performing the semantic action.

Overview

Package

Class

Use

Tree

Deprecated

Index

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

org.htmlparser.nodeDecorators Class AbstractNodeDecorator

delegate

AbstractNodeDecorator

clone

accept

collectInto

getStartPosition

setStartPosition

getEndPosition

setEndPosition

getPage

setPage

equals

getParent

getText

setParent

getChildren

setChildren

getFirstChild

getLastChild

getPreviousSibling

getNextSibling

setText

toHtml

toPlainTextString

toString

doSemanticAction

org.htmlparser.nodeDecorators
Class AbstractNodeDecorator