|
|||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object org.htmlparser.nodes.AbstractNode
The concrete base class for all types of nodes (tags, text remarks).
This class provides basic functionality to hold the Page
, the
starting and ending position in the page, the parent and the list of
children
.
Field Summary | |
protected NodeList |
children
The children of this node. |
protected Page |
mPage
The page this node came from. |
protected int |
nodeBegin
The beginning position of the tag in the line |
protected int |
nodeEnd
The ending position of the tag in the line |
protected Node |
parent
The parent of this node. |
Constructor Summary | |
AbstractNode(Page page,
int start,
int end)
Create an abstract node with the page positions given. |
Method Summary | |
abstract void |
accept(NodeVisitor visitor)
Visit this node. |
java.lang.Object |
clone()
Clone this object. |
void |
collectInto(NodeList list,
NodeFilter filter)
Collect this node and its child nodes (if-applicable) into the collectionList parameter, provided the node satisfies the filtering criteria. |
void |
doSemanticAction()
Perform the meaning of this tag. |
NodeList |
getChildren()
Get the children of this node. |
int |
getEndPosition()
Gets the ending position of the node. |
Node |
getFirstChild()
Get the first child of this node. |
Node |
getLastChild()
Get the last child of this node. |
Node |
getNextSibling()
Get the next sibling to this node. |
Page |
getPage()
Get the page this node came from. |
Node |
getParent()
Get the parent of this node. |
Node |
getPreviousSibling()
Get the previous sibling to this node. |
int |
getStartPosition()
Gets the starting position of the node. |
java.lang.String |
getText()
Returns the text of the node. |
void |
setChildren(NodeList children)
Set the children of this node. |
void |
setEndPosition(int position)
Sets the ending position of the node. |
void |
setPage(Page page)
Set the page this node came from. |
void |
setParent(Node node)
Sets the parent of this node. |
void |
setStartPosition(int position)
Sets the starting position of the node. |
void |
setText(java.lang.String text)
Sets the string contents of the node. |
abstract java.lang.String |
toHtml()
Return the HTML that generated this node. |
abstract java.lang.String |
toPlainTextString()
Returns a string representation of the node. |
abstract java.lang.String |
toString()
Return a string representation of the node. |
Methods inherited from class java.lang.Object |
equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait |
Field Detail |
protected Page mPage
protected int nodeBegin
protected int nodeEnd
protected Node parent
protected NodeList children
Constructor Detail |
public AbstractNode(Page page, int start, int end)
page
- The page this tag was read from.start
- The starting offset of this node within the page.end
- The ending offset of this node within the page.Method Detail |
public java.lang.Object clone() throws java.lang.CloneNotSupportedException
clone
in interface Node
java.lang.CloneNotSupportedException
- This shouldn't be thrown since
the Node
interface extends Cloneable.public abstract java.lang.String toPlainTextString()
Node node; for (Enumeration e = parser.elements (); e.hasMoreElements (); ) { node = (Node)e.nextElement(); System.out.println (node.toPlainTextString ()); // or do whatever processing you wish with the plain text string }
toPlainTextString
in interface Node
public abstract java.lang.String toHtml()
toHtml
in interface Node
public abstract java.lang.String toString()
System.out.println(node)
toString
in interface Node
public void collectInto(NodeList list, NodeFilter filter)
This mechanism allows powerful filtering code to be written very easily,
without bothering about collection of embedded tags separately.
e.g. when we try to get all the links on a page, it is not possible to
get it at the top-level, as many tags (like form tags), can contain
links embedded in them. We could get the links out by checking if the
current node is a CompositeTag
, and going through its children.
So this method provides a convenient way to do this.
Using collectInto(), programs get a lot shorter. Now, the code to extract all links from a page would look like:
NodeList collectionList = new NodeList(); NodeFilter filter = new TagNameFilter ("A"); for (NodeIterator e = parser.elements(); e.hasMoreNodes();) e.nextNode().collectInto(collectionList, filter);Thus, collectionList will hold all the link nodes, irrespective of how deep the links are embedded.
Another way to accomplish the same objective is:
NodeList collectionList = new NodeList(); NodeFilter filter = new TagClassFilter (LinkTag.class); for (NodeIterator e = parser.elements(); e.hasMoreNodes();) e.nextNode().collectInto(collectionList, filter);This is slightly less specific because the LinkTag class may be registered for more than one node name, e.g. <LINK> tags too.
collectInto
in interface Node
list
- The node list to collect acceptable nodes into.filter
- The filter to determine which nodes are retained.public Page getPage()
getPage
in interface Node
Node.setPage(org.htmlparser.lexer.Page)
public void setPage(Page page)
setPage
in interface Node
page
- The page that supplied this node.Node.getPage()
public int getStartPosition()
getStartPosition
in interface Node
Node.setStartPosition(int)
public void setStartPosition(int position)
setStartPosition
in interface Node
position
- The new start position.Node.getStartPosition()
public int getEndPosition()
getEndPosition
in interface Node
Node.setEndPosition(int)
public void setEndPosition(int position)
setEndPosition
in interface Node
position
- The new end position.Node.getEndPosition()
public abstract void accept(NodeVisitor visitor)
accept
in interface Node
visitor
- The visitor that is visiting this node.public Node getParent()
CompositeTag
.
getParent
in interface Node
null
otherwise.Node.setParent(org.htmlparser.Node)
public void setParent(Node node)
setParent
in interface Node
node
- The node that contains this node. Must be a CompositeTag
.Node.getParent()
public NodeList getChildren()
getChildren
in interface Node
null
otherwise.Node.setChildren(org.htmlparser.util.NodeList)
public void setChildren(NodeList children)
setChildren
in interface Node
children
- The new list of children this node contains.Node.getChildren()
public Node getFirstChild()
getFirstChild
in interface Node
null
otherwise.public Node getLastChild()
getLastChild
in interface Node
null
otherwise.public Node getPreviousSibling()
getPreviousSibling
in interface Node
null
otherwise.public Node getNextSibling()
getNextSibling
in interface Node
null
otherwise.public java.lang.String getText()
getText
in interface Node
null
.Node.setText(java.lang.String)
public void setText(java.lang.String text)
setText
in interface Node
text
- The new text for the node.Node.getText()
public void doSemanticAction() throws ParserException
doSemanticAction
in interface Node
ParserException
- Not used. Provides for subclasses
that may want to indicate an exceptional condition.
|
|||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |