Categories:

Filtering in document.createTreeWalker()

The essence of the Tree Walker object is to easily filter nodes within a document. In the previous page we looked at the various NodeFilter constants (ie: NodeFilter.SHOW_ELEMENT) that provide basic top-level filtering. But that's hardly enough in real world cases. That's where the 3rd parameter of document.createTreeWalker() comes in, which lets you pass in a reference to your custom filtering function that picks up where the 2nd parameter left off:

document.createTreeWalker(root, nodesToShow, filter, entityExpandBol)

"filter" is the function reference to a filtering function:

myfilter=function(node){
if (node.tagName=="DIV" || node.tagName=="IMG") //filter out DIV and IMG elements
return NodeFilter.FILTER_ACCEPT
else
return NodeFilter.FILTER_SKIP
}

var walker=document.createTreeWalker(document.body, NodeFilter.SHOW_ELEMENT, myfilter, false)

while (walker.nextNode())
walker.currentNode.style.display="none" //hide all DIV and IMG elements on the page

In the above, I define a custom function "myfilter()" to filter out (internally) all DIVs and IMGs in the document. Such a function accepts one parameter, the node currently being pointed at as Tree Walker traverses the document. Within this function, 3 constants are supported to allow you to either accept, reject, or skip the node:

NodeFilter filter function constants
NodeFilter.FILTER_ACCEPT NodeFilter.FILTER_REJECT NodeFilter.FILTER_SKIP

FILTER_ACCEPT is self explanatory, and when returned informs TreeWalker to accept this node. However, FILTER_REJECT and FILTER_SKIP differ in a subtle way that is important to understand. With FILTER_REJECT TreeWalker will reject the node in question plus any descendants of the node, while with FILTER_REJECT, TreeWalker will skip the node in question but not its descendants. In other words, if you wish to filter out nodes independent of their relationship with a parent node, use NodeFilter.FILTER_SKIP instead of NodeFilter.FILTER_REJECT. Consider the same filter function above, but slightly modified to use "REJECT" instead of "SKIP" to oust unwanted nodes:

myfilter=function(node){
if (node.tagName=="DIV" || node.tagName=="IMG") //filter out DIV and IMG elements
return NodeFilter.FILTER_ACCEPT
else
return NodeFilter.FILTER_REJECT
}

In this case, not all DIV and IMG elements in the document may be extracted! This is because an image may be contained inside a rejected element such as <P>, causing TreeWalker to skip it automatically once it encounters the unwanted P element.

- Example: Manipulate elements by class attribute

In this demonstration, I'll use the TreeWalker object to easily filter out all elements on the page with class="blue", and change its color to red.

getelementbyclass=function(node){
if (node.className=="blue") //filter out elements with this class attribute
return NodeFilter.FILTER_ACCEPT
else
return NodeFilter.FILTER_SKIP
}

var rootnode=document.body
var walker=document.createTreeWalker(rootnode, NodeFilter.SHOW_ELEMENT, getelementbyclass, false)

while (walker.nextNode())
walker.currentNode.style.color="red"

walker.currentNode=document.body //reset Tree Walker position to root node

Nothing new here, though note the line in red. After I'm done traversing my Tree Walker instance, I reset its currentNode property back to the root node, so subsequent calls to it will begin at the beginning of the collection of filtered nodes again.

Mixing NodeFilter constants

On the previous page you saw the 15 NodeFilter constants that let you filter out nodes of a certain type, such as NodeFilter.SHOW_ELEMENT, NodeFilter.SHOW_TEXT etc. These constants can actually be combined and mixed to create more inclusive or restrictive top level filters. For example:

  • OR operator: NodeFilter.SHOW_ELEMENT | NodeFilter.SHOW_TEXT

  • AND operator: NodeFilter.SHOW_TEXT + NodeFilter.SHOW_COMMENT

  • NOT operator: ~NodeFilter.SHOW_COMMENT (get everything that's not a comment)

//filter out element and text nodes
document.createTreeWalker(root, NodeFilter.SHOW_ELEMENT | NodeFilter.SHOW_TEXT, null, entityExpandBol)

And that's it for the TreeWalker object of DOM2! Remember, this object is currently only supported in Firefox and Opera 8+, and not IE (as of IE7 beta 3).