Getting a head(ing) with XPath
June 15th, 2005I was looking around the Mozilla XPath Documentation because I wanted to write a bookmarklet to lists headings in pages (I’m lazy like that). And low and behold, the first example I found did (almost) just that. Apparently you do something like:
var headings = document.evaluate("//h2", document, null, XPathResult.ANY_TYPE,null);
But that doesn’t really answer my question about how use XPath to get all the headings into that headings variable .
Digging around on the same site, I discovered a Firefox Extension which does everything I had in mind and then some: Document Map. It produces nifty outlines like this.
But that still doesn’t scratch the itch, of course — how do I get the XPATH to do what I want? So I asked my homie Jonas (er, can you have a homie in another country?), and he found this (in Java documentation, of all places):
Which has some nice examples. Anyway, here’s the answer, apparently:
XPath 1.0 does not support regular expressions to match element names.
However, it is possible to perform some very simple matches on element names.// Get all elements whose name starts with el xpath = "//*[starts-with(name(), 'el')]"; // 2 3 5 7 8 9 // Get all elements whose name contains with lem1 xpath = "//*[contains(name(), 'lem1')]"; // 2 8
So I guess this is the answer:
xpath = "//*[starts-with(name(), 'h')]";
Assuming that there aren’t any other tags that start with h. Which is a dumb assumption. Er… are there any? XPath syntax is a little nutty-looking, if you ask me, but I guess it just takes some getting used to.
But whatever, for now problem solved.
Claus Wahlers suggested two better alternatives:
//h1 | //h2 | //h3| //h4 | //h5 | //h6
Which simply “or’s” together possible heading tags, and the rather more wizardly:
//*[contains('h1h2h3h4h5h6',name())]
What that says is “Return any element (*) which returns true for the condition that the tag’s name can be found within the string h1h2h3h4h5h6.”
That rules out the silly errors my first statement had, like including
html tags or hr or head tags. Duh.
But wait there’s more:
To really learn XPath:
Mark Pilgrim’s Dive into Greasemonkey also has a list of further reading links, one of which points to this XPath Tutorial by Example. It’s already been translated into seven languages, so I guess it must not suck. = )
“Assuming that there aren’t any other tags that start with h. Which is a dumb assumption. Er… are there any?”
uhm.. html, head, hr.
how about
- Claus Wahlers @ 16 June 2005//*[contains('h1h2h3h4h5h6',name())]?or how about
- Claus Wahlers @ 16 June 2005//h1 | //h2 | //h3| //h4 | //h5 | //h6?I should hit myself on the
headwith anhr.About the regexen, the second one looks quite intuitive, thanks for that. Jonas also suggested something similar to the first one, but I confess to having been a little confused by the syntax at first;
I finally figured out what that was about here:
XPath, XQuery, and XSLT Function Reference
fn:contains(string1,string2) Returns true if string1 contains string2, otherwise it returns false
Example: contains('XML','XM')
Result: true
Thanks for the tip!
- pat @ 18 June 2005I guess that to get the loan from creditors you ought to present a great motivation. However, once I’ve got a collateral loan, just because I wanted to buy a building.
- RachelleHart @ 11 April 2010