infundibulum

Getting a head(ing) with XPath

June 15th, 2005

I was looking around the Mozilla XPath Documentation because I wanted to write a bookmarklet to lists headings in pages (I’m lazy like that). And low and behold, the first example I found did (almost) just that. Apparently you do something like:


var headings = document.evaluate("//h2", document, null, XPathResult.ANY_TYPE,null);

But that doesn’t really answer my question about how use XPath to get all the headings into that headings variable .

Digging around on the same site, I discovered a Firefox Extension which does everything I had in mind and then some: Document Map. It produces nifty outlines like this.

But that still doesn’t scratch the itch, of course — how do I get the XPATH to do what I want? So I asked my homie Jonas (er, can you have a homie in another country?), and he found this (in Java documentation, of all places):

Finding Elements by Absolute Location in a DOM Document Using XPath (Java Developers Almanac Example)

Which has some nice examples. Anyway, here’s the answer, apparently:

XPath 1.0 does not support regular expressions to match element names.
However, it is possible to perform some very simple matches on element names.

    // Get all elements whose name starts with el
    xpath = "//*[starts-with(name(), 'el')]";  // 2 3 5 7 8 9

    // Get all elements whose name contains with lem1
    xpath = "//*[contains(name(), 'lem1')]";   // 2 8

So I guess this is the answer:

xpath = "//*[starts-with(name(), 'h')]";

Assuming that there aren’t any other tags that start with h. Which is a dumb assumption. Er… are there any? XPath syntax is a little nutty-looking, if you ask me, but I guess it just takes some getting used to.

But whatever, for now problem solved.

UPDATE
Claus Wahlers suggested two better alternatives:

//h1 | //h2 | //h3| //h4 | //h5 | //h6

Which simply “or’s” together possible heading tags, and the rather more wizardly:

//*[contains('h1h2h3h4h5h6',name())]

What that says is “Return any element (*) which returns true for the condition that the tag’s name can be found within the string h1h2h3h4h5h6.”

That rules out the silly errors my first statement had, like including
html tags or hr or head tags. Duh.

But wait there’s more:

To really learn XPath:

Mark Pilgrim’s Dive into Greasemonkey also has a list of further reading links, one of which points to this XPath Tutorial by Example. It’s already been translated into seven languages, so I guess it must not suck. = )

UPDATE
Also good (even thought it’s rather infested with ads): XPath Tutorial

Comments

  1. 1

    “Assuming that there aren’t any other tags that start with h. Which is a dumb assumption. Er… are there any?”

    uhm.. html, head, hr.

    how about //*[contains('h1h2h3h4h5h6',name())] ?

    - Claus Wahlers @
  2. 2

    or how about //h1 | //h2 | //h3| //h4 | //h5 | //h6 ?

    - Claus Wahlers @
  3. 3

    I should hit myself on the head with an hr.

    About the regexen, the second one looks quite intuitive, thanks for that. Jonas also suggested something similar to the first one, but I confess to having been a little confused by the syntax at first;

    I finally figured out what that was about here:

    XPath, XQuery, and XSLT Function Reference


    fn:contains(string1,string2) Returns true if string1 contains string2, otherwise it returns false

    Example: contains('XML','XM')
    Result: true

    Thanks for the tip!

    - pat @
  4. 4

    I guess that to get the loan from creditors you ought to present a great motivation. However, once I’ve got a collateral loan, just because I wanted to buy a building.

    - RachelleHart @

Leave a Reply