infundibulum

Forecastfox

June 15th, 2005

Speaking of Firefox plugins, is it just me, or does the “thundercloud” in Forecastfox look like one of those wizard hats?

Screenshot of Forecastfox

Hmm, apparently my ever-so-witty commentary is moot, in any case, since forecastfox has been updated.

Getting a head(ing) with XPath

June 15th, 2005

I was looking around the Mozilla XPath Documentation because I wanted to write a bookmarklet to lists headings in pages (I’m lazy like that). And low and behold, the first example I found did (almost) just that. Apparently you do something like:


var headings = document.evaluate("//h2", document, null, XPathResult.ANY_TYPE,null);

But that doesn’t really answer my question about how use XPath to get all the headings into that headings variable .

Digging around on the same site, I discovered a Firefox Extension which does everything I had in mind and then some: Document Map. It produces nifty outlines like this.

But that still doesn’t scratch the itch, of course — how do I get the XPATH to do what I want? So I asked my homie Jonas (er, can you have a homie in another country?), and he found this (in Java documentation, of all places):

Finding Elements by Absolute Location in a DOM Document Using XPath (Java Developers Almanac Example)

Which has some nice examples. Anyway, here’s the answer, apparently:

XPath 1.0 does not support regular expressions to match element names.
However, it is possible to perform some very simple matches on element names.

    // Get all elements whose name starts with el
    xpath = "//*[starts-with(name(), 'el')]";  // 2 3 5 7 8 9

    // Get all elements whose name contains with lem1
    xpath = "//*[contains(name(), 'lem1')]";   // 2 8

So I guess this is the answer:

xpath = "//*[starts-with(name(), 'h')]";

Assuming that there aren’t any other tags that start with h. Which is a dumb assumption. Er… are there any? XPath syntax is a little nutty-looking, if you ask me, but I guess it just takes some getting used to.

But whatever, for now problem solved.

UPDATE
Claus Wahlers suggested two better alternatives:

//h1 | //h2 | //h3| //h4 | //h5 | //h6

Which simply “or’s” together possible heading tags, and the rather more wizardly:

//*[contains('h1h2h3h4h5h6',name())]

What that says is “Return any element (*) which returns true for the condition that the tag’s name can be found within the string h1h2h3h4h5h6.”

That rules out the silly errors my first statement had, like including
html tags or hr or head tags. Duh.

But wait there’s more:

To really learn XPath:

Mark Pilgrim’s Dive into Greasemonkey also has a list of further reading links, one of which points to this XPath Tutorial by Example. It’s already been translated into seven languages, so I guess it must not suck. = )

UPDATE
Also good (even thought it’s rather infested with ads): XPath Tutorial