infundibulum

Fuggeddaboutit

October 23rd, 2006

Oh, awesome:

“Full text search in SQLite.”

Oh, the sound of a bazillion angels crying:

“The module currently uses the following generic tokenization mechanism. A token is a contiguous sequence of alphanumeric ASCII characters (A-Z, a-z and 0-9). All non-ASCII characters are ignored. Each token is converted to lowercase before it is stored in the index, so all full-text searches are case-insensitive. The module does not perform stemming of any sort.”

My forehead is really starting to hurt from banging it on the desk.

Comments

  1. 1

    That’s completely and utterly fucking useless. And very, very strange given the fact that SQLite does a great job with UTF-8 in general.

    But there is some hope:

    “Soon, we hope to allow applications to define their own tokenizers (we in fact already have a generic tokenizer mechanism in our code; we just have yet to expose it to the outside world).”

    Maybe we should start a fund for buying and sending infrastructure developers the Unicode 5 book? Hm…

    - Thijs van der Vossen @
  2. 2

    I’m a little afraid to know what their definition of “generic” is… O.o

    - pat @
  3. 3

    generic as in your international language has to be written with letters, a to z, and their squiggly variants [àñÿôè etc…]?

    - dda @
  4. 4

    Hey, who authorized your ÿ???

    - pat @
  5. 5

    Are they completely, totally off their rocker?

    I just caught up with Mark Liberman’s post and was still in the “oh, awsome” state.. Grmph.

    (But then, there’s a tool I use at work that has similar ideas at what text should be. I’ve taken to communicating with the maintainers carefully replacing all occurrences of the letters D and O with “?”. Verrrry slowly some people seem to be getting the point.)

    - chris @
  6. 6

    You know, it just occurred to me that the last sentence in that paragraph is funny:

    The module does not perform stemming of any sort.

    I know what they mean but… haha.

    - pat @

Leave a Reply