infundibulum

My Favorite Techie/Language Books

February 8th, 2005

Here’s a list of books that I like that are more or less related to the intersection of language and computing. I make no attempt to justify the grouping — it’s just that I refer to them enough that somebody else out there might be interested.

The Elements of Typographic Style, Robert Bringhurst
Although this is very much a book about print, I still think it’s a great introduction to the nature of typography. There’s an appendix which is especially useful for looking up the names of funny characters like Ą and Đ and so forth. You may think that’s something you can do by just searching for the character in the Unicode tables, but LATIN CAPITAL LETTER A WITH OGONEK only tells you so much. Bringhurst gives you much, much more. Besides, the book itself is one of the most beautiful pieces of typography I’ve ever seen.
Unicode Demystified: A Practical Programmer’s Guide to the Encoding Standard, Richard Gillam
If you really want to dig into Unicode (doesn’t everyone?), this is the book. If you’re a geeky-leaning language nerd, and are wondering if getting into internationalization and localization and programming and stuff like that is for you, then this is probably also the book to start with. Even reference tomes like Daniels (see below) are now out of date in the sense that they don’t convey how various writing systems are represented electronically. This book does that capably and readably, as opposed to the dry-as-dust Unicode specification itself. Even I haven’t read that. People don’t seem to realize what amazing, amazing thing Unicode is. Just browsing this book conveys that.
Jurafsky & Martin and Manning & Schuetze
These two are NLP(Natural Language Processing) textbooks. They’re a more on the mathematical side, and contain no code to speak of, outside of pseudocode for describing algorithms. They’re often mentioned together because they’re sort of complementary — J&M leans toward symbolic approaches (it’s heavy on parsing), whereas M&S is leans more toward the statistical approach (which I personally find more interesting). Both require a significant dedication to understand. (I’ve only made dents.)
Text Processing With Python, David Mertz. (also free online)
Some pretty sound advice on handling text in Python. I don’t particularly like the approach he takes to Unicode, however.
The World’s Major Languages, Bernard Comrie, ed.
This is linguistics stuff. It’s probably the best single book for syntheses of grammar, phonetics/phonology, and writing systems of a broad variety of “important” languages. Of course, in this context “important” can be interpreted to mean “Let’s argue!” In my humble opinion, it’s absurd that Mayan or Quechua or Guaraní or at least one American language wasn’t included. But whatever, it’s still a useful book: if you need to know just a little about the structure of a language, and if it’s in here, it’s an excellent place to start.
The World’s Writing Systems, Daniels & Bright.
This is definitely a library-only kind of book (But if you have a spare $170 bucks lying around, my birthday is coming up next January.) As theory-independent as possible (and much better than Geoffrey Sampson’s Writing Systems in that respect), Daniels & Bright groans under the sheer amount of information it contains. It also groans under the weight of its weight: 919 pages. I’ve xeroxed a few zillion chapters out of here in my day. Endless bemusement.
Longman Dictionary of Contemporary English (searchable online)
This is a bit of an odd choice for this list, but my respect for this dictionary has grown and grown since I first started using it back when I was teaching English as a Second Language. I picked it up because I thought it would be good for learners — and it was. Many of my students ended up buying a copy for themselves. Oddly enough, I found myself using it on a regular basis, just because it’s so clear. I believe its utility is firmly based on one feature: it was built with corpora of actual usage. Not just frequencies of words, but frequencies of phrases. So it gives examples, for instance, of how the word “careful” is actually used: be careful is the most common, followed by careful person/work etc (that is, as an adjective), careful to do sth, and so on. It’s all about exemplification, and nothing about useless grammatical terminology. For a learner, that kind of information is solid gold, and it could only be obtained with statistical approaches to studying language.