Paste in some text in an unknown language and this tool will try to identify it.

This is a statistical language identification tool. It can currently identify several hundred languages. Paste in a few paragraphs in any language and it will try to identify what language it is. A few words are not enough for identification purposes.. To find some interesting languages to try, follow one of the links from:

Or just pick a paragraph from this mysterious-looking page: sampletexts.txt

(Note that the submit button will only be enabled once you have pasted in enough text. We're currently working on making it work with less text...)

Here's an experimental bookmarklet, props to Maciej Ceglowski (actually this whole project aside from the id algorithm is is modeled on his Language Guesser):


This language identifier currently has trouble identifying languages which use Chinese characters (mainly Chinese and Japanese). Working on it.

It also tends to over-identify some constructed languages like Interlingua. I'm probably just going to remove those.

NOTE The text you submit may be read during further development; do not test with sensitive or personal content.

(Don't paste in the same phrase over and over—you need to paste in actual text with a variety of words so there is enough variety to find patterns.)