CharlockHolmes is a useful library for detecting the character encoding of strings of unknown provenance. It's both accurate and fast, thanks to its use of icu4c, a popular C library for unicode operations. Unfortunately, the native dependency makes installation on Heroku less than elementary.
The good news is that there's a nicely written blog post that takes you through exactly how to get CharlockHolmes to build. The bad news is that a few things have changed since the post was written, so it doesn't work out of the box; worse, it wasn't terribly obvious (at least to me) how to fix the problems.
But, having spent the afternoon on the problem, I finally got ol' Charlock to build on our cedar-14 app. Since most of the post is still entirely relevant, rather than rehashing it here, I've annotated the original post with updates. Enjoy!