Identifying Text Ranges in an Uncertain World

I spoke at FirstMark Capital's Code Driven NYC on December 16, 2015. Check out the video below to learn how Genius uses fuzzy matching to correctly annotate webpages that change over time.…

Seven Habits of Highly Effective Gems

These days, writing a Ruby gem is incredibly easy, but writing a good one isn’t. I recently flew down to San Antonio to give a talk at RubyConf on specific ways that gem authors can make their users and contributors happy. Happy users means more traction for your library,…

PSA: Internet Explorer requires all four arguments to document.createTreeWalker

If you, like me, typically use the Mozilla Developer Network documentation as the source of truth for the browser JavaScript interface, you may be forgiven for assuming that only the first argument to document.createTreeWalker is required [https://developer.mozilla.org/en-US/docs/Web/API/Document/createTreeWalker]. As it turns…

Installing CharlockHolmes on Heroku cedar-14

CharlockHolmes [https://github.com/brianmario/charlock_holmes] is a useful library for detecting the character encoding of strings of unknown provenance. It's both accurate and fast, thanks to its use of icu4c [http://icu-project.org/apiref/icu4c/], a popular C library for unicode operations. Unfortunately, the native dependency…

XPath Is Actually Pretty Useful Once It Stops Being Confusing

I first met XPath in 2007, but we didn't become friends until just recently. For the most part I had avoided it; when forced to use it, I made do with trial and error. XPath just didn't really make sense to me. But then I came…