Unicode and Elixir, part 1: Introduction
For the next few posts, at least, I’m going to write about Unicode, how it’s implemented and used in Elixir. I intend to keep these posts shorter than the last series so I can keep it much more lightweight and regular. However, I’m a pretty ambitious guy, and I like getting carried away with things, also, the scope of this extends in two directions with a lot of ground, at the same time, personally, there’s a lot of “life happens” stuff happening, so … we’ll see.
At first though, lets get oriented. Here’s a few resources we’re going to get familiar with as we go along.
From the Elixir Getting started book Chapter 6: Binaries, strings and char lists. Here you can read about how strings in Elixir work. There’s two kinds, but I expect to focus on strings as binaries, rather than as char lists.
From Elixir’s docs, the String module documentation will be our reference for the different operations on Elixir’s strings. Other module documentation of note: Enum and possibly StringIO.
From Elixir’s source, we’ll find most of what we’re interested in lib/elixir/unicode folder.
From http://unicode.org we’ll want to get up to speed on the unicode specification. Elixir is currently on Unicode 8.0.0, but Unicode 9.0.0 is coming out June 2016, with a bunch of new characters and important other changes.
And that last bit gets to heart of why I’m writing this. How does the a project like Elixir implement and keep up with a large and complicated standard like Unicode? Unicode as a standard of character representation and Elixir as a language for humans and computers to share, occupy an intersection where the crosswinds of mathematical numbers and strings with thousands of years of human traditions around reading and writing stir and mix. If find this negotiated territory between computation and communication with all of the conflicts and compromises very interesting. If you do too, please follow along!