Review: here’s a list of modules defined in elixir/lib/elixir/unicode/unicode.ex:

  • String.Unicode
  • String.Casing
  • String.Break
  • String.Normalizer

What we’ll find when we go searching for uses of these modules in the rest of Elixir is they’re cheifly used in elixir/lib/elixir/lib/string.ex.

Most uses are simply where String uses the macro Kernal.defdelegate/2 to defer effectively import functions from another module.

  • 263: defdelegate split(binary), to: String.Break
  • 515: defdelegate normalize(string, form), to: String.Normalizer
  • 533: defdelegate upcase(binary), to: String.Casing
  • 551: defdelegate downcase(binary), to: String.Casing
  • 582: defdelegate rstrip(binary), to: String.Break, as: :trim_trailing
  • 738: defdelegate lstrip(binary), to: String.Break, as: :trim_leading
  • 769: defdelegate trim_leading(string), to: String.Break
  • 799: defdelegate trim_trailing(string), to: String.Break
  • 1137: defdelegate codepoints(string), to: String.Unicode
  • 1159: defdelegate next_codepoint(string), to: String.Unicode
  • 1286: defdelegate graphemes(string), to: String.Unicode
  • 1325: defdelegate next_grapheme_size(string), to: String.Unicode
  • 1385: defdelegate length(string), to: String.Unicode

If one is just getting started with Elixir this is good look at another way to share code between modules–an alternative to import/2 or perhaps use/2. We can see a couple of examples where some functions are delegated with an alias.

It’s also interesting to note that String.split/1 delegated to String.Break but String.split/2 and String.split/3 are defined in String. If one is getting into Elixir macros, it’s informative to look at source of Kernal.defdelegate/2. You’ll see it macroexpands into a pass-through function.

There’s a handful of areas where String.Unicode.split_at/2 are called directly. Here’s a good example to introduce us:

defp do_split_at(string, position) do
  {byte_size, rest} = String.Unicode.split_at(string, position)
  {binary_part(string, 0, byte_size), rest || ""}
end

This private function supports String.split_at/2. It returns a tuple of byte size, and the rest of the string beginning at the position. With these, it constructs a tuple of the first part of the string to position, and the rest of the string, or an empty string. This a convenient idiom for recursive functions, like String.split_at/2. If you’re new to recursion, check out the source code for String.split_at/2. It’s good to note that there’s no information loss in this function–the original arguments can be reconstructed from the output; an important property, although not one that’s directly used in many applications.

The private function String.do_at/2 supports String.at/2, using String.Unicode.split_at/2 in a fall-through case–determine if it’s reached the end of a string, returning nil if so, or the first character of the rest of the string, if not.

defp do_at(string, position) do
  case String.Unicode.split_at(string, position) do
    {_, nil}  -> nil
    {_, rest} -> first(rest)
  end
end

In String.slice/3 we see String.Unicode/split_at/2 used for detecting the end of a string, but also to get the length of the remaining string after the split.

def slice(string, start, len) when start >= 0 and len >= 0 do
  case String.Unicode.split_at(string, start) do
    {_, nil} -> ""
    {start_bytes, rest} ->
      {len_bytes, _} = String.Unicode.split_at(rest, len)
      binary_part(string, start_bytes, len_bytes)
  end
end

In String.slice/2, the use is very similiar but without an additional split required.

def slice(string, first..-1) when first >= 0 do
  case String.Unicode.split_at(string, first) do
    {_, nil} -> ""
    {start_bytes, _} ->
      binary_part(string, start_bytes, byte_size(string) - start_bytes)
  end
end

Lastly, here’s where String.capitalize/1 calls upon String.Casing.titlecase_once/1:

def capitalize(string) when is_binary(string) do
  {char, rest} = String.Casing.titlecase_once(string)
  char <> downcase(rest)
end

This looks simple enough, that, for the next post I’ll unpack how it’s implemented, here, in String.Casing, and it’s tests. It’ll be a good, simple introduction to a Unicode feature in Elixir.

Postscript on bodiless functions

While researching this entry, I noticed that several string functions had bodiless function heads, preceding bodied function clauses, for example:

def jaro_distance(string1, string2)

I knew about how that’s a necessity for functions with more than one default argument, but some of these, like String.slice/2 and String.printable?/1, and String.jaro_distance/2 didn’t have any default arguments. I couldn’t figure it out for quite a while, but while inquiring about it on the Slack Elixir group I noticed that the commit message for String.printable?/1 explained it:

8f679e9a Explicitely[sic] declare argument names for functions with unnamed arguments.

So the reason for this is documentary, so that documentation generators can print the function with sensible argument names were none may exist in the following definitions.