Improving static data performance with metaprogramming

In my previous post Generating static data with Elixir comprehensions I showed how you can use comprehensions to generate static data in your Elixir application. That static data was stored as a Map in a module attribute and accessible with a function that returns the Map itself.

When building your application, you probably know how the static data will be accessed. Elixir has various ways of accessing data in a List, Keyword, or Map, and there are big differences in performance based on the size of tha data set. We’ll use the library benchee to measure the differences between the built-in access functions and compare that to approaches based on metaprogramming.

Where we left off

I’ll simplify the comprehension here to generate the base set of data. Let’s assume our source file is just tab-separated keys and values. This time around we’ll use ISO country codes instead of languages since there are more two-character country codes than language codes and bigger lists show bigger performance differences. We’ll store these as a Keyword in the module attribute @iso_codes.

defmodule MyApp.Countries do
  @iso_codes for line <- File.stream!("priv/data/iso-3166.tab"),
                 [code, name] = String.split(line, "\t"),
                 code = String.to_atom(code),
                 do: {code, name}

  @spec iso_codes() :: keyword()
  def iso_codes, do: @iso_codes
end

Now we have a function iso_codes/0 that returns the Keyword list, like [au: "Australia", se: "Sweden", ...]

How does the data get used?

Before we benchmark data access, let’s think about how the data is likely to be queried our application. It’s safe to assume two main operations:

  • Looking up names of countries by their codes e.g. :au => "Australia"
  • Looking up country codes by their names e.g. "Australia" => :au

We should add public functions for these operations, but before we do, let’s benchmark different ways of achieving these goals.

Benchmarking

If you can’t measure it, you can’t improve it. - Peter Drucker

We’ll be using the benchee library for benchmarking Elixir code. You create a script with code that benchee will run as many times as possible to give you an average iterations per second. The more iterations, the better.

Let’s create country_code_bench.exs to benchmark various approaches for getting the country name by the code.

First we’ll read the Keyword into a local variable and create a second variable with the data as a Map so we can compare accessing the two.

Then we just need to write different functions for each data access strategy.

# read the Keyword into a local value to reduce function calls
# create a Map from the same data source
codes_kw = Countries.iso_codes()
codes_map = Map.new(codes_kw)

# create a list of all 676 unique 2-char combos. :aa, :ab, :ac...
all_2char = for a <- ?a..?z, b <- ?a..?z, do: String.to_atom(<<a::utf8, b::utf8>>)

# each test iterates all 676 letter combos and tries to fetch the
# country name from the Keyword or Map in various ways.
Benchee.run(%{
  "Keyword.Access" => fn -> Enum.each(all_2char, &codes_kw[&1]) end,
  "Keyword.fetch"  => fn -> Enum.each(all_2char, &Keyword.fetch(codes_kw, &1)) end,
  "Keyword.get"    => fn -> Enum.each(all_2char, &Keyword.get(codes_kw, &1)) end,

  "Map.Access"     => fn -> Enum.each(all_2char, &codes_map[&1]) end,
  "Map.fetch"      => fn -> Enum.each(all_2char, &Map.fetch(codes_map, &1)) end,
  "Map.get"        => fn -> Enum.each(all_2char, &Map.get(codes_map, &1)) end
})

Now we can run this with:

mix run country_code_bench.exs

The result summary on my machine:

Name                     ips        average  deviation         median         99th %
Map.fetch            24.83 K       40.28 μs    ±26.29%          38 μs          85 μs
Map.get              20.41 K       49.00 μs    ±22.69%          46 μs         101 μs
Map.Access           17.76 K       56.32 μs    ±20.75%          53 μs         114 μs
Keyword.Access        3.38 K      295.66 μs    ±15.80%         282 μs         490 μs
Keyword.fetch         3.38 K      295.83 μs    ±13.89%         286 μs      474.68 μs
Keyword.get           3.37 K      296.61 μs    ±14.67%         282 μs         483 μs

Comparison:
Map.fetch            24.83 K
Map.get              20.41 K - 1.22x slower +8.72 μs
Map.Access           17.76 K - 1.40x slower +16.04 μs
Keyword.Access        3.38 K - 7.34x slower +255.38 μs
Keyword.fetch         3.38 K - 7.34x slower +255.55 μs
Keyword.get           3.37 K - 7.36x slower +256.33 μs

We see that accessing a key in Keyword is significantly slower than accessing a Map. And when using a Map, getting values using Access is 40% slower than fetch/2.

What is metaprogramming?

The definition and scope of metaprogramming varies based on the language you’re using. As Elixir is a compiled language, metaprogramming generally refers to, among other things, using code to generate code at compile time. That’s what we’re going to do here.

Generating functions with metaprogramming

Let’s add a function get_name/1 to our module that takes a country code atom and returns its name. This will be functionally equivalent to Map.get/2 where a successful lookup returns the name, and unsuccessful lookup returns nil.

This code is added in the module body:

@doc "Get a country name by its ISO code as an atom"
@spec get_name(atom()) :: nil | binary()

for {code, name} <- @iso_codes do
  def get_name(unquote(code)), do: unquote(name)
end

def get_name(_), do: nil

The for comprehension is evaluated at compile time. The body defines a function get_name/1 and uses pattern matching in the head to match the literal code atom and return the name. You can think of unquote here like string interpolation. We want to inject the values of code and name rather than defining variables named code and name in the function.

We’re able to use the special @doc and @spec attributes to define documentation and a typespec as usual.

The compiled result of this is equivalent to manually defining functions:

@doc "Get a country name by its ISO code as an atom"
@spec get_name(atom()) :: nil | binary()
def get_name(:ad), do: "Andorra"
def get_name(:ae), do: "United Arab Emirates"
def get_name(:af), do: "Afghanisatan"
...
def get_name(:zw), do: "Zimbabwe"
def get_name(_), do: nil

Benchmark again.

Let’s add this get_name/1 function to our benchmark script and run it again to see if we’ve made any improvements.

# add to country_code_bench.exs
"fn_pattern_match" => fn -> Enum.each(all_2char, &Countries.get_name/1) end,
Name                       ips        average  deviation         median         99th %
fn_pattern_match       30.30 K       33.00 μs    ±22.89%          32 μs          70 μs
Map.fetch              25.28 K       39.56 μs    ±27.95%          37 μs          85 μs
Map.get                21.25 K       47.07 μs    ±20.19%          45 μs          98 μs
Map.Access             18.17 K       55.03 μs    ±20.29%          53 μs         113 μs
Keyword.get             3.44 K      290.60 μs    ±13.58%         281 μs      475.55 μs
Keyword.fetch           3.43 K      291.93 μs    ±13.80%         280 μs      474.35 μs
Keyword.Access          3.33 K      300.34 μs    ±12.97%         289 μs         466 μs

Comparison:
fn_pattern_match       30.30 K
Map.fetch              25.28 K - 1.20x slower +6.56 μs
Map.get                21.25 K - 1.43x slower +14.07 μs
Map.Access             18.17 K - 1.67x slower +22.03 μs
Keyword.get             3.44 K - 8.81x slower +257.60 μs
Keyword.fetch           3.43 K - 8.85x slower +258.93 μs
Keyword.Access          3.33 K - 9.10x slower +267.34 μs

Look at that! Our generated functions that use pattern matching are 20% faster than our previous performance king Map.fetch.

Reverse lookups

Now we need to look up country codes by their names. We’ll take a similar approach to generate functions where we pattern match the country name in the head. To support case-insensitive matching, we’ll define the generated functions as private and create a public function that downcases the input and uses the private functions.

@doc "Lookup a country's ISO code by name"
@spec get_code(binary()) :: nil | atom()
def get_code(name) do
  name
  |> String.downcase()
  |> get_code_by_lcase_name()
end

for {code, name} <- @iso_codes, name = String.downcase(name) do
  defp get_code_by_lcase_name(unquote(name)), do: unquote(code)
end
defp get_code_by_lcase_name(_), do: nil

Now we have a nice public API for querying country codes by name:

Countries.get_code("australia")
:au

Countries.get_code("GERMANY")
:de

Tradeoffs and final thoughts

Metaprogramming can be a powerful tool to help achieve better performance for accessing static data. Of course there’s always a tradeoff and in this case the tradeoff is complexity of the code. It’s always worth considering how code golf or fancy (i.e. less conventional) solutions will be understood by other developers.

When writing code that will be executed frequently, this tradeoff may be well worth making. Given this example, imagine you’re processing a data source with tens of thousands of records and looking up country names from ISO codes. Our 20% performance boost will be nice to have.

Benchmarking code is a great way to prove theories on performance. When benchmarks lead to coding somewhat unconventional solutions, be kind to the next developer and leave a comment there explaining what you’re doing and why you chose this approach.

There are entire books published about this, such as Metaprogramming Elixir written by Chris McCord, author of Phoenix.