Having come from an almost exclusively Object-Oriented and Procedural Programming background, one of the things I initially struggled with when learning the functional programming paradigm of Elixir, but which I later grew to love, was to think about solving problems in terms of data processing pipelines of functions, rather than hierarchies of objects interacting with one another.
I found myself spending a significant time thinking about how to elegantly express the solution to a data processing requirement as a sequence of data transforms. Although there are always numerous different ways to achieve an aim in Elixir, the language is both more rewarding and punishing than non-functional languages, like C#, Java or Python, when it comes to the quality of your implementation. If you think it through and get it right, you are rewarded with a beautifully elegant and concise sequence of data transformations; but if you approach the problem from the wrong direction, or try to hack it, things rapidly become messy.
In reality, putting the extra effort in to properly think through what you are trying to achieve rapidly becomes second nature; and there have been several times over the last few months where it has given me great pleasure to arrive at an elegant solution after a bit of head-scratching and careful thinking.
Elixir’s Language Features for Chaining Processing Steps
With the aim of creating a clean and elegant sequence of processing steps I want to look at the relative merits of the different approaches to chaining the output of one processing step with the input of the next step.
The main options (from most basic upwards) are:
– A simple ‘if-else’ statement
– The ‘cond’ and ‘case’ statements
– Expression chaining using the |> operator and pattern-matching
– The ‘with’ statement
Each of these has their merits and detractions in different situations. My (by no means definitive) take on when to use these is as follows:
if…else (official docs)
My general thought would be that unlike procedural/OO languages, where several nested ‘if…else’ statements can be a fairly standard and not too terrible occurrence, if you find yourself using if…else as more than a single top-level selector in Elixir, there are almost certainly better ways of expressing your logic. I get the feeling that ‘if’ statements in Elixir are tolerated, rather than recommended. Whilst they are useful where there is a simple binary decision that needs to be made, anything with multiple interrelated decisions is likely to get messy. Whilst if…else statements might seem like a familiar friend to procedural developers, they don’t work as well in the functional paradigm. In C# or Python it might make sense to have a function with multiple ‘if’s updating some internal object state based on input data; however, data immutability means that this kind of scenario doesn’t really apply in Elixir. For example, it may make more sense to have different versions of an Elixir function, matching different patterns of input data, rather than using ‘if’ statements inside the function to switch logic path.
One other shortcoming is that ‘if’ doesn’t easily handle the case where one of the items being evaluated may be one of a number of types. This situation can occur with one of the standard Elixir patterns, whereby functions return some data on success, or {:error, message} on failure. Testing such a return value with ‘if’ will cause a MatchError, and so is awkward or inappropriate in this situation; wheras the scenario can easily be handling using the pattern matching available in the options discussed below.
Case and Cond (official docs)
Elixir’s ‘cond’ and ‘case’ statements are both useful where an expression can evaluate to a number of different values (case) or where several paths may be taken based on one of a number of different statements evaluating to ‘true’ (cond)
Case can also be more useful than ‘if’ in some binary branching cases as it allows binary decisions to be made when the statement being evaluated can result in different types. Coming back to the return-type paradigm mentioned above in the ‘if’ section – case can be used to determine the next processing step when a function potentially returns more than one return type, e.g. here, where for success there is just a single atom, ‘:ok’, but for failure there is a tuple:
case uploadAsset(localName, targetFilename) do
:ok ->
IO.puts("Uploaded #{targetFilename}")
{:ok, "#{@mediaRoot}/#{targetFilename}"}
{:error, reason} ->
IO.puts("Uploaded of #{targetFilename} failed: #{reason}")
{:error, "Unable to upload track to filestore: #{reason}"}
end
Case is obviously useful in any case where there are multiple possible patterns to match; and a big benefit it brings when compared with the equivalent in most procedural languages, is that a) the optional match patterns within the case can be complex data types and b) matched values can then be used in the subsequent action statement. For example, take this block of code that handles the result of a transactional database update:
case Repo.transaction(multi) do
# if success, then return the profile
{:ok, result} -> result.profile
{:error, :profile, changeset, %{}} ->
EventLogger.logError("PREFERENCE_PROFILE_CREATE_ERROR", "#{changeset |> inspect}", __MODULE__)
{:error, "Unable to create Preference Profile for device"}
{:error, :device, changeset, %{}} ->
EventLogger.logError("DEVICE_CREATE_ERROR", "#{changeset |> inspect}", __MODULE__)
{:error, "Unable to create consumption device record"}
end
The transaction update can fail in multiple ways, and the case statement allows all of these patterns to be matched and then acted upon.
One disadvantage is that nested case statements can rapidly become messy and difficult to understand; and so I’ve found they work best if each clause routes to just a few lines of processing – ideally to a single expression or function call.
Case statements can be made more flexible still by using a ‘when’ clause to limit matches still further – this becomes most useful when there is a drop-through default that mops up any situation where there isn’t a match.
Cond provides a similar multi-path selector to case, but is useful when each path is gated on multiple un-related (or only semi related) conditions. For example, the following authentication handler plug logic within a Phoenix app, that first needs to match the case where a user isn’t logged in, then needs to match the case where they are logged in and have access rights, and finally needs to drop through to the case where the user has insufficient rights for a resource:
def call(conn, opts) do
cond do
conn.assigns.current_user == nil ->
conn
|> put_flash(:error, "Please log in or sign up to access that page")
|> redirect(to: page_path(conn, :index))
|> halt()
matchRequirements(conn.assigns.current_user, opts) == true ->
conn
true ->
conn
|> put_flash(:error, "You have insufficient rights to access that page")
|> redirect(to: page_path(conn, :index))
|> halt()
end
end
In general I have found ‘cond’ to be less frequently useful than ‘case’ or the other approaches mentioned here; but still a very useful part of the Elixir toolkit when the need emerges, and more elegant than the multiple nested ‘if’ statements that may be necessary without it.
The |> Operator (official docs)
If there was one language feature that really makes Elixir pop it has to be the |> pipeline operator, and the programming paradigm that goes along with it. You may begin to guess that I love this feature, and I’ll try to explain why in the next few paragraphs:
So what is the |> operator? Quite simply it is an operator that directs the output of the statement on its left-hand-side into the first parameter of the function on its right-hand-side. On the surface, this doesn’t seem like that big a deal; however, this facilitates a programming pattern which, in conjunction with the way many of the standard enumerable and stream-handling modules are written, allows complex sequences to be expressed concisely and elegantly.
I think the primary benefit that use of the |> operator brings is it really gets you to think in terms of a data processing pipeline. If you design your functional steps properly then you can have one function pass its output to the input of the next function, etc. etc.; and what you end up with is a brilliantly concise and clear representation of the data processing steps your code is making. I had a eureka moment early on in my Elixir programming days where I had written some processing logic that spread across several functions, case statements and if branches. It wasn’t terrible code, but it didn’t exactly jump out as elegant. It was then that I began to realise the power of Elixir’s ‘Enum’ module, and the way it allows sequences of data to be operated on using a single statement.
With this in mind, I started playing around with my code, which needed to:
– iterate over a dataset, creating a ‘match score’ for each data item
– sort the results so the best matches were first
– take the top results
– separate the data items from their match scores
– load a foreign-key associated data item from the database for each data item
– extract the associated data items out into their own results list.
Using Enum and the pipe operator, I was able to refactor the code as follows:
results = getTrackMetadata
|> Enum.map(fn x -> { measureDistance(seed.v1, x.v1), x } end)
|> List.keysort(0)
|> Enum.slice(0, @numResultsNeeded)
|> Enum.map(fn {_score, metadata} -> metadata end)
|> Repo.preload(:track)
|> Enum.map(fn x -> x.track end)
Which resulted in a block of code that clearly illustrated the sequence of steps that were taking place.
So, the first two conclusions I came to were that
1) The |> operator allows a pipeline to be clearly defined as an elegant sequence of data processing steps.
2) When processing sets of data, the Enum module provides a powerful set of functions to applying map, reduce and other sequence processing operations as part of a pipeline.
However, there was a problem. Take the following sequence, which is a block of code for doing an authenticated upload to Google Cloud Storage, and then logging the result:
def upload(sourcePath, targetFilename, type) do
Goth.Token.for_scope("https://www.googleapis.com/auth/cloud-platform")
|> createConnection
|> uploadFile(sourcePath, targetFilename, type)
|> EventLogger.logResult("FILE_OP", "Upload of #{targetFilename}")
end
This looks clean and concise, but what happens when any of the intermediate functions suffers a failure? Suddenly there is a match error in the pipeline and Elixir throws an exception, bringing the pipeline to a crashing halt. This may be ok in some situations, but in most cases I would expect that it is desirable to at least register an error and handle it properly.
In order to allow the pipeline to run end-to-end in the successful case, but to propagate errors in the case of failure, I’ve found that an effective pattern is to implement two versions of each pipeline function: an error propagating one, and the standard one. In other words, create different versions of the function to handle each of the possible outputs from the previous step; and in the case of that output being an error, just propagate that error unchanged.
For example, that ‘createConnection’ function above can be implemented as:
defp createConnection({:error, message}), do: {:error, message}
defp createConnection({:ok, token}) do
GoogleApi.Storage.V1.Connection.new(token.token)
end
Now, if Goth.Token.for_scope returns an error, createConnection passes the error onwards; but if it returns {:ok, token} the second version of the function is invoked and a storage token is created.
With (official docs)
So the |> operator allows a concise sequence of steps to be defined; but what if you don’t want to go to the bother of creating duplicate versions of every function in the pipe for cases where the ‘happy path’ isn’t followed? Luckily, Elixir has an answer for this in the form of the ‘with’ macro.
Simply put, ‘with’ allows you to define a sequence of processing steps, in a similar way as to the |> operator; but it allows you to define the match condition for the result of each step, and an error handler for the case where this match doesn’t occur. In other words, it allows you to define the happy path, and also what to do when this path can’t be taken.
For example, the following code takes a base64-encoded token, decodes it, extracts its version header, validates the header and then validates the token; before returning the valid token. If there is a failure, it returns the error as a standard {:error, message} tuple:
def unwrap(wrappedToken, expectedTokenType) do
with { :ok, decodedToken } <- base64Decode(wrappedToken, expectedTokenType),
{header, encryptedToken} <- extractHeader(decodedToken),
{:ok, actualTokenType} <- getValidatedTokenType(header, expectedTokenType),
{:ok, validToken} <- getValidatedToken(encryptedToken, actualTokenType) do
{:ok, validToken}
else
{:error, message} -> {:error, "Invalid token data (#{message |> inspect})"}
errorMessage -> {:error, "Invalid token data (#{errorMessage |> inspect})"}
end
end
I’ll start by saying that I’m still on the fence with ‘with’. I like that it can express some powerful processing pipelines without the additional plumbing needed by the |> operator to elegantly handle error conditions; but I find the syntax significantly less readable. That said, it also allows a data processing pipeline to be defined when the function outputs can’t be directly fed into the next stage; and so offers a little extra flexibility when compared to |>.
I’m still playing with, and exploring the merits, of both approaches; but I still think the concise elegance enabled by the |> operator is often worth the modest overhead.