Elixir : Basics of pattern matching
Pattern matching is one of the core features of elixir, around which most of the other features and syntaxes are built on top of. It performs two main things such as asserting the equality of two terms and binding the equivalent values on one side to the variables present on the other side. Pattern matching is explicitly done using the match operator =
and it is implicitly used in places like function clauses and case statements. This article is on the basics of pattern matching using the match operator and its usage for different data types in elixir.
Pattern matching using the match operator involves two terms on each side of the match operator.
left_side = right_side
The right side can contain any valid elixir term, an expression or a function call. The right side will first be evaluated and reduced to an elixir term after which the pattern matching will continue.
The left side can contain only elixir terms and variables. The variables on the left side need not be bound to a value. Whenever the match operator encounters a variable on the left side, bound or unbound, it will try to rebind it with a new equivalent term present in the right side, thus trying to make both sides equal. Other than a variable, all the terms present in the left side will be checked for equality with the respective right side counterpart. If the match operator cannot make the two sides equal, then a match error is thrown.
Examples
5 = 5
The right side is already in a reduced form as the term 5. The left side contains term 5 which is equal to the right side and hence the match succeeds.
5 = 6
** (MatchError) no match of right hand side value: 6
The right side is already in a reduced form as the term 6. The left side contains term 5 which is not equal to the right side and hence the match fails.
5 = x
error: undefined variable "x"
The right side contains a variable and it has to be reduced down
to the value bound to the variable. But the variable x is unbound.
Hence the right side term cannot be reduced further and the match will fail.
x = 5
The right side is already in a reduced form as the term 5. The left side contains a single variable x. The two terms are structurally equal and hence the term on the right, 5 will be bound to x, which will make both sides equal.
2 + 3 = 5
error: cannot invoke remote function :erlang.+/2 inside a match
The right side is already in a reduced form as the term 5. The left side contains an expression 2 + 3. The left side must only contain elixir terms or variables. Hence an error is thrown.
x = 3
5 = 2 + x
The first matching operation binds the variable x to the term 3. In the second matching operation, the right side contains an expression 2 + x. It has to be reduced down to an elixir term and hence it is evaluated and reduced as 5. The left side contains an elixir term 5, which is equal to the right side and hence the match succeeds.
Pin operator
Now that we have a basic understanding of how pattern matching works, we will look into the pin operator ^
. From the above examples, it is clear that whenever there is a variable on the left side, it will be rebound with a new value from the right side during the matching process. Pin operator gives you the ability to stop rebinding a variable on the left side and instead treat the variable as a value. This will behave as if the variable has been replaced directly with its bound value in the matching expression. Hence only the equality check with its right side counterpart will be done and the variable will retain its previously bound value even after the pattern match.
x = 5 # 5 will be bound to x after pattern matching
x = 6 # 6 will be bound to x after pattern matching
^x = 7
** (MatchError) no match of right hand side value: 7
In the above examples, the first two matches happen with x as a variable on the left side. But in the third match, the pin operator is used for the variable x and this in turn behaves as if the value bound to x, 6 is substituted in place of x and used directly in the pattern match, mimicking 6 = 7. This leads to failure of the pattern match since the two sides could not be made equal. The variable x will retain its value 6 after the third pattern match. Please note that in order to use a pin operator on a variable, the variable must already be bound with a value. If not, this will raise an error.
^x = 5
error: undefined variable ^x. No variable "x" has been defined before the current pattern
Pattern matching collection types
Pattern matching simple terms such as integers, floats, atoms and booleans will work similar to the examples shown above. Let us now look at how pattern matching works for collection types such as lists, tuples, maps and binaries.
Tuples
Tuples are array based structures that store ordered heterogeneous elements. They are widely used as the return type in function calls. In order to pattern match the tuples, you should know the number of elements in the tuple on the right side in order to deconstruct the structure into multiple elements.
x = {1, 2}
The right side is already in a reduced form as the term {1, 2}. The left side contains a single variable x. Hence the term on the right, {1, 2} will be bound to x, which will make both sides equal.
{1, 2} = {1, 2}
The right side is already in a reduced form as the term {1, 2}. The left side also contains a tuple. The tuples on both sides are equal with the same size and same elements. Hence the pattern matching succeeds.
{a, b} = {1, 2}
a # 1
b # 2
The right side is already in a reduced form as the term {1, 2}. The left side also contains a tuple. The tuples on both sides are of the same size. The left side tuple’s elements are two variables. Hence the elements in the right side tuple will be bound to the respective variable in the left side tuple to make the two sides equal.
{a, 2} = {1, 2}
a # 1
The right side is already in a reduced form as the term {1, 2}. The left side also contains a tuple. The tuples on both sides are of the same size. The left side tuple’s elements are a variable and the term 2. The second element of left side tuple, 2, is the same as the second element in the right side tuple. The first element of the left side tuple is a variable which will be bound to the first element, 1, of the right side tuple to make both sides equal.
{a, b, c} = {1, 2}
** (MatchError) no match of right hand side value: {1, 2}
The right side is already in a reduced form as the term {1, 2}. The left side also contains a tuple. The tuples on both sides are not of the same size and hence the match fails.
{a, 3} = {1, 2}
** (MatchError) no match of right hand side value: {1, 2}
a = 2
{^a, b} = {1, 2}
** (MatchError) no match of right hand side value: {1, 2}
In the above examples, the match fails as the respective elements in the both tuples are not the same.
{a, a, 3} = {1, 2, 3}
** (MatchError) no match of right hand side value: {1, 2, 3}
{a, a, a, 3} = {1, 1, 2, 3}
** (MatchError) no match of right hand side value: {1, 1, 2, 3}
{a, a, 3} = {1, 1, 3}
a # 1
In the above examples the same variable a, is used more than once in the left side. In this case, in its first occurrence, the variable is bound with the respective right side value as usual. But during its successive occurrences, the variable will be treated as if a pin operator is being used, with its value taken from its first binding.
{a, a, a, 4} = {1, 2, 3, 4} is equivalent to {a, ^a, ^a, 4} = {1, 2, 3, 4} where the value of the variable a, when using the pin operator, will be 1, which was bound during the variable’s first occurrence.
Lists
Lists in elixir are built on top of linked lists which store ordered heterogeneous elements. For smaller lists, where you know the specific number of elements , the matching will be similar to tuples.
x = [1, 2]
[1, 2] = [1, 2]
[a, b] = [1, 2]
a # 1
b # 2
[a, 2] = [1, 2]
a # 1
[a, b, c] = [1, 2]
** (MatchError) no match of right hand side value: [1, 2]
[a, 3] = [1, 2]
** (MatchError) no match of right hand side value: [1, 2]
a = 2
[^a, b] = [1, 2]
** (MatchError) no match of right hand side value: [1, 2]
[a, a, 3] = [1, 2, 3]
** (MatchError) no match of right hand side value: [1, 2, 3]
[a, a, a, 3] = [1, 1, 2, 3]
** (MatchError) no match of right hand side value: [1, 1, 2, 3]
[a, a, 3] = [1, 1, 3]
a # 1
For larger lists where you don’t know the exact number of elements, the head and tail representation can be used to match the elements. Every list can be represented as a head and a tail pair. The head is the first element of the list and the tail is the list containing the rest of the elements. This pattern continues recursively until the end of the list, where the last tail is implicitly an empty list. The head and tail representation syntax of a list uses the cons |
operator to separate the head and tail of the list. The list [1, 2, 3, 4, 5] can be represented in the following different ways using the head and tail representation.
[1,2,3,4,5]
[1|[2,3,4,5]]
[1|[2|[3,4,5]]]
[1|[2|[3|[4,5]]]]
[1|[2|[3|[4|[5]]]]]
[1|[2|[3|[4|[5|[]]]]]]
Using the above, we can pattern match lists into head and tail values. Pattern matching of this kind is widely used in elixir for reading the lists using recursion.
[1 | tl] = [1, 2, 3, 4]
tl # [2, 3, 4]
[hd | tl] = [1, 2, 3, 4]
hd # 1
tl # [2, 3, 4]
[a, b | tl] = [1, 2, 3, 4]
a # 1
b # 2
tl # [3, 4]
[a | [b | tl]] = [1, 2, 3, 4]
a # 1
b # 2
tl # [3, 4]
[hd | tl] = [1]
hd # 1
tl # []
[hd | tl] = []
** (MatchError) no match of right hand side value: []
Maps
Maps in elixir are associative data structures that store key-value pairs. Pattern matching maps require the usage of keys to deconstruct the map structure. You need not know all the keys present in the map in order to perform pattern matching. On the left side, the keys must always be elixir terms or variables with pin operators. The keys cannot be variables. Multiple keys can be matched on the left side, provided that all the keys are present in the map on the right side. Pattern matching maps are mainly used to extract values of known keys and to assert if the map on the right side has certain keys and values in it. If a key used on the left side does not exist on the right side, then the match will fail. Please note that an empty map on the left side matches with any map on the right side, irrespective of its size.
map = %{:one => 1, :two => 2, :three => 3, :four=> 4, :five => 5}
%{val => 1} = map
error: cannot use variable val as map key inside a pattern.
Map keys in patterns can only be literals (such as atoms, strings, tuples,
and the like) or an existing variable matched with the pin operator
(such as ^some_var)
%{:one => val} = map
val # 1
%{:no_key => val} = map
** (MatchError) no match of right hand side value: %{one: 1, two: 2, three: 3, four: 4, five: 5}
x = :one
%{^x => val1, :two => val2, :three => 3} = map
val1 # 1
val2 # 2
%{:one => "one", :two => val2, :three => 3} = map
** (MatchError) no match of right hand side value: %{one: 1, two: 2, three: 3, four: 4, five: 5}
%{} = %{} # match succeeds
%{} = %{0 => :zero, 1 => :one} # match succeeds
Binaries
Binaries in elixir are sequences of bytes. Binary pattern matching involves reading bytes and bits from a binary as data of a particular type such as an integer, float, utf8, bits or bytes. Binary pattern matching is more complex than the other data types and requires deeper understanding of the subject. This article explains the binary and bitstring data type clearly and the different ways of binary pattern matching.
Underscore wildcard
The underscore _
character can be used as a wildcard on the left side of pattern matching, in place of values that are not of interest. They are used widely in function clauses and case statements. Let us consider a tuple of three elements. In case you need to extract only the second and third element of this tuple, you can use the wildcard in place of the first element. This is done to ensure that the match is successful with the same size of 3 on the left side tuple and also to ignore the first element of the tuple.
{_, b, c} = {1, 2, 3}
b # 2
c # 3
You could still use a variable instead of the wildcard and just not use it after, but this would lead to unnecessary binding of the values to variables that are not even used.
match?
Elixir also offers a match?/2
macro that takes in two expressions or terms as arguments and returns a boolean if both the arguments match each other.