Elixir/OTP : Basics of Partition Supervisors

Arunmuthuram M
5 min readFeb 25, 2024

--

Partition supervisors are supervisors that create multiple numbers/partitions of the same child under it using a single child spec. They are used mainly to distribute work or load via multiple independent isolated children thus avoiding bottlenecks caused due to using a single child process.

Starting a partition supervisor

A partition supervisor can be started using the PartitionSupervisor.start_link/1 function that takes in a keyword list of options as its argument. The different supported options are :name, :child_spec, :partitions, :strategy, :max_restarts, :max_seconds and :with_arguments. The :name and :child_spec options are mandatory while the other options have default values and are optional. The function returns {:ok, partition_supervisor_pid} on success. Calling this function will link the caller process and the partition supervisor process. Hence abnormal exits in one of the processes will terminate the other, if exits are not trapped. Normal exit of the caller process will also shutdown the spawned partition supervisor process along with its children.

  • The :name option takes an atom that can be registered to the created supervisor process and the :child_spec option takes in either a child spec map, a module or a 2-element tuple of format {module, child_spec_arg}, representing the child process that will be created under the partition supervisor.
  • The :partitions option takes in a positive integer indicating the total number of partitions/children that should be created under the supervisor. Its default value is the number of schedulers operating in the BEAM VM. By default, BEAM VM employs one scheduler per core and this number can be obtained by using the System.schedulers_online function.

For every partition created, the id for each partition will be set as a
0-indexed incremental integer. Hence if 8 partitions are created, the ids will range from 0 to 7. If the child spec already has an :id key, then it will be ignored.

defmodule ChildAgent do
use Agent
def start_link(_), do: Agent.start_link(fn -> [] end)
end
---------------------------------------------------------------------------
PartitionSupervisor.start_link(name: PartSup, child_spec: ChildAgent)
{:ok, #PID<0.116.0>}

PartitionSupervisor.which_children(PartSup)
[
{7, #PID<0.124.0>, :worker, [ChildAgent]},
{6, #PID<0.123.0>, :worker, [ChildAgent]},
{5, #PID<0.122.0>, :worker, [ChildAgent]},
{4, #PID<0.121.0>, :worker, [ChildAgent]},
{3, #PID<0.120.0>, :worker, [ChildAgent]},
{2, #PID<0.119.0>, :worker, [ChildAgent]},
{1, #PID<0.118.0>, :worker, [ChildAgent]},
{0, #PID<0.117.0>, :worker, [ChildAgent]}
]

These partition ids will be mapped with the partition process’s PID and is stored internally under the :name value using ets tables. The PID of a partition can be looked up using a :via tuple, which in turn looks up the information stored in the ets tables. Similar to how a registered :name atom can be used to identify a process, these :via tuples can be used in different functions to identify a process by looking up its PID. The structure of a :via tuple that can be used to identify a partition process’s PID is {:via, PartitionSupervisor, {name_of_the_supervisor, routing key}}. Instead of directly using the partition id to identify a partition, any term can be used as a routing key that will be internally converted into a partition id present within the available range. A single key will always point to the same partition id as long as the total number of partitions stays the same. This is ensured by internally using modulo partitioning for integer keys and :erlang.phash2/2 for other terms.

total_partitions = 8

get_partition_id = fn
routing_key when is_integer(routing_key) -> rem(abs(routing_key), total_partitions)
routing_key -> :erlang.phash2(routing_key, total_partitions)
end
----------------------------------------------------------------------------
get_partition_id.(4)
4

get_partition_id.(9)
1

get_partition_id.(9)
1

get_partition_id.("key1")
6

get_partition_id.("key1")
6
defmodule ChildAgent do
use Agent
def start_link(_), do: Agent.start_link(fn -> [] end)
end
---------------------------------------------------------------------------
PartitionSupervisor.start_link(name: PartSup, child_spec: ChildAgent)

Agent.get({:via, PartitionSupervisor, {PartSup, 0}}, &(&1))
[]

Agent.update({:via, PartitionSupervisor, {PartSup, 0}}, &([0 | &1]))
:ok

Agent.update({:via, PartitionSupervisor, {PartSup, 6}}, &([6 | &1]))
:ok

Agent.get({:via, PartitionSupervisor, {PartSup, 0}}, &(&1))
[0]

Agent.get({:via, PartitionSupervisor, {PartSup, 6}}, &(&1))
[6]

Please note that if any of the partitions are terminated and not restarted again, any routing key that points to the terminated PID, when used in a function, will lead to an error since the PID mapped to the partition id is not alive. But when a terminated child is restarted, the new PID of the partition will be automatically updated for the respective partition id.

  • The :strategy, :max_restarts and :max_seconds options behave the same way as when they are used in a normal supervisor.
  • The :with_arguments option takes in a two-arg anonymous function that lets you modify the arguments passed in to the :start key mfa of the child spec. A list of already present arguments in the :start key mfa is passed in as the first argument and the partition id of the particular child process is passed in as the second argument. The existing arguments can be modified and the partition id can be injected into the arguments and returned from the anonymous function. The modified arguments returned from the anonymous function will then be passed in when calling the :start key mfa to start the child process. This option is mostly used to inject the partition id into the arguments when the child process needs access to its associated partition id.
defmodule ChildAgent do
use Agent
def start_link(opts) do
agent_name = :"Agent#{opts[:partition_id]}"
Agent.start_link(fn -> [] end, name: agent_name)
end
end
---------------------------------------------------------------------------
anon_fn = fn [opts], partition_id -> [Keyword.put(opts, :partition_id, partition_id)] end

sup_opts = [name: PartSup, child_spec: ChildAgent, with_arguments: anon_fn]
PartitionSupervisor.start_link(sup_opts)

Process.whereis(:Agent1) == GenServer.whereis({:via, PartitionSupervisor, {PartSup, 1}})
true

Agent.get(:Agent1, &(&1))
[]

Agent.update(:Agent2, &[2 | &1])
:ok

Agent.get(:Agent2, &(&1))
[2]

A partition supervisor is typically started under another high level supervisor and since the PartitionSupervisor module contains child_spec/1 and init/1 functions, they can be directly passed in as children to other supervisors.

defmodule ChildAgent do
use Agent
def start_link(_), do: Agent.start_link(fn -> [] end)
end
---------------------------------------------------------------------------
partition_child_spec = {PartitionSupervisor, name: PartSup, child_spec: ChildAgent}
{:ok, sup_pid} = Supervisor.start_link([partition_child_spec], strategy: :one_for_one)

Supervisor.which_children(sup_pid)
[{PartSup, #PID<0.150.0>, :supervisor, [PartitionSupervisor]}]

PartitionSupervisor.which_children(PartSup)
[
{7, #PID<0.158.0>, :worker, [ChildAgent]},
{6, #PID<0.157.0>, :worker, [ChildAgent]},
{5, #PID<0.156.0>, :worker, [ChildAgent]},
{4, #PID<0.155.0>, :worker, [ChildAgent]},
{3, #PID<0.154.0>, :worker, [ChildAgent]},
{2, #PID<0.153.0>, :worker, [ChildAgent]},
{1, #PID<0.152.0>, :worker, [ChildAgent]},
{0, #PID<0.151.0>, :worker, [ChildAgent]}
]

Module functions

Other than the start_link/1 function, the PartitionSupervisor module contains functions such as count_children/1, which_children/1 and stop/3 which behave the same way as their Supervisor module counterparts. The partitions/1 function returns the total number of child partitions present under a partition supervisor.

defmodule ChildAgent do
use Agent
def start_link(_), do: Agent.start_link(fn -> [] end)
end
---------------------------------------------------------------------------
PartitionSupervisor.start_link(name: PartSup, child_spec: ChildAgent)

PartitionSupervisor.partitions(PartSup)
8

--

--