Elixir : Basics of Sigils

Arunmuthuram M
8 min readFeb 3, 2024

--

Sigils in Elixir are a syntactic sugar mainly used to customise and simplify the representation and definition of text and other data types in source code. They are internally replaced by calls to functions or macros that execute and expand into literals and structs. Elixir offers many built-in sigils that are used to define charlists, strings, lists, regular expressions, date and time structs etc.

Syntax

A sigil starts with the tilde ~ symbol followed by a single lowercase character or a group of uppercase characters, representing the internal function/macro to call. After the tilde symbol and a single/group of characters is a pair of delimiters that contain input data within them. Elixir allows character pairs such as //, ||, "", '', (), [], {} and <> to be valid delimiters in sigils. The data contained within the delimiters will be converted into a binary and will be passed in as the first argument of the internal function/macro call. The sigil may contain a series of characters after the closing delimiter that can be used as flags. They are internally converted into a charlist and passed in as the second argument of the internal function/macro call.

~W[word1 word2 word3] # creates a list of strings
["word1", "word2", "word3"]

~W(atom1 atom2 atom3)a # creates a list of atoms
[:atom1, :atom2, :atom3]

Built-in sigils

  • ~c and ~C sigils create a charlist out of the provided binary. ~c unescapes characters and performs interpolation before generating the charlist while ~C doesn’t. ~C creates the charlist literally from the provided data.
list = ~c[test\n]
~c"test\n"

Enum.each(list, &IO.inspect/1)
116 # t
101 # e
115 # s
116 # t
10 # newline - unescapes \n into a single newline character
:ok

~c[123#{1 + 3}]
~c"1234" # performs interpolation
list = ~C[test\n]
~c"test\\n"

Enum.each(list, &IO.inspect/1)
116 # t
101 # e
115 # s
116 # t
92 # \ does not unescape and instead contains a literal \ and a n character
110 # n
:ok

~C[123#{1 + 3}]
~c"123\#{1 + 3}" # does not perform interpolation
  • ~s and ~S sigils create a string out of the provided data and also support the heredoc multiline syntax """. ~s performs unescaping and interpolation while ~S doesn’t. The ~S sigil is used widely to create readable strings without needing to explicitly escape characters.
str = ~s[test\ttest]
"test\ttest"

IO.puts(str)
test test # unescapes \t into a tabspace character
:ok

~s[123#{2 ** 2}]
"1234" # performs interpolation

# heredoc syntax
allowed_escapes = ~s"""
"\\" – single backslash
"\\a" – bell/alert
"\\b" – backspace
"\\d" - delete
"\\e" - escape
"\\f" - form feed
"\\n" – newline
"\\r" – carriage return
"\\s" – space
"\\t" – tab
"\\v" – vertical tab
"\\0" - null byte
"\\x61" - "\x61" - hexadecimal
"\\u{1F600}" - "\u{1f600}" - unicode
"""

IO.puts(allowed_escapes)
"\" – backslash
"\a" – bell
"\b" – backspace
"\d" - delete
"\e" - escape
"\f" - form feed
"\n" – newline
"\r" – carriage return
"\s" – space
"\t" – tab
"\v" – vertical tab
"\0" - null byte
"\x61" - "a" - hexadecimal
"\u{1F600}" - "😀" - unicode

:ok
str = ~S[test\ttest]
"test\\ttest"

IO.puts(str)
test\ttest # does not unescape \t into a tabspace character
:ok

~S[123#{2 ** 2}]
"123\#{2 ** 2}" # does not perform interpolation

# heredoc syntax
allowed_escapes = ~S"""
"\" – backslash
"\a" – bell
"\b" – backspace
"\d" - delete
"\e" - escape
"\f" - form feed
"\n" – newline
"\r" – carriage return
"\s" – space
"\t" – tab
"\v" – vertical tab
"\0" - null byte
"\x61" - "a" - hexadecimal
"\u{1F600}" - "😀" - unicode
"""

IO.puts(allowed_escapes)
"\" – single backslash
"\a" – bell/alert
"\b" – backspace
"\d" - delete
"\e" - escape
"\f" - form feed
"\n" – newline
"\r" – carriage return
"\s" – space
"\t" – tab
"\v" – vertical tab
"\0" - null byte
"\x61" - "a" - hexadecimal
"\u{1F600}" - "😀" - unicode

:ok
  • ~w and ~W sigils split the given input binary and create a list of terms. Similar to the character and string sigils, the lower case ~w performs unescaping and interpolation while the upper case ~W doesn’t. They also support any one of the three flag options. The default flag option s creates a list of strings, a creates a list of atoms and c creates a list of charlists.
~W[word1 wo\trd2 word3] 
["word1", "wo\\trd2", "word3"]

~w[word1 wo\trd2 word3]s
["word1", "wo", "rd2", "word3"]

~w[atom1 atom#{1+1} atom3]a
[:atom1, :atom2, :atom3]

~W[chlist1 chlist#{1+1} chlist3]c
[~c"chlist1", ~c"chlist\#{1+1}", ~c"chlist3"]
  • ~r sigil is used for defining regex expressions. It performs unescaping, interpolation and its internal macro issues a call to the Regex.compile!/2 function. Hence it supports all the flags or modifiers supported by Regex.compile. Even though any one of the 8 supported delimiters can be used for the regex sigil, the // delimiter pair is commonly used. This is because the other delimiters are often part of the regex and so may require explicit escaping of the delimiter characters, making it harder to read and to define regex expressions.
regex = ~r/^[1-5a-z]*$/ 
~r/^[1-5a-z]*$/

Regex.matches(regex, "abc123")
true

Regex.matches(regex, "Abc128")
false
  • ~T sigil creates and returns a Time struct. The internal macro parses the input binary, extracts different parts such as hours, minutes, seconds, microseconds and calendar, validates them and generates a Time struct. The input binary should be in the format hh:mm:ss or hh:mm:ss.sss. The default calendar is Calendar.ISO and need not be explicitly provided in the binary data. Any custom calendar implementation created by implementing the Calendar behaviour can be used with ~T sigil using the format customTimeFormat customCalendar
time = ~T[13:34:45.123456]
~T[13:34:45.123456]

time.hour
13

time.calendar
Calendar.ISO

~T[06:10:00 Calendar.ISO]
~T[06:10:00]
  • ~D sigil creates and returns a Date struct similar to the ~T sigil. It takes in data in the format yyyy-mm-dd with Calendar.ISO as its default calendar. It can also be used with a custom calendar with the format of customDateFormat customCalendar.
date = ~D[2024-02-25]
~D[2024-02-25]

date.year
2024
  • ~N sigil creates and returns a NativeDateTime struct similar to the above mentioned date and time sigils. It requires the input binary in the formats yyyy-mm-dd hh:mm:ss, yyyy-mm-ddThh:mm:ss,
    yyyy-mm-dd hh:mm:ss.sss or yyyy-mm-ddThh:mm:ss.sss. The default calendar is Calendar.ISO and custom calendars can be used with the binary data format customNativeTimeFormat customCalendar.
native_date_time = ~N[2024-02-25T23:00:07]
~N[2024-02-25 23:00:07]
  • ~U sigil creates and returns a DateTime struct in UTC timezone. It requires the input binary data in the same formats of ~N sigil mentioned above, with an additional Z or +00:00 at the end to denote the UTC timezone offset. Similar to the other data and time sigils, the default calendar is Calendar.ISO and alternate custom calendars can also be used.
utc_date_time = ~U[2024-02-25 23:00:07Z]
~U[2024-02-25 23:00:07Z]

utc_date_time.time_zone
"Etc/UTC"

utc_date_time2 = ~U[2024-02-25T23:00:07+00:00]
~U[2024-02-25 23:00:07Z]

Creating custom sigils

Sigils in Elixir are extensible and custom sigils can be defined and used similar to the built-in sigils. Every defined sigil has an internal function or macro that will be called whenever a sigil is used in code. The names of these internal functions/macros must be sigil_ followed by the single lower case letter or the group of upper case letters present after the ~ in the sigil’s construct. All of the built-in sigils mentioned above have associated macros defined in the Kernel module which will be called internally when a sigil is used in code.

~W(list1 list2 list3)s
["list1", "list2", "list3"]

Kernel.sigil_W(<<"list1 list2 list3">>, [?s])
["list1", "list2", "list3"]
--------------------------------------------------------------------------
~s => Kernel.sigil_s/2
~S => Kernel.sigil_S/2
~N => Kernel.sigil_N/2

Creating a custom sigil involves choosing a lower case character or a group of upper case characters that are not being used by any other sigil. Once the identifier characters are chosen, then a public function/macro with the signature sigil_{identifier_character}/2 can be defined inside a module. It takes in the input binary data present within the delimiters as the first argument and the flags charlist present after the delimiters as the second argument. If no data is provided in the sigil, then an empty binary is passed into the internal function/macro and similarly if no flags are used in the sigil, then an empty charlist is passed into the internal macro/function. The function can read, parse and manipulate the input binary and flags to create the required literal or struct and return them. In order to use the custom sigil in code, the module defining the internal function/macro must first be imported.

Let’s now create a custom sigil ~TIME that returns the current UTC time without the microseconds. It doesn’t need any input data and could support two flags such h for 24h format and a for am/pm format. You could take in a timezone or an offset as input data and return the current time in that particular timezone or offset, but let’s keep it simple.

defmodule CustomSigil do
def sigil_TIME(<<>>, []), do: sigil_TIME(<<>>, [?h])
def sigil_TIME(<<>>, [?h]) do
Time.utc_now() |> to_string |> String.split(".") |> hd
end
def sigil_TIME(<<>>, [?a]) do
t = Time.utc_now()
{h_12, am_pm} = convert_24_to_12h(t.hour)
"#{pad(h_12)}:#{pad(t.minute)}:#{pad(t.second)} #{am_pm}"
end

defp pad(n) do
if n < 10, do: "0#{n}", else: "#{n}"
end

defp convert_24_to_12h(hour_24) do
cond do
hour_24 == 0 -> {12, "am"}
hour_24 > 12 -> {hour_24 - 12, "pm"}
true -> {hour_24, "am"}
end
end
end
---------------------------------------------------------------------------
import CustomSigil

~TIME//
"12:50:27"

~TIME//h
"12:50:30"

~TIME//a
"12:50:35 pm"

Sigil functions vs macros

The custom sigil that we have defined above, had a function sigil_TIME/2 that was called internally. But all of the built-in sigils that we have mentioned above have an associated macro that gets called internally. Sigils support both functions and macros, but the difference is that the macros get executed and expanded during compilation, while functions get executed during runtime. In our case, the custom sigil ~TIME does not return a static value and is instead dynamic. Every time we use the sigil, we expect it to define the current time and hence it requires a function that gets executed in runtime. But for the built-in macros, the output depends only on the input binary and the flags. Hence an internal macro is more suitable for this case, as it gets executed during compile time, replacing the sigil construct with the literal.

Consider the ~W sigil which takes in the input binary and splits them to create a list of terms. If this was internally a function, then everywhere the sigil is used, the internal function will get executed in runtime thus performing the binary split and creation of terms during runtime. But instead, it is internally a macro that performs the binary split and list creation during compilation, replacing the sigil construct with the list literal before creating the bytecode. This way, the parsing of binary and term creation in runtime is avoided. Thus it is essential to identify the actual use case and decide whether to use macros or functions internally for the sigils.

--

--

No responses yet