Elixir : Internal data representation of Float datatype
Elixir follows IEEE 754 double precision-64 bit representation of floating point numbers which is widely followed among many programming languages. In order to understand the internal working of elixir’s data representation, please go through this article that gives you an overview of the underlying erlang’s tagging system, which is required to understand this article. This article explains the internal data representation of the float datatype in elixir.
Boxed term
The float datatype in elixir follows the same representation as its internal erlang type system. Float falls under the boxed term category where the actual data and the metadata is stored in the heap. Whenever a floating point number is defined, a boxed term is created in the stack or the heap depending on where the float is defined. When the float is directly bound to a variable in a function, the boxed term will be created in the stack as a local variable. But, when the float is a part of or an element of another data structure such as a list or a tuple, the boxed term for the float will be created in the heap. The boxed term will have its associated value 10
as its primary tag bits in the least significant end. The rest of the bits will have a pointer to the location of a header word present in the heap.
Header term
The header term that is being pointed to, by the pointer in the boxed term, contains its primary tag bits as 00
in the least significant end and the next four least significant bits will contain the secondary tag 0110
, indicating that the type of data that is being stored is float. The rest of the bits of the header word will contain the arity of the data words which are fixed in this case. The arity will be 1 for a 64-bit system and 2 for a 32-bit system. This is because erlang follows the IEEE 754 double precision standard, which requires a total of 64-bits(1 word for 64-bit systems and 2 words for 32-bit systems) to represent the whole data. And this is the reason that the arity of data words is fixed for the float datatype.
Actual data bit representation
The header term present in the heap will be followed by the contiguously stored data words that hold the actual data in terms of bits. The actual data’s bit representation conversion rules are defined by the IEEE 754 double precision standard. To understand the basics and conversion of floating point numbers to bit representation and vice versa, check out these following resources, Floating point numbers, The IEEE 754 Format, decimal to IEEE 754 bit representation conversion, and IEEE 754 bit representation to decimal conversion. If you want to play around with the bits to see how it affects its decimal value, check out Float toy. Once the actual data is converted into its bit representation, data bits will be stored in the data words. In a 32-bit system, data will be distributed in two words. Hence a floating point number takes up a total of 3 words in a 64-bit system and 4 words in a 32-bit system.
When it comes to representing floating point numbers, the main attributes are its range, which refers to how wide(max and min numbers) the representation can go, and its precision, which refers to how many digits it can accurately represent before rounding off. The exponent bits in the IEEE 754 system correspond to the range and the mantissa bits correspond to the precision. Even though IEEE 754 has special notations such as infinity, NaN etc, elixir and erlang does not represent these values and typically throws an ArithmeticError to handle edge case scenarios when dealing with floating point numbers.
Example
Now, let’s see a real example. Let us take the number -250.125 in a 64-bit system. Let us follow the IEEE 754 double precision standard and convert the decimal number to its bit representation. In the 64-bits available, the first bit will be the sign bit. 1 for a negative number and 0 for a positive number. In our case the first bit will be a 1. The next 11 bits are the exponent bits and the rest of the 52 bits will be the mantissa bits. First, let us convert the magnitude into its binary representation. The number 250’s binary equivalent is 11111010
and the fractional part .125’s binary equivalent is .001
, thus giving us the binary value 11111010.001
Let us now normalise the binary equivalent which would give us the value,1.1111010001 x 2⁷
. So the mantissa bits would be the digits after the decimal separator, 1111010001
. The exponent value is 7 which should be adjusted using the bias for double precision which is 1023. The adjusted exponent value would be 1023 + 7 = 1030, whose binary equivalent is 10000000110
, which would be our exponent bits. We finally have all the three parts of the bit representation which would go into the data word.
Now in order to decode the bit representation, first you have a term with the primary tag 10
and so you would know that it is a boxed term. You can get the pointer from the rest of the bits and follow the pointer to the header word in the heap which has the primary tag 00
. Now you get the secondary tag in the header word which is 0110
, indicating a float. Then you get the arity from the rest of the bits in the header word, which is 1 for a 64-bit system. Then you access the next word in memory right after the header word to access the actual data. The data is encoded as per the IEEE 754 double precision standard which means that the first bit is the sign bit. Our first bit, 1 indicates a negative number. The next 11 bits contain the adjusted exponent whose actual value can be obtained by subtracting the bias(1023) from the decimal value of the 11 exponent bits, giving us the value of the actual exponent, 1030–1023 = 7. The mantissa bits can then be converted into their equivalent decimal fraction value and then can be added to one, which gives us 0.9541015625 + 1 = 1.9541015625. Then this value can be multiplied with the exponent to produce the final decimal number. 1.9541015625 x 2⁷ = -250.125