Elixir : Internal data representation of Small Integers

Arunmuthuram M
4 min readAug 31, 2023

--

Even though elixir has a single integer datatype, the integers are internally represented as small integers and big integers based on the size they would take up for representation. In order to understand the internal working of erlang’s data representation, please go through this article that gives you an overview of the erlang’s tagging system which is required to understand this article. This article explains the internal data representation of the small integers in elixir.

Small Integers

A small integer is a whole number that is small enough to be stored within a single word along with its associated tag bits(metadata). In a 32-bit architecture, the size of a word is 4 bytes or 32 bits. So these 32 bits must contain both actual data and the meta data in form of the tag bits. The small integers fall under the immediate term category and the primary tag for immediate terms is the binary value 11. So, that would take up 2 bits. And the secondary immediate tag for small integers is also the binary value 11. That would take a total of four bits for tags. We now have 32-4 = 28 bits remaining for storing the actual data.

2’s Complement

Small integers follow 2’s complement to represent the signed numbers. In 2’s complement representation, the most significant bit is called the sign bit and the rest of the bits are called the magnitude bits. In our case, we have 28 bits for data and hence we have 27 magnitude bits and a single sign bit. The range of numbers that can be stored using 2’s complement is determined by the formula −2²⁷ to (2²⁷-1) which equals -134,217,728 and 134,217,727 for 32 bit architectures having 27 magnitude bits.

Example bit representation

Let us take the number 100,000 in a 32-bit system which falls within the range for 27 magnitude bits. The binary value of the number in 28 bits is 0000 0000 0001 1000 0110 1010 0000. Hence the representation would be…

Bit representation of positive small integer 100,000

Now let us take the negative number -150,000 in a 32 bit system which also falls within the range for 27 magnitude bits. In order to get the bit representation of this negative number, we must use 2’s complement. The first step is to get the binary representation of the magnitude 150,000 in our allotted 28 data bits which is 0000 0000 0010 0100 1001 1111 0000. Then we should invert all the digits, 1111 1111 1101 1011 0110 0000 1111. Now we add one to this value which would give us 1111 1111 1101 1011 0110 0001 and this is what goes into our 28 bits allotted for the actual data.

Bit representation of negative small integer -150,000

In order to convert the above bit representation back to actual numbers, you first see the primary tag which is 11 for the immediate term. Then you check the secondary immediate tag which is again 11, meaning a small integer. Then you check the most significant bit for the sign of the number.

If the sign bit contains the value 0, then it is a positive number and so you convert all 28 bits of binary representation directly into a decimal number to get the magnitude of the number.

If the sign bit contains the value 1, then it is a negative number. You take all the 28 bits of binary representation and compute the 2’s complement again by inverting all the bits and then adding one to it. Then you convert the resulting binary value into decimal to get the magnitude of the number.

The advantages of using 2’s complement for representing signed numbers is that the operations addition, subtraction and multiplication are easier to handle. Since you don’t need to represent the number zero in two ways(-0 and +0), you get an additional spot to represent an extra number in the negative end. And this is why, when you see the range of signed integers that you can store using 2’s complement, the negative side’s magnitude is always one more than the positive side’s magnitude e.g. -128 to 127 for an 8 bit capacity.

--

--

No responses yet