Thursday, February 02, 2006

Data alignment

Data alignment is (very?) important; or at least is very important to be aware of it when you write your code.
Consider this:
struct Broadcast
char timezone; // 1 byte data
int frequency; // 4 byte data
short timeofday; // 2 byte data

applying the sizeof operator on the Broadcast (sizeof can be applied to a class) the result is 12 on a 32 bit operating system.
this is because compilers usually round up a data structure's size to make it divisible by 2, 4, or 8—depending on the hardware's memory alignment requirements.

Most CPUs require that objects and variables reside at particular offsets in the system's memory. For example, 32-bit processors require a 4-byte integer to reside at a memory address that is evenly divisible by 4. This requirement is called "memory alignment". Thus, a 4-byte int can be located at memory address 0x2000 or 0x2004, but not at 0x2001. On most Unix systems, an attempt to use misaligned data results in a bus error, which terminates the program altogether. On Intel processors, the use of misaligned data is supported but at a substantial performance penalty.

Secondly, compilers insert padding bytes between data members to ensure that each member's address is properly aligned. The problem is that when members are declared in a random order, the compiler may need to insert more padding bytes between them, thereby inflating the data structure's size.

The padding which compiler is doing is because each data type must be in memory at memory addressed which are divisible by their own size.
for example:
an int object should be always at a 4 divisible memory address
a short object should be always at a 2 divisible memory address
an double object should be always at a 8 divisible memory address

Hence Broadcast after compilation will be like this:
struct Broadcast
char timezone; // 4 byte data (padded with 3 bytes)
int frequency; // 4 byte data
short timeofday; // 4 byte data (padded with 2 bytes)

without padding the data looks like this:
0 1 5 7 8 12
1 1111 11 1 1111 1 1

this is taken from this:

also look here

you can think that even if you try to arrange in this optimized way the structures, the processor might take some time for example for this:
char timezone; // 1 byte data //gets 1 padding byte
short timeofday; // 2 byte data -> these 2 will get into 4 byte memory location
int frequency; // 4 byte data

which is properly aligned (each member is on a good memory boundary)
you can think that, on a 32 bit processor for example, reading the short variable within the 4 byte address( the 4 byte memory location will contain the char, a padding byte and the 2 bytes from short ) might take some time, but it doesnt.
the main point of this alignament is this: the goal is to have a structure where member data does not get bad aligned:
when a variable value needs to be read, the CPU should not be forced (due to bad alignament) to read the value from 2 memory locations (on the 32 bit CPU).
for example, if an int (4bytes) variable is not on a memory address divisible by 4, this means that the variable 'stay' in 2 memory addresses.
the CPU in this case needs to fetch 2 memory addresses, extract the parts from each other and concatenate them in order to get the value.

the compiler by default (if you dont change the default #pragma) will try to aligned the structs on a 2, 4 or 8 respectively boundary in order to avoid the CPU to get into the above described overhead; what it does not do is to arrange for you the structs members such that to take less size (which you should do).

No comments: