Diving Deep into Data Alignment

Recently I came across an interesting topic on memory access optimization named data alignment. I know that struct can be seen as public class, and the performance of struct of array is better than array of struct thanks to contiguous memory. But how can the order of elements in a class/struct make a difference?

With those questions in mind, I did a little research. This post is organized in the following way:

  • What is data alignment
  • Calculating the size of struct
  • Optimizing with data alignment

1. What is Data Alignment

Consider a simple struct as following,

struct S {
    short a;    // 2 bytes
    int b;      // 4 bytes
    char c, d;  // 1 byte per char
}st;

You may naturally think the total size of st is simply the sum of all members, which is 8. However, sizeof(st) will give you 12.

This happens because gcc/g++ by default enables data alignment. Data alignment is a technique used to fit data into approperate boundary in memory, so that we can reduce the number of memory access. It facilitates generating single-instruction fetches and puts of the typed data. Without alignment constraints, on the other hand, the code might end up having to do two or more accesses spanning machine-word boundaries.

Without alignment, it would be laid out in memory like this (assuming a 32-bit architecture)1:

 0 1 2 3 4 5 6 7
|a|a|b|b|b|b|c|d|  bytes
|       |       |  words

The problem is that on some CPU architectures, the instruction to load a 4-byte integer from memory only works on word boundaries. So your program would have to fetch each half of st.b with separate instructions. This is going to penalize you when you need to access st.b for many time

But if the memory was laid out as:

 0 1 2 3 4 5 6 7 8 9 A B
|a|a| | |b|b|b|b|c|d| | |
|       |       |       |

Then access to b becomes straightforward. (The disadvantage is that more memory is required, because of the padding bytes.)

Now that we know what is data alignment, how to use it?

Alignment Requirements

From C++ standard,

Every object type has the property called alignment requirement, which is an integer value (of type std::size_t, always a power of 2) representing the number of bytes between successive addresses at which objects of this type can be allocated. 2

In simple English, it means the size of word boundaries should be a small power of two, and the data type can only start at an offset that is an integer mulltiple of it. Usually alignment requirement falls into {1, 2, 4, 8}, and gcc by default uses 4. This can be overriden by user, and we will see how soon.

Primitive Types

For primitive data types, it’s common that:

  • char is 1-byte aligned
  • short is 2-byte aligned
  • int, float, and pointers (on 32-bit systems1) is 4-byte aligned
  • double is 8-byte aligned

The alignment requirement of a data type can be queried using std::alignment_of.

malloc is required by the C standard to return a pointer that’s properly aligned for any data type. 3

Class Objects

Not many posts on the internet cover data alignment with class objects. I think it is mainly because for class objects, things are a little more complicated and undetermined. Due to the unspecified order of class members (pre-C++11 standard), I shall also restrict the discussion to struct, whose order is determined.


According to C++11 standard,

Nonstatic data members of a (non-union) class with the same access control (Clause 11) are allocated so that later members have higher addresses within a class object. The order of allocation of non-static data members with different access control is unspecified.

It means that for the following code, address of x<y<z, since they are under the same access control.

struct foo { public: int x; public: int y; public: int z; };

However, there is another story before C++11, where the order of class member without an intervening access-specifier is unspeficied.


At this step, it is natural that we want to see some examples. I shall begin by calculating the size of a struct instance.

2. Calculating the size of struct

Rule of Thumb for calculating

By following the rules below, you should get the same result of sizeof.4

  1. Member Alignment: For the members of union or struct, the first member is placed at index 0 (offset=0), and succeeding elements will be placed at an integer multiple of the size of member, or sub-member of the member. For example, int takes 4 bytes on 32-bit/64-bit machine, thus int should be placed at index [0,4,8,…].

  2. Struct as a Member: If a struct A contains member struct B, then the offset of member struct B should be the integer multiple of largest element of B. For example, if struct A contains struct B, where B contains char, int, double, then B should start with offset = 8*n because double is the largest member in B and it is 8-byte aligned.

  3. Ending: Total size of struct must be equal to the integer multiple of its largest member. Padding will be applied where necessary.

Example 1

#include <iostream>

typedef struct b
{
    int id;         // [0,3], Rule 1
    double height;  // [8,15], Rule 2
    float weight;   // [16,19]+[20,23], need to pad for double -> Rule 3
}SubRecord;


typedef struct a
{
    char name[2];   // [0,1]
    int age;        // [4,7]
    bool gender;    // [8,23], need to pad for SubRecord
    SubRecord rec;  // [16,39], starting position is determined by SubRecord.height,
                    //         length is determined by SubRecord
    double score;   // [40,47]
    float bmi;      // [48,51]
    short grade;    // [52,53], padded by [54,55] so that total size of Record 
                    //         is devisible by 8
}Record;

using namespace std;

int main()
{
    Record rec;
    cout<<"Size of Record: "<<sizeof(rec)<<endl;
    cout<<"Size of subrecord: "<<sizeof(rec.rec)<<endl;

    return 0;
}

Try to compile it on your machine. You should get the following output:

Size of Record: 56
Size of subrecord: 24

Is this what you expected?

A rule above all: #pragma pack(n)

We know now that compiler will insert padding between members to ensure that they are aligned to appropriate addresses in memory. Thus, we can avoid the performance penalty (or outright error).

#pragma pack(n) lets compiler pack structure members using particular alignment size n. In plain English, it sets n to be the size of word boundaries for each struct member. Hence, it effects the number of padding bits. For larger n, more bits will be needed to pad smaller data types, while the performance will be improved for accessing larger data types.

If you change the alignment of a structure, it may not use as much space in memory, but you may see a decrease in performance or even get a hardware-generated exception for unaligned access.

For example, for Test

struct Test
{
   char AA;
   int BB;
   char CC;
};

gcc by default will use 4-byte alignment on Test, thus it looks like the following:

|   1   |   2   |   3   |   4   |  

| AA(1) | pad.................. |
| BB(1) | BB(2) | BB(3) | BB(4) | 
| CC(1) | pad.................. |

where it takes 4*3 = 12 bytes.

However, if you specify #pragma pack(1) beforehand, the alignment result will look like:

|   1   |

| AA(1) |
| BB(1) |
| BB(2) |
| BB(3) |
| BB(4) |
| CC(1) |

and Test will take 1+4+1 = 6 bytes. This is commonly used to reduce the padding.

If you use #pragma pack(2), the alignment may look like:

|   1   |   2   | 

| AA(1) | pad.. |
| BB(1) | BB(2) |
| BB(3) | BB(4) |
| CC(1) | pad.. |

and Test will use 4*2 = 8 bytes.

Revising the Rule of Thumb

Therefore, we need to revise our third rule and add n to restrict the maximum size of memory cell.

  1. Ending: The alignment of a member will be on a boundary that is either a multiple of n or a multiple of the size of the member, whichever is smaller. Padding will be applied when necessary.

3. Optimizing with data alignment

At this point, you know how data alignment works, and how to override the default behaviour using #pragma pack(n) to reduce memory footprint or increase efficiency for larger data types. Although there is no general rule of thumb, you can reorder your members by placing the larger ones in front and fit the smaller one below, just like playing LEGOs.

For more examples, I would strongly recommend Reference 1.

Reference

  1. http://www.catb.org/esr/structure-packing/
  2. https://msdn.microsoft.com/en-us/library/2e70t5y1.aspx

  1. It dependes on the compiler’s implementation. 64-bit/32-bit are merely the size of register, yet it is the compiler’s responsibility to decide which implementation to use. To know the actual size of data, use sizeof()↩︎

  2. https://en.cppreference.com/w/cpp/language/object#Alignment ↩︎

  3. https://en.cppreference.com/w/cpp/language/alignas ↩︎

  4. https://blog.csdn.net/hairetz/article/details/4084088 (in Mandrin) ↩︎

Ziji SHI(史子骥)
Ziji SHI(史子骥)
Ph.D. candidate

My research interests include distributed machine learning and high-performance computing.

Related