Understanding C++ Performance (1) : Stack and Heap

Last updated on Mar 16, 2021 4 min read

One thing about C++ that attracts me a lot is that it gives me control over the program to a great extent. Therefore, I am starting an article series on why C++ is efficient from the implementation perspective. I have also benchmarked some of the claims and will post them when they are ready.

I hope this article will be helpful to you.

Stack vs Heap

Stack

The stack is the memory set aside as scratch space for a thread of execution. When a function is called, a block is reserved on the top of the stack for local variables and some bookkeeping data. When that function returns, the block becomes unused and can be used the next time a function is called. The stack is always reserved in a LIFO (last in first out) order; the most recently reserved block is always the next block to be freed. This makes it really simple to keep track of the stack; freeing a block from the stack is nothing more than adjusting one pointer, which makes it fast.

Heap

The heap is memory set aside for dynamic allocation. For instance, when you do new or malloc. Unlike the stack, there’s no enforced pattern to the allocation and deallocation of blocks from the heap; you can allocate a block at any time and free it at any time. This makes it much more complex to keep track of which parts of the heap are allocated or free at any given time; there are many custom heap allocators available to tune heap performance for different usage patterns.

Furthermore, unless RAII ideology is adopted, the memory used in heap must be freed manually after use, which may caused overhead. delete is okay to use as long as it does not involve system call, whereas free is expensive, since free itself have several hundreds lines of code.

Each thread gets a stack, while there’s typically only one heap for the application (although it isn’t uncommon to have multiple heaps for different types of allocation).

Thread vs Process

Heap is shared across threads of same process, while stack is private to each thread

Picture above makes it clearer that : heap is shared within threads of the same process, whereas each thread gets it own stack.

Operating system(OS) manages the memory for different processes so that they won’t mess up with each other. The size of the heap is set when application starts, but can grow as more space is needed (the allocator requests more memory from the OS).

As for threads, OS estimates the memory needed by a new thread, then allocates the stack for it. The size of the stack is set when a thread is created, and it is determined by compiler, runtime, and some other factors.

Stack is reclaimed when thread exits. Heap is reclaimed when process exits. Also, in case data leak happens, OS can recovered the leaked memory when program finishes.

What makes one faster?

Stack is faster for the following reasons:

access pattern : it is trivial to allocate and deallocate memory from it (a pointer/integer is simply incremented or decremented), while the heap has much more complex bookkeeping involved in an allocation or deallocation.
caching effect: each byte in the stack tends to be reused very frequently which means it tends to be mapped to the processor’s cache, making it very fast.
syncronization takes time: heap, being mostly a global resource, typically has to be multi-threading safe, i.e. each allocation and deallocation needs to be - typically - synchronized with “all” other heap accesses in the program.
exception handling: malloc may throw, and exception handling is expensive if it throws.

Edit : Placement New vs New

Placement new is a variation of new operator. Different from new, where memory is allocated on heap at an unknown address, placement new will construct object at memory address that is already allocated.

The syntax is as below:

	Object *oldObj = new Object();
   	Object *myObj = new (oldObj) Object();  // myObj will overwrite oldObj

Normally, new operator will allocate memory first, then construct new object at that memory address. Placement new will skip the first step, thus, it can be used to optimize the code if memory address is known.

There is no placement delete. Compiler knows when to delete it and will handle the deallocation of the memory. Alternatively, programmers can use destructor to delete the object constructed from placement new.