Introduction to OpenMP

Why OpenMP

  • Syntax is easy to pickup
  • Almost linear scaling


Existing native parallel solutions in C and Fortran

  • Unified Parallel C (UPC) in C

  • Fortran:

    • CoArray Fortran

    • DO CONCURRENT from Fortran 2008

    • None of them becomes mainstream

OpenMP programming model

a. Based on threads instead of cores (so hyperthreading counts)

b. Each thread has it own copy of code and run in private memory

c. Each OpenMP thread is managed by the OMP runtime system

i. Runtime decide the best way to run, in contrast to MPI

d. Thread safe: a function execute correctly even then executed concurrently by multiple threads

How to use

a. Compiler Directives

b. Environmental Variables

c. Runtime Library Routines (e.g.: call runtime routine to determine the thread ID)

Feature set

1. Parallel Construct

2. Work-sharing constructs (loop, section, single, workshare (fortran only))

3. data-sharing, no wait, schedule clauses

4. Synchronization construct (barrier, critical, atomic, locks, master (for sync.))

## How to compile

icc -qopenmp omp_hello.c -o hello
gcc -fopenmp omp_hello.c -o hello
pgcc -mp omp_hello.c -o hello
clang -fopenmp omp_hello.c -o hello

Practice 1 : OpenMP Hello World

#include <omp.h>
#pragma omp parallel                   
    printf("Hello World... from thread = %d\n", 

$ cd Example

$ gcc -fopenmp omp_hello.c -o hello


Note 1:You can set the number of threads visible to OpenMP by

$ export OMP_NUM_THREADS=num_of_threads_you_want

Note 2: Note the difference between \

omp_get_num_threads(); // get thread ID
omp_get_num_threads(); // get total number of threads

Q1: why we need nthreads, tid to be private ?

Q2: Try ```OMP_NUM_THREADS=1000``` and let elapsed time be t1; try ```OMP_NUM_THREADS=10000``` and let elapsed time to be t2 on NSCC (note that machine has 24 threads). Do you observe any difference between 10xt1 and t2?

OpenMP uses fork-join model

  • start as a single master thread on one core

  • continues as single thread until parallel construct

  1. Paralle region

Rule for fortran: a) no `GOTO` allowed b) STOP statements are okay

Rule for C/C++

  • Case sensitive

  • structure

Hands-on with OpenMP

Do’s and Don’ts

  • Must be a DO or for loop (not DO WHILE  or WHILE )

  • Avoid logical dependencies like DO A(i) = A(i-1)*2 ENDO

  • Variable declared inside code block becomes “private” by default

  # omp parallel for
  for (int i=0; i<100; ++i)
    // do your work
    // now i is a private to each work thread

Section Construct

Allowed thread to execute diff. block of code to be done concurently.

Each section is executed once by a thread in the team.

#progma omp parallel

Single Construct

  • Allow only 1 tthread

OpenMP Clause

  • shared

  • private

  • default

  • if

  • firstprivate, lastprivate

    • pass value to some certain threads
  • num_threads

  • reductions

  • copyin


The tutorial is based on OpenMP Exercise by Lawrence Livermore National Laboratory , and NAG training.

Ziji SHI(史子骥)
Ziji SHI(史子骥)
Ph.D. candidate

My research interests include distributed machine learning and high-performance computing.