parallel algorithm course 02

parallel algorithm
note
Author

Shiguang Wu

Published

March 16, 2022

review


problem \(T_1\)

-> reduce dependency ->

tasks \(T_\infin\)

work sharing

blocks

synchronization (better to avoid)

false sharing


sync

Single program multiple data

seq

for(i=0;i<N;i++) a[i] = a[i] + b[i];

omp par

#pragma omp parallel
{
    id = omp_get_thread_num();
    Nthrds
    i_start = id * N / Nthrds
    i_end = (id+1) * N / Nthrds
    if (id==Nthrds - 1) iend=N
    for (i=istart;i<iend;i++) ...    
}

shorter (but slower)

#pragma omp parallel {
    #pragma omp for
        for () ...
}

// or equiv

#pragma omp parallel for
...

loop worksharing constructs

schedule clause

  • static: least work for scheduling (done at compiling)

  • dynamic (complex scheduling at run-time)


reduction

op should be associative

+,*,-,min,max

#pragma omp parallel for reduction (+: sum)
    for (i=0; i < num_steps; i++)
        x = ...
        sum = sum + x

slower than SPMD critical


omp for loop has barrier by default

has implicit barrier

use nowait to cancel it


master construct

#pragma omp master

only master thread will execute

no barrier for other threads


#pragma omp single

only one thread will execute, has barrier

use nowait to cancel barrier


sections

#pragma omp sections
{
    #pragma omp section
        thread A
    #pragma omp section
        thread B
}

A, B will execute in parallel

has barrier at the end of sections


lock routines

omp_init_lock();

omp_set_lock(), omp_unset_lock()

omp_destroy_lock()

e.g. array

if use critical on a[i], then the whole array a will be lock

#bins \(\gg\) #threads

low prob of false sharing, conficts are rare, use locks


OMP refernce card


omp_set_dynamic()


env vars

OMP_NUM_THREADS


heap: shared

stack: private

int tmp=0; // in heap
#pragma omp parallel for private(tmp)
    for (int i; i < N; i++) {
        tmp += j; // not initialized, local copy ( in stack )
    }
print(tmp); // 0 here
int tmp=999; // in heap
#pragma omp parallel for firstprivate(tmp)
    for ()
        tmp += j; // has initialized with 1, local copy ( in stack ), still private

printf(tmp) // 999

not use often: lastprivate

int tmp=0; // in heap
#pragma omp parallel for lastprivate(j)
    for ()
        tmp = j; // copy the last j to tmp
    
print(tmp); // print the last j