parallel algorithm course 02
review
problem \(T_1\)
-> reduce dependency ->
tasks \(T_\infin\)
work sharing
blocks
synchronization (better to avoid)
false sharing
sync
Single program multiple data
seq
for(i=0;i<N;i++) a[i] = a[i] + b[i];
omp par
#pragma omp parallel
{
= omp_get_thread_num();
id
Nthrds= id * N / Nthrds
i_start = (id+1) * N / Nthrds
i_end if (id==Nthrds - 1) iend=N
for (i=istart;i<iend;i++) ...
}
shorter (but slower)
#pragma omp parallel {
#pragma omp for
for () ...
}
// or equiv
#pragma omp parallel for
...
loop worksharing constructs
schedule clause
static: least work for scheduling (done at compiling)
dynamic (complex scheduling at run-time)
reduction
op should be associative
+,*,-,min,max
#pragma omp parallel for reduction (+: sum)
for (i=0; i < num_steps; i++)
= ...
x = sum + x sum
slower than SPMD critical
omp for loop has barrier by default
has implicit barrier
use nowait
to cancel it
master construct
#pragma omp master
only master thread will execute
no barrier for other threads
#pragma omp single
only one thread will execute, has barrier
use nowait
to cancel barrier
sections
#pragma omp sections
{
#pragma omp section
thread A#pragma omp section
thread B}
A, B will execute in parallel
has barrier at the end of sections
lock routines
omp_init_lock();
omp_set_lock(), omp_unset_lock()
omp_destroy_lock()
e.g. array
if use critical on a[i]
, then the whole array a
will be lock
#bins \(\gg\) #threads
low prob of false sharing, conficts are rare, use locks
omp_set_dynamic()
env vars
OMP_NUM_THREADS
heap: shared
stack: private
int tmp=0; // in heap
#pragma omp parallel for private(tmp)
for (int i; i < N; i++) {
+= j; // not initialized, local copy ( in stack )
tmp }
(tmp); // 0 here print
int tmp=999; // in heap
#pragma omp parallel for firstprivate(tmp)
for ()
+= j; // has initialized with 1, local copy ( in stack ), still private
tmp
(tmp) // 999 printf
not use often: lastprivate
int tmp=0; // in heap
#pragma omp parallel for lastprivate(j)
for ()
= j; // copy the last j to tmp
tmp
(tmp); // print the last j print