parallel algorithm course 01
parallel algorithm
note
gcc -fopenmp
omp_num_thread(int)
: request
multi-data
omp_get_thread_num, get id
SMP: equal-time access cost, in theory
NUMA: different .., practically
False Sharing
cache line
two processors may have access to the same region, repeat many useless write back
Synchronization, to avoid data racing, false sharing (avoid global array) d barrier
#pragma omp barrier
critical
only one thread can enter (often cost cheap), mutual exclusion, avoid data racing
(software support)
#pragma omp critical
atomic
only support (hardware support)
x binopr= expr
x++, ++x, x–, –x
#pragma omp atomic