parallel algorithm course 04
synchronization has costs
simple list traversal
// linked list
= head;
p while (p) {
(p);
process= p->next;
p }
compute all p in advance
#pragma omp parallel for schedule(static, 1)
for (int i = 0; i < count; ++i)
(parr[i]); process
task
#pragma omp parallel {
// not often used
#pragma omp task
();
foo
#pragma omp barrier
// more common case
#pragma omp single {
#pragma omp task
();
bar}
}
example (with bug)
int fib(int n) {
int x, y; // private
if (n < 2) return n;
#pragma omp task
// need `shared(x)`. by default, task copy the whole stack,
// x is in different address
= fib(n-1);
x
#pragma omp task // need `shared(y)`, default is private in task
= fib(n-2);
y
#pragma omp taskwait
return x+y; // wrong here!!!
}
#pragma omp parallel single
(n); fib
two important points
barrier
data env
example 2 (with bug)
;
list ml*e;
Element #pragma omp parallel
#pragma omp single
{
for (e=ml->first; e; e=e->next)
#pragma omp task // need firstprivate(e)
(e); // has data racing
process}
return to first example
#pragma omp parallel
{
#pragma omp single
{
* p = head;
nodewhile(p) {
#pragma omp task firstprivate(p)
(p);
process= p->next;
p }
}
}
merge sort
directly using parallel
\(T_1=2T_1(\frac{n}{2})+O(n)=O(n\cdot log(n)\)
\(T_\infty=T_\infty(\frac{n}{2})+O(n)=O(n)\)
use a differernt merge method
parallel merge
say here we have two sorted seqs (\(\le\))
\(A=a_1,\dots,a_n\) and \(B=b_1,\dots,b_n\)
let \(t=a_{\lceil \frac{n}{2}\rceil}\), split A and B into sub arr \(A_1,A_2,B_1,B_2\) s.t. \(A_1\le t, B_1\le t\) and \(A_2\gt t, B_2\gt t\)
so seq \(A_1,B_1,A_2,B_2\) is semi-merged by t, then go on with \(merge(A_1,B_1)\) and \(merge(A_2,B_2)\)
random number generating problem
problem
example: approximate \(\pi\) using Monto Carlo
(SEED);
seed#pragma omp parallel for private(x,y) reduction (+:n_in_circle)
for(int i = 0; i < num_trials; i++) {
= random();
x = random();
y
if (x*x + y*y <= r*r) n_in_circle ++;
}
= 4.0 * (1.0 * n_in_circle / num_trials); pi
Linear Congruential Generator(LCG) (has data racing problem)
int random_last = 0;
// need
// #pragma omp threadprivate(random_last)
int random() {
= (MULTIPLIER * random_last + ADDEND) % PMOD;
random_next = random_next;
random_last
return random_next;
}
parameter: (one possible parameter suite) MULTIPLIER=1366
, ADDEND=150889
, PMOD=714025
has a period (less than PMOD)
single thread and multi-thread has different accuracy (multi-thread has poor performance)
data racing on var random_last
solution 1
add #pragma omp threadprivate(random_last)
under int random_last=0
has a local copy of random_last
when creating a thread
similar to firstprivate
problem: still poor than single thread version
different random range in threads maybe overlap
solution 2
Leap Frog method
generate different seeds for each thread
situation for ADDEND=0
#pragma omp single
{
= ...
nthreads = PMOD / MULTIPLIER;
iseed [0] = iseed;
pseed= MULTIPLIER;
mult_n for (int i = 0; i < nthreads; i++) {
= (unsigned long long) ((MULTIPLIER * iseed) % PMOD);
iseed [i] = iseed;
pseed= (mult_n * MULTIPLIER) % PMOD;
mult_n }
= ...
id = (unsigned long long) pseed[id];
random_last }