Basic Block Data Decomposition in Perl
March 3rd, 2009I was playing around with the idea of parallelizing something the other day to eke out some performance. Unfortunately, I’ve gotten a bit rusty since writing some MPI code for a parallel computing course a few years back. I got stuck on what should be the simple part of dividing up my input across the threads.
The goal is to divvy things up into continguous blocks of roughly equal size. i.e., if the size of your input is 38 (n) and you start four threads (p) you don’t want to give the first three threads chunks of 12 and the last thread gets 2. You want slices of 10, 9, 10 and 9.
So I flailed away with loops and the POSIX::floor for little awhile and came pretty close to what I remembered. I had to finally drag out my textbook (and translate from the C Macros) to get it right.
#!/usr/bin/perl
# Block Data Decomposition:
# Divide array n into p contiguous blocks of roughly equal size
use POSIX qw(floor);
use strict;
sub block_start {
my ($i, $p, $n) = @_;
return floor(($i * $n) / $p);
}
sub block_end {
my ($i, $p, $n) = @_;
return (block_start($i + 1, $p, $n) - 1);
}
my @input = get_input();
my $n = scalar @input;
my $p = 4;
for my $i (0..$p-1)
{
my $start = block_start($i, $p, $n);
my $end = block_end($i, $p, $n);
my @range = @input[$start..$end];
do_something(\@range);
}
The idea is that
sends a slice of input off for processing by one of your threads. A pretty useful algorithm when doing this sort of thing. Certainly not rocket science. Which is why we should all be happy I’m not a rocket scientist.



March 3rd, 2009 at 6:35 pm
How does thread handling work in Perl? Does Perl provide any facilities for it, or are you stuck using the OS system primitives?
March 4th, 2009 at 7:45 am
I know nothing of the internals. Superficially I know perl has its own threading mechanism called “ithreads”. They're like POSIX threads in a lot of ways, but the key difference is that variables aren't shared between threads unless explicitly requested. Each thread gets a copy of all the current process data.instead.
For example…
use threads;
my $foo = 1;
my $thr = threads->create(&increment)
$thr->join();
print $foo, “n”;
sub increment {
$foo++;
print $foo, “n”;
}
This is going to print “2″ and then “1″. If you're used to POSIX threads you'd think the increment() subroutine is ++-ing a $foo shared with the main scope, but it doesn't; the thread increments its own copy. If you want $foo to be shared you have to enabled shared data and also declare $foo differently…
use threads;
use threads::shared;
my $foo :shared = 1;
…
March 4th, 2009 at 7:51 am
testing something…
<pre>
use threads;
my $foo = 1;
my $thr = threads->create(&increment)
$thr->join();
print $foo, “n”;
sub increment {
$foo++;
print $foo, “n”;
}
</pre