5

Parallel Looping in PHP with SPL’s MultipleIterator

 3 years ago
source link: https://markbakeruk.net/2019/12/31/parallel-looping-in-php-with-spls-multipleiterator/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

Parallel Looping in PHP with SPL’s MultipleIterator

There was a time, back when I was working with PHP 5, when I found the Standard PHP Library (SPL) an extremely powerful and useful toolbox; and I particularly enjoyed working with SPL’s Iterators and Datastructures. Sadly, SPL was always a somewhat forgotten component of PHP, even when it was incorporated into core, probably due to the lack of documentation (although Pete Cowburn and the other PHP documentors did work hard to improve it over the years). With the arrival of PHP 7, SPL seems to have dropped even further into a backwater: generally we create our own exception classes, though we may sometimes use those in the SPL; we rely on Composer to handle autoloading, rather than using SPL directly to set up our autoloading manually; and Datastructures (one of my favourite SPL components) are better implemented in Rudi Theunissen’s ext-ds library.

But although much of the SPL is now outdated, or has been superseded by better language elements; some components are still useful. SPL’s Iterators are confusingly named, and there’s little in the documentation that explains when and how they can/should be used, but if you can learn to understand them, they are still powerful tools even with PHP 7; especially as they work with any Traversable (like Iterators and Generators, or the Collection objects provided by many frameworks), not simply with arrays.

One of my favourite SPL Iterators is the MultipleIterator, which allows a single loop to iterate over several Traversable objects in parallel.

The most commonly used approaches to this without using MultipleIterator generally entail matching the array keys, using either a for loop (when the keys are simple enumerated values) or a foreach loop against one of the arrays by key and value, with a key match for the other arrays that we need to access:


$dates = ['Q1-2019', 'Q2-2019', 'Q3-2019', 'Q4-2019'];
$budget = [1200, 1300, 1400, 1500];
$expenditure = [1250, 1315, 1440, 1485];

$budgetElements = count($budget);
for ($i = 0; $i < $budgetElements; $i++) {
    echo "{$dates[$i]} - {$budget[$i]}, {$expenditure[$i]}", PHP_EOL;
}

Or using a foreach loop if the keys are associative:


$budget = ['Q1-2019' => 1200, 'Q2-2019' => 1300, 'Q3-2019' => 1400, 'Q4-2019' => 1500];
$expenditure = ['Q1-2019' => 1250, 'Q2-2019' => 1315, 'Q3-2019' => 1440, 'Q4-2019' => 1485];

foreach ($budget as $key => $value) {
    echo "{$key} - {$value}, {$expenditure[$key]}", PHP_EOL;
}

This approach works well enough as long as the keys match up, and we have the same number of entries in each array that we’re working with: if the arrays are of different lengths, then we need to ensure that we’re iterating over the most appropriate array (if using foreach), and we need more code inside the loop to handle those cases where there may not be entries with matching keys.

SPL’s MultipleIterator is better able to handle those missing entries, or where keys don’t match up, because it doesn’t work using keys (an option to match on keys would be useful though). Instead it matches up based on position in the array, which means that we do need to use ordered arrays.


$budget = ['Q1-2019' => 1200, 'Q2-2019' => 1300, 'Q3-2019' => 1400, 'Q4-2019' => 1500];
$expenditure = ['Q1-2019' => 1250, 'Q2-2019' => 1315, 'Q3-2019' => 1440, 'Q4-2019' => 1485];

function wrapArrayAsIterator(array $array) {
    return new ArrayIterator($array);
}

$budgetIterator = wrapArrayAsIterator($budget);
$expenditureIterator = wrapArrayAsIterator($expenditure);

$combinedIterable = new MultipleIterator();
$combinedIterable->attachIterator($budgetIterator);
$combinedIterable->attachIterator($expenditureIterator);

foreach($combinedIterable as $keys => $values) {
    echo "{$keys[0]} - {$values[0]}, {$values[1]}", PHP_EOL;
}

The first point of note is that MultipleIterator works only with Iterators, not with iterables: an array is iterable, but not an Iterator; so we need to wrap it in an ArrayIterator before we can add it to the MultipleIterator. An extra, slightly cumbersome step; and one which we don’t need if the values we want to loop over are already Traversable such as a database resultset, a DatePeriod or a framework Collection.

By default, both $keys and $values in the foreach loop are arrays of values – one from each object that we’re traversing in turn, with simple numeric indexing – which isn’t so useful for the readability of our code; but that is easily fixed because the MultipleIterator allows us to define associative indexes:


$combinedIterable = new MultipleIterator(MultipleIterator::MIT_KEYS_ASSOC);
$combinedIterable->attachIterator($budgetIterator, 'budget');
$combinedIterable->attachIterator($expenditureIterator, 'expenditure');

foreach($combinedIterable as $keys => $values) {
    echo "{$keys['budget']} - {$values['budget']}, {$values['expenditure']}", PHP_EOL;
}

Note that it isn’t only the $values array that is associative, but that the $keys array is also associative when using the MIT_KEYS_ASSOC flag.

An alternative would be to use normal enumerated keys for our $values and list(), but we need to know the order in which each Iterator was added to the MultipleIterator:


foreach($combinedIterable as $keys => $values) {
    [$date] = $keys;
    [$budget, $expenditure] = $values;
    echo "{$date} - {$budget}, {$expenditure}", PHP_EOL;
}

or in this case where the date is extracted from the keys, we can add it as another array to MultipleIterator:


$dates = wrapArrayAsIterator(array_keys($budget));
$budgetIterator = wrapArrayAsIterator($budget);
$expenditureIterator = wrapArrayAsIterator($expenditure);

$combinedIterable = new MultipleIterator();
$combinedIterable->attachIterator($dates);
$combinedIterable->attachIterator($budgetIterator);
$combinedIterable->attachIterator($expenditureIterator);

foreach($combinedIterable as $keys => $values) {
    [$date, $budget, $expenditure] = $values;
    echo "{$date} - {$budget}, {$expenditure}", PHP_EOL;
}

or we could combine MultipleIterators associative key setting with extract, which gives us much more readable code:


$combinedIterable = new MultipleIterator(MultipleIterator::MIT_KEYS_ASSOC);
$combinedIterable->attachIterator($dates, 'date');
$combinedIterable->attachIterator($budgetIterator, 'budget');
$combinedIterable->attachIterator($expenditureIterator, 'expenditure');

foreach($combinedIterable as $keys => $values) {
    extract($values);
    echo "{$date} - {$budget}, {$expenditure}", PHP_EOL;
}

Of course, not everybody likes the use of functions like extract() that create new variables – if we don’t know the keys in our array, then we don’t have control of the variables that will be created; but in this case we do have control of the array’s associative keys, so we’re controlling the variables that are created by extract(). Note that we will have to manually declare the variable names if we’re using an IDE like PHPStorm; but as we’ve defined the names to use, we can do this.

And when we’re using associative keys with extract(), it doesn’t matter what order the Iterators are added to the MultipleIterator.


But what of missing values? One of the problems with the more normal key-matching for() or foreach()​ approaches to parallel iteration is that we need to include additional checks in the code to handle missing values from all the additional arrays. Suppose that we don’t yet have the expenditure figures for the last quarter of 2019.


$budget = ['Q1-2019' => 1200, 'Q2-2019' => 1300, 'Q3-2019' => 1400, 'Q4-2019' => 1500];
$expenditure = ['Q1-2019' => 1250, 'Q2-2019' => 1315, 'Q3-2019' => 1440];

MultipleIterator has two “modes”: MIT_NEED_ALL and MIT_NEED_ANY. When MIT_NEED_ALL is used, then the foreach loop will terminate when any of the Iterators no longer has any values that can be returned, even if other Iterators do still have values; so even though $budget has four values, $expenditure has only three, so foreach will only loop three times:


$combinedIterable = new MultipleIterator(MultipleIterator::MIT_NEED_ALL|MultipleIterator::MIT_KEYS_ASSOC);
$combinedIterable->attachIterator($dates, 'date');
$combinedIterable->attachIterator($budgetIterator, 'budget');
$combinedIterable->attachIterator($expenditureIterator, 'expenditure');

foreach($combinedIterable as $keys => $values) {
    extract($values);
    echo "{$date} - {$budget}, {$expenditure}", PHP_EOL;
}

Q1-2019 - 1200, 1250
Q2-2019 - 1300, 1315
Q3-2019 - 1400, 1440

But when MIT_NEED_ANY is used, the loop will only terminate when all the values for the “longest” Iterator have been returned. For those Iterators that have completed their own iterations, a null value (and key) will be returned.


$combinedIterable = new MultipleIterator(MultipleIterator::MIT_NEED_ANY|MultipleIterator::MIT_KEYS_ASSOC);
$combinedIterable->attachIterator($dates, 'date');
$combinedIterable->attachIterator($budgetIterator, 'budget');
$combinedIterable->attachIterator($expenditureIterator, 'expenditure');

foreach($combinedIterable as $keys => $values) {
    extract($values);
    echo "{$date} - {$budget}, {$expenditure}", PHP_EOL;
}

Q1-2019 - 1200, 1250
Q2-2019 - 1300, 1315
Q3-2019 - 1400, 1440
Q4-2019 - 1500, 

We may still need to write some additional code inside the loop to handle these cases (depending on exactly what we need to do in those circumstances) but we’re testing against an explicit null rather than the existence of an array entry, so it should allow our code to be simpler.

Here’s where we have a discrepancy/ambiguity (or at least a GOTCHA!) in the documentation though! The documentation for the MultipleIterator Constructor specifies that:


Defaults to MultipleIterator::MIT_NEED_ALL|MultipleIterator::MIT_KEYS_NUMERIC. 

So the documentation indicates that if we don’t provide any flags (e.g. $combinedIterable = new MultipleIterator();), then behaviour defaults to MIT_NEED_ALL and MIT_KEYS_NUMERIC, and that is correct. However, if we provide a value for the MIT_KEYS_* flag, but no value for the MIT_NEED_* flag (e.g. $combinedIterable = new MultipleIterator(MultipleIterator::MIT_KEYS_ASSOC);), then behaviour will change to MIT_NEED_ANY, not retain the MIT_NEED_ALL default. Conversely, if we specify a MIT_NEED_* flag, then the default MIT_KEYS_* behaviour is still applied. So it is always best to set both flags explicitly if we want to change either behaviour.


Hopefully, this short post about MultipleIterator will persuade some developers that the SPL Iterators are worth looking at for use in their own code. MultipleIterator isn’t perfect, but in the right circumstances it can be a powerful tool when we’re working with looping Traversables as well as with arrays in parallel, and help reduce the complexity in their code.

Over the next few months, I’ll make a few more posts about some of SPL’s other Iterators, and how they can be used.

Happy New Year 2020!

Loading...

About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK