Introduction Migrations WeMigrane Pairing functions

Pairing functions

Version	Date	Notes	By
0.1	2025-03-05	Initial release	ROB

Introduction

This chapter will introduce the concept of pairing functions as a mathematical concept and explain how they are used inside WeMigrane™ to establish relations between source and target tables in unusual circunstances or to join multiple source tables into one target table.

Pairing funtions

A pairing function can be used to join 2 or more numbers and produce an output number that is unique to the numbers and order entered. That means that 1+2 !== 2+1 using a pairing function.

WeMigrane can use the Cantor, Piker and Optimus pairing functions (they were implemented to try to discover which of them is the most efficient when it comes to the usage of the numberspace).

Commonly the cantor function is used since during testing, no big difference was detected in efficiency.

Joining two or more tables into one

The pairing function can be used to join two tables into one, by using a pairing function to generate a unique id from table 1 ids to table n ids. The id thus generated can be reversed back into the original id plus table number.

For example:

/**
 * Parameter 1 can be the id of the table
 * Parameter 2 designates that this is table 1 in the fusion process
 * - So long as the parameters are consistently used, this will generate unique ids for the destination table. 
 */
cantorPairingFunction($item['id'], 1);

This tecnique is used to join into one table what used to be recorded into two tables but the ids generated tend to quickly become very large. This is a side effect of using a pairing function, and can either be accepted (which makes it safe, as new migrations can be created and unpair the target id to obtain back the original id) or can be packed (which compresses the ids to their lowest possible value but destroys the relation with the source tables).

Non standard usage of pairing functions

Documents module - Documentation tree

Document tree from v9 (t01w016) to t01_documentation_tree_document, each equivalent tree record in v10 is stored in multiple rows in v9 (the logic was very different). All the ids for the tree element are added to an array (padded with 0's up to the ARRAY_MAX_SIZE in the DocumentationKeyHelper class, and paired together, and then packed).

In essence, instead of fusing two tables into one we are fusing multiple rows (which in v9 represented only one branch of the tree) into one row for v10.

Packing

Packing simply "pushes" the generated ids together to the lowest available value to prevent the rapid rise in id values. There is a helper class that can be used to perform this packing.

Simply instance a packer (one will be required per target table) and give it a name (can be used to identify relation between unpacked and packed ids for the duration of the migration, usefull for foreign keys using the packed id).

Warning: There is not yet a way to persist the packed / unpacked equivalence table to disk for future use, so, when the migration finishes, the information will be gone. Consider this information when planning a migration.

$packer = new PairingPackerHelper(
    'equipments', // name of this packer instance (keep consistent to use in foreign keys)
    0, // increment start at (0 means 1st id will be 1, 1 will be 2, etc ...)
    true // read only flag (false if primary table | true if related table / foreign key)
),

/**
 * ... further on, probably within a column transformer ...
 */

// id + unifier (where unifier is a diferent integer value per table [1, 2, ..., N])
// if read only, will return null for value not originally packed
$packer->getPacked(cantorPairingFunction($item['id'], $item['unifier']));

Summary

In short, pairing functions can be used to generate unique ids from combinations of ids / numbers, and (optionally) these numbers can be packed (made to use the numberspace efficiently) before being stored in the database.