A brief history

In search for a replication tool in my profession as a developer for Delphi and Sybase systems, I got in touch with Sybase SQL Remote. Intrigued by the power and the lack of complexity of the concept, I searched for a similar tool for my preferred Open Source database, Firebird. I ended up with Fibre , a utility written in Java. After a close examination and a short thought of contributing to this project, I discovered some faults (which could be easily solved), but worse, I found some design flaws which conflicted with my own design goals.

Finally, I decided to write my own tool. Given the experience I had from the Sybase product SQL Remote and the Fibre tool, I wanted a tool which would combine the strength of the two. Furthermore, the weaknesses in both products should be resolved, see comparison sheet.

Lessons from SQL Remote

SQL Remote is a Sybase product, designed for replication between different Sybase database servers, ASE and ASA. I was attracted by it's simplicity-in-concept. In short, this is how SQL Remote works: a change on the master database is recorded in a log table, an agent converts the change into a file message. An agent from the remote database picks up this file and executes SQL commands against the remote database, thus replicating the data change.

There we're some things I didn't like about SQL Remote.

No transparent processing. The process it self was absolutely not transparent. In no way you could look into the message it self, to discover what information is sent over. Normally, you have no interest in these things, but when anything goes wrong (bugs?) or just for studying practices, you definitely want to see the contents of the file. But, as SQL Remote is a closed source product, the inner working is kept as a secret. Because the message format is not open, things like data transformation by modifying the message file is impossible. Also, using SQL Remote solely as a message producer or consumer is impossible, thus reducing the usability greatly.
Difficult to implement partitioning. When partitioning is involved, it is sheer impossible to configure this tool. That's because all tables which depend on the partitioned table have to be manually configured, which is asking for trouble. That's why I wanted this to be automatically configured in Frext.
Difficult roll-out of extracts. SQL Remote (only the ASA version) is capable of doing an extract, but executing this on the remote virtually requires local access and/or DBA-involvement. This poses a problem when remotes are located elsewhere, i.e. somewhere on the globe.

Still, it's a very simple, but also very effective replication method. This ended up relying on the concept for Frext by taking the good aspects from SQL Remote and expanding them with my own idea's. Re-inventing the wheel is not good, re-designing might be.

Lessons from Fibre

(... and why I didn't participate in the Fibre project)

Fibre looked very promising: a replication tool for Firebird, written in Java so it is implicit cross-platform. It actually cost me an hour or so to get two databases replicate data between each other. I discovered one bug, it had to do with default values for dates. After downloading the code from Sourceforge, I was quickly able to repair this bug. This was quite astonishing, with zero Java experience!

After a quick start, I began a thorough investigation and troubles arose.

Dependency problem by grouping operations. First of all, I discovered that Fibre was grouping types of operations. So, on the remote database, first all inserts were executed, then all updates and at last all deletes. This looks as a very efficient method of replication, but on second thought, there is a huge problem with this: it cannot handle table dependencies automatically. For that to work, each table has to have a hierarchy index: the table with no dependencies, normally containing static data, comes on top. The table with only dependencies to the lowest table, comes next, etc., etc. Fibre does the inserts and updates in this order, deletes are executed in the opposite way. One could say that the dependency problem is solved. It sure is, but to manually define a correct hierarchy index for each table is requiring in depth knowledge of the application. In real world applications with tens or hundreds of tables, it's sheer impossible to do such an ordering flawlessly. What's more, an error made in the schema will not be visible immediately. Only in very specific situations, for example an insert on a static table and a second insert on a table with depends on the previous insert, where the static table is defined after the dependent table, the error pops up with probably a foreign key violation. Also, when it occurs, it's very difficult to retrack the error on a misconfiguration of the table order. According to Murphy's Law, this problem will certainly occur.
No missing message mechanism. Although Fibre does have a registration which message is to expected on the remote, there is no mechanism to resend a message when a message is missing or malformed. Real world applications cannot expect that messages are always transported correctly and in the right order, so a robust replication solution must provide resend capability.
No transaction grouping. Because Fibre groups operations per type (inserts together, updates together, etc.), all transaction information is lost. When one or more operations are executed on the master database in one transaction, for example in a balance application where both sides always have to be equal, it's at least recommended that these operations are grouped together on the remote database also in one transaction: when one operation fails, the other should be roll backed. This is a severe shortcoming of Fibre, thereby not suitable for databases with strict constraints, i.e. real world applications.
Required modification of production table structure. Fibre depends on it's own unique key field per table. This means that each table has to be expanded with an extra field. In (complex) production systems, this is not a desirable situation. A replication tool should be as non-intrusive as possible, adding a few triggers like Frext does is acceptable, table expansions are certainly not.
No builtin data extract and partitioning. Fibre does not have a built in data extract so it depends on external processing. Also, partitioning is not available, at least according from the documentation. I have serious doubts about the correct architecture of partitioning within Fibre, partitioning is surely the most complex part inside replication.

By now it should be clear why I regard Fibre not suitable as a replication tool for real world applications. It also means that I have to build one on my own, but that's not a pity, on the contrary, it's fun!

See: