A brief history
In search for a replication tool
in my profession as a developer for Delphi and Sybase systems, I got in
touch with Sybase SQL Remote. Intrigued by the power and
the lack of complexity of the concept, I searched for a
similar tool for my preferred Open Source database, Firebird. I ended up with Fibre
, a utility written in Java.
After a close examination and a short thought of contributing to this
project, I discovered some faults (which could be easily solved), but
worse, I found some design flaws which conflicted with my own design goals.
Finally, I decided to write my own tool. Given the
experience
I had from the Sybase product SQL Remote and the Fibre tool, I wanted a
tool which would combine the strength of the two. Furthermore, the
weaknesses in both products should be resolved, see comparison sheet.
Lessons from SQL Remote
SQL Remote is a Sybase product, designed for replication
between different Sybase database servers, ASE and ASA. I was attracted
by it's simplicity-in-concept. In short, this is how SQL Remote works:
a change on the master database is recorded in a log table, an agent
converts the change into a file message. An agent from the remote
database picks up this file and executes SQL commands against the
remote database, thus replicating the data change.
There we're some things I didn't like about SQL
Remote.
- No
transparent processing. The process it self was absolutely
not transparent. In no way you could look into the message it self, to
discover what information is sent over. Normally, you have no
interest in these things, but when anything goes wrong (bugs?) or just
for studying practices, you definitely want to see the contents of the
file. But, as SQL Remote is a closed source product, the
inner working is kept as a secret. Because
the message format is not open, things like data transformation by
modifying the message file is impossible. Also, using SQL Remote solely
as a message producer or consumer is impossible, thus reducing the
usability greatly.
- Difficult
to
implement partitioning. When partitioning
is involved, it is sheer impossible to configure this tool. That's
because all tables which depend on the partitioned table have to be
manually configured, which is asking for trouble. That's why I
wanted this to be automatically configured in Frext.
- Difficult
roll-out of extracts.
SQL Remote (only the ASA version) is capable of doing an extract, but
executing this on the remote virtually requires local access and/or
DBA-involvement. This poses a problem when remotes are located
elsewhere, i.e. somewhere on the globe.
Still, it's a very simple, but also very effective
replication
method. This ended up relying
on the concept for Frext by taking the good aspects
from SQL Remote and expanding them with my own idea's. Re-inventing the
wheel is not good, re-designing might be.
Lessons from Fibre
(... and why I didn't participate in the Fibre project)
Fibre
looked
very promising: a replication tool for Firebird, written in Java so it
is implicit cross-platform. It actually cost me an hour or so to get
two databases replicate data between each other. I discovered one bug,
it had to do with default values for dates. After downloading the code
from Sourceforge, I was quickly able to repair this bug. This was quite
astonishing, with zero Java experience!
After a quick start, I began a thorough investigation
and
troubles arose.
- Dependency
problem by grouping operations. First of all, I discovered
that Fibre was grouping types of operations. So, on the remote
database, first all inserts were executed, then all updates and at last
all deletes. This looks as a very efficient method of replication, but
on second thought, there is a huge problem with this: it cannot handle
table dependencies automatically. For that to work, each table has to
have a hierarchy index: the table with no dependencies, normally
containing static data, comes on top. The table with only dependencies
to the lowest table, comes next, etc., etc. Fibre does the inserts and
updates in this order, deletes are executed in the opposite
way. One could say that the dependency problem is solved. It
sure is, but to manually define a correct hierarchy index for
each table is requiring in depth knowledge of the application. In real
world applications with tens or hundreds of tables, it's sheer
impossible to do such an ordering flawlessly. What's more, an error
made in the schema will not be visible immediately. Only in
very specific situations, for example an insert on a static table and a
second insert on a table with depends on the previous insert, where the
static table is defined after the dependent
table, the error pops up with probably a foreign key violation. Also,
when it occurs, it's very difficult to retrack the error on a
misconfiguration of the table order. According to Murphy's Law, this
problem will certainly occur.
- No
missing
message mechanism. Although Fibre does have a registration
which message is to expected on the remote, there is no mechanism to
resend a message when a message is missing or malformed. Real world
applications cannot expect that messages are always transported
correctly and in the right order, so a robust replication solution must
provide resend capability.
- No
transaction grouping. Because Fibre groups operations per
type (inserts together, updates together, etc.), all transaction
information is lost. When one or more operations are executed on the
master database in one transaction, for example in a balance
application where both sides always have to be equal, it's at least
recommended that these operations are grouped together on the remote
database also in one transaction: when one operation fails, the other
should be roll backed. This is a severe shortcoming of Fibre, thereby
not suitable for databases with strict constraints, i.e. real world
applications.
- Required
modification of production table structure. Fibre depends
on it's own unique key field per table. This means that each table has
to be expanded with an extra field. In (complex) production systems,
this is not a desirable situation. A replication tool should be as
non-intrusive as possible, adding a few triggers like Frext does is
acceptable, table expansions are certainly not.
- No
builtin
data extract and partitioning. Fibre does not have a built
in data extract so it depends on external processing. Also,
partitioning is not available, at least according from the
documentation. I have serious doubts about the correct architecture of
partitioning within Fibre, partitioning
is surely the most complex part inside replication.
By now it should be clear why I regard Fibre not
suitable as a
replication tool for real world applications. It also means that I have
to build one on my own, but that's not a pity, on the contrary, it's
fun!
See:
|