Design goals
The ultimate goal of Frext is building the best-of-breed
sync
tool, providing all the good from existing products like while closing
down any imperfection, most of them recorded in the comparison sheet. Frext
will be Open Source under the GPL
license
and cross-platform,
fully transparent, versatile and yet easy to install, configure and
maintain.
(* Not implemented yet, ** Partially implemented)
General
- Cross-platform. Because
the
database server Firebird runs on many platforms, Linux, Windows, etc.,
Frext cannot be limited to one or a small selection of platforms. The
choice for Java as a development tool, means that everywhere where Java
runs, Frext can also run. As a consequence, on all platforms where
Firebird runs, Frext also runs.
- Open Source. Because
Firebird
is Open Source, it is just natural that Frext should be Open Source
also. There are some closed source replication tools (e.g. IBReplicator
), but it does not feel natural
to make use of closed source replication utility on top of an Open
Source database server. By releasing Frext as Open Source under
a GPL license ,
it's enhancing the capabilities of Firebird
thereby contributing to the success of Firebird.
- Thin engine. As much
work as
possible should be done within the database itself. Coding close to the
data as possible makes sense because the code itself does not have the
overhead of classic client programming, say Delphi or Java, and is
therefore very clean and straightforward.
- Multi purpose engine.
The
client has multiple functions and serves as a central point
for replication processes: engine of the process of extracting messages
from one database and inject them into another database. Also, a
configurator for all settings, for example the replication schema.
Installation of the Frext system tables and adding triggers to existing
tables must be done from within the engine.
Architecture
- Asynchronous
processing. Asynchronous
replication is
imperative, because databases within the replication model must operate
separately and do not have to wait for each other to complete an
operation. This is essential when providing application availability
when not connected, a situation commonly in mobile applications. Frext
implements derived concepts from asynchronous replication like
store-and-forward, send-and-forget
and guarantied-delivery.
- No physical contact needed.
While synchronizing, databases must not have physical contact with each
other. Propagation of changes must be done through messaging,
therefore implicitly providing disconnected or asynchronous
replication.
Physical contact is often very difficult to achieve (firewalls)
but imposes also a security risk because the database is direct exposed
to
the open, e.g. internet.
- File based messaging.
Messages must be transferred through files, simple, extendable and
flexible. It's the most straightforward storage and transport
mechanism. Computer systems can set up replication by making use of
file sharing. Because file based messaging is very easy to
understand and handle, it is also very easy to use additional
transport mechanisms,
like email or FTP, SSH, etc.; this can be implemented on the OS level
by scripting and is as such not part of Frext. As a consequence, the
transport mechanism is completely transparent from the Frext point of
view, so it is possible to have multiple transport channels
active
at the same time. Imagine a master database which receives messages
from the first remote by file sharing, from the second one by mail, and
the third one trough FTP.
- Transaction based.
Replication
must be based on transaction
log processing. Transactions occur in the database as a
direct result of modification of data. This means that only data
modifications will be replicated; only differences are taken into
account. Databases will definitely not be synchronized using direct table
comparison,
because this would involve that complete tables must be read in and
compared locally, meaning a network and a performance
problem. The ideal situation should be that the replication
process
is based on the results of a log viewer reading an existing transaction
log, that's how products like Sybase SQL Remote work. Unfortunately,
Firebird does not have a transaction log, so we have to build one on
our own, but this has no impact on the production table structure.
- Transaction grouping.
All
operations within one transaction must be grouped and also executed
within one transaction an the remote. All operations must succeed, or
none.
- Message sequencer.
Operations, grouped by transactions, must be executed in the same
sequence on the receiver as they are produced on the sender; this
ensures that foreign key violations can not occur. Whenever
messages are missing, the replication system should do a request for
resend: missing message mechanism. In the meanwhile, it should
temporarily stop replication.
- Transparent processing.
The
tool must be as
transparent as possible. Every aspect, every stage of messaging must be
re traceable and reproducible. First of all, this makes debugging or
understanding the process very easy. Transport files must have a
XML-format, so it will be immediately clear want kind of data will be
replicated. All transactions must be re traceable and re-doable. Log
data of executed operations must be replicated to the sender to provide
a central point of system management.
- Hierarchical
replication model.
Any database which is part of the replication model, called a node, can
only have one or more descendants. There are no other
connections
between any node, i.e. no peer-to-peer interaction. As a result, in any
model there should always be a top node.
However, there can be multiple levels, i.e. more than two.
- Heterogeneous database servers.
Frext must be able to function as a front end (message consumer) or as
back end (message producer); the other end does not have to be an
instance of Frext/Firebird. Thus, Frext acts as a connector between
different, heterogeneous servers or applications. Purpose: connect
different information systems with each other by providing a common
interface.
- Provide
data
transformation. It should
be easy to change (enhanced) data as this is replicated from the sender
to the receiver. This can be implemented in two ways: change the
produced message files; this is possible because
messages are
in plain, open, transparent XML-format, so you can choose any preferred
XML-tool. The other way is using an intermediate database
where
changes are done by triggers on that database. The latter option is
cleaner in a way that only standard functions of Frext are
being
used. Because both implementations are outside Frext (but facilitated
by Frext through standard operation), data transformation is an
implementation issue.
- **Data partitioning.
With
partitioning, it can be ensured that each remote node can receive a
different set of data, based on individual property values. A common
appliance: in a sales organization, each salesman should has only his
own customers, thus restricting data transport bandwidth. When using
partitioning, sub tables (lower in table dependency rank) should be
automatic. Extracts should also be automatic adjusted for partitioning.
When data is transferred (e.g. when a customer in a sales organization
is transferred to another salesman), realignment should be automatic,
without a need of writing customized triggers or otherwise.
(Implemented: configuration, creating partitioned extracts. Not
implemented: data realignment).
- *Synchronous replication.
Frext syncs in principal asynchronous.
Thus, operations are not propagated immediately. Sometimes this is not
desirable, just to achieve ACID.
For this to work, Frext should also work synchronous.
Installation
& Configuration
- Loosely definable replication model.
The model represents all databases involved. The replication model must
be as loose as possible. No hard registration of databases, servers,
etc. is allowed. In that way, servers in the replication process
interact relatively independent from each other.
- No configuration of order of tables for
operation execution. There must be no need to indicate
the order in which tables are to be updated, in order to make the
design of a replication schema as simple as possible. This will be
achieved by executing operations sequentially in the order in which
they are created. This goal was a result of my Fibre
experience, where each table must be given an order identifier. In that
way, different operations (inserts, updates and deletes) on different
tables can be done without conflicting with any foreign key
constraints. Yet, to achieve that, one must have a deep knowledge about
the table relationship within the database. Also, a schema error will
mostly not be raised immediately when starting the replication process,
but will occur later, in a very specific situation, making it very hard
(impossible) to debug.
- Minimizing impact on the production
database
structure.
The impact on databases on which replication should work, must be kept
to the absolute minimum, no modification of existing tables is allowed.
This goal was a result of my Fibre experience, where each table in the
replication schema has to have an extra identifier field. This is
unacceptable because real world applications often can not allowed
to be
changed. Leaving the production database virtually unchanged is
therefore
imperative.
- Ease of installation.
Installation of the Frext
system tables should be as easy as this: start the Frext engine with
the 'Install' option. Installation of triggers on existing
tables
must also be done from within Frext.
- Versatile configuration.
Configuration is kept in a set of database tables. Configuration must
be done, as one of the configuration possibilities, through an easy to
use tool, a flat file editor comes into mind. Export and
import of configuration must be made possible. When exported, the
configuration must be stored in one-and-only-one configuration file.
This file can then be copied easily to different locations and modified
with any standard flat file editor. The file must be in XML-format, so
it is readable by humans. Through import, it should be easy to set up a
remote computer. Because configuration is primarily kept in the
database it self, directly manipulating configuration through
SQL-commands, must be given as a another way to configure Frext.
- Auto-configurator for table dependencies and
required fields. While adding a new table in
the replication schema, the engine must help by adding automatically
tables (and fields) on which that table depends on. Also, the
configurator should give warnings when dependent tables are not in the
replication schema or when not all table key fields (primary or foreign
key) are part of the replication schema.
- Publication-subscription model.
The sending database should publish the tables which might me
replicated in a publication; multiple receivers can subscribe to that
publication thus making a subscription. The publication-subscription
model provides a clean, flexible and robust configuration. What's more,
a loosely defined configuration is especially
suitable for mobile
applications.
- Field level replication, by inclusion
and
exclusion. Providing an option to not only replicate a
whole table, but also a few fields from that table. This must be
configured by including fields, but also by excluding
fields which comes near to solving real world problems where in many
situations, you are likely to replicate a whole table, except one or
two fields.
- *Remote configuration push.
Several remote nodes should be configured through replication, thus
making remote administration and configuration possible.
System
Management
- Remote log viewing. The
whole
process must be monitored from one single point. Therefore, all
operations should give confirmations and all these confirmations must
be replicated back to the master. Overhead involved is taken as
granted. Control is not a thing you can do without, this is just the
(little) price you have to pay.
- Background synchronization.
Frext should be working in the background, without manual interaction
needed. But Frext should also be operated manually when needed.
- *Undo/redo capability.
It
should be possible to undo or redo database operations until a
specified point in time. This is useful when a database has to be
recovered
- *Remote SQL command execution.
Frext should be able to execute a SQL command through the replication
mechanism. Possible appliances: on the fly modification of table
structures, modify stored procs, etc. Facilitates online access
simulation.
Database
Initialization
- Built in extract capabilities.
When setting up a remote database, in most cases that database must be
filled with data. This data is most likely derived directly from the
master database. It should be very easy to do such a data extraction.
An extract takes care of filling remote databases with data. When
partitioning is configured, it should automatically impact extract
contents.
- Easy data initialization of remote
databases.
Extracts should travel on the same transport channel as normal messages
to provide off line rebuilding of a remote database without DBA or user
interaction. This extraction message should have the same format as
other data messages, so the same transport mechanism of normal data
messages can or must be utilized. Reading such an extract message
should be possible without making any changes to the database server or
user configuration on any level.
- *Remote
extract request. Remote databases should have an option to
request for a new data extraction.
|