Design goals

The ultimate goal of Frext is building the best-of-breed sync tool, providing all the good from existing products like while closing down any imperfection, most of them recorded in the comparison sheet. Frext will be Open Source under the GPL license icon-external-link.pngand cross-platform, fully transparent, versatile and yet easy to install, configure and maintain.

(* Not implemented yet, ** Partially implemented)

General

  • Cross-platform. Because the database server Firebird runs on many platforms, Linux, Windows, etc., Frext cannot be limited to one or a small selection of platforms. The choice for Java as a development tool, means that everywhere where Java runs, Frext can also run. As a consequence, on all platforms where Firebird runs, Frext also runs.
  • Open Source. Because Firebird is Open Source, it is just natural that Frext should be Open Source also. There are some closed source replication tools (e.g. IBReplicator icon-external-link.png), but it does not feel natural to make use of closed source replication utility on top of an Open Source database server. By releasing Frext as Open Source under a GPL license , it's enhancing the capabilities of Firebird thereby contributing to the success of Firebird.
  • Thin engine. As much work as possible should be done within the database itself. Coding close to the data as possible makes sense because the code itself does not have the overhead of classic client programming, say Delphi or Java, and is therefore very clean and straightforward.
  • Multi purpose engine. The client has multiple functions and serves as a central point for replication processes: engine of the process of extracting messages from one database and inject them into another database. Also, a configurator for all settings, for example the replication schema. Installation of the Frext system tables and adding triggers to existing tables must be done from within the engine.

Architecture

  • Asynchronous processing. Asynchronous replication is imperative, because databases within the replication model must operate separately and do not have to wait for each other to complete an operation. This is essential when providing application availability when not connected, a situation commonly in mobile applications. Frext implements derived concepts from asynchronous replication like store-and-forward, send-and-forget and guarantied-delivery.
  • No physical contact needed. While synchronizing, databases must not have physical contact with each other. Propagation of changes must be done through messaging, therefore implicitly providing disconnected or asynchronous replication. Physical contact is often very difficult to achieve (firewalls) but imposes also a security risk because the database is direct exposed to the open, e.g. internet.
  • File based messaging. Messages must be transferred through files, simple, extendable and flexible. It's the most straightforward storage and transport mechanism. Computer systems can set up replication by making use of file sharing. Because file based messaging is very easy to understand and handle, it is also very easy to use additional transport mechanisms, like email or FTP, SSH, etc.; this can be implemented on the OS level by scripting and is as such not part of Frext. As a consequence, the transport mechanism is completely transparent from the Frext point of view, so it is possible to have multiple transport channels active at the same time. Imagine a master database which receives messages from the first remote by file sharing, from the second one by mail, and the third one trough FTP.
  • Transaction based. Replication must be based on transaction log processing. Transactions occur in the database as a direct result of modification of data. This means that only data modifications will be replicated; only differences are taken into account. Databases will definitely not be synchronized using direct table comparison, because this would involve that complete tables must be read in and compared locally, meaning a network and a performance problem. The ideal situation should be that the replication process is based on the results of a log viewer reading an existing transaction log, that's how products like Sybase SQL Remote work. Unfortunately, Firebird does not have a transaction log, so we have to build one on our own, but this has no impact on the production table structure.
  • Transaction grouping. All operations within one transaction must be grouped and also executed within one transaction an the remote. All operations must succeed, or none.
  • Message sequencer. Operations, grouped by transactions, must be executed in the same sequence on the receiver as they are produced on the sender; this ensures that foreign key violations can not occur. Whenever messages are missing, the replication system should do a request for resend: missing message mechanism. In the meanwhile, it should temporarily stop replication.
  • Transparent processing. The tool must be as transparent as possible. Every aspect, every stage of messaging must be re traceable and reproducible. First of all, this makes debugging or understanding the process very easy. Transport files must have a XML-format, so it will be immediately clear want kind of data will be replicated. All transactions must be re traceable and re-doable. Log data of executed operations must be replicated to the sender to provide a central point of system management.
  • Hierarchical replication model. Any database which is part of the replication model, called a node, can only have one or more descendants. There are no other connections between any node, i.e. no peer-to-peer interaction. As a result, in any model there should always be a top node. However, there can be multiple levels, i.e. more than two.
  • Heterogeneous database servers. Frext must be able to function as a front end (message consumer) or as back end (message producer); the other end does not have to be an instance of Frext/Firebird. Thus, Frext acts as a connector between different, heterogeneous servers or applications. Purpose: connect different information systems with each other by providing a common interface.
  • Provide data transformation. It should be easy to change (enhanced) data as this is replicated from the sender to the receiver. This can be implemented in two ways: change the produced message files; this is possible because messages are in plain, open, transparent XML-format, so you can choose any preferred XML-tool. The other way is using an intermediate database where changes are done by triggers on that database. The latter option is cleaner in a way that only standard functions of Frext are being used. Because both implementations are outside Frext (but facilitated by Frext through standard operation), data transformation is an implementation issue.
  • **Data partitioning. With partitioning, it can be ensured that each remote node can receive a different set of data, based on individual property values. A common appliance: in a sales organization, each salesman should has only his own customers, thus restricting data transport bandwidth. When using partitioning, sub tables (lower in table dependency rank) should be automatic. Extracts should also be automatic adjusted for partitioning. When data is transferred (e.g. when a customer in a sales organization is transferred to another salesman), realignment should be automatic, without a need of writing customized triggers or otherwise. (Implemented: configuration, creating partitioned extracts. Not implemented: data realignment).
  • *Synchronous replication. Frext syncs in principal asynchronous. Thus, operations are not propagated immediately. Sometimes this is not desirable, just to achieve ACID. For this to work, Frext should also work synchronous.

Installation & Configuration

  • Loosely definable replication model. The model represents all databases involved. The replication model must be as loose as possible. No hard registration of databases, servers, etc. is allowed. In that way, servers in the replication process interact relatively independent from each other.
  • No configuration of order of tables for operation execution. There must be no need to indicate the order in which tables are to be updated, in order to make the design of a replication schema as simple as possible. This will be achieved by executing operations sequentially in the order in which they are created. This goal was a result of my Fibre experience, where each table must be given an order identifier. In that way, different operations (inserts, updates and deletes) on different tables can be done without conflicting with any foreign key constraints. Yet, to achieve that, one must have a deep knowledge about the table relationship within the database. Also, a schema error will mostly not be raised immediately when starting the replication process, but will occur later, in a very specific situation, making it very hard (impossible) to debug.
  • Minimizing impact on the production database structure. The impact on databases on which replication should work, must be kept to the absolute minimum, no modification of existing tables is allowed. This goal was a result of my Fibre experience, where each table in the replication schema has to have an extra identifier field. This is unacceptable because real world applications often can not allowed to be changed. Leaving the production database virtually unchanged is therefore imperative. 
  • Ease of installation. Installation of the Frext system tables should be as easy as this: start the Frext engine with the 'Install' option. Installation of triggers on existing tables must also be done from within Frext.
  • Versatile configuration. Configuration is kept in a set of database tables. Configuration must be done, as one of the configuration possibilities, through an easy to use tool, a flat file editor comes into mind. Export and import of configuration must be made possible. When exported, the configuration must be stored in one-and-only-one configuration file. This file can then be copied easily to different locations and modified with any standard flat file editor. The file must be in XML-format, so it is readable by humans. Through import, it should be easy to set up a remote computer. Because configuration is primarily kept in the database it self, directly manipulating configuration through SQL-commands, must be given as a another way to configure Frext.
  • Auto-configurator for table dependencies and  required fields. While adding a new table in the replication schema, the engine must help by adding automatically tables (and fields) on which that table depends on. Also, the configurator should give warnings when dependent tables are not in the replication schema or when not all table key fields (primary or foreign key) are part of the replication schema.
  • Publication-subscription model. The sending database should publish the tables which might me replicated in a publication; multiple receivers can subscribe to that publication thus making a subscription. The publication-subscription model provides a clean, flexible and robust configuration. What's more, a loosely defined configuration is especially suitable for mobile applications.
  • Field level replication, by inclusion and exclusion. Providing an option to not only replicate a whole table, but also a few fields from that table. This must be configured by including fields, but also by excluding fields which comes near to solving real world problems where in many situations, you are likely to replicate a whole table, except one or two fields.
  • *Remote configuration push. Several remote nodes should be configured through replication, thus making remote administration and configuration possible.

System Management

  • Remote log viewing. The whole process must be monitored from one single point. Therefore, all operations should give confirmations and all these confirmations must be replicated back to the master. Overhead involved is taken as granted. Control is not a thing you can do without, this is just the (little) price you have to pay.
  • Background synchronization. Frext should be working in the background, without manual interaction needed. But Frext should also be operated manually when needed.
  • *Undo/redo capability. It should be possible to undo or redo database operations until a specified point in time. This is useful when a database has to be recovered 
  • *Remote SQL command execution. Frext should be able to execute a SQL command through the replication mechanism. Possible appliances: on the fly modification of table structures, modify stored procs, etc. Facilitates online access simulation.

Database Initialization

  • Built in extract capabilities. When setting up a remote database, in most cases that database must be filled with data. This data is most likely derived directly from the master database. It should be very easy to do such a data extraction. An extract takes care of filling remote databases with data. When partitioning is configured, it should automatically impact extract contents. 
  • Easy data initialization of remote databases. Extracts should travel on the same transport channel as normal messages to provide off line rebuilding of a remote database without DBA or user interaction. This extraction message should have the same format as other data messages, so the same transport mechanism of normal data messages can or must be utilized. Reading such an extract message should be possible without making any changes to the database server or user configuration on any level.
  • *Remote extract request. Remote databases should have an option to request for a new data extraction.