What is replication?

Definition of replication

Replication is the process of copying a database to another server, and then keeping the two copies in sync, so that they behave as near as possible.

Replication copies data or changes of data from one database to an another. Accessing the first database or the second, it does not matter because they are the same, i.e. they are in sync. Replication is not limited to two databases, any number of database can take place into the replication model.

It does not mean that the replicated databases are exactly the same, because replication can be configured that only a subset of tables or columns or rows will be replicated, i.e. partial replication. However, replication ensures that those specific configured objects (others are left alone) are kept in sync between the different databases.

With replication, maintenance of multiple copies of data to different locations is automated. Because each location has its one database, each location works essentially local with maximum performance and security; databases don't have to be accessed outside the local network or workstation.

Replication is a process which needs no interaction from the user. In that way replication is a low level process which is (or should be) completely invisible for the application user. This means that replication by copying databases (through backup & restore) or by using export- and import-scripts are not viable replication mechanisms.

Why use replication?

The major advantages of replication is to:
  • improve data availability and scalability
  • provide fail safe backup
  • provide load spreading
  • provide disconnected operation (mobile solutions)
... without sacrificing performance, security and application usability.

Some examples of usage of replication:
  • enhance availability of data on different locations by distributing data by providing a mirror
  • split workload to different servers by distributing data by providing a mirror
  • merge data from different locations (consolidation)
  • provide a mirror for disaster recovery
  • emulate a multi-company application by sharing a limited number of tables through replication
  • facilitate mobile solutions: application is made independent of network availability (off line)
  • application integration: connect different applications without a common interface
  • provide data transformation: change data in a controlled way
  • audit or analyze database usage
See Appliances: what can replication do for you? for a detailed view.

Types of replication

In replication terms, direct communication is referred to as synchronous replication. Indirect communication is called asynchronous replication. To explain these concepts, first a non-technical example.

Let's imagine, you have to invite someone for a party. You pick up the telephone, dial a number, the other sides picks up the phone, you ask the question and wait for the answer. Once the answer is given, both sides can hang up. This is called synchronous communication: both sides have to be active and the operation, getting answer on a question asked, will succeed always. What happens when you do the telephone call and the other side is away so he or she can't pick up? Then you have to wait and try it later.

Luckily there's a technical solution for this problem: the answering machine. After a predefined number of rings, the machine picks up. Then you're allowed to speak to the machine and leave a message. Once the called notices that new messages have arrived, he or she will call you back. This is called asynchronous communication. Asynchronous communication is also referred to as messaging. Communication takes place, but indirect through an auxiliary device.

As we look closely to the different types of communication, we quickly see the differences:

  • Asynchronous communication:
    • both sides don't have to be in direct contact with each other
    • the caller doesn't know if or when the called will respond; the answer to the question (the invitation) will not be given immediately and, in bad luck, will sometimes never be given. 
    • the caller, in case of absence of the called, can go on with whatever he's been doing and never have to wait for the caller to respond, so there's no lock up
  • Synchronous communication:
    • works immediate; you're always sure that the question can be asked and that the answer is given in a very short time, mostly real time
    • both sides must have contact at the same time with each other for this to happen
    • the caller is completely is locked up when the called is in absence and the caller must know the answer before proceeding to other invitations.

Asynchronous replication

This is like implementing an answering machine on the level of a database. Operations such as inserts, updates and deletes are transferred trough messaging: each operation forms a message. This message is transferred to the other database and processed there. As a result, the operation executed on the master, is remotely executed on the remote. Because the databases don't actually connect to each other but rather communicate through a external message, the process of extracting, transferring and processing messages on the remote could all be different processes. There's no need of doing all the different stages all at once. The different components within the replication architecture don't have to wait for each other to complete.

info.png Messaging is inherently connected to (asynchronous) replication. However, they are not the same, replication only make use of the message concept.

Asynchronous replication systems also makes use of several derived concepts:

  • Store-and-forward means that a message is stored somewhere by the originator, picked up by a message transport system, called a channel, and finally delivered to the receiver.
  • Send-and-forget means that the originator sends a message and then forget about it. Therefore the originator can go on with further processing, even if that means that new messages will be created. Of course, the originator doesn't forget the message. In real world systems, a confirmation of the receiver is needed to actually erase the message from the originator's send queue.
  • Guarantied-delivery means that whenever a message is send to a receiver, the originator can be sure about the fact that the message is actually delivered to the receiver. If not, some kind of notification must be done to let the originator know that deliverance was not possible. Often, guarantied delivery is mistaken by 'guarantied faultless execution': it is guarantied that the message will be applied, but it is not certain if the operation within the message is applied successfully. See ACID.

Synchronous replication

Implementing something like a telephone conversation in database terms is relatively easy because the message component and therefore the indirect contact is missing. First you must ensure that the systems can contact each other whenever necessary. In practice this means that connections are always on. Once a change in the first system, the originator, has occurred, a user has changed a name of a customer for example, the originator contacts the receiver and immediately pushes the change to the other data store. Note however that the change in the originator can only be committed once the receiver has accepted the change also. Committing is now a two phase process. There's no need to store the operation in some kind of queuing mechanism, this makes the overall design very simple.

But there are some (severe) problems with synchronous replication:

  • If for any reason the receiver cannot answer (network down or very slow), the originator cannot commit the change, even in his own system! 
  • Highly dependent of network latency: overall performance is determined by the sum of time needed for all operation and the transport time; the latter can be excessive in low latency network. The committing time (this is the time to actually write a change to a database) is a sum of the times that the two systems need to can commit the operation individually. So, in slow connection conditions, particularly WAN, committing can be slowed down. Sometimes this effect is unacceptable for the end user. 
  • Databases have to have physical contact with each other. Apart from some practical hindrances (can we connect through internet?), one may reject this option for security reasons alone.

Synchronous replication has it opportunities, but only in a strict, controlled environment where network conditions are guarantied.

Mobile applications: off line vs. online

When it comes to mobile applications, immediate raises the question: off line vs. online. It has to with the network connection availability: off line means not connected to the network and thus to the back end.

info.png Technically a "sometimes not connected" application is exactly the same as a "almost never connected" application. Thus a "sometimes not connected" application must be considered as an off line.

Online applications

Online applications work directly on the back end data source, without needing replication. Examples of online applications: web-access (browser on the local device) or terminal serving (Citrix, MS Terminal Server, LTSP icon-external-link.png). These applications rely highly on network connection: when there is none, the application is unusable.
  • Advantages:
    • Data is always correct, no discrepancy with the back office dataset 
    • ACID because there's only one dataset
    • Easy to implement on the data layer, no additional replication layer involved
    • No risk for data loss; there is only one dataset
  • Disadvantages:
    • When there's no network available, the application is not usable
    • Performance highly dependent from bandwidth conditions: low bandwidth does have a large effect on user performance on the mobile application
    • Highly dependent of network latency
    • Higher communication costs when time or volume based network cost billing
    • Harder to implement on the connection layer, additional connection component involved
Online applications do not need a replication layer like Frext, so these are out of scope as Frext concerns.

Off line applications

Off line applications work on a local data source (database), needing replication to copy changes to and from the back end. These applications are virtually independent of a network connection: when there is none, the application is still usable. However, once in a while data should be synced to prevent that data is outdated and to prevent the risk for data loss.
  • Advantages:
    • Always available (data and therefore it's application)
    • Performance not dependent from bandwidth conditions: low bandwidth does not have any effect on user performance on the mobile application
    • Application performance is not dependent from network latency
    • Lower communication costs when time or volume based network cost billing
    • Easy to implement on the connection layer, no additional connection component involved
  • Disadvantages:
    • Data might be outdated
    • Not ACID
    • More difficult to implement on the data layer, an additional replication layer is involved
    • Risk for data loss; non-replicated changes can get lost when the remote database gets corrupted
Frext is a asynchronous replication tool, so it can be used in off line situations.

What is synchronization?

Synchronization is defined as follows: bi-directional or two way replication. This means that data changes can travel both sides: from the master database down to any other remote databases, but also from a remote database up to the master database.

Imagine the following example: on the master database which serves as a data store for the main office, there's a table defined Customer which has the following fields: cust_id, name. Someone on the main office adds a new customer, cust_id = 1, name = 'Customer A'. Through replication, this new customer is added to remote the database. The remote user then can view the new record and then decides to alter the name into 'Customer B'. Again through replication, this modification is propagated back to the master database. In the end, the master database has the following customer record: cust_id = 1, name = 'Customer B'.

warning_small.png Because asynchronous replication can not ensure consistency among all databases (so: not ACID) and because two-way replication even worsens this situation, some problems with synchronization can be anticipated.

info.png Synchronization is often mentioned together with replication, for some they are interchangeable or synonymous. However, pure replication is a one way process.

How do we know which data must be replicated?

When synchronizing two databases, there are basically two options: table comparison and transaction based replication. 

Table comparison

Compare two tables from two locations, distinguishing any differences, and update the other table.

This replication concept looks easy: loop through any record on the master table, check if the record exists on the remote table. Of not, do an insert. Then, compare the records themselves on values of each field. If there are any differences, do an update on the remote table. To detect deletions done on the remote table, thus requiring a delete on the master table to be kept in sync, there's a second comparison needed: each record on the remote table must exist on the master. In other words, delete all records on the master table that doesn't exist in the remote table.

So far so good, but what do we need exactly for making such comparisons: a direct database connection, enough bandwidth and a solution to prevent foreign key violations.

First of all, both databases should be connected directly to the comparison agent. For practical reasons, this can be a problem, let say we have to connect a remote user on a laptop directly to the master database; this is typical not what you want. For flexibility reasons, databases should be loosely coupled, so they should have no knowledge of each other existence. 

There should be enough bandwidth available. This is because the complete contents of both the master and remote tables will be sent over more than once. This is not a problem when replicating data between two databases on the same server on the same network (server-to-server replication), but in all other cases, specifically mobile applications, this is a major bottle neck.

A problem can rise concerning referencing tables, also known as foreign key violations. Imagine a comparison of table A, which is referencing table B. This means that any record in table B which is referenced from table A, must be there to ensure that any insert or update on table A succeeds. In practical terms, table B should be compared first. So, we have to compare tables is the correct order, dependent from the position of the table in the referencing tree. This one was easy, only two tables involved, but in real world applications, we have many more tables, up to a few hundred; this cannot be handled manually. This aside from the problem that deletions on the lowest tables, generic data like countries, only succeed when there are now records in higher tables referencing the lower table; in fact, this is the reverse foreign key dependency problem.

Some things can be organized to avoid these problems: for the direct connect problem, you can consider to extract data from both tables to a file and then do the comparison, but this will make the process more difficult. First of all, in any point in time, some agent must do the extraction. This can originate from a central point, but then a direct connection is still needed. If it originates locally, possibly the extracted data files are already outdated before any comparison takes place. Of course there's the problem how the local engine can be informed that an extraction must take place, how do we organize this one?

For the bandwidth problem, you can add per table an extra field (which is not desired), indicating if the record has been modified. Aside from the problem that this involves changing the table structure by adding a new field, you must organize some kind of mechanism to set those indicators, watch for changes, and reset them after syncing.

As a conclusion, surely there can be places where table comparison is a solution fro replication, providing you have enough bandwidth combined with a simple data model where modifications are allowed for adding extra fields for data modification detection.

Transaction based replication

Record or log which modifications occur, and only replicate those modifications, no more, no less.

Because table comparison is not a viable option by posing a considerable network and processing overhead, a better option to use for replication is transaction log processing. Transactions always occur in the database as a direct result of modification of data. These transactions are recorded and thus can be viewed and processed later, (long) after the operation it self is finished. This means that only data modifications will be replicated; only differences are taken into account so the processing and therefore network and processing overhead is kept to a minimum. 

Databases will definitely not be synchronized by Frext using direct table comparison, because this would involve that complete tables must be read in and compared locally, meaning a network and a performance problem. Transaction based replication is the type of replication used by Frext.

What is partitioning?

Partitioning is used when not every database within the replication model needs the same dataset. Imagine a sales force where each individual has a different, regional working field or responsibility, e.g everyone handles customers from a different country or region; plus: there's no need to see each others data. In that way, the complete customer base is split up, thus: partitioned into smaller pieces. Each salesmen sees only a fraction of the data, but all the data is consolidated on a master database, at the back office. Partitioning is good for reducing data transfers (updates to customers outside the 'owned' set will not be received) and introduces also the concept of data-ownership: who is owner thus responsible for a specific part of the data. Partitioned applications are very scalable because of the low network overhead.

How does Frext fits into this?

  • Frext replicates data asynchronous. An option for synchronous replication will be provided. 
  • Frext uses transaction based replication, thus replicating only differences.
  • Frext handles partitioning
See: Frext architecture