MySQL: Advantages and Disadvantages of Galera

Galera cluster replication are now supported in MySQL InnoDB and Percona XtraDB. Even it is new, this technology is having bright future to be developed and is coming into trend to achieve high availability and scalability on database server.

Advantages:

  • No need to learn new storage engine technology like NDBCluster. Learning new technology will require some time to learn. It just similar to InnoDB with added of cluster functionality.
  • All nodes are equal on read and write at all times. It provides synchronous replication, and can guarantee that replicas are up-to-date and ready for reads or to be promoted to master.
  • It protects you against data loss when a node fails. In fact, because all nodes have all the data, you can lose every node but one and still not lose the data (even if the cluster has a split brain and stops working). This is different from NDB, where the data is partitioned across node groups and some data can be lost if all servers in a node group are lost.
  • Since it is using synchronous replication, replicas cannot fall behind. The write sets are propagated to and certified on every node in the cluster before the transaction commits.
  • Easy to scale with more simple implementation on adding and removing nodes, joining cluster and monitor the cluster status. No need to have management node like MySQL cluster.

Disadvantages:

  • It’s new. There isn’t a huge body of experience with its strengths, weaknesses, and appropriate use cases.
  • The whole cluster performs writes as slowly as the weakest node. Thus, all nodes need similar hardware, and if one node slows down (e.g., because the RAID card does a battery-learn cycle), all of them slow down. If one node has probability P of being slow to accept writes, a three-node cluster has probability 3P of being slow.
  • It isn’t as space-efficient as NDB, because every node has all the data, not just a portion. On the other hand, it is based on Percona XtraDB (which is an enhanced version of InnoDB), so it doesn’t have NDB’s limitations regarding on-disk data.
  • It currently disallows some operational tricks that are possible with asynchronous replication, such as making schema changes offline on a replica and promoting it to be master so you can repeat the changes on other nodes offline. The current alternative is to use a technique such as Percona Toolkit’s online schema change tool. Rolling schema upgrades are nearly ready for release at the time of writing, however.
  • Adding a new node to a cluster requires copying data to it, plus the ability to keep up with ongoing writes, so a big cluster with lots of writes could be hard to grow. This will put a practical limit on the cluster’s data size. We aren’t sure how large this is, but a pessimistic estimate is that it could be as low as 100 GB or so. It could be much larger; time and experience will tell.
  • The replication protocol seems to be somewhat sensitive to network hiccups at the time of writing, and that can cause nodes to stop themselves and drop out of the cluster, so we recommend a high-performance network with good redundancy. If you don’t have a reliable network, you might end up adding nodes back to the cluster too often. This requires a resynchronization of the data. At the time of writing, incremental state transfer to avoid a full copy of the dataset is almost ready to use, so this should not be as much of a problem in the future. It’s also possible to configure Galera to be more tolerant of network timeouts (at the cost of delayed failure detection), and more reliable algorithms are planned for future releases.
  • If you aren’t watching carefully, your cluster could grow too big to restart nodes that fail, just as backups can get too big to restore in a reasonable amount of time if you don’t practice it routinely. We need more practical experience to know how this will work in reality.
  • Because of the cross-node communication required at transaction commit, writes will get slower, and deadlocks and  rollbacks will get more frequent, as you add nodes to the cluster.

Reference: High Performance MySQL 3rd Edition