Quantcast
Channel: Severalnines - clustercontrol
Viewing all 385 articles
Browse latest View live

Announcing ClusterControl 1.7.2: Improved PostgreSQL Backup & Support for TimescaleDB & MySQL 8.0

$
0
0

We are excited to announce the 1.7.2 release of ClusterControl - the only database management system you’ll ever need to take control of your open source database infrastructure.

ClusterControl 1.7.2 marks the first time for supporting time-series data with TimescaleDB; strengthening our mission to provide complete life-cycle support for the best open source databases and expanding our ability to support applications like IoT, Fintech and smart technology.

We continue to improve our support of PostgreSQL by adding new monitoring capabilities to forecast database growth as well as the integration of a top backup technology, pgBackRest, that gives you new ways to manage your PostgreSQL backups.

Release Highlights

New Database Technology Support

  • Support for TimescaleDB 1.2.2 (New Time-Series Database)
  • Support for Oracle MySQL Server 8.0.15
  • Support for Percona Server for MySQL 8.0
  • Improved support for MaxScale 2.2 (Load Balancing)

Improved Monitoring & Backup Management for PostgreSQL

  • New database growth graphs
  • Create or restore full, differential and incremental backups using the pgBackRest technology
  • A new way to do Point in Time Recovery
  • Enable and configure backup compression options

Launch of CC Spotlight

  • CC Spotlight allows you to quickly find and open any page in ClusterControl with a few keystrokes. It also allows you to find individual nodes to quickly perform cluster actions on them.

CMON HA (BETA)

  • CMON HA enables multiple controller servers for fault tolerance. It uses a consensus protocol (raft) to keep multiple controller processes in sync. A production setup consists of a 'leader' CMON process and a set of 'followers' which share storage and state using a MySQL Galera cluster.
ClusterControl
Single Console for Your Entire Database Infrastructure
Find out what else is new in ClusterControl

View Release Details and Resources

Release Details

New Database Technology Support

TimescaleDB: Included in the 1.7.2 release of ClusterControl, we are proud to announce an expansion of the databases we support to include TimescaleDB, a revolutionary new time-series that leverages the stability, maturity and power of PostgreSQL. Learn more about it here.

MySQL 8.0: This new and exciting 8.0 version of MySQL boasts improvements to user management, the introduction of a NoSQL storage engine, new common table expressions, Window functions, and improved spatial support.

Maxscale 2.2: MariaDB Maxscale 2.2 offers full support for MariaDB 10.2, of which support was introduced in ClusterControl 1.5 as well as a variety of other improvements. MariaDB MaxScale is a database proxy that extends the high availability, scalability, and security of MariaDB Server while at the same time simplifying application development by decoupling it from underlying database infrastructure.

Improved Monitoring & Backup Management for PostgreSQL

PostgreSQL Database Growth Graph: This new graph allows you to track the dataset growth of your PostgreSQL database, letting you stay on top performance and plan for your future needs.

Integration of pgBackRest: PgBackRest is among the top open source backup tools for PostgreSQL, mainly because its efficiency to cope with very large volumes of data and the extreme care its creators put into validation of backups via checksums. With it, ClusterControl users will be able to create & restore either full or incremental backups. It also allows for a new method to achieve point-in-time-recovery for PostgreSQL.

Launch of CC Spotlight

This new ClusterControl navigation and search tool can help you easily navigate across both the ClusterControl features as well as your specific deployments. Just click on the search icon or hit CTRL+SPACE on your keyboard to activate Spotlight.

CMON HA (BETA)

CMON HA uses a consensus protocol (raft) to provide a high availability setup with more than one cmon process. It allows you to setup a 'leader' CMON process and a set of 'followers' which share storage and state using a MySQL Galera cluster. In case of failure of the active controller, a new one is promoted to leader.

This enables ClusterControl users to build highly available deployments which would be immune to network partitioning and split brain conditions.


Advanced Database Monitoring & Management for TimescaleDB

$
0
0

Included in the 1.7.2 release of ClusterControl we are proud to announce an expansion of the databases we support to include TimescaleDB, a revolutionary new time-series that leverages the stability, maturity and power of PostgreSQL.

For ClusterControl, this marks the first time for supporting time-series data; strengthening our mission to provide complete life-cycle support for the best open source databases and expanding our ability to support applications like IoT, Fintech and smart technology.

TimescaleDB can ingest large amounts of data and then measure how it changes over time. This ability is crucial to analyzing any data-intensive, time-series data. In addition, anyone who is familiar with SQL-based databases, such as PostgreSQL, would be able to utilize the TimeScaleDB technology.

TimescaleDB Management Features

ClusterControl allows TimescaleDB users the ability to quickly and easily deploy high availability TimeScaleDB setups; point-and-click, using the ClusterControl GUI.

Out of the box monitoring and performance management features provide deep insights into production database workload and query performance.

ClusterControl automates failover and recovery in replication setups, and makes use of HAProxy to provide a single endpoint for applications, ensuring maximum uptime.

In addition, ClusterControl also provides a full-suite of backup management features for TimescaleDB including backup verification, data compression & encryption, Point-in-Time Recovery (PITR), retention management and cloud archiving.

Vinay Joosery, Severalnines CEO, had this to say about Timescale; “TimescaleDB is the first time-series database to be compatible with SQL. This allows the user to leverage the power of the technology with the stability and support of an existing open-source community. ClusterControl is a better way to run TimescaleDB because you do not need to cobble together multiple tools to run and manage the database.”

ClusterControl
Single Console for Your Entire Database Infrastructure
Find out what else is new in ClusterControl

Applications for Timescale and ClusterControl

TimescaleDB is perfectly suited to applications who need to track a large amount of incoming data and examine how that data changes over time.

  • Internet of Things (IoT) - While the array of products that fall in the IoT category are vast, TimescaleDB enables many scenarios to be successful. Timescale allows IoT companies to “go deep” in their analysis of the data hidden inside the usage of these devices, information that can be used to build new products and features.
  • Systems Monitoring - high-traffic applications require time-series data to be able to analyze and understand usage patterns of their users. TimescaleDB provides the ability to do this at scale, handling and making sense of large amounts of data inputs.
  • Business Analytics - Analyzing time-series data allows businesses to extract meaningful statistics and other characteristics. This data could include transaction data, trends, or pricing.
  • FinTech - Time-series financial analysis allows users to better understand the marketplace and improves their ability to generate quality forecasts. Because TimescaleDB is built for volume and speed, FinTech companies can utilize it to “cast a wide net” and process data at an even faster rate than before.

Timescale can be deployed and monitored for free using the ClusterControl Community Edition and all additional features can be tested in a 30-day trial of ClusterControl Enterprise which can be downloaded for free.

Benchmarking Manual Database Deployments vs Automated Deployments

$
0
0

There are multiple ways of deploying a database. You can install it by hand, you can rely on the widely available infrastructure orchestration tools like Ansible, Chef, Puppet or Salt. Those tools are very popular and it is quite easy to find scripts, recipes, playbooks, you name it, which will help you automate the installation of a database cluster. There are also more specialized database automation platforms, like ClusterControl, which can also be used to automated deployment. What would be the best way of deploying your cluster? How much time you will actually need to deploy it?

First, let us clarify what we want to do. Let’s assume we will be deploying Percona XtraDB Cluster 5.7. It will consist of three nodes and for that we will use three Vagrant virtual machines running Ubuntu 16.04 (bento/ubuntu-16.04 image). We will attempt to deploy a cluster manually, then using Ansible and ClusterControl. Let’s see how the results will look like.

Manual Deployment

Repository Setup - 1 minute, 45 seconds.

First of all, we have to configure Percona repositories on all Ubuntu nodes. Quick google search, ssh into the virtual machines and running required commands takes 1m45s

We found the following page with instructions:
https://www.percona.com/doc/percona-repo-config/percona-release.html

and we executed steps described in “DEB-BASED GNU/LINUX DISTRIBUTIONS” section. We also ran apt update, to refresh apt’s cache.

Installing PXC Nodes - 2 minutes 45 seconds

This step basically consists of executing:

root@vagrant:~# apt install percona-xtradb-cluster-5.7

The rest is mostly dependent on your internet connection speed as packages are being downloaded. Your input will also be needed (you’ll be passing a password for the superuser) so it is not unattended installation. When everything is done, you will end up with three running Percona XtraDB Cluster nodes:

root     15488  0.0  0.2   4504  1788 ?        S    10:12   0:00 /bin/sh /usr/bin/mysqld_safe
mysql    15847  0.3 28.3 1339576 215084 ?      Sl   10:12   0:00  \_ /usr/sbin/mysqld --basedir=/usr --datadir=/var/lib/mysql --plugin-dir=/usr/lib/mysql/plugin --user=mysql --wsrep-provider=/usr/lib/galera3/libgalera_smm.so --log-error=/var/log/mysqld.log --pid-file=/var/run/mysqld/mysqld.pid --socket=/var/run/mysqld/mysqld.sock --wsrep_start_position=00000000-0000-0000-0000-000000000000:-1

Configuring PXC nodes - 3 minutes, 25 seconds

Here starts the tricky part. It is really hard to quantify experience and how much time one would need to actually understand what is needed to be done. What is good, google search “how to install percona xtrabdb cluster” points to Percona’s documentation, which describes how the process should look like. It still may take more or less time, depending on how familiar you are with the PXC and Galera in general. Worst case scenario you will not be aware of any additional required actions and you will connect to your PXC and start working with it, not realizing that, in fact, you have three nodes, each forming a cluster of its own.

Let’s assume we follow the recommendation from Percona and time just those steps to be executed. In short, we modified configuration files as per instructions on the Percona website, we also attempted to bootstrap the first node:

root@vagrant:~# /etc/init.d/mysql bootstrap-pxc
mysqld: [ERROR] Found option without preceding group in config file /etc/mysql/my.cnf at line 10!
mysqld: [ERROR] Fatal error in defaults handling. Program aborted!
mysqld: [ERROR] Found option without preceding group in config file /etc/mysql/my.cnf at line 10!
mysqld: [ERROR] Fatal error in defaults handling. Program aborted!
mysqld: [ERROR] Found option without preceding group in config file /etc/mysql/my.cnf at line 10!
mysqld: [ERROR] Fatal error in defaults handling. Program aborted!
mysqld: [ERROR] Found option without preceding group in config file /etc/mysql/my.cnf at line 10!
mysqld: [ERROR] Fatal error in defaults handling. Program aborted!
 * Bootstrapping Percona XtraDB Cluster database server mysqld                                                                                                                                                                                                                     ^C

This did not look correct. Unfortunately, instructions weren’t crystal clear. Again, if you don’t know what is going on, you will spend more time trying to understand what happened. Luckily, stackoverflow.com comes very helpful (although not the first response on the list that we got) and you should realise that you miss [mysqld] section header in your /etc/mysql/my.cnf file. Adding this on all nodes and repeating the bootstrap process solved the issue. In total we spent 3 minutes and 25 seconds (not including googling for the error as we noticed immediately what was the problem).

Configuring for SST, Bringing Other Nodes Into the Cluster - Starting From 8 Minutes to Infinity

The instructions on Percona web site are quite clear. Once you have one node up and running, just start remaining nodes and you will be fine. We tried that and we were unable to see more nodes joining the cluster. This is where it is virtually impossible to tell how long it will take to diagnose the issue. It took us 6-7 minutes but to be able to do it quickly you have to:

  1. Be familiar with how PXC configuration is structured:
    root@vagrant:~# tree  /etc/mysql/
    /etc/mysql/
    ├── conf.d
    │   ├── mysql.cnf
    │   └── mysqldump.cnf
    ├── my.cnf -> /etc/alternatives/my.cnf
    ├── my.cnf.fallback
    ├── my.cnf.old
    ├── percona-xtradb-cluster.cnf
    └── percona-xtradb-cluster.conf.d
        ├── client.cnf
        ├── mysqld.cnf
        ├── mysqld_safe.cnf
        └── wsrep.cnf
  2. Know how the !include and !includedir directives work in MySQL configuration files
  3. Know how MySQL handles the same variables included in multiple files
  4. Know what to look for and be aware of configurations that would result in node bootstrapping itself to form a cluster on its own

The problem was related to the fact that instructions did not mention any file except for /etc/mysql/my.cnf where, in fact, we should have been modifying /etc/mysql/percona-xtradb-cluster.conf.d/wsrep.cnf. That file contained empty variable:

wsrep_cluster_address=gcomm://

and such configuration forces node to bootstrap as it does not have information about other nodes to join to. We set that variable in /etc/mysql/my.cnf but later wsrep.cnf file was included, overwriting our setup.

This issue might be a serious blocker for people who are not really familiar with how MySQL and Galera works, resulting even in hours if not more of debugging.

Total Installation Time - 16 minutes (If You Are MySQL DBA Like I Am)

We managed to install Percona XtraDB Cluster in 16 minutes. You have to keep in mind a couple of things - we did not tune the configuration. This is something which will require more time and knowledge. PXC node comes with some simple configuration, related mostly to binary logging and Galera writeset replication. There is no InnoDB tuning. If you are not familiar with MySQL internals, this is hours if not days of reading and familiarizing yourself with internal mechanisms. Another important thing is that this is a process you would have to re-apply for every cluster you deploy. Finally, we managed to identify the issue and solve it very fast due to our experience with Percona XtraDB Cluster and MySQL in general. Casual user will most likely spend significantly more time trying to understand what is going on and why.

Ansible Playbook

Now, on to automation with Ansible. Let’s try to find and use an ansible playbook, which we could reuse for all further deployments. Let’s see how long will it take to do that.

Configuring SSH Connectivity - 1 minute

Ansible requires SSH connectivity across all the nodes to connect and configure them. We generated a SSH key and manually distributed it across the nodes.

Finding Ansible Playbook - 2 minutes 15 seconds

The main issue here is that there are so many playbooks available out there that it is impossible to decide what’s best. As such, we decided to go with top 3 Google results and try to pick one. We decided on https://github.com/cdelgehier/ansible-role-XtraDB-Cluster as it seems to be more configurable than the remaining ones.

Cloning Repository and Installing Ansible - 30 seconds

This is quick, all we needed was to

apt install ansible git
git clone https://github.com/cdelgehier/ansible-role-XtraDB-Cluster.git

Preparing Inventory File - 1 minute 10 seconds

This step was also very simple, we created an inventory file using example from documentation. We just substituted IP addresses of the nodes to what we have configured in our environment.

Preparing a Playbook - 1 minute 45 seconds

We decided to use the most extensive example from the documentation, which includes also a bit of the configuration tuning. We prepared a correct structure for the Ansible (there was no such information in the documentation):

/root/pxcansible/
├── inventory
├── pxcplay.yml
└── roles
    └── ansible-role-XtraDB-Cluster

Then we ran it but immediately we got an error:

root@vagrant:~/pxcansible# ansible-playbook pxcplay.yml
 [WARNING]: provided hosts list is empty, only localhost is available

ERROR! no action detected in task

The error appears to have been in '/root/pxcansible/roles/ansible-role-XtraDB-Cluster/tasks/main.yml': line 28, column 3, but may
be elsewhere in the file depending on the exact syntax problem.

The offending line appears to be:


- name: "Include {{ ansible_distribution }} tasks"
  ^ here
We could be wrong, but this one looks like it might be an issue with
missing quotes.  Always quote template expression brackets when they
start a value. For instance:

    with_items:
      - {{ foo }}

Should be written as:

    with_items:
      - "{{ foo }}"

This took 1 minute and 45 seconds.

Fixing the Playbook Syntax Issue - 3 minutes 25 seconds

The error was misleading but the general rule of thumb is to try more recent Ansible version, which we did. We googled and found good instructions on Ansible website. Next attempt to run the playbook also failed:

TASK [ansible-role-XtraDB-Cluster : Delete anonymous connections] *****************************************************************************************************************************************************************************************************************
fatal: [node2]: FAILED! => {"changed": false, "msg": "The PyMySQL (Python 2.7 and Python 3.X) or MySQL-python (Python 2.X) module is required."}
fatal: [node3]: FAILED! => {"changed": false, "msg": "The PyMySQL (Python 2.7 and Python 3.X) or MySQL-python (Python 2.X) module is required."}
fatal: [node1]: FAILED! => {"changed": false, "msg": "The PyMySQL (Python 2.7 and Python 3.X) or MySQL-python (Python 2.X) module is required."}

Setting up new Ansible version and running the playbook up to this error took 3 minutes and 25 seconds.

Fixing the Missing Python Module - 3 minutes 20 seconds

Apparently, the role we used did not take care of its prerequisites and a Python module was missing for connecting to and securing the Galera cluster. We first tried to install MySQL-python via pip but it became apparent that it will take more time as it required mysql_config:

root@vagrant:~# pip install MySQL-python
Collecting MySQL-python
  Downloading https://files.pythonhosted.org/packages/a5/e9/51b544da85a36a68debe7a7091f068d802fc515a3a202652828c73453cad/MySQL-python-1.2.5.zip (108kB)
    100% |████████████████████████████████| 112kB 278kB/s
    Complete output from command python setup.py egg_info:
    sh: 1: mysql_config: not found
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/tmp/pip-build-zzwUtq/MySQL-python/setup.py", line 17, in <module>
        metadata, options = get_config()
      File "/tmp/pip-build-zzwUtq/MySQL-python/setup_posix.py", line 43, in get_config
        libs = mysql_config("libs_r")
      File "/tmp/pip-build-zzwUtq/MySQL-python/setup_posix.py", line 25, in mysql_config
        raise EnvironmentError("%s not found" % (mysql_config.path,))
    EnvironmentError: mysql_config not found

    ----------------------------------------
Command "python setup.py egg_info" failed with error code 1 in /tmp/pip-build-zzwUtq/MySQL-python/

That is provided by MySQL development libraries so we would have to install them manually, which was pretty much pointless. We decided to go with PyMySQL, which did not require other packages to install. This brought us to another issue:

TASK [ansible-role-XtraDB-Cluster : Delete anonymous connections] *****************************************************************************************************************************************************************************************************************
fatal: [node3]: FAILED! => {"changed": false, "msg": "unable to connect to database, check login_user and login_password are correct or /root/.my.cnf has the credentials. Exception message: (1698, u\"Access denied for user 'root'@'localhost'\")"}
fatal: [node2]: FAILED! => {"changed": false, "msg": "unable to connect to database, check login_user and login_password are correct or /root/.my.cnf has the credentials. Exception message: (1698, u\"Access denied for user 'root'@'localhost'\")"}
fatal: [node1]: FAILED! => {"changed": false, "msg": "unable to connect to database, check login_user and login_password are correct or /root/.my.cnf has the credentials. Exception message: (1698, u\"Access denied for user 'root'@'localhost'\")"}
    to retry, use: --limit @/root/pxcansible/pxcplay.retry

Up to this point we spent 3 minutes and 20 seconds.

Fixing “Access Denied” Error - 18 minutes 55 seconds

As per error, we did ensure that MySQL config is prepared correctly and that it included correct user and password to connect to the database. This, unfortunately, did not work as expected. We did investigate further and found that the role did not create root user properly, even though it marked the step as completed. We did a short investigation but decided to make the manual fix instead of trying to debug the playbook, which would take way more time than the steps which we did. We just created manually users root@127.0.0.1 and root@localhost with correct passwords. This allowed us to pass this step and onto another error:

TASK [ansible-role-XtraDB-Cluster : Start the master node] ************************************************************************************************************************************************************************************************************************
skipping: [node1]
skipping: [node2]
skipping: [node3]

TASK [ansible-role-XtraDB-Cluster : Start the master node] ************************************************************************************************************************************************************************************************************************
skipping: [node1]
skipping: [node2]
skipping: [node3]

TASK [ansible-role-XtraDB-Cluster : Create SST user] ******************************************************************************************************************************************************************************************************************************
skipping: [node1]
skipping: [node2]
skipping: [node3]

TASK [ansible-role-XtraDB-Cluster : Start the slave nodes] ************************************************************************************************************************************************************************************************************************
fatal: [node3]: FAILED! => {"changed": false, "msg": "Unable to start service mysql: Job for mysql.service failed because the control process exited with error code. See \"systemctl status mysql.service\" and \"journalctl -xe\" for details.\n"}
fatal: [node2]: FAILED! => {"changed": false, "msg": "Unable to start service mysql: Job for mysql.service failed because the control process exited with error code. See \"systemctl status mysql.service\" and \"journalctl -xe\" for details.\n"}
fatal: [node1]: FAILED! => {"changed": false, "msg": "Unable to start service mysql: Job for mysql.service failed because the control process exited with error code. See \"systemctl status mysql.service\" and \"journalctl -xe\" for details.\n"}
    to retry, use: --limit @/root/pxcansible/pxcplay.retry

For this section we spent 18 minutes and 55 seconds.

Fixing “Start the Slave Nodes” Issue (part 1) - 7 minutes 40 seconds

We tried a couple of things to solve this problem. We tried to specify node using its name, we tried to switch group names, nothing solved the issue. We decided to clean up the environment using the script provided in the documentation and start from scratch. It did not clean it but just made things even worse. After 7 minutes and 40 seconds we decided to wipe out the virtual machines, recreate the environment and start from scratch hoping that when we add the Python dependencies, this will solve our issue.

Fixing “Start the Slave Nodes” Issue (part 2) - 13 minutes 15 seconds

Unfortunately, setting up Python prerequisites did not help at all. We decided to finish the process manually, bootstrapping the first node and then configuring SST user and starting remaining slaves. This completed the “automated” setup and it took us 13 minutes and 15 seconds to debug and then finally accept that it will not work like the playbook designer expected.

Further Debugging - 10 minutes 45 seconds

We did not stop there and decided that we’ll try one more thing. Instead of relying on Ansible variables we just put the IP of one of the nodes as the master node. This solved that part of the problem and we ended up with:

TASK [ansible-role-XtraDB-Cluster : Create SST user] ******************************************************************************************************************************************************************************************************************************
skipping: [node2]
skipping: [node3]
fatal: [node1]: FAILED! => {"changed": false, "msg": "unable to connect to database, check login_user and login_password are correct or /root/.my.cnf has the credentials. Exception message: (1045, u\"Access denied for user 'root'@'::1' (using password: YES)\")"}

This was the end of our attempts - we tried to add this user but it did not work correctly through the ansible playbook while we could use IPv6 localhost address to connect to when using MySQL client.

Total Installation Time - Unknown (Automated Installation Failed)

In total we spent 64 minutes and we still haven’t managed to get things going automatically. The remaining problems are root password creation which doesn’t seem to work and then getting the Galera Cluster started (SST user issue). It is hard to tell how long will it take to debug it further. It is sure possible - it is just hard to quantify because it really depends on the experience with Ansible and MySQL. It is definitely not something anyone can just download, configure and run. Well, maybe another playbook would have worked differently? It is possible, but it may as well result in different issues. Ok, so there is a learning curve to climb and debugging to make but then, when you are all set, you will just run a script. Well, that’s sort of true. As long as changes introduced by the maintainer won’t break something you depend on or new Ansible version will break the playbook or the maintainer will just forget about the project and stop developing it (for the role that we used there’s quite useful pull request waiting already for almost a year, which might be able to solve the Python dependency issue - it has not been merged). Unless you accept that you will have to maintain this code, you cannot really rely on it being 100% accurate and working in your environment, especially given that the original developer has no incentives in keeping the code up to date. Also, what about other versions? You cannot use this particular playbook to install PXC 5.6 or any MariaDB version. Sure, there are other playbooks you can find. Will they work better or maybe you’ll spend another bunch of hours trying to make them to work?

ClusterControl
Single Console for Your Entire Database Infrastructure
Find out what else is new in ClusterControl

ClusterControl

Finally, let’s take a look at how ClusterControl can be used to deploy Percona XtraDB Cluster.

Configuring SSH Connectivity - 1 minute

ClusterControl requires SSH connectivity across all the nodes to connect and configure them. We generated a SSH key and manually distributed it across the nodes.

Setting Up ClusterControl - 3 minutes 15 seconds

Quick search “ClusterControl install” pointed us to relevant ClusterControl documentation page. We were looking for a “simpler way to install ClusterControl” therefore we followed the link and found following instructions.

Downloading the script and running it took 3 minutes and 15 seconds, we had to take some actions while installation proceeded so it is not unattended installation.

Logging Into UI and Deployment Start - 1 minute 10 seconds

We pointed our browser to the IP of ClusterControl node.

We passed the required contact information and we were presented with the Welcome screen:

Next step - we picked the deployment option.

We had to pass SSH connectivity details.

We also decided on the vendor, version, password and hosts to use. This whole process took 1 minute and 10 seconds.

Percona XtraDB Cluster Deployment - 12 minutes 5 seconds

The only thing left was to wait for ClusterControl to finish the deployment. After 12 minutes and 5 seconds the cluster was ready:

Total Installation Time - 17 minutes 30 seconds

We managed to deploy ClusterControl and then PXC cluster using ClusterControl in 17 minutes and 30 seconds. The PXC deployment itself took 12 minutes and 5 seconds. At the end we have a working cluster, deployed according to the best practices. ClusterControl also ensures that the configuration of the cluster makes sense. In short, even if you don't really know anything about MySQL or Galera Cluster, you can have a production-ready cluster deployed in a couple of minutes. ClusterControl is not just a deployment tool, it is also management platform - makes things even easier for people not experienced with MySQL and Galera to identify performance problems (through advisors) and do management actions (scaling the cluster up and down, running backups, creating asynchronous slaves to Galera). What is important, ClusterControl will always be maintained and can be used to deploy all MySQL flavors (and not only MySQL/MariaDB, it also supports TimeScaleDB, PostgreSQL and MongoDB). It also worked out of the box, something which cannot be said about other methods we tested.

If you would like to experience the same, you can download ClusterControl for free. Let us know how you liked it.

Introducing ClusterControl Spotlight Search

$
0
0

Included in the latest release, ClusterControl 1.7.2 introduces our new and exciting search functionality we’re calling “ClusterControl Spotlight.”

ClusterControl Spotlight allows you to...

  • Navigate the application faster
  • Execute any action from any page within the application
  • Discover faster new and existing features
  • Find what you are looking for faster than ever before

With ClusterControl Spotlight you will be able to speed up your daily workflow and navigate through the application; executing actions with only a few keys without leaving your keyboard.

How Does ClusterControl Spotlight Work?

ClusterControl Spotlight gives you the ability to search and quickly find your clusters and cluster actions, nodes and nodes actions and all other pages of the applications which don’t necessarily belong to a cluster.

ClusterControl Spotlight can be opened by clicking on the Search icon in the Main left navigation bar and (for your convenience) by using the shortcut - Control (or Ctrl) ⌃ + SPACE (on Mac) and CTRL + SPACE (on Windows and Linux).

Now let’s dig deeper into how Spotlight works...

Finding a Specific Node and Navigating to its Overview Page

First let’s open the ClusterControl Spotlight by clicking the shortcut mentioned above Control (or Ctrl) ⌃ + SPACE (on Mac). Right after you open “Spotlight” you will see all of your clusters listed.

Let’s select the first item in the list. You will see a list of all the nodes which are part of this cluster. At this stage you have 3 different options; select node from your cluster, execute cluster actions, or navigate to a cluster inner page. As we would like to go to the overview page of a specific node let’s select one of the nodes from that cluster.  You will then be presented with specific actions you can take with this node as well as the option to go to the overview page.

With this functionality you will be able to navigate to a specific page a lot faster than the traditional method of clicking on the menu items.

CluserControl Spotlight: Find specific node and navigate to his overview page
CluserControl Spotlight: Find specific node and navigate to his overview page

Executing Actions with Spotlight

In order to execute actions with Spotlight we again need to open it by clicking the Search icon on the left side menu or by shortcut combination mentioned above.

Spotlight shows contextual actions meaning if you select a cluster you will be able to see and select all the main cluster actions. (If you select a node then nodes actions will be presented.)

You could also just type the name of the action you are trying to perform and then Spotlight will list all the cluster or nodes where this action could be executed. Spotlight gives you the freedom to execute any actions from any place in ClusterControl.

ClusterControl Spotlight: Execute node actions
ClusterControl Spotlight: Execute node actions
ClusterControl
Single Console for Your Entire Database Infrastructure
Find out what else is new in ClusterControl

Navigating to Any ClusterControl Page

Now that we have shown you how to quickly perform actions on your nodes and cluster, I would like to show you how you can go to any page, anywhere in the application. Let’s say we are in the “Backup” section of a Galera Cluster setup and we would like to go the “Schemas and Users” which lives in the “Manage” section of our Replication Cluster.

Simply open Spotlight and then select the Replication Cluster and type “Manage.” Select the manage section listed below and “voilà!” we are already there without having to move our fingers from the keyboard.

Conclusion

I would like to encourage you to try ClusterControl Spotlight as we believe with it you could execute your daily tasks much faster. The automation at the core of ClusterControl has always been there to save you time and money in your database management tasks and CC Spotlight adds an even greater way to perform more actions with speed and precision.

If you are a ClusterControl user, try it by clicking the search Icon in the left side menu or by shortcut combination - Control (or Ctrl) ⌃ + SPACE (on Mac) and CTRL + SPACE (on Windows and Linux). If you are not, what are you waiting for? Download a free trial today.

Let us know what you think of this new feature in the comments below.

How to Deploy PostgreSQL to a Docker Container Using ClusterControl

$
0
0

Docker has become the most common tool to create, deploy, and run applications by using containers. It allows us to package up an application with all of the parts it needs, such as libraries and other dependencies, and ship it all out as one package. Docker could be considered as a virtual machine, but instead of creating a whole virtual operating system, Docker allows applications to use the same Linux kernel as the system that they're running on and only requires applications to be shipped with things not already running on the host computer. This gives a significant performance boost and reduces the size of the application.

In this blog, we’ll see how we can easily deploy a PostgreSQL setup via Docker, and how to turn our setup in a primary/standby replication setup by using ClusterControl.

How to Deploy PostgreSQL with Docker

First, let’s see how to deploy PostgreSQL with Docker manually by using a PostgreSQL Docker Image.

The image is available on Docker Hub and you can find it from the command line:

$ docker search postgres
NAME                                         DESCRIPTION                                     STARS               OFFICIAL            AUTOMATED
postgres                                     The PostgreSQL object-relational database sy…   6519                [OK]

We’ll take the first result. The Official one. So, we need to pull the image:

$ docker pull postgres

And run the node containers mapping a local port to the database port into the container:

$ docker run -d --name node1 -p 6551:5432 postgres
$ docker run -d --name node2 -p 6552:5432 postgres
$ docker run -d --name node3 -p 6553:5432 postgres

After running these commands, you should have this Docker environment created:

$ docker ps
CONTAINER ID        IMAGE                         COMMAND                  CREATED             STATUS                 PORTS                                                                                     NAMES
51038dbe21f8        postgres                      "docker-entrypoint.s…"   About an hour ago   Up About an hour       0.0.0.0:6553->5432/tcp                                                                    node3
b7a4211744e3        postgres                      "docker-entrypoint.s…"   About an hour ago   Up About an hour       0.0.0.0:6552->5432/tcp                                                                    node2
229c6bd23ff4        postgres                      "docker-entrypoint.s…"   About an hour ago   Up About an hour       0.0.0.0:6551->5432/tcp                                                                    node1

Now, you can access each node with the following command:

$ docker exec -ti [db-container] bash
$ su postgres
$ psql
psql (11.2 (Debian 11.2-1.pgdg90+1))
Type "help" for help.
postgres=#

Then, you can create a database user, change the configuration according to your requirements or configure replication between the nodes manually.

How to Import Your PostgreSQL Containers into ClusterControl

Now that you've setup your PostgreSQL cluster, you still need to monitor it, alert in case of performance issues, manage backups, detect failures and automatically failover to a healthy server.

If you already have a PostgreSQL cluster running on Docker and you want ClusterControl to manage it, you can simply run the ClusterControl container in the same Docker network as the database containers. The only requirement is to ensure the target containers have SSH related packages installed (openssh-server, openssh-clients). Then allow passwordless SSH from ClusterControl to the database containers. Once done, use the “Import Existing Server/Cluster” feature and the cluster should be imported into ClusterControl.

First, let’s Install OpenSSH related packages on the database containers, allow the root login, start it up and set the root password:

$ docker exec -ti [db-container] apt-get update
$ docker exec -ti [db-container] apt-get install -y openssh-server openssh-client
$ docker exec -it [db-container] sed -i 's|^PermitRootLogin.*|PermitRootLogin yes|g' /etc/ssh/sshd_config
$ docker exec -it [db-container] sed -i 's|^#PermitRootLogin.*|PermitRootLogin yes|g' /etc/ssh/sshd_config
$ docker exec -ti [db-container] service ssh start
$ docker exec -it [db-container] passwd

Start the ClusterControl container (if it’s not started) and forward port 80 on the container to port 5000 on the host:

$ docker run -d --name clustercontrol -p 5000:80 severalnines/clustercontrol

Verify the ClusterControl container is up:

$ docker ps | grep clustercontrol
7eadb6bb72fb        severalnines/clustercontrol   "/entrypoint.sh"         4 hours ago         Up 4 hours (healthy)   22/tcp, 443/tcp, 3306/tcp, 9500-9501/tcp, 9510-9511/tcp, 9999/tcp, 0.0.0.0:5000->80/tcp   clustercontrol

Open a web browser, go to http://[Docker_Host]:5000/clustercontrol and create a default admin user and password. You should now see the ClusterControl main page.

The last step is setting up the passwordless SSH to all database containers. For this, we need to know the IP Address for each database node. To know it, we can run the following command for each node:

$ docker inspect [db-container] |grep IPAddress
            "IPAddress": "172.17.0.6",

Then, attach to the ClusterControl container interactive console:

$ docker exec -it clustercontrol bash

Copy the SSH key to all database containers:

$ ssh-copy-id 172.17.0.6
$ ssh-copy-id 172.17.0.7
$ ssh-copy-id 172.17.0.8

Now, we can start to import the cluster into ClusterControl. Open a web browser and go to Docker’s physical host IP address with the mapped port, e.g http://192.168.100.150:5000/clustercontrol, click on “Import Existing Server/Cluster”, and then add the following information.

We must specify User, Key or Password and port to connect by SSH to our servers. We also need a name for our new cluster.

After setting up the SSH access information, we must define the database user, version, basedir and the IP Address or Hostname for each database node.

Make sure you get the green tick when entering the hostname or IP address, indicating ClusterControl is able to communicate with the node. Then, click the Import button and wait until ClusterControl finishes its job. You can monitor the process in the ClusterControl Activity Section.

The database cluster will be listed under the ClusterControl dashboard once imported.

Note that, if you only have a PostgreSQL master node, you could add it into ClusterControl. Then you can add the standby nodes from the ClusterControl UI to allow ClusterControl to configure them for you.

ClusterControl
Single Console for Your Entire Database Infrastructure
Find out what else is new in ClusterControl

How to Deploy Your PostgreSQL Containers with ClusterControl

Now, let’s see how to deploy PostgreSQL with Docker by using a CentOS Docker Image (severalnines/centos-ssh) and a ClusterControl Docker Image (severalnines/clustercontrol).

First, we’ll deploy a ClusterControl Docker Container using the latest version, so we need to pull the severalnines/clustercontrol Docker Image.

$ docker pull severalnines/clustercontrol

Then, we’ll run the ClusterControl container and publish the port 5000 to access it.

$ docker run -d --name clustercontrol -p 5000:80 severalnines/clustercontrol

Now you can open the ClusterControl UI at http://[Docker_Host]:5000/clustercontrol and create a default admin user and password.

The severalnines/centos-ssh comes with, in addition to the SSH service enabled, an Auto Deployment feature, but it’s only valid for Galera Cluster. PostgreSQL is not supported yet. So, we’ll set the AUTO_DEPLOYMENT variable in 0 in the docker run command to create the databases nodes.

$ docker run -d --name postgres1 -p 5551:5432 --link clustercontrol:clustercontrol -e AUTO_DEPLOYMENT=0 severalnines/centos-ssh
$ docker run -d --name postgres2 -p 5552:5432 --link clustercontrol:clustercontrol -e AUTO_DEPLOYMENT=0 severalnines/centos-ssh
$ docker run -d --name postgres3 -p 5553:5432 --link clustercontrol:clustercontrol -e AUTO_DEPLOYMENT=0 severalnines/centos-ssh

After running these commands, we should have the following Docker environment:

$ docker ps
CONTAINER ID        IMAGE                         COMMAND             CREATED             STATUS                    PORTS                                                                                     NAMES
0df916b918a9        severalnines/centos-ssh       "/entrypoint.sh"    4 seconds ago       Up 3 seconds              22/tcp, 3306/tcp, 9999/tcp, 27107/tcp, 0.0.0.0:5553->5432/tcp                             postgres3
4c1829371b5e        severalnines/centos-ssh       "/entrypoint.sh"    11 seconds ago      Up 10 seconds             22/tcp, 3306/tcp, 9999/tcp, 27107/tcp, 0.0.0.0:5552->5432/tcp                             postgres2
79d4263dd7a1        severalnines/centos-ssh       "/entrypoint.sh"    32 seconds ago      Up 31 seconds             22/tcp, 3306/tcp, 9999/tcp, 27107/tcp, 0.0.0.0:5551->5432/tcp                             postgres1
7eadb6bb72fb        severalnines/clustercontrol   "/entrypoint.sh"    38 minutes ago      Up 38 minutes (healthy)   22/tcp, 443/tcp, 3306/tcp, 9500-9501/tcp, 9510-9511/tcp, 9999/tcp, 0.0.0.0:5000->80/tcp   clustercontrol

We need to know the IP Address for each database node. To know it, we can run the following command for each node:

$ docker inspect [db-container] |grep IPAddress
            "IPAddress": "172.17.0.3",

Now we have the server nodes up and running, we need to deploy our database cluster. To make it in an easy way we’ll use ClusterControl.

To perform a deployment from ClusterControl, open the ClusterControl UI at http://[Docker_Host]:5000/clustercontrol, then select the option “Deploy” and follow the instructions that appear.

When selecting PostgreSQL, we must specify User, Key or Password and port to connect by SSH to our servers. We also need a name for our new cluster and if we want ClusterControl to install the corresponding software and configurations for us.

After setting up the SSH access information, we must define the database user, version and datadir (optional). We can also specify which repository to use.

In the next step, we need to add our servers to the cluster that we are going to create.

When adding our servers, we can enter IP or hostname. Here we must use the IP Address that we got from each container previously.

In the last step, we can choose if our replication will be Synchronous or Asynchronous.

We can monitor the status of the creation of our new cluster from the ClusterControl activity monitor.

Once the task is finished, we can see our cluster in the main ClusterControl screen.

Conclusion

As we could see, the deploy of PostgreSQL with Docker could be easy at the beginning but it’ll require a bit more work to configure replication. Finally, you should monitor your cluster to see what is happening. With ClusterControl, you can import or deploy your PostgreSQL cluster with Docker, as well as automate the monitoring and management tasks like backup and automatic failover/recovery. Try it out.

How to Automate Deployment of MySQL Galera Cluster using s9s CLI and Chef

$
0
0

In our previous blog, we showed how devops can automate your daily database tasks with Chef. Now, let's see how we can quickly deploy a MySQL Galera Cluster with Chef using s9s CLI.

Setting up a Galera Cluster manually can be fast for an experienced DBA. Automating it with something like Ansible may take a hell of a lot longer, as our colleague Krzysztof found out in this blog. While automating deployment of a database cluster with Chef is certainly doable, it is not an easy task as you can end up with hundreds of lines of code which can be hard to maintain when there are updates to the software. We will demonstrate here how you can integrate the s9s CLI in your deployment cookbook and speed up the process.

About the s9s CLI

s9s is the CLI tool for ClusterControl, you can use it to integrate with your runbooks for automation like Ansible, Puppet, Chef or Salt. This allows you to easily integrate database management functionality into your orchestration scripts. The command line tool allows you to interact, control and manage your database infrastructure using ClusterControl. For instance, we used it here to automate deployment when running ClusterControl and Galera Cluster on Docker Swarm. It is noteworthy that this tool is open source so you can freely use or contribute to it.

How to get the s9s CLI

The CLI can be installed by adding the s9s tools repository and using a package manager, or compiled from source. The current installation script to install ClusterControl, install-cc, will automatically install the command line client. The command line client can also be installed on another computer or workstation for remote management. More information in our documentation.

Some of the things you can do from the CLI:

  • Deploy and manage database clusters
    • MySQL and MariaDB
    • PostgreSQL
    • MongoDB to be added soon
    • TimescaleDB
  • Monitor your databases
    • Status of nodes and clusters
    • Cluster properties can be extracted
    • Gives detailed enough information about your clusters
  • Manage your systems and integrate with DevOps tools
    • Create, stop or start clusters
    • Add, remove, or restart nodes in the cluster
    • Create database users (CREATE USER, GRANT privileges to user)
      • Users created in the CLI are traceable through the system
    • Create load balancers (HAProxy, ProxySQL)
    • Create and Restore backups
    • Use maintenance mode
    • Conduct configuration changes of db nodes
    • Integrate with existing deployment automation
      • Ansible, Puppet, Chef or Salt, …
    • Integrate with chatbots like Slack, FlowDock and Hipchat

Automating your Galera Cluster Setup

In our previous blog, we have discussed about the flow of Chef in which you have to setup your workstation, Chef server, and the nodes/clients. Now, let's look at first the diagram for this setup on automating our Galera Cluster setup. Check out below:

The workstation serves here as your development machine, where you write your code. Then push the cookbooks/recipes to the Chef Server, which runs them in the target node which is the ClusterControl in this setup. This target ClusterControl must be a clean/dedicated host. As mentioned earlier, we'll be using s9s-tools to leverage the installation and setup of Galera nodes without writing lots of lines of code. Instead, code it like a boss.

Here are the prerequisites:

  • Workstation/Node
  • Chef Server
  • ClusterControl controller/server - The controller here is a requirement for our s9s CLI to operate. Our community version lets you deploy your database clusters.
  • 3-galera nodes. For this setup, I have the following IP's used: 192.168.70.70,192.168.70.80,192.168.70.100
  • Setup your ClusterControl OS user public keys to all of the targeted Galera nodes to avoid further SSH errors.

Integrating ClusterControl into your automation tools

Installing ClusterControl can be done in several ways. You can use the package manager such as your favorite yum or apt and use our repository, you can use install-cc, or you can use our automation scripts for Puppet, Ansible, or Chef.

For the purpose of this blog, we will use the S9s_cookbook and integrate the automation process for our Galera Cluster setup. There are two ways to utilize the S9s_cookbook. You can use the github https://github.com/severalnines/S9s_cookbooks or through the marketplace using knife. We'll use the marketplace.

  1. In your workstation, download the cookbook using Chef's knife tool, and uncompress the tar ball.

    $ cd ~/dba-tasks-repo/cookbooks
    $ knife cookbook site download clustercontrol
    $ tar -xzvf clustercontrol-*
    $ unlink clusercontrol-*.tar.gz
  2. Then run the s9s_helper.sh located in app the clustercontrol cookbook.

    $ cd ~/dba-tasks-repo/cookbooks/clustercontrol/files/default
    $ ./s9s_helper.sh

    For example, you'll see the following as you run the script:

    [vagrant@node2 default]$ ./s9s_helper.sh 
    ==============================================
    Helper script for ClusterControl Chef cookbook
    ==============================================
    
    ClusterControl will install a MySQL server and setup the MySQL root user.
    Enter the password for MySQL root user [password] : R00tP@55
    
    ClusterControl will create a MySQL user called 'cmon' for automation tasks.
    Enter the password for user cmon [cmon] : cm0nP@55
    
    Generating config.json..
    {
        "id" : "config",
        "mysql_root_password" : "R00tP@55",
        "cmon_password" : "cm0nP@55",
        "clustercontrol_api_token" : "f38389ba5d1cd87a1aa5f5b1c15b3ca0ee5a2b0f"
    }
    
    Data bag file generated at /home/vagrant/dba-tasks-repo/cookbooks/clustercontrol/files/default/config.json
    To upload the data bag, you can use following command:
    $ knife data bag create clustercontrol
    $ knife data bag from file clustercontrol /home/vagrant/dba-tasks-repo/cookbooks/clustercontrol/files/default/config.json
    
    ** We highly recommend you to use encrypted data bag since it contains confidential information **
  3. Then create a data bag as per instruction in the last output of the s9s_helper.sh script,

    [vagrant@node2 clustercontrol]$ knife data bag create clustercontrol
    Created data_bag[clustercontrol]
    [vagrant@node2 clustercontrol]$ knife data bag from file clustercontrol ~/dba-tasks-repo/cookbooks/clustercontrol/files/default/config.json 
    Updated data_bag_item[clustercontrol::config]
  4. Before you upload to the Chef Server, ensure that you have the similar contents in your templates/default/configure_cmon_db.sql.erb as follows:

    SET SQL_LOG_BIN=0;
    GRANT ALL PRIVILEGES ON *.* TO 'cmon'@'127.0.0.1' IDENTIFIED BY '<%= node['cmon']['mysql_password'] %>' WITH GRANT OPTION;
    GRANT ALL PRIVILEGES ON *.* TO 'cmon'@'localhost' IDENTIFIED BY '<%= node['cmon']['mysql_password'] %>' WITH GRANT OPTION;
    GRANT ALL PRIVILEGES ON *.* TO 'cmon'@'<%= node['ipaddress'] %>' IDENTIFIED BY '<%= node['cmon']['mysql_password'] %>' WITH GRANT OPTION;
    GRANT ALL PRIVILEGES ON *.* TO 'cmon'@'<%= node['fqdn'] %>' IDENTIFIED BY '<%= node['cmon']['mysql_password'] %>' WITH GRANT OPTION;
    REPLACE INTO dcps.apis(id, company_id, user_id, url, token) VALUES (1, 1, 1, 'http://127.0.0.1/cmonapi', '<%= node['api_token'] %>');
    FLUSH PRIVILEGES;
  5. Ensure in file recipes/controller.erb also that the cmon service will be restarted. See below:

    service "cmon" do
        supports :restart => true, :start => true, :stop => true, :reload => true
        action [ :enable, :restart ]
    end
  6. Upload the cookbook to the Chef Server

    $ ~/dba-tasks-repo/cookbooks/
    $ knife cookbook upload clustercontrol
  7. Alternatively, you can create roles to attach the role to the node. We'll use roles again just like in our previous blog. You can do the following for example:

    1. Create a file called cc_controller.rb in path ~/dba-tasks-repo/cookbooks/clustercontrol with the following contents:

      name "cc_controller"
      description "ClusterControl Controller"
      run_list ["recipe[clustercontrol]"]
    2. Create a role from the file we created as follows:

      [vagrant@node2 clustercontrol]$ knife role from file cc_controller.rb 
      Updated Role cc_controller
    3. Assign the roles to the target nodes/client as follows:

      $ knife node run_list add <cluster_control_host> "role[cc_controller]"

      Where <cluster_control_host> is your ClusterControl controller's hostname.

    4. Verify what nodes the role is attached to and its run list. For example, I have the following:

      [vagrant@node2 clustercontrol]$ knife role show cc_controller
      chef_type:           role
      default_attributes:
      description:         ClusterControl Controller
      env_run_lists:
      json_class:          Chef::Role
      name:                cc_controller
      override_attributes:
      run_list:
        recipe[clustercontrol]
        recipe[db_galera_install]

Now, we're not yet finished. We'll proceed on integrating our Galera Cluster automation Chef cookbook.

ClusterControl
Single Console for Your Entire Database Infrastructure
Find out what else is new in ClusterControl

Writing our Chef Cookbook for MySQL Galera Cluster

As mentioned earlier, we will be integrating s9s CLI into our automation code. Let's proceed with the steps.

  1. Let's generate a cookbook. Let's name it as db_galera_install,

    $ cd ~/dba-tasks-repo/cookbooks/
    $ chef generate cookbook db_galera_install
  2. Let's also generate the attribute file,

    $  chef generate attribute default
  3. Go to attributes/default.rb and add the following contents in the file

    default['cmon']['s9s_bin'] = '/bin/s9s'
    default['cmon']['galera_cluster_name'] = 'PS-galera_cluster'
    default['cmon']['db_pass'] = 'R00tP@55'
  4. Go to recipes/default.rb and add the following contents in the file

    galera_cluster_name = "#{node['cmon']['galera_cluster_name']}"
    s9s_bin = "#{node['cmon']['s9s_bin']}"
    
    cc_config = data_bag_item('clustercontrol','config')
    db_pass = cc_config['mysql_root_password']
    
    
    bash "install-galera-nodes" do
      user "root"
      code <<-EOH
      #{s9s_bin} cluster --create --cluster-type=galera --nodes="192.168.70.70,192.168.70.80,192.168.70.100" \
        --vendor=percona \
        --provider-version=5.7 \
        --db-admin-passwd='#{db_pass}' \
        --os-user=vagrant \
        --cluster-name='#{galera_cluster_name}' \
        --wait \
        --log
      EOH
    
      not_if "[[ ! -z $(#{s9s_bin} cluster --list --cluster-format='%I' --cluster-name '#{galera_cluster_name}') ]]"
    end

    A little bit of explanation about the code. It uses the s9s cluster --create command to create a Galera type of cluster. The nodes are specified using its IP addresses within the --nodes argument. We also use the same password setup from the ClusterControl database using the current data bag named clustercontrol. Hence, you can initiate another data bag as preferred. The rest are self-explainable but you can check here in our documentation.
    Lastly, the conditional statement not_if part is very important. It determines that the bash resource, which handles the setup for the Galera cluster, will not be invoked once the Galera Cluster named PS-galera_cluster has been provisioned.

  5. Since we have it setup, we'll then upload it to the Chef server as follows:

    $ ~/dba-tasks-repo/cookbooks/
    $ knife cookbook upload db_galera_install
  6. Let's verify the list of roles and then add it to the role we have setup earlier namely cc_controller

    [vagrant@node2 cookbooks]$ knife role list
    cc_controller
    pg_nodes

    Then edit the role by running the command below:

    $ export EDITOR=vi; 
    $ knife role edit cc_controller

    You might have something look like as follows,

    {
      "name": "cc_controller",
      "description": "ClusterControl Controller",
      "json_class": "Chef::Role",
      "default_attributes": {
    
      },
      "override_attributes": {
    
      },
      "chef_type": "role",
      "run_list": [
        "recipe[clustercontrol]",
        "recipe[db_galera_install]"
      ],
      "env_run_lists": {
    
      }
    }

    You must ensure that the run_list must have the following,

    "recipe[clustercontrol]",
    "recipe[db_galera_install]"

That's all and we are ready to roll!

Executing the Runbooks

We're done preparing both ClusterControl and our Galera Cluster cookbooks ready to be tested. We'll proceed on running it and show the results of our Chef automation.

Go to the target node/client. In my end, I use node9 with IP 192.168.70.90. Then run the command below,

$ sudo chef-client -l info

In my client node, this shows as follows:

Setting and Installing the ClusterControl Server
Setting and Installing the ClusterControl Server
Installing the Galera Cluster Nodes at Host 192.168.70.70
Installing the Galera Cluster Nodes at Host 192.168.70.70
Installing the Galera Cluster Nodes at Host 192.168.70.80
Installing the Galera Cluster Nodes at Host 192.168.70.80

Once it's all done, you'll have all your ClusterControl setup and the Galera Cluster nodes with ease!

Lastly, here's a screenshot of our Galera Cluster within ClusterControl:

ClusterControl Home Page
ClusterControl Home Page
Our Galera Overview Dashboard
Our Galera Overview Dashboard

Conclusion

We have showed you how easy it is to integrate our s9s CLI tool for your common database activities. This is very helpful in an organization where Chef is used as the de-facto automation tool. s9s can make you more productive by automating your daily database administrative work.

How to Setup Asynchronous Replication Between MariaDB Galera Clusters

$
0
0

Galera Cluster, with its (virtually) synchronous replication, is commonly used in many different types of environments. Scaling it by adding new nodes is not hard (or just as simple a couple of clicks when you use ClusterControl).

The main problem with synchronous replication is, well, the synchronous part which often results in that the whole cluster being only as fast as its slowest node. Any write executed on a cluster has to be replicated to all of the nodes and certified on them. If, for whatever reason, this process slows down, it can seriously impact the cluster’s ability to accommodate writes. Flow control will then kick in, this is in order to ensure that the slowest node can still keep up with the load. This makes it quite tricky for some of the common scenarios that happen in a real world environment.

First off, let’s discuss geographically distributed disaster recovery. Sure, you can run clusters across a Wide Area Network, but the increased latency will have a significant impact on the cluster’s performance. This seriously limits the ability of using such a setup, especially over longer distances when latency is higher.

Another quite common use case - a test environment for major version upgrade. It is not a good idea to mix different versions of MariaDB Galera Cluster nodes in the same cluster, even if it is possible. On the other hand, migration to the more recent version requires detailed tests. Ideally, both reads and writes would have been tested. One way to achieve that is to create a separate Galera cluster and run the tests, but you would like to run tests in an environment as close to production as possible. Once provisioned, a cluster can be used for tests with real world queries but it would be hard to generate a workload which would be close to that of production. You cannot move some of the production traffic to such test system, this is because the data is not current.

Finally, the migration itself. Again, what we said earlier, even if it is possible to mix old and new versions of Galera nodes in the same cluster, it is not the safest way to do it.

Luckily, the simplest solution for all those three issues would be to connect separate Galera clusters with an asynchronous replication. What makes it such a good solution? Well, it’s asynchronous which makes it not affect the Galera replication. There is no flow control, thus the performance of the “master” cluster will not be affected by the performance of the “slave” cluster. As with every asynchronous replication, a lag may show up, but as long as it stays within acceptable limits, it can work perfectly fine. You also have to keep in mind that nowadays asynchronous replication can be parallelized (multiple threads can work together to increase bandwidth) and reduce replication lag even further.

In this blog post we will discuss what are the steps to deploy asynchronous replication between MariaDB Galera clusters.

How to Configure Asynchronous Replication Between MariaDB Galera Clusters?

First off we have to have to deploy a cluster. For our purposes we setup a three node cluster. We will keep the setup to the minimum, thus we will not discuss the complexity of the application and proxy layer. Proxy layer may be very useful for handling tasks for which you want to deploy asynchronous replication - redirecting a subset of the read-only traffic to the test cluster, helping in the disaster recovery situation when the “main” cluster is not available by redirecting the traffic to the DR cluster. There are numerous proxies you can try, depending on your preference - HAProxy, MaxScale or ProxySQL - all can be used in such setups and, depending on the case, some of them may be able to help you manage your traffic.

Configuring the Source Cluster

Our cluster consists of three MariaDB 10.3 nodes, we also deployed ProxySQL to do the read-write split and distribute the traffic across all nodes in the cluster. This is not a production-grade deployment, for that we would have to deploy more ProxySQL nodes and a Keepalived on top of them. It is still enough for our purposes. To set up asynchronous replication we will have to have a binary log enabled on our cluster. At least one node but it’s better to keep it enabled on all of them in case the only node with binlog enabled go down - then you want to have another node in the cluster up and running that you can slave off.

When enabling binary log, make sure that you configure the binary log rotation so the old logs will be removed at some point. You will use ROW binary log format. You should also ensure that you have GTID configured and in use - it will come very handy when you would have to reslave your “slave” cluster or if you would need to enable multi-threaded replication. As this is a Galera cluster, you want to have ‘wsrep_gtid_domain_id’ configured and ‘wsrep_gtid_mode’ enabled. Those settings will ensure that GTID’s will be generated for the traffic coming from the Galera cluster. More information can be found in the documentation. Once this is all done, you can proceed with setting up the second cluster.

Setting Up the Target Cluster

Given that currently there is no target cluster, we have to start with deploying it. We will not cover those steps in detail, you can find instructions in the documentation. Generally speaking the process consists of several steps:

  1. Configure MariaDB repositories
  2. Install MariaDB 10.3 packages
  3. Configure nodes to form a cluster

At the beginning we will start with just one node. You can setup all of them to form a cluster but then you should stop them and use just one for the next step. That one node will become a slave to the original cluster. We will use mariabackup to provision it. Then we will configure the replication.

First, we have to create a directory where we will store the backup:

mkdir /mnt/mariabackup

Then we execute the backup and create it in the directory prepared in the step above. Please make sure you use the correct user and password to connect to the database:

mariabackup --backup --user=root --password=pass --target-dir=/mnt/mariabackup/

Next, we have to copy the backup files to the first node in the second cluster. We used scp for that, you can use whatever you like - rsync, netcat, anything which will work.

scp -r /mnt/mariabackup/* 10.0.0.104:/root/mariabackup/

After the backup has been copied, we have to prepare it by applying the log files:

mariabackup --prepare --target-dir=/root/mariabackup/
mariabackup based on MariaDB server 10.3.16-MariaDB debian-linux-gnu (x86_64)
[00] 2019-06-24 08:35:39 cd to /root/mariabackup/
[00] 2019-06-24 08:35:39 This target seems to be not prepared yet.
[00] 2019-06-24 08:35:39 mariabackup: using the following InnoDB configuration for recovery:
[00] 2019-06-24 08:35:39 innodb_data_home_dir = .
[00] 2019-06-24 08:35:39 innodb_data_file_path = ibdata1:100M:autoextend
[00] 2019-06-24 08:35:39 innodb_log_group_home_dir = .
[00] 2019-06-24 08:35:39 InnoDB: Using Linux native AIO
[00] 2019-06-24 08:35:39 Starting InnoDB instance for recovery.
[00] 2019-06-24 08:35:39 mariabackup: Using 104857600 bytes for buffer pool (set by --use-memory parameter)
2019-06-24  8:35:39 0 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins
2019-06-24  8:35:39 0 [Note] InnoDB: Uses event mutexes
2019-06-24  8:35:39 0 [Note] InnoDB: Compressed tables use zlib 1.2.8
2019-06-24  8:35:39 0 [Note] InnoDB: Number of pools: 1
2019-06-24  8:35:39 0 [Note] InnoDB: Using SSE2 crc32 instructions
2019-06-24  8:35:39 0 [Note] InnoDB: Initializing buffer pool, total size = 100M, instances = 1, chunk size = 100M
2019-06-24  8:35:39 0 [Note] InnoDB: Completed initialization of buffer pool
2019-06-24  8:35:39 0 [Note] InnoDB: page_cleaner coordinator priority: -20
2019-06-24  8:35:39 0 [Note] InnoDB: Starting crash recovery from checkpoint LSN=3448619491
2019-06-24  8:35:40 0 [Note] InnoDB: Starting final batch to recover 759 pages from redo log.
2019-06-24  8:35:40 0 [Note] InnoDB: Last binlog file '/var/lib/mysql-binlog/binlog.000003', position 865364970
[00] 2019-06-24 08:35:40 Last binlog file /var/lib/mysql-binlog/binlog.000003, position 865364970
[00] 2019-06-24 08:35:40 mariabackup: Recovered WSREP position: e79a3494-964f-11e9-8a5c-53809a3c5017:25740

[00] 2019-06-24 08:35:41 completed OK!

In case of any error you may have to re-execute the backup. If everything went ok, we can remove the old data and replace it with the backup information

rm -rf /var/lib/mysql/*
mariabackup --copy-back --target-dir=/root/mariabackup/
…
[01] 2019-06-24 08:37:06 Copying ./sbtest/sbtest10.frm to /var/lib/mysql/sbtest/sbtest10.frm
[01] 2019-06-24 08:37:06         ...done
[00] 2019-06-24 08:37:06 completed OK!

We also want to set the correct owner of the files:

chown -R mysql.mysql /var/lib/mysql/

We will be relying on GTID to keep the replication consistent thus we need to see what was the last applied GTID in this backup. That information can be found in xtrabackup_info file that’s part of the backup:

root@vagrant:~/mariabackup# cat /var/lib/mysql/xtrabackup_info | grep binlog_pos
binlog_pos = filename 'binlog.000003', position '865364970', GTID of the last change '9999-1002-23012'

We will also have to ensure that the slave node has binary logs enabled along with ‘log_slave_updates’. Ideally, this will be enabled on all of the nodes in the second cluster - just in case the “slave” node failed and you would have to set up the replication using another node in the slave cluster.

The last bit we need to do before we can set up the replication is to create an user which we will use to run the replication:

MariaDB [(none)]> CREATE USER 'repuser'@'10.0.0.104' IDENTIFIED BY 'reppass';
Query OK, 0 rows affected (0.077 sec)
MariaDB [(none)]> GRANT REPLICATION SLAVE ON *.*  TO 'repuser'@'10.0.0.104';
Query OK, 0 rows affected (0.012 sec)

That’s all we need. Now, we can start the first node in the second cluster, our to-be-slave:

galera_new_cluster

Once it’s started, we can enter MySQL CLI and configure it to become a slave, using the GITD position we found couple steps earlier:

mysql -ppass
MariaDB [(none)]> SET GLOBAL gtid_slave_pos = '9999-1002-23012';
Query OK, 0 rows affected (0.026 sec)

Once that’s done, we can finally set up the replication and start it:

MariaDB [(none)]> CHANGE MASTER TO MASTER_HOST='10.0.0.101', MASTER_PORT=3306, MASTER_USER='repuser', MASTER_PASSWORD='reppass', MASTER_USE_GTID=slave_pos;
Query OK, 0 rows affected (0.016 sec)
MariaDB [(none)]> START SLAVE;
Query OK, 0 rows affected (0.010 sec)

At this point we have a Galera Cluster consisting of one node. That node is also a slave of the original cluster (in particular, its master is node 10.0.0.101). To join other nodes we will use SST but to make it work first we have to ensure that SST configuration is correct - please keep in mind that we just replaced all the users in our second cluster with the contents of the source cluster. What you have to do now is to ensure that ‘wsrep_sst_auth’ configuration of the second cluster matches the one of the first cluster. Once that’s done, you can start remaining nodes one by one and they should join the existing node (10.0.0.104), get the data over SST and form the Galera cluster. Eventually, you should end up with two clusters, three node each, with asynchronous replication link across them (from 10.0.0.101 to 10.0.0.104 in our example). You can confirm that the replication is working by checking the value of:

MariaDB [(none)]> show global status like 'wsrep_last_committed';
+----------------------+-------+
| Variable_name        | Value |
+----------------------+-------+
| wsrep_last_committed | 106   |
+----------------------+-------+
1 row in set (0.001 sec)
MariaDB [(none)]> show global status like 'wsrep_last_committed';
+----------------------+-------+
| Variable_name        | Value |
+----------------------+-------+
| wsrep_last_committed | 114   |
+----------------------+-------+
1 row in set (0.001 sec)

How to Configure Asynchronous Replication Between MariaDB Galera Clusters Using ClusterControl?

As of the time of this blog, ClusterControl does not have the functionality to configure asynchronous replication across multiple clusters, we are working on it as I type this. Nonetheless ClusterControl can be of great help in this process - we will show you how you can speed up the laborious manual steps using automation provided by ClusterControl.

From what we showed before, we can conclude that those are the general steps to take when setting up replication between two Galera clusters:

  1. Deploy a new Galera cluster
  2. Provision new cluster using data from the old one
  3. Configure new cluster (SST configuration, binary logs)
  4. Set up the replication between the old and the new cluster

First three points are something you can easily do using ClusterControl even now. We are going to show you how to do that.

Deploy and Provision a New MariaDB Galera Cluster Using ClusterControl

The initial situation is similar - we have one cluster up and running. We have to set up the second one. One of the more recent features of ClusterControl is an option to deploy a new cluster and provision it using the data from backup. This is very useful to create test environments, it is also an option we will use to provision our new cluster for the replication setup. Therefore the first step we will take is to create a backup using mariabackup:

Three steps in which we picked the node to take the backup off it. This node (10.0.0.101) will become a master. It has to have binary logs enabled. In our case all of the nodes have binlog enabled but if they hadn’t it’s very easy to enable it from the ClusterControl - we will show the steps later, when we will do it for the second cluster.

Once the backup is completed, it will become visible on the list. We can then proceed and restore it:

Should we want that, we could even do the Point-In-Time Recovery, but in our case it does not really matter: once the replication will be configured, all required transactions from binlogs will be applied on the new cluster.

Then we pick the option to create a cluster from the backup. This opens another dialog:

It is a confirmation which backup will be used, which host the backup was taken from, what method was used to create it and some metadata to help verify if the backup looks sound.

Then we basically go to the regular deployment wizard in which we have to define SSH connectivity between ClusterControl host and the nodes to deploy the cluster on (the requirement for ClusterControl) and, in the second step, vendor, version, password and nodes to deploy on:

That’s all regarding deployment and provisioning. ClusterControl will set up the new cluster and it will provision it using the data from the old one.

We can monitor the progress in the activity tab. Once completed, second cluster will show up on the cluster list in ClusterControl.

Reconfiguration of the New Cluster Using ClusterControl

Now, we have to reconfigure the cluster - we will enable binary logs. In the manual process we had to make changes in the wsrep_sst_auth config and also configuration entries in [mysqldump] and [xtrabackup] sections of the config. Those settings can be found in secrets-backup.cnf file. This time it is not needed as ClusterControl generated new passwords for the cluster and configured the files correctly. What is important to keep in mind, though, should you change the password of the ‘backupuser’@’127.0.0.1’ user in the original cluster, you will have to make configuration changes in the second cluster too to reflect that as changes in the first cluster will replicate to the second cluster.

Binary logs can be enabled from the Nodes section. You have to pick node by node and run “Enable Binary Logging” job. You will be presented with a dialog:

Here you can define how long you would like to keep the logs, where they should be stored and if ClusterControl should restart the node for you to apply changes - binary log configuration is not dynamic and MariaDB has to be restarted to apply those changes.

When the changes will complete you will see all nodes marked as “master”, which means that those nodes have binary log enabled and can act as master.

If we do not have replication user created already, we have to do that. In the first cluster we have to go to Manage -> Schemas and Users:

On the right hand side we have an option to create a new user:

This concludes the configuration required to set up the replication.

Setting up replication between clusters using ClusterControl

As we stated, we are working on automating this part. Currently it has to be done manually. As you may remember, we need GITD position of our backup and then run couple commands using MySQL CLI. GTID data is available in the backup. ClusterControl creates backup using xbstream/mbstream and it compresses it afterwards. Our backup is stored on the ClusterControl host where we don’t have access to mbstream binary. You can try to install it or you can copy the backup file to the location, where such binary is available:

scp /root/backups/BACKUP-2/ backup-full-2019-06-24_144329.xbstream.gz 10.0.0.104:/root/mariabackup/

Once that’s done, on 10.0.0.104 we want to check the contents of xtrabackup_info file:

cd /root/mariabackup
zcat backup-full-2019-06-24_144329.xbstream.gz | mbstream -x
root@vagrant:~/mariabackup# cat /root/mariabackup/xtrabackup_info | grep binlog_pos
binlog_pos = filename 'binlog.000007', position '379', GTID of the last change '9999-1002-846116'

Finally, we configure the replication and start it:

MariaDB [(none)]> SET GLOBAL gtid_slave_pos ='9999-1002-846116';
Query OK, 0 rows affected (0.024 sec)
MariaDB [(none)]> CHANGE MASTER TO MASTER_HOST='10.0.0.101', MASTER_PORT=3306, MASTER_USER='repuser', MASTER_PASSWORD='reppass', MASTER_USE_GTID=slave_pos;
Query OK, 0 rows affected (0.024 sec)
MariaDB [(none)]> START SLAVE;
Query OK, 0 rows affected (0.010 sec)

This is it - we just configured asynchronous replication between two MariaDB Galera clusters using ClusterControl. As you could have seen, ClusterControl was able to automate the majority of the steps we had to take in order to set up this environment.

Announcing ClusterControl 1.7.3: Improved Support PostgreSQL & New Cloud Deployment Options

$
0
0

We’re excited to announce the 1.7.3 release of ClusterControl - the only database management system you’ll ever need to take control of your open source database infrastructure. 

In this release we have added support for running multiple PostgreSQL instances on the same server as well as improvements to PgBackRest to continue our expanding support for PostgreSQL environments.

We have also added additional cluster types to our cloud deployment and support for scaling out cloud deployed clusters with automated instance creation. You can now deploy MySQL Replication, PostgreSQL, and TimeScaleDB clusters from ClusterControl onto Amazon AWS, Google Cloud Platform, and Microsoft Azure.

Release Highlights

PostgreSQL Improvements

  • Manage multiple PostgreSQL instances on the same host
  • Improvements to our support for pgBackRest by adding non-standard instance ports and custom stanzas
  • New PostgreSQL Configuration Management page to manage your database configuration files
  • Newly added metrics allowing you to monitor PostgreSQL Logical Replication setups

Improved Cloud Integration

  • Automatically launch a cloud instance and scale out your database cluster by adding a new DB node (Galera) or replication slave (Replication).
  • Deploy the following new replication database clusters:
    • Oracle MySQL Server 8.0
    • Percona Server 8.0
    • MariaDB Server 10.3
    • PostgreSQL 11.0 (Streaming Replication)
    • TimescaleDB 11.0 (Streaming Replication)

Additional Improvements

  • Backup verification jobs with xtrabackup can use the --use-memory parameter to limit the memory usage.
  • A running backup verification server now shows up in the Topology viewer
  • MongoDB sharded clusters can now add or register an existing MongoDB configuration node
  • Improved Configuration Management for MySQL, MongoDB, and MySQL NDB Cluster.
  • Improved Email Notification Settings
  • New Performance->Transaction Logs
  • Code Clean-up: legacy ExtJS pages have been migrated to AngularJS
  • CMON API Depreciation: The clustercontrol-cmonapi package is deprecated from now on as it is no longer required for ClusterControl operations

View Release Details and Resources

Release Details

Running Multiple PostgreSQL Instances from a Single Host

Saving money and resources is something every SysAdmin and DBA is looking to do. One of the ways this can be achieved is by leveraging the same hardware to run multiple database instances, allow the operating system (not the server) handle the traffic routing virtually. In this release we are enabling support for this type of setup for PostgreSQL. Look for more information soon on how this can be achieved.

New Cloud Deployment Options

ClusterControl has offered the ability to deploy databases (and backup databases) in the cloud since ClusterControl 1.6. In this new version we have expanded this functionality to include new database types, rounding out support for the most popular open source databases on the market. With this release ClusterControl can now deploy…

  • MySQL
  • MySQL Galera Cluster
  • MySQL Replication
  • MariaDB
  • PostgreSQL Streaming Replication
  • Percona Server
  • TimescaleDB Streaming Replication
  • MongoDB Replica Set
Photo author
Photo description

How to Automate Database Failover with ClusterControl

$
0
0

Recovery Time Objective (RTO) is the time period within which a service must be restored to avoid unacceptable consequences. By calculating how long it can take to recover from a database failure, we can know what the level of preparation required. If RTO is a few minutes, then significant investment in failover is required. An RTO of 36 hours requires a significantly lower investment. This is where failover automation comes in.

In our previous blogs, we have discussed failover for MongoDB, MySQL/MariaDB/Percona, PostgreSQL or TimeScaleDB. To sum it up, "Failover" is the ability of a system to continue functioning even if some failure occurs. It suggests that the functions of the system are assumed by secondary components if the primary components fail. Failover is a natural part of any high availability system, and in some cases, it even has to be automated. Manual failovers take just too long, but there are cases where automation will not work well - for instance in case of a split brain where database replication is broken and the two ‘halves’ keep receiving updates, effectively leading to diverging data sets and inconsistency.

We previously wrote about the guiding principles behind ClusterControl automatic failover procedures. Where possible, automated failover provides efficiency as it enables quick recovery from failures. In this blog, we’ll look at how to achieve automatic failover in a master-slave (or primary-standby) replication setup using ClusterControl.

Technology Stack Requirements

A stack can be assembled from Open Source Software components, and there are a number of options available - some more appropriate than others depending on failover characteristics and also level of expertise available for managing and maintaining the solution. Hardware and networking are also important aspects.

Software

There are lots of options available in the open source ecosystem that you can use to implement failover. For MySQL, you can take advantage of MHA, MMM, Maxscale/MRM, mysqlfailover, or Orchestrator. This previous blog compares MaxScale to MHA to Maxscale/MRM. PostgreSQL has repmgr, Patroni, PostgreSQL Automatic Failover (PAF), pglookout, pgPool-II, or stolon. These different high availability options were covered previously. MongoDB has replica sets with support for automated failover.

ClusterControl provides automatic failover functionality for MySQL, MariaDB, PostgreSQL and MongoDB, which we will cover further down. Worth to note that it also has functionality to automatically recover broken nodes or clusters.

Hardware

Automatic failover is typically performed by a separate daemon server that is setup on its own hardware - separate from the database nodes. It is monitoring the status of the databases, and uses the information to make decisions on how to react in case of failure.

Commodity servers can work fine, unless the server is monitoring a huge number of instances. Typically, system checks and health analysis are lightweight in terms of processing. However, if you have a large number of nodes to check, large CPU and memory is a must especially when checks have to be queued up as it tries to ping and collect information from servers. The nodes being monitored and supervised might stall sometimes due to network issues, high load, or at worse case, they might be down due to a hardware failure or some VM host corruption. So the server that runs the health and system checks shall be able to withstand such stalls, as chances are that processing of queues can go up as responses to each of the nodes monitored can take time until verified that it's no longer available or a timeout has been reached.

For cloud-based environments, there are services that offer automatic failover. For instance, Amazon RDS uses DRBD to replicate storage to a standby node. Or if you are storing your volumes in EBS, these are replicated in multiple zones.

Network

Automated failover software often relies on agents that are setup on the database nodes. The agent harvests information locally from the database instance and sends it to the server, whenever requested.

In terms of network requirements, make sure that you have good bandwidth and a stable network connection. Checks need to be done frequently, and missed heartbeats because of an unstable network may lead to the failover software to (wrongly) deduce that a node is down.

ClusterControl does not require any agent installed on the database nodes, as it will SSH into each database node at regular intervals and perform a number of checks.

Automated Failover with ClusterControl

ClusterControl offers the ability to perform manual as well as automated failovers. Let’s see how this can be done.

Failover in ClusterControl can be configured to be automatic or not. If you prefer to take care of failover manually, you can disable automated cluster recovery. When doing a manual failover, you can go to Cluster → Topology in ClusterControl. See the screenshot below:

By default, cluster recovery is enabled and automated failover is used. Once you make changes in the UI, the runtime configuration is changed. If you would like the setting to survive a restart of the controller, thenmake sure you also make the change in the cmon configuration, i.e. /etc/cmon.d/cmon_<cluster_id>.cnf, and set enable_cluster_autorecovery to ‘0’. See image below for an example of enabling the automatic recovery:

In MySQL/MariaDB/Percona server, automatic failover is initiated by ClusterControl when it detects that there is no host with read_only flag disabled. It can happen because master (which has read_only set to 0) is not available or it can be triggered by a user or some external software that changed this flag on the master. If you do manual changes to the database nodes or have software that may fiddle with the read_only settings, then you should disable automatic failover. ClusterControl's automated failover is attempted only once, therefore, a failed failover will not be followed again by a subsequent failover - not until cmon is restarted.

For PostgreSQL, ClusterControl will pick the most advanced slave, using for this purpose the pg_current_xlog_location (PostgreSQL 9+) or pg_current_wal_lsn (PostgreSQL 10+) depending on the version of our database. ClusterControl also performs several checks over the failover process, in order to avoid some common mistakes. One example is that if we manage to recover our old failed master, it will "not" be reintroduced automatically to the cluster, neither as a master nor as a slave. We need to do it manually. This will avoid the possibility of data loss or inconsistency in the case that our slave (that we promoted) was delayed at the time of the failure. We might also want to analyze the issue in detail before re-introducing it to the replication setup, so we would want to preserve diagnostic information.

Also, if failover fails, no further attempts are made (this applies to both PostgreSQL and MySQL-based clusters), manual intervention is required to analyze the problem and perform the corresponding actions. This is to avoid the situation where ClusterControl, which handles the automatic failover, tries to promote the next slave and the next one. There might be a problem, and we do not want to make things worse by attempting multiple failovers.

ClusterControl offers whitelisting and blacklisting of a set of servers that you want to participate in the failover, or exclude as candidate.

For MySQL-type clusters, ClusterControl builds a list of slaves which can be promoted to master. Most of the time, it will contain all slaves in the topology but the user has some additional control over it. There are two variables you can set in the cmon configuration:

replication_failover_whitelist

and

replication_failover_blacklist

For the configuration variable replication_failover_whitelist, it contains a list of IP’s or hostnames of slaves which should be used as potential master candidates. If this variable is set, only those hosts will be considered. For variable replication_failover_blacklist, it contains list of hosts which will never be considered a master candidate. You can use it to list slaves that are used for backups or analytical queries. If the hardware varies between slaves, you may want to put here the slaves which use slower hardware.

replication_failover_whitelist takes precedence, meaning the replication_failover_blacklist is ignored if replication_failover_whitelist is set.

Once the list of slaves which may be promoted to master is ready, ClusterControl starts to compare their state, looking for the most up to date slave. Here, the handling of MariaDB and MySQL-based setups differs. For MariaDB setups, ClusterControl picks a slave which has the lowest replication lag of all slaves available. For MySQL setups, ClusterControl picks such a slave as well but then it checks for additional, missing transactions which could have been executed on some of the remaining slaves. If such a transaction is found, ClusterControl slaves the master candidate off that host in order to retrieve all missing transactions. You can skip this process and just use the most advanced slave by setting variable replication_skip_apply_missing_txs in your CMON configuration:

e.g.

replication_skip_apply_missing_txs=1

Check our documentation here for more information with variables.

Caveat is that you must only set this if you know what you are doing, as there might be errant transactions. These might cause replication to break, as well as data inconsistency across the cluster. If the errant transaction happened way in the past, it may no longer available in binary logs. In that case, replication will break because slaves won’t be able to retrieve the missing data. Therefore, ClusterControl, by default, checks for any errant transactions before it promotes a master candidate to become a master. If such problem is detected, the master switch is aborted and ClusterControl lets the user fix the problem manually.

If you want to be 100% certain that ClusterControl will promote a new master even if some issues are detected, you can do that using the replication_stop_on_error variable. See below:

e.g.

replication_stop_on_error=0

Set this variable in your cmon configuration file. As mentioned earlier, it may lead to problems with replication as slaves may start asking for a binary log event which is not available anymore. To handle such cases we added experimental support for slave rebuilding. If you set the variable

replication_auto_rebuild_slave=1

in the cmon configuration and if your slave is marked as down with the following error in MySQL:

Got fatal error 1236 from master when reading data from binary log: 'The slave is connecting using CHANGE MASTER TO MASTER_AUTO_POSITION = 1, but the master has purged binary logs containing GTIDs that the slave requires.'

ClusterControl will attempt to rebuild the slave using data from the master. Such a setting may not always be appropriate as the rebuilding process will induce an increased load on the master. It may also be that your dataset is very large and a regular rebuild is not an option - that’s why this behavior is disabled by default.

Once we ensure that no errant transaction exists and we are good to go, there is still one more issue we need to handle somehow - it may happen that all slaves are lagging behind the master.

As you probably know, replication in MySQL works in a rather simple way. The master stores writes in binary logs. The slave’s I/O thread connects to the master and pulls any binary log events it is missing. It then stores them in the form of relay logs. The SQL thread parses them and applies events. Slave lag is a condition in which SQL thread (or threads) cannot cope with the number of events, and is unable to apply them as soon as they are pulled from the master by the I/O thread. Such situation may happen no matter what type of replication you are using. Even if you use semi-sync replication, it can only guarantee that all events from the master are stored on one of slaves in the relay log. It doesn’t say anything about applying those events to a slave.

The problem here is that, if a slave is promoted to master, relay logs will be wiped out. If a slave is lagging and hasn’t applied all transactions, it will lose data - events that are not yet applied from relay logs will be lost forever.

There is no one-size-fits-all way of solving this situation. ClusterControl gives users control over how it should be done, maintaining safe defaults. It is done in cmon configuration using the following setting:

replication_failover_wait_to_apply_timeout=-1

By default it takes a value of ‘-1’, which means that failover won’t happen immediately if a master candidate is lagging, so it is set to wait forever unless the candidate has caught up. ClusterControl will wait indefinitely for it to apply all missing transactions from its relay logs. This is safe, but, if for some reason, the most up-to-date slave is lagging badly, failover may take hours to complete. On the other side of the spectrum is setting it to ‘0’ – it means that failover happens immediately, no matter if the master candidate is lagging or not. You can also go the middle way and set it to some value. This will set a time in seconds, for example 30 seconds so set the variable to,

replication_failover_wait_to_apply_timeout=30

When set to > 0, ClusterControl will wait for a master candidate to apply missing transactions from its relay logs until value is met (which is 30 seconds in the example). Failover happens after the defined time or when the master candidate will catch up on replication, whichever happens first. This may be a good choice if your application has specific requirements regarding downtime and you have to elect a new master within a short time window.

For more details about how ClusterControl works with automatic failover in PostgreSQL and MySQL, checkout our previous blogs titled "Failover for PostgreSQL Replication 101" and "Automatic failover of MySQL Replication - New in ClusterControl 1.4".

Conclusion

Automated Failover is a valuable feature, especially for businesses that require 24/7 operations with minimal downtime. The business must define how much control is given up to the automation process during unplanned outages. A high availability solution like ClusterControl offers a customizable level of interaction in failover processing. For some organizations, automated failover may not be an option, even though the user interaction during failover can eat time and impact RTO. The assumption is that it is too risky in case automated failover does not work correctly or, even worse, it results in data being messed up and partially missing (although one might argue that a human can also make disastrous mistakes leading to similar consequences). Those who prefer to keep close control over their database may choose to skip automated failover and use a manual process instead. Such a process takes more time, but it allows an experienced admin to assess the state of a system and take corrective actions based on what happened.

How to Manage MariaDB 10.3 with ClusterControl

$
0
0

MariaDB Server is no longer a straight imitate of MySQL. It grew into a mature fork, which implements new functionalities similar to what proprietary database systems offer in the upstream. MariaDB 10.3 greatly extends the list of enterprise features, and with new SQL_MODE=Oracle becomes an exciting choice for companies that would like to migrate their Oracle databases to an open source database. However, operational management is an area where there is still some catching up to do, and MariaDB requires that you build your own scripts.

Perhaps a good opportunity to look into an automation system?

Automated procedures are accurate and consistent. They can give you much-needed repeatability so you can minimize the risk of change in the production systems. However, as modern open source databases develop so fast, it's more challenging to keep your management systems on par with all new features.

The natural next step is to look for automation platforms. There are many platforms that you can use to deploy systems. Puppet, Chef, and Ansible are probably the best examples of that new trend. These platforms are suitable for the fast deployment of various software services. They are perfect for deployments, but still require you to maintain the code, cover feature changes, and usually, they cover just one aspect of your work. Things like backups, performance, and maintenance still need external tools or scripts.

On the other side, we have cloud platforms, with polished interfaces and a variety of additional services for a fully managed experience. However, it may not be feasible; for instance, hybrid environments where you might be using the cloud, but with still a significant on-prem footprint.

So, how about a dedicated management layer for your MariaDB databases?

ClusterControl was designed to automate the deployment and management of MariaDB as well as other open-source databases. At the core of ClusterControl is functionality that lets you automate the database tasks you have to perform regularly, like deploying new database instances and clusters, managing backups, high availability and failover, topology changes, upgrades, scaling new nodes and more.

ClusterControl installation

To start with ClusterControl, you need a dedicated virtual machine or host. The VM and supported systems requirements are described here. At the minimum you can start from tiny VM 2 GB RAM, 2 CPU cores and 20 GB storage space, either on-prem or in the cloud.

The primary installation method is to download an installation wizard that walks you through all the steps (OS configuration, package download and installation, metadata creation, and others).

For environments without internet access, you can use the offline installation process.

ClusterControl is agentless so you don't need to install additional software. It requires only SSH access to the database hosts. It also supports agent-based monitoring for higher resolution monitoring data.

To set up passwordless SSH to all target nodes (ClusterControl and all database hosts), run the following commands on the ClusterControl server:

$ ssh-keygen -t rsa # press enter on all prompts
$ ssh-copy-id -i ~/.ssh/id_rsa [ClusterControl IP address]
$ ssh-copy-id -i ~/.ssh/id_rsa [Database nodes IP address] # repeat this to all target database nodes

One of the most convenient ways to try out cluster control maybe the option to run it in docker container.

docker run -d --name clustercontrol \
--network db-cluster \
--ip 192.168.10.10 \
-h clustercontrol \
-p 5000:80 \
-p 5001:443 \
-v /storage/clustercontrol/cmon.d:/etc/cmon.d \
-v /storage/clustercontrol/datadir:/var/lib/mysql \
-v /storage/clustercontrol/sshkey:/root/.ssh \
-v /storage/clustercontrol/cmonlib:/var/lib/cmon \
-v /storage/clustercontrol/backups:/root/backups \
severalnines/clustercontrol

After successful deployment, you should be able to access the ClusterControl Web UI at {host's IP address}:{host's port}, for example:

HTTP: http://192.168.10.100:5000/clustercontrol
HTTPS: https://192.168.10.100:5001/clustercontrol

Installation of MariaDB Cluster

Once we enter the ClusterControl interface, the first thing to do is to deploy a new database or import an existing one. The version 1.7.2 introduced support for version 10.3 (along with 10.0,10.1,10.2). In 1.7.3 which was released this week, we can see the improved deployment of installation in the cloud.

ClusterControl: Deploy/Import
ClusterControl: Deploy/Import

At the time of writing this blog, the current versions are 10.3.16. Latest packages are picked up by default. Select the option "Deploy Database Cluster" and follow the instructions that appear.

Now is the time to provide data needed for the connection between ClusterControl and DB nodes. At this step, you would have clean VM's or images of OS that you use inside your organization. When choosing MariaDB, we must specify User, Key or Password and port to connect by SSH to our servers.

ClusterControl: Deploy Database Cluster
ClusterControl: Deploy Database Cluster

After setting up the SSH access information, we must enter the data to access our database, for MariaDB that will be the superuser root. We can also specify which repository to use. You can have three types of repositories when deploying database server/cluster using ClusterControl:

  • Use Vendor Repository. Provision software by setting up and using the database vendor's preferred software repository. ClusterControl will install the latest version of what is provided by the database vendor repository.
  • Do Not Setup Vendor Repositories. No repositories will be set up by ClusterControl. ClusterControl will rely on the system configuration (your default repository files).
  • Create and mirror the current database vendor's repository and then deploy using the local mirrored repository. This allows you to "freeze" the current versions of the software packages.

When all is set, hit the deploy button. The deployment process will also take care of the installation of additional tools provided by MariaDB like mariabackup and tools from external vendors, popular in database administration.

Import a New Cluster

We also have the option to manage an existing setup by importing it into ClusterControl. Such an environment can be created by ClusterControl or other methods (puppet, chef, ansible, docker …). The process is simple and doesn't require specialized knowledge.

First, we must enter the SSH access credentials to our existing database servers. Then we enter the access credentials to our database, the server data directory, and the version. We add the nodes by IP or hostname, in the same way as when we deploy, and press on Import. Once the task is finished, we are ready to manage our cluster from ClusterControl. At this point, we can also define the options for the node or cluster auto recovery.

ClusterControl: Import existing 10.3 database cluster
ClusterControl: Import existing 10.3 database cluster

Scaling MariaDB, Adding More Nodes to DB Cluster

With ClusterControl, adding more servers to the server is an easy step. You can do that from the GUI or CLI. For more advanced users, you can use ClusterControl Developer Studio and write a resource base condition to expand your cluster automatically.

ClusterControl: Adding MariaDB Node
ClusterControl: Adding MariaDB Node

ClusterControl supports an option to use an existing backup, so there is no need to overwhelm the production master node with additional work.

Securing MariaDB

The default MariaDB installation comes with relaxed security. This has been improved with the recent versions however production-grade systems still require tweaks in the default my.cnf configuration. ClusterControl deployments come with non-default my.cnf settings (different for different cluster types).

ClusterControl removes human error and provides access to a suite of security features, to automatically protect your databases from hacks and other threats.

ClusterControl: Security Panel
ClusterControl: Security Panel

ClusterControl enables SSL support for MariaDB connections. Enabling SSL adds another level of security for communication between the applications (including ClusterControl) and database. MariaDB clients open encrypted connections to the database servers and verify the identity of those servers before transferring any sensitive information.

ClusterControl will execute all necessary steps, including creating certificates on all database nodes. Such certificates can be maintained later on in the Key Management tab.

With ClusterControl you can also enable auditing. It uses the audit plugin provided by MariaDB. Continuous auditing is an imperative task for monitoring your database environment. By auditing your database, you can achieve accountability for actions taken or content accessed. Moreover, the audit may include some critical system components, such as the ones associated with financial data to support a precise set of regulations like SOX, or the EU GDPR regulation. The guided process lets you choose what should be audited and how to maintain the audit log files.

Monitoring and Alerting

When working with database systems, you should be able to monitor them. That will enable you to identify trends, plan for upgrades or improvements or react effectively to any problems or errors that may arise.

ClusterControl: Overview
ClusterControl: Overview

The new ClusterControl is using Prometheus as the data store with PromQL query language. The list of dashboards includes Server General, Server Caches, InnoDB Metrics, Replication Master, Replication Slave, System Overview, and Cluster Overview Dashboards.

ClusterControl: DashBoard
ClusterControl: DashBoard

ClusterControl installs Prometheus agents, configures metrics and maintains access to Prometheus exporters configuration via its GUI, so you can better manage parameter configuration like collector flags for the exporters (Prometheus).

As a database operator, we need to be informed whenever something critical occurs in our database. The three main methods in ClusterControl to get an alert includes:

  • email notifications
  • integrations
  • advisors
ClusterControl: Integration Services
ClusterControl: Integration Services

You can set the email notifications on a user level. Go to Settings > Email Notifications. Where you can choose between criticality and type of alert to be sent.

The next method is to use the Integration services. This is to pass the specific category of events to the other service like ServiceNow tickets, Slack, PagerDuty, etc. so you can create advanced notification methods and integrations within your organization.

The last one is to involve sophisticated metrics analysis in the Advisor section, where you can build intelligent checks and triggers.

ClusterControl: Advisors
ClusterControl: Advisors

SQL Monitoring

The SQL Monitoring is divided into three sections.

  • Top Queries - presents the information about queries that take a significant chunk of resources.
    Query Monitor: Top queries
    Query Monitor: Top queries
  • Running Queries - it’s a process list of information combined from all database cluster nodes into one view. You can use that to kill queries that affect your database operations.
    Query Monitor: Running Queries
    Query Monitor: Running Queries
  • Query Outliers - present the list of queries with execution time longer than average.
    Query Monitor: Query Outliers
    Query Monitor: Query Outliers

Backup and Recovery

Now that you have your MariaDB up and running, and have your monitoring in place, it is time for the next step: ensure you have a backup of your data.

ClusterControl: Backup repository
ClusterControl: Backup repository

ClusterControl provides an interface for MariaDB backup management with support for scheduling and creative reports. It gives you two options for backup methods.

  • Logical backup (text): mysqldump
  • Binary backups: xtrabackup (lower versions), mariabackup

A good backup strategy is a critical part of any database management system. ClusterControl offers many options for backups and recovery/restore.

ClusterControl backup retention is configurable; you can choose to retain your backup for any time period or to never delete backups. AES256 encryption is employed to secure your backups against rogue elements. For rapid recovery, backups can be restored directly into a new cluster - ClusterControl handles the full restore process from the launch of a new database setup to the recovery of data, removing error-prone manual steps from the process.

Backups can be automatically verified upon completion, and then uploaded to cloud storage services (AWS, Azure and Google). Different retention policies can be defined for local backups in the data center as well as backups that are uploaded in the cloud.

Node and cluster auto-recovery

ClusterControl provides advanced support for failure detection and handling. It also allows you to deploy different proxies to integrate them with your HA stack, so there is no need to adjust application connection string or DNS entry to redirect the application to the new master node.

When the master server is down, ClusterControl will create a job to perform automatic failover. ClusterControl does all the background work to elect a new master, deploy failover slave servers, and configure load balancers.

ClusterControl automatic failover was designed with the following principles:

  • Make sure the master is really dead before you failover
  • Failover only once
  • Do not failover to an inconsistent slave
  • Only write to the master
  • Do not automatically recover the failed master

With the built-in algorithms, failover can often be performed pretty quickly so you can assure the highest SLA's for your database environment.

ClusterControl: Auto Recovery
ClusterControl: Auto Recovery

The process is highly configurable. It comes with multiple parameters that you can use to adopt recovery to the specifics of your environment. Among the different options you can find replication_stop_on_error, replication_auto_rebuild_slave, replication_failover_blacklist, replication_failover_whitelist, replication_skip_apply_missing_txs, replication_onfail_failover_script and many others.

Failover is the process of moving to a healthy standby component, during a failure or maintenance event, in order to preserve uptime. The quicker it can be done, the faster you can be back online. If you're looking at minimizing downtime and meet your SLAs through an automated approach for TimescaleDB, then this blog is for you.

MaxScale Load Balancer

In addition to MariaDB 10.3, ClusterControl adds an option of MaxScale 2.3 load balancer. MaxScale is a SQL-aware proxy that can be used to build highly available environments. It comes with numerous features, however, the main goal is to enable load balancing and high availability.

ClusterControl: MaxScale
ClusterControl: MaxScale

MaxScale can be used to track the health of the master MariaDB node and, should it fail, perform a fast, automatic failover. Automated failover is crucial in building up a highly available solution that can recover promptly from the failure.

Load Balance Database Sessions

Read-write splitting is a critical feature to allow read scaling. It is enough for the application to connect to the MaxScale, and it detects the topology, determine which MariaDB acts as a master and which act as slaves. It routes the traffic accordingly to this.

Summary

We hope that this blog helps you to get familiar with ClusterControl and MariaDB 10.3 administration modules. The best option is to download ClusterControl and test each of them.

Database Failover for WordPress Websites

$
0
0

Every profitable enterprise requires high availability. Websites & Blogs are no different as even smaller companies and individuals require their sites to stay live to keep their reputation. 

WordPress is, by far, the most popular CMS in the world powering millions of websites from small to large. But how can you ensure that your website stays live. More specifically, how can I ensure the unavailability of my database will not impact my website? 

In this blog post we will show how to achieve failover for your WordPress website using ClusterControl.

The setup we will use for this blog will use Percona Server 5.7. We will have another host which contains the Apache and Wordpress application. We will not touch the application high-availability portion, but this also  something you want to make sure to have. We will use ClusterControl to manage databases to ensure the availability and we will use a third host to install and setup ClusterControl itself.

Assuming that the ClusterControl is up and running, we will need to import our existing database into it.

Importing a Database Cluster with ClusterControl

ClusterControl Import Cluster

Go to the Import Existing Server/Database option in the deployment wizard.

Importing an Existing Cluster with ClusterControl

We have to configure the SSH connectivity as this is a requirement for ClusterControl to be able to manage the nodes.

Configuring an Imported Cluster with ClusterControl

We now have to define some details about the vendor, version, root user access, the node itself, and if we want ClusterControl to manage autorecovery for us or not. That’s all, once the job succeeds, you will be presented with a cluster on the list.

Database Cluster List

To set up the highly-available environment, we need to execute a couple of actions. Our environment will consists of...

  • Master - Slave pair
  • Two ProxySQL instances for read/write split and topology detection
  • Two Keepalived instances for Virtual IP management

The idea is simple - we will deploy the slave to our master so we will have a second instance to failover to should the master fail. ClusterControl will be responsible for failure detection and it will promote the slave should the master become unavailable. ProxySQL will keep the track of the replication topology and it will redirect the traffic to the correct node - writes will be sent to the master, no matter which node it’s in, reads can either be sent to master-only or distributed across master and slaves. Finally, Keepalived will be collocated with ProxySQL and it will provide VIP for the application to connect to. That VIP will always be assigned to one of ProxySQL instances and Keepalived will move it to the second one, should the “main” ProxySQL node fail.

Having said all of that, let’s configure this using ClusterControl. All of it can be done in just a couple of clicks. We’ll start with adding the slave.

Adding a Database Slave with ClusterControl

Adding a Database Slave with ClusterControl

We start with picking “Add Replication Slave” job. Then we are asked to fill a form:

Adding a Replication Slave

We have to pick the master (in our case we don’t really have many options), we have to pass the IP or hostname for the new slave. If we had backups previously created, we could use one of them to provision the slave. In our case this is not available and ClusterControl will provision the slave directly from the master. That’s all, the job starts and ClusterControl performs required actions. You can monitor the progress in the Activity tab.

ClusterControl Activity Tab

Finally, once the job completes successfully, the slave should be visible on the cluster list.

Cluster List

Now we will proceed with configuring the ProxySQL instances. In our case the environment is minimal so, to keep things simpler, we will locate ProxySQL on one of the database nodes. This is not, however, the best option in a real production environment. Ideally, ProxySQL would either be located on a separate node or collocated with the other application hosts.

Configure ProxySQL ClusterControl

The place to start the job is Manage -> Loadbalancers.

ProxySQL Load Balancer Configuration ClusterControl

Here you have to pick where the ProxySQL should be installed, pass administrative credentials, and add a database user. In our case, we will use our existing user as our WordPress application already uses it for connecting to the database. We then have to pick which nodes to use in ProxySQL (we want both master and slave here) and let ClusterControl know if we use explicit transactions or not. This is not really relevant in our case, as we will reconfigure ProxySQL once it will be deployed. When you have that option enabled, read/write split will not be enabled. Otherwise ClusterControl will configure ProxySQL for read/write split. In our minimal setup we should seriously think if we want the read/write split to happen. Let’s analyse that.

The Advantages & Disadvantages of Read/Write Spit in ProxySQL

The main advantage of using the read/write split is that all the SELECT traffic will be distributed between the master and the slave. This means that the load on the nodes will be lower and response time should also be lower. This sounds good but keep in mind that should one node fail, the other node will have to be able to accommodate all of the traffic. There is little point in having automated failover in place if the loss of one node means that the second node will be overloaded and, de facto, unavailable too. 

It might make sense to distribute the load if you have multiple slaves - losing one node out of five is less impactful than losing one out of two. No matter what you decide on, you can easily change the behavior by going to ProxySQL node and clicking on the Rules tab.

ProxySQL Rules - ClusterControl

Make sure to look at rule 200 (the one which catches all SELECT statements). On the screenshot below you can see that the destination hostgroup is 20, which means all nodes in the cluster - read/write split and scale-out is enabled. We can easily disable this by editing this rule and changing the Destination Hostgroup to 10 (the one which contain master).

ProxySQL Configuration - ClusterControl

If you would like to enable the read/write split, you can easily do so by editing this query rule again and setting the destination hostgroup back to 20.

Now, let’s deploy second ProxySQL.

Deploy ProxySQL ClusterControl

To avoid passing all the configuration options again we can use the “Import Configuration” option and pick our existing ProxySQL as the source.

When this job will complete we still have to perform the last step in setting our environment. We have to deploy Keepalived on top of the ProxySQL instances.

Deploying Keepalived on Top of ProxySQL Instances

Deploy Keepalived with ProxySQL - ClusterControl

Here we picked ProxySQL as the load balancer type, passed both ProxySQL instances for Keepalived to be installed on and we typed our VIP and network interface.

Topology View - ClusterControl

As you can see, we now have the whole setup up and ready. We have a VIP of 10.0.0.111 which is assigned to one of the ProxySQL instances. ProxySQL instances will redirect our traffic to the correct backend MySQL nodes and ClusterControl will keep an eye on the environment performing failover if needed. The last action we have to take is to reconfigure Wordpress to use the Virtual IP to connect to the database.

To do that, we have to edit wp-config.php and change the DB_HOST variable to our Virtual IP:

/** MySQL hostname */

define( 'DB_HOST', '10.0.0.111' );

Conclusion

From now on Wordpress will connect to the database using VIP and ProxySQL. In case the master node fails, ClusterControl will perform the failover.

ClusterControl Failover with ProxySQL

As you can see, new master has been elected and ProxySQL also points towards new master in the hostgroup 10.

We hope this blog post gives you some idea about how to design a highly-available database environment for a Wordpress website and how ClusterControl can be used to deploy all of its elements.

Monitoring & Ops Management of MySQL 8.0 with ClusterControl

$
0
0

Users of open source databases often have to use a mixture of tools and homegrown scripts to manage their production database environments. However, even while having own homegrown scripts in the solution, it’s hard to maintain it and keep up with new database features, security requirements or upgrades. With new major versions of a database, including MySQL 8.0, this task can become even harder.

At the heart of ClusterControl is its automation functionality that lets you automate the database tasks you have to perform regularly, like deploying new databases, adding and scaling new nodes, managing backups, high availability and failover, topology changes, upgrades, and more. Automated procedures are accurate, consistent, and repeatable so you can minimize the risk of changes on the production environments.

Moreover, with ClusterControl, MySQL users are no longer subject to vendor lock-in; something that was questioned by many recently. You can deploy and import a variety of MySQL versions and vendors from a single console for free.

In this article, we will show you how to deploy MySQL 8.0 with a battle tested configuration and manage it in an automated way. You will find here how to do:

  • ClusterControl installation
  • MySQL deployment process
    • Deploy a new cluster
    • Import existing cluster
  • Scaling MySQL
  • Securing MySQL
  • Monitoring and Trending
  • Backup and Recovery
  • Node and Cluster autorecovery (auto failover)

ClusterControl installation

To start with ClusterControl you need a dedicated virtual machine or host. The VM and supported systems requirements are described here. The base VM can start from 2 GB, 2 cores and Disk space 20 GB storage space, either on-prem or in the cloud.

The installation is well described in the documentation but basically, you download an installation script which walks you through the steps. The wizard script sets up the internal database, installs necessary packages, repositories, and other necessary tweaks. For environments without internet access, you can use the offline installation process.

ClusterControl requires SSH access to the database hosts, and monitoring can be agent-based or agentless. Management is agentless.

To setup passwordless SSH to all target nodes (ClusterControl and all database hosts), run the following commands on the ClusterControl server:

$ ssh-keygen -t rsa # press enter on all prompts
$ ssh-copy-id -i ~/.ssh/id_rsa [ClusterControl IP address]
$ ssh-copy-id -i ~/.ssh/id_rsa [Database nodes IP address] # repeat this to all target database nodes

One of the most convenient ways to try out cluster control maybe the option to run it in docker container.

docker run -d --name clustercontrol \
--network db-cluster \
--ip 192.168.10.10 \
-h clustercontrol \
-p 5000:80 \
-p 5001:443 \
-v /storage/clustercontrol/cmon.d:/etc/cmon.d \
-v /storage/clustercontrol/datadir:/var/lib/mysql \
-v /storage/clustercontrol/sshkey:/root/.ssh \
-v /storage/clustercontrol/cmonlib:/var/lib/cmon \
-v /storage/clustercontrol/backups:/root/backups \
severalnines/clustercontrol

After successful deployment, you should be able to access the ClusterControl Web UI at {host's IP address}:{host's port}, for example:

HTTP: http://192.168.10.100:5000/clustercontrol
HTTPS: https://192.168.10.100:5001/clustercontrol

Deployment and Scaling

Deploy MySQL 8.0

Once we enter the ClusterControl interface, the first thing to do is to deploy a new database or import an existing one. The new version 1.7.2 introduces support for version 8.0 of Oracle Community Edition and Percona Server. At the time of writing this blog, the current versions are Oracle MySQL Server 8.0.15 and Percona Server for MySQL 8.0-15. Select the option “Deploy Database Cluster” and follow the instructions that appear.

ClusterControl: Deploy Database Cluster
ClusterControl: Deploy Database Cluster

When choosing MySQL, we must specify User, Key or Password and port to connect by SSH to our servers. We also need a name for our new cluster and if we want ClusterControl to install the corresponding software and configurations for us.

After setting up the SSH access information, we must enter the data to access our database. We can also specify which repository to use. Repository configuration is an important aspect for database servers and clusters. You can have three types of repositories when deploying database server/cluster using ClusterControl:

  • Use Vendor Repository
    Provision software by setting up and using the database vendor’s preferred software repository. ClusterControl will install the latest version of what is provided by the database vendor repository.
  • Do Not Setup Vendor Repositories
    Provision software by using the pre-existing software repository already set up on the nodes. The user has to set up the software repository manually on each database node and ClusterControl will use this repository for deployment. This is good if the database nodes are running without internet connection.
  • Use Mirrored Repositories (Create new repository)
    Create and mirror the current database vendor’s repository and then deploy using the local mirrored repository. This allows you to “freeze” the current versions of the software packages.

In the next step, we need to add our servers to the cluster that we are going to create. When adding our servers, we can enter IP or hostname then choose network interface. For the latter, we must have a DNS server or have added our MySQL servers to the local resolution file (/etc/hosts) of our ClusterControl, so it can resolve the corresponding name that you want to add.

On the screen we can see an example deployment with one master and two slave servers. The server list is dynamic and allows you to create sophisticated topologies which can be extended after the initial installation.

ClusterControl: Define Topology
ClusterControl: Define Topology

When all is set hit the deploy button. You can monitor the status of the creation of our new replication setup from the ClusterControl activity monitor. The deployment process will also take care of installation of popular mysql tools like percona toolkit and percona-xtradb-backup.

ClusterControl: Deploy Cluster Details
ClusterControl: Deploy Cluster Details

Once the task is finished, we can see our cluster in the main ClusterControl screen and on the topology view. Note that we also added a load balancer (ProxySQL) in front of the database instances.

ClusterControl: Topology
ClusterControl: Topology

As we can see in the image, once we have our cluster created, we can perform several tasks on it, directly from the topology section.

ClusterControl: Topology Management
ClusterControl: Topology Management

Import a New Cluster

We also have the option to manage an existing setup by importing it into ClusterControl. Such an environment can be created by ClusterControl or other methods (puppet, chef, ansible, docker …). The process is simple and doesn't require specialized knowledge.

ClusterControl: Import Existing Cluster
ClusterControl: Import Existing Cluster

First, we must enter the SSH access credentials to our servers. Then we enter the access credentials to our database, the server data directory, and the version. We add the nodes by IP or hostname, in the same way as when we deploy, and press on Import. Once the task is finished, we are ready to manage our cluster from ClusterControl. At this point we can also define the options for node or cluster auto recovery.

Scaling MySQL

With ClusterControl, adding more servers to the server is an easy step. You can do that from the GUI or CLI. For more advanced users you can use ClusterControl Developer Studio and write a resource base condition to expand your cluster automatically.

When adding a new node to the setup, you have an option to use existing backup so there is no need to overwhelm the production master node with additional work.

ClusterControl Scaling MySQL
ClusterControl Scaling MySQL

With the built-in support for load balancers (ProxySQL, Maxscale, HAProxy), you can add and remove MySQL nodes dynamically. If you wish to know more in-depth about how best to manage MySQL replication and clustering, please read the MySQL replication for HA replication whitepaper.

Securing MySQL

MySQL comes with very little security out of the box. This has been improved with the recent version however production grade systems still require tweeks in the default my.cnf configuration.

ClusterControl removes human error and provides access to a suite of security features, to automatically protect your databases from hacks and other threats.

ClusterControl enables SSL support for MySQL connections. Enabling SSL adds another level of security for communication between the applications (including ClusterControl) and database. MySQL clients open encrypted connections to the database servers and verify the identity of those servers before transferring any sensitive information.

ClusterControl will execute all necessary steps, including creating certificates on all database nodes. Such certificates can be maintained later on in the Key Management tab.

ClusterControl: Manager SSL keys
ClusterControl: Manager SSL keys

The Percona server installations comes with additional support for an audit plugin. Continuous auditing is an imperative task for monitoring your database environment. By auditing your database, you can achieve accountability for actions taken or content accessed. Moreover, the audit may include some critical system components, such as the ones associated with financial data to support a precise set of regulations like SOX, or the EU GDPR regulation. The guided process lets you choose what should be audited and how to maintain the audit log files.

ClusterControl: Enable Audit Log for Percona Server 8.0
ClusterControl: Enable Audit Log for Percona Server 8.0

Monitoring

When working with database systems, you should be able to monitor them. That will enable you to identify trends, plan for upgrades or improvements or react effectively to any problems or errors that may arise.

The new ClusterControl 1.7.2 comes with updated high-resolution monitoring for MySQL 8.0. It's using Prometheus as the data store with PromQL query language. The list of dashboards includes MySQL Server General, MySQL Server Caches, MySQL InnoDB Metrics, MySQL Replication Master, MySQL Replication Slave, System Overview, and Cluster Overview Dashboards.

ClusterControl installs Prometheus agents, configures metrics and maintains access to Prometheus exporters configuration via its GUI, so you can better manage parameter configuration like collector flags for the exporters (Prometheus). We described in details what can be monitored recently in the article How to Monitor MySQL with Prometheus & ClusterControl.

ClusterControl: Dashboard
ClusterControl: Dashboard

Alerting

As a database operator, we need to be informed whenever something critical occurs on our database. The three main methods in ClusterControl to get an alert includes:

  • email notifications
  • integrations
  • advisors

You can set the email notifications on a user level. Go to Settings > Email Notifications. Where you can choose between criticality and type of alert to be sent.

ClusterControl: Notification
ClusterControl: Notification

The next method is to use Integration services. This is to pass the specific category of events to the other service like ServiceNow tickets, Slack, PagerDuty etc. so you can create an advanced notification methods and integrations within your organization.

ClusterControl: Integration
ClusterControl: Integration

The last one is to involve sophisticated metrics analysis in Advisor section, where you can build intelligent checks and triggers.

ClusterControl: Automatic Advisors
ClusterControl: Automatic Advisors

Backup and Recovery

Now that you have your MySQL up and running, and have your monitoring in place, it is time for the next step: ensure you have a backup of your data.

ClusterControl: Create Backup
ClusterControl: Create Backup

ClusterControl provides an interface for MySQL backup management with support for scheduling and creative reports. It gives you two options for backup methods.

  • Logical: mysqldump
  • Binary: xtrabackup/mariabackup
ClusterControl: Create Backup Options
ClusterControl: Create Backup Options

A good backup strategy is a critical part of any database management system. ClusterControl offers many options for backups and recovery/restore.

ClusterControl: Backup schedule and Backup Repository
ClusterControl: Backup schedule and Backup Repository

ClusterControl backup retention is configurable; you can choose to retain your backup for any time period or to never delete backups. AES256 encryption is employed to secure your backups against rogue elements. For rapid recovery, backups can be restored directly into a new cluster - ClusterControl handles the full restore process from launch of a new database setup to recovery of data, removing error-prone manual steps from the process.

Backups can be automatically verified upon completion, and then uploaded to cloud storage services (AWS, Azure and Google). Different retention policies can be defined for local backups in the datacenter as well as backups that are uploaded in the cloud.

Node and cluster autorecovery

ClusterControl provides advanced support for failure detection and handling. It also allows you to deploy different proxies to integrate them with your HA stack so there is no need to adjust application connection string or dns entry to redirect application to the new master node.

When master server is down, ClusterControl will create a job to perform automatic failover. ClusterControl does all the background work to elect a new master, deploy fail-over slave servers, and configure load balancers.

ClusterControl: Node autorecovery
ClusterControl: Node autorecovery

ClusterControl automatic failover was designed with the following principles:

  • Make sure the master is really dead before you failover
  • Failover only once
  • Do not failover to an inconsistent slave
  • Only write to the master
  • Do not automatically recover the failed master

With the built-in algorithms, failover can often be performed pretty quickly so you can assure the highest SLA’s for your database environment.

The process is highly configurable. It comes with multiple parameters which you can use to adopt recovery to the specifics of your environment. Among the different options you can find replication_stop_on_error, replication_auto_rebuild_slave, replication_failover_blacklist, replication_failover_whitelist, replication_skip_apply_missing_txs, replication_onfail_failover_script and many others.

Comparing DBaaS Failover Solutions to Manual Recovery Setups

$
0
0

We have recently written several blogs covering how different cloud providers handle database failover. We compared failover performance in Amazon Aurora, Amazon RDS and ClusterControl, tested the failover behavior in Amazon RDS, and also on Google Cloud Platform. While those services provide great options when it comes to failover, they may not be right for every application.

In this blog post we will spend a bit of time analysing the pros and cons of using the DBaaS solutions compared with designing an environment manually or by using a database management platform, like ClusterControl.

Implementing High Availability Databases with Managed Solutions

The primary reason to use existing solutions is ease of use. You can deploy a highly available solution with automated failover in just a couple of clicks. There’s no need for combining different tools together, managing the databases by hand, deploying tools, writing scripts, designing the monitoring, or any other database management operations. Everything is already in place. This can seriously reduce the learning curve and requires less experience to set up a highly-available environment for the databases; allowing basically everyone to deploy such setups.

In most of the cases with these solutions, the failover process is executed within a reasonable time. It may be blazing fast as with Amazon Aurora or somewhat slower as with Google Cloud Platform SQL nodes. For the majority of the cases, these types of results are acceptable. 

The bottom line. If you can accept 30 - 60 seconds of downtime, you should be ok using any of the DBaaS platforms.

The Downside of Using a Managed Solution for HA

While DBaaS solutions are simple to use, they also come with some serious drawbacks. For starters, there is always a vendor lock-in component to consider. Once you deploy a cluster in Amazon Web Services it is quite tricky to migrate out of that provider. There are no easy methods to download the full dataset through a physical backup. With most providers, only manually executed logical backups are available. Sure, there are always options to achieve this, but it is typically a complex, time-consuming process, which still may require some downtime after all.

Using a provider like Amazon RDS also comes with limitations. Some actions cannot be easily performed which would be very simple to accomplish on environments deployed in a fully user-controlled manner (e.g. AWS EC2). Some of these limitations have already been covered in other blogs, but to summarize is that no DBaaS service gives you the same level of flexibility as regular MySQL GTID-based replication. You can promote any slave, you can re-slave every node off any other...virtually every action is possible. With tools like RDS you face design-induced limitations you cannot bypass.

The problem is also with an ability to understand performance details. When you design your own highly available setup, you become knowledgeable about potential performance issues that may show up. On the other hand, RDS and similar environments are pretty much “black boxes.” Yes, we have learned that Amazon RDS uses DRBD to create a shadow copy of the master, we know that Aurora uses shared, replicated storage to implement very fast failovers. That’s just a general knowledge. We cannot tell what are the performance implications of those solutions other than what we might casually notice. What are common issues associated with them? How stable are those solutions? Only the developers behind the solution know for sure.

What is the Alternative to DBaaS Solutions?

You may wonder, is there an alternative to DBaaS? After all, it is so convenient to run the managed service where you can access most of the typical actions via UI. You can create and restore backups, failover is handled automatically for you. The environment is easy-to-use which can be compelling for companies who do not have dedicated and experienced staff for dealing with databases.

ClusterControl provides a great alternative to cloud-based DBaaS services. It provides you with a graphical user interface, which can be used to deploy, manage, and monitor open source databases. 

In couple of clicks you can easily deploy a highly-available database cluster, with automated failover (faster than most of the DBaaS offerings), backup management, advanced monitoring, and other features like integration with external tools (e.g. Slack or PagerDuty) or upgrade management. All this while completely avoiding vendor lock-in. 

ClusterControl doesn’t care where your databases are located as long as it can connect to them using SSH. You can have setups in cloud, on-prem, or in a mixed environment of multiple cloud providers. As long as connectivity is there, ClusterControl will be able to manage the environment. Utilizing the solutions you want (and not the ones that you are not familiar nor aware of) allows you to take full control over the environment at any point in time. 

Whatever setup you deployed with ClusterControl, you can easily manage it in a more traditional, manual or scripted way. ClusterControl even provides you with command line interface, which will let you incorporate tasks executed by ClusterControl into your shell scripts. You have all the control you want - nothing is a black box, every piece of the environment would be built using open source solutions combined together and deployed by ClusterControl.

Let’s take a look at how easily you can deploy a MySQL Replication cluster using ClusterControl. Let’s assume you have the environment prepared with ClusterControl installed on one instance and all other nodes accessible via SSH from ClusterControl host.

ClusterControl Deployment Wizard

We will start with picking the “Deploy” wizard.

ClusterControl Deployment Wizard

At the first step we have to define how ClusterControl should connect to the nodes on which databases are to be deployed. Both root access or sudo (with or without the password) are supported.

ClusterControl Deployment Wizard

Then, we want to pick a vendor, version and pass the password for the administrative user in our MySQL database.

ClusterControl Deployment Wizard

Finally, we want to define the topology for our new cluster. As you can see, this is already quite complex setup, unlike something you can deploy using AWS RDS or GCP SQL node.

ClusterControl Jobs

All we have to do now is to wait for the process to complete. ClusterControl will do its best to understand the environment it is deploying to and install required set of packages, including the database itself.

ClusterControl Cluster List

Once the cluster is up-and-running, you can proceed with deploying the proxy layer (which will provide your application with a single point of entry into the database layer). This is more or less what happens behind the scenes with DBaaS, where you also have endpoints to connect to the database cluster. It is quite common to use a single endpoint for writes and multiple endpoints for reaching particular replicas.

Database Cluster Topology

Here we will use ProxySQL, which will do the dirty work for us - it will understand the topology, sends writes only to the master and load balance read-only queries across all replicas that we have.

To deploy ProxySQL we will go to Manage -> Load Balancers.

Add Database Load Balancer ClusterControl

We have to fill all required fields: hosts to deploy on, credentials for the administrative and monitoring user, we may import existing user from MySQL into ProxySQL or create a new one. All the details about ProxySQL can be easily found in multiple blogs in our blog section.

We want at least two ProxySQL nodes to be deployed to ensure high-availability. Then, once they are deployed, we will deploy Keepalived on top of ProxySQL. This will ensure that Virtual IP will be configured and pointing to one of the ProxySQL instances, as long as there will be at least one healthy node.

Add ProxySQL ClusterControl

Here is the only potential problem if you go with cloud environments where routing works in a way that you cannot easily bring up a network interface. In such case you will have to modify the configuration of Keepalived, introduce ‘notify_master’ script and use a script, which will make the necessary IP changes - in case of EC2 it would have to detach Elastic IP from one host and attach it to the other host. 

There are plenty of instructions on how to do that using widely-tested open source software in setups deployed by ClusterControl. You can easily find additional information, tips, and how-to’s which are relevant to your particular environment.

Database Cluster Topology with Load Balancer

Conclusion

We hope you found this blog post insightful. If you would like to test ClusterControl, it comes with a 30 day enterprise trial where you have available all the features. You can download it for free and test if it fits in your environment.

Failover & Failback for PostgreSQL on Microsoft Azure

$
0
0

It’s pretty common to use the cloud to store your data or as a failover option in the case of master failure. There are several cloud providers which allow you to store, manage, retrieve, and manipulate data via a cloud platform; accessible over the internet. Each cloud provider has its own product offerings and unique features, each with different cost models. 

Microsoft Azure is one of these could providers. In this blog, we’ll take a look at what features Microsoft Azure offers for primary storage, as a disaster recovery site, and specifically look at how it handles a mixed PostgreSQL database environment.

Deploying a PostgreSQL Database Instance on Microsoft Azure

Before performing this task, you need to decide how you will use this instance and which Azure product is best for you. There are two basic ways to deploy a PostgreSQL instance on Microsoft Azure.

Microsoft Azure Marketplace
  1. Azure Database for PostgreSQL: Is a managed service that you can use to run, manage, and scale highly-available PostgreSQL databases in the cloud. It’s available in two deployment options: Single Server and Hyperscale.
  2. Virtual Machine: Provides an on-demand, high-scale, secure, virtualized infrastructure. It has support for Ubuntu Server, RedHat Enterprise Linux, SUSE Linux Enterprise Server, CentOS, Debian, and Windows Server and it allows you to develop, test, run applications, and extend your datacenter in just a few seconds.

For this blog we will take a look at both how we can create an Azure Database for PostgreSQL and use a Virtual Machine Azure from the Microsoft Azure Portal.

Deploying Azure Database for PostgreSQL

If you go to your Azure Portal -> Create a Resource -> Databases -> Azure Database for PostgreSQL, you’ll be able to choose between Single Server or Hyperscale. For this blog, we’ll use a Single Server, as the Hyperscale option is on preview and it doesn’t offer an SLA yet.

Azure Database for PostgreSQL

Here you need to add some information about your new PostgreSQL instance; such as subscription, server name, user credentials, and location. You can also choose which PostgreSQL version to use (9.5, 9.6, 10 or 11 versions are currently available) and the virtual hardware to run it (Compute + Storage).

Azure Database for PostgreSQL

When you specify the hardware, you’ll see the estimated price in real-time. This is really useful to avoid a big surprise next month. After this step, you just have to confirm the resource configuration and wait a couple minutes until Azure finishes the creation job.

When you have the new resource created, you can go to All Resources to see the resource options available.

Azure Database for PostgreSQL

In the created resource options, you can go to Replication to enable it and replicate from the master server to up to five replicas. You should also check the Connection Security section to enable or disable external access. To know the access information, you must visit the overview resource section.

$ psql -h pg1blog.postgres.database.azure.com -U severalnines@pg1blog postgres

Password for user severalnines@pg1blog:

psql (11.5, server 11.4)

SSL connection (protocol: TLSv1.2, cipher: ECDHE-RSA-AES256-GCM-SHA384, bits: 256, compression: off)

Type "help" for help.

postgres=>

Failover on Azure Database for PostgreSQL

Unfortunately, automated failover between master and replica servers is not available. If you delete the master instance, however, Azure will perform a failover process to promote the replica in an automatic way.

Failover on Azure Database for PostgreSQL

There is an option to perform this failover task manually, which requires you to stop the replica and configure the new endpoint into your application to point to the new master. The replica will be promoted and delinked from the master. There is not a way to relink this replica to your master again.

Deploying PostgreSQL on Azure Virtual Machine

If you go to your Azure Portal -> Create a Resource -> Compute -> Virtual Machine, you’ll open the Create a virtual machine section where you can specify different configurations for your new Azure Virtual Machine.

Azure Create a Virtual Machine

In the basic tab, you must specify the Azure subscription, Region, Availability options, Operating System, Server Size, access credentials (username/password or SSH Key), and inbound firewall rules.

Azure Create a Virtual Machine - Disk Options

In the disk tab, you must specify the storage (type and size) for your new virtual machine. The disk type can be Standard HDD, Standard SSD, or Premium SSD. The last one is recommended for high IOPS workloads.

Azure Create a Virtual Machine - Networking

In the networking tab, you can specify the virtual network, public IP address, and the allowed inbound ports. You can also add this new virtual machine behind an exiting Azure load balancing solution.

Azure Create a Virtual Machine Management Settings

In the next tab, we have some management options, like monitoring and backups. 

Azure Create a Virtual Machine - Advanced Settings

And finally, in the advanced tab, we can add extensions, cloud-init, or host groups.

After reviewing the previous option and confirming it, you’ll have your new virtual machine created and accessible from the Azure Portal. In the Resource -> Overview section, you can see the virtual machine access information (Public/Private IP Address).

Azure Resource Overview

Now, you can access it via SSH and install the PostgreSQL database using ClusterControl.

$ ssh 23.102.177.27

Last login: Mon Sep 23 21:33:27 2019

[sinsausti@vm1test ~]$

You can check this link to see the steps to perform the PostgreSQL deployment with ClusterControl.

PostgreSQL Failover on Azure Virtual Machine

Disaster recovery is a Virtual Machine feature under the Operations section that allows you to replicate your environment in another Azure region. To enable it, you need to choose the target region. In the advanced tab, you can modify the specific target details; such as virtual network, storage settings, and replication settings.

Microsoft Azure Disaster Recovery

When the disaster recovery is enabled, you’ll be able to check the replication status, test the failover process, or manually failover to it.

Microsoft Azure Disaster Recovery

Enabling this allows you to have a failover option in the case of failure. This, however, will be a failover for entire environment and not just the database service.

An Improved PostgreSQL Failover Process for Microsoft Azure

As you have SSH access, you can improve this failover process by importing the virtual machine (or even deploying the PostgreSQL database) with ClusterControl.

If you’re managing the database nodes with ClusterControl (and if the “Auto Recovery” option is ON) in the case of master failure, ClusterControl will promote the most advanced slave (if it is not blacklisted) to master as well as notify you of the problem. It also automatically fails over the rest of the slaves to replicate from the new master.

With ClusterControl, you even also deploy a mixed environment with some nodes on the cloud and other nodes on-prem. You can also add load balancers to your topology to improve our high availability environment. You can find more information about this topic here.

Conclusion

Azure has a lot of features and products to offer an enterprise-level solution. During these tests, however, the main issue I found was that the time to creation and failover was too lengthy for most application needs.

If you need a fast failover and recovery, you should improve the availability of the environment by using a load balancer, or an external system like ClusterControl, to decrease downtime. For more detailed information about running PostgreSQL on Microsoft Azure you can take a look at our deep dive blog.

How to Troubleshoot MySQL Database Issues

$
0
0

As soon as you start running a database server and your usage grows, you are exposed to many types of technical problems, performance degradation, and database malfunctions.  Each of these could lead to much bigger problems, such as catastrophic failure or data loss. It’s like a chain reaction, where one thing can lead to another, causing more and more issues. Proactive countermeasures must be performed in order for you to have a stable environment as long as possible.

In this blog post, we are going to look at a bunch of cool features offered by ClusterControl that can greatly help us troubleshoot and fix our MySQL database issues when they happen.

Database Alarms and Notifications

For all undesired events, ClusterControl will log everything under Alarms, accessible on the Activity (Top Menu) of ClusterControl page. This is commonly the first step to start troubleshooting when something goes wrong. From this page, we can get an idea on what is actually going on with our database cluster:

ClusterControl Database Alarms

The above screenshot shows an example of a server unreachable event, with severity CRITICAL, detected by two components, Network and Node. If you have configured the email notifications setting, you should get a copy of these alarms in your mailbox. 

When clicking on the “Full Alarm Details,” you can get the important details of the alarm like hostname, timestamp, cluster name and so on. It also provides the next recommended step to take. You can also send out this alarm as an email to other recipients configured under the Email Notification Settings. 

You may also opt to silence an alarm by clicking the “Ignore Alarm” button and it will not appear in the list again. Ignoring an alarm might be useful if you have a low severity alarm and know how to handle or work around it. For example if ClusterControl detects a duplicate index in your database, where in some cases would be needed by your legacy applications.

By looking at this page, we can obtain an immediate understanding of what is going on with our database cluster and what the next step is to do to solve the problem. As in this case, one of the database nodes went down and became unreachable via SSH from the ClusterControl host. Even a beginner SysAdmin would now know what to do next if this alarm appears.

Centralized Database Log Files

This is where we can drill down what was wrong with our database server. Under ClusterControl -> Logs -> System Logs, you can see all log files related to the database cluster. As for MySQL-based database cluster, ClusterControl pulls the ProxySQL log, MySQL error log and backup logs:

ClusterControl System Logs

Click on "Refresh Log" to retrieve the latest log from all hosts that are accessible at that particular time. If a node is unreachable, ClusterControl will still view the outdated log in since this information is stored inside the CMON database. By default ClusterControl keeps retrieving the system logs every 10 minutes, configurable under Settings -> Log Interval. 

ClusterControl will trigger the job to pull the latest log from each server, as shown in the following "Collect Logs" job:

ClusterControl Database Job Details

A centralized view of log file allows us to have faster understanding on what went wrong. For a database cluster which commonly involves multiple nodes and tiers, this feature will greatly improve the log reading where a SysAdmin can compare these logs side-by-side and pinpoint critical events, reducing the total troubleshooting time. 

Web SSH Console

ClusterControl provides a web-based SSH console so you can access the DB server directly via the ClusterControl UI (as the SSH user is configured to connect to the database hosts). From here, we can gather much more information which allows us to fix the problem even faster. Everyone knows when a database issue hits the production system, every second of downtime counts.

To access the SSH console via web, simply pick the nodes under Nodes -> Node Actions -> SSH Console, or simply click on the gear icon for a shortcut:

ClusterControl Web SSH Console Access

Due to security concern that might be imposed with this feature, especially for multi-user or multi-tenant environment, one can disable it by going to /var/www/html/clustercontrol/bootstrap.php on ClusterControl server and set the following constant to false:

define('SSH_ENABLED', false);

Refresh the ClusterControl UI page to load the new changes.

Database Performance Issues

Apart from monitoring and trending features, ClusterControl proactively sends you various alarms and advisors related to database performance, for example:

  • Excessive usage - Resource that passes certain thresholds like CPU, memory, swap usage and disk space.
  • Cluster degradation - Cluster and network partitioning.
  • System time drift - Time difference among all nodes in the cluster (including ClusterControl node).
  • Various other MySQL related advisors:
    • Replication - replication lag, binlog expiration, location and growth
    • Galera - SST method, scan GRA logfile, cluster address checker
    • Schema check - Non-transactional table existance on Galera Cluster.
    • Connections - Threads connected ratio
    • InnoDB - Dirty pages ratio, InnoDB log file growth
    • Slow queries - By default ClusterControl will raise an alarm if it finds a query running for more than 30 seconds. This is of course configurable under Settings -> Runtime Configuration -> Long Query.
    • Deadlocks - InnoDB transactions deadlock and Galera deadlock.
    • Indexes - Duplicate keys, table without primary keys.

Check out the Advisors page under Performance -> Advisors to get the details of things that can be improved as suggested by ClusterControl. For every advisor, it provides justifications and advice as shown in the following example for "Checking Disk Space Usage" advisor:

ClusterControl Disk Space Usage Check

When a performance issue occurs you will get "Warning" (yellow) or "Critical" (red) status on these advisors. Further tuning is commonly required to overcome the problem. Advisors raise alarms, which means, users will get a copy of these alarms inside the mailbox if Email Notifications are configured accordingly. For every alarm raised by ClusterControl or its advisors, users will also get an email if the alarm has been cleared. These are pre-configured within ClusterControl and require no initial configuration. Further customization is always possible under Manage -> Developer Studio. You can check out this blog post on how to write your own advisor.

ClusterControl also provides a dedicated page in regards to database performance under ClusterControl -> Performance. It provides all sorts of database insights following the best-practices like centralized view of DB Status, Variables, InnoDB status, Schema Analyzer, Transaction Logs. These are pretty self-explanatory and straightforward to understand.

For query performance, you can inspect Top Queries and Query Outliers, where ClusterControl highlights queries which performed significantly differ from their average query. We have covered this topic in detail in this blog post, MySQL Query Performance Tuning.

Database Error Reports

ClusterControl comes with an error report generator tool, to collect debugging information about your database cluster to help understand the current situation and status. To generate an error report, simply go to ClusterControl -> Logs -> Error Reports -> Create Error Report:

ClusterControl Database Error Reports

The generated error report can be downloaded from this page once ready. This generated report will be in TAR ball format (tar.gz) and you may attach it to a support request. Since the support ticket has the limit of 10MB of file size, if the tarball size is bigger than that, you could upload it into a cloud drive and only share with us the download link with proper permission. You may remove it later once we already got the file. You can also generate the error report via command line as explained in the Error Report documentation page.

In the event of an outage, we highly recommend that you generate multiple error reports during and right after the outage. Those reports will be very useful to try to understand what went wrong, the consequences of the outage, and to verify that the cluster is in-fact back to operational status after a disastrous event.

Conclusion

ClusterControl proactive monitoring, together with a set of troubleshooting features, provide an efficient platform for  users to troubleshoot any kind of MySQL database issues. Long gone is the legacy way of troubleshooting where one has to open multiple SSH sessions to access multiple hosts and execute multiple commands repeatedly in order to pinpoint the root cause.

If the above mentioned features are not helping you in solving the problem or troubleshooting the database issue, you always contact the Severalnines Support Team to back you up. Our 24/7/365 dedicated technical experts are available to attend your request at anytime. Our average first reply time is usually less than 30 minutes.


Using MySQL Galera Cluster Replication to Create a Geo-Distributed Cluster: Part Two

$
0
0

In the previous blog in the series we discussed the pros and cons of using Galera Cluster to create geo-distributed cluster. In this post we will design a Galera-based geo-distributed cluster and we will show how you can deploy all the required pieces using ClusterControl.

Designing a Geo-Distributed Galera Cluster

We will start with explaining the environment we want to build. We will use three remote data centers, connected via Wide Area Network (WAN). Each datacenter will receive writes from local application servers. Reads will also be only local. This is intended to avoid unnecessary traffic crossing the WAN. 

For this setup the connectivity is in place and secured, but we won’t describe exactly how this can be achieved. There are numerous methods to secure the connectivity starting from proprietary hardware and software solutions through OpenVPN and ending up on SSH tunnels. 

We will use ProxySQL as a loadbalancer. ProxySQL will be deployed locally in each datacenter. It will also route traffic only to the local nodes. Remote nodes can always be added manually and we will explain cases where this might be a good solution. Application can be configured to connect to one of the local ProxySQL nodes using round-robin algorithm. We can as well use Keepalived and Virtual IP to route the traffic towards the single ProxySQL node, as long as a single ProxySQL node would be able to handle all of the traffic. 

Another possible solution is to collocate ProxySQL with application nodes and configure the application to connect to the proxy on the localhost. This approach works quite well under the assumption that it is unlikely that ProxySQL will not be available yet the application would work ok on the same node. Typically what we see is either node failure or network failure, which would affect both ProxySQL and application at the same time.

Geo-Distributed MySQL Galera Cluster with ProxySQL

The diagram above shows the version of the environment, where ProxySQL is collocated on the same node as the application. ProxySQL is configured to distribute the workload across all Galera nodes in the local datacenter. One of those nodes would be picked as a node to send the writes to while SELECTs would be distributed across all nodes. Having one dedicated writer node in a datacenter helps to reduce the number of possible certification conflicts, leading to, typically, better performance. To reduce this even further we would have to start sending the traffic over the WAN connection, which is not ideal as the bandwidth utilization would significantly increase. Right now, with segments in place, only two copies of the writeset are being sent across datacenters - one per DC.

The main concern with Galera Cluster geo-distributed deployments is latency. This is something you always have to test prior launching the environment. Am I ok with the commit time? At every commit certification has to happen so writesets have to be sent and certified on all nodes in the cluster, including remote ones. It may be that the high latency will deem the setup unsuitable for your application. In that case you may find multiple Galera clusters connected via asynchronous replication more suitable. This would be a topic for another blog post though.

Deploying a Geo-Distributed Galera Cluster Using ClusterControl

To clarify things, we will show here how a deployment may look like. We won’t use actual multi-DC setup, everything will be deployed in a local lab. We assume that the latency is acceptable and the whole setup is viable. What is great about ClusterControl is that it is infrastructure-agnostic. It doesn’t care if the nodes are close to each other, located in the same datacenter or if the nodes are distributed across multiple cloud providers. As long as there is SSH connectivity from ClusterControl instance to all of the nodes, the deployment process looks exactly the same. That’s why we can show it to you step by step using just local lab.

Installing ClusterControl

First, you have to install ClusterControl. You can download it for free. After registering, you should access the page with guide to download and install ClusterControl. It is as simple as running a shell script. Once you have ClusterControl installed, you will be presented with a form to create an administrative user:

Installing ClusterControl

Once you fill it, you will be presented with a Welcome screen and access to deployment wizards:

ClusterControl Welcome Screen

We’ll go with deploy. This will open a deployment wizard:

ClusterControl Deployment Wizard

We will pick MySQL Galera. We have to pass SSH connectivity details - either root user or sudo user are supported. On the next step we are to define servers in the cluster.

Deploy Database Cluster

We are going to deploy three nodes in one of the data centers. Then we will be able to extend the cluster, configuring new nodes in different segments. For now all we have to do is to click on “Deploy” and watch ClusterControl deploying the Galera cluster.

Cluster List - ClusterControl

Our first three nodes are up and running, we can now proceed to adding additional nodes in other datacenters.

Add a Database Node - ClusterControl

You can do that from the action menu, as shown on the screenshot above.

Add a Database Node - ClusterControl

Here we can add additional nodes, one at a time. What is important, you should change the Galera segment to non-zero (0 is used for the initial three nodes).

After a while we end up with all nine nodes, distributed across three segments.

ClusterControl Geo-Distributed Database Nodes

Now, we have to deploy proxy layer. We will use ProxySQL for that. You can deploy it in ClusterControl via Manage -> Load Balancer:

Add a Load Balancer - ClusterControl

This opens a deployment field:

Deploy Load Balancer - ClusterControl

First, we have to decide where to deploy ProxySQL. We will use existing Galera nodes but you can type anything in the field so it is perfectly possible to deploy ProxySQL on top of the application nodes. In addition, you have to pass access credentials for the administrative and monitoring user.

Deploy Load Balancer - ClusterControl

Then we have to either pick one of existing users in MySQL or create one right now. We also want to ensure that the ProxySQL is configured to use Galera nodes located only in the same datacenter.

When you have one ProxySQL ready in the datacenter, you can use it as a source of the configuration:

Deploy ProxySQL - ClusterControl

This has to be repeated for every application server that you have in all datacenters. Then the application has to be configured to connect to the local ProxySQL instance, ideally over the Unix socket. This comes with the best performance and the lowest latency.

Reducing Latency - ClusterControl

After the last ProxySQL is deployed, our environment is ready. Application nodes connect to local ProxySQL. Each ProxySQL is configured to work with Galera nodes in the same datacenter:

ProxySQL Server Setup - ClusterControl

Conclusion

We hope this two-part series helped you to understand the strengths and weaknesses of geo-distributed Galera Clusters and how ClusterControl makes it very easy to deploy and manage such cluster.

Announcing ClusterControl 1.7.4: Cluster-to-Cluster Replication - Ultimate Disaster Recovery

$
0
0

We’re excited to announce the 1.7.4 release of ClusterControl - the only database management system you’ll ever need to take control of your open source database infrastructure. 

In this release we launch a new function that could be the ultimate way to minimize RTO as part of your disaster recovery strategy. Cluster-to-Cluster Replication for MySQL and PostgreSQL lets you build-out a clone of your entire database infrastructure and deploy it to a secondary data center, while keeping both synced. This ensures you always have an available up-to-date database setup ready to switch-over to should disaster strike.  

In addition we are also announcing support for the new MariaDB 10.4 / Galera Cluster 4.x as well as support for ProxySQL 2.0, the latest release from the industry leading MySQL load balancer.

Lastly, we continue our commitment to PostgreSQL by releasing new user management functions, giving you complete control over who can access or administer your postgres setup.

Release Highlights

Cluster-to-Cluster Database Replication

  • Asynchronous MySQL Replication Between MySQL Galera Clusters.
  • Streaming Replication Between PostgreSQL Clusters.
  • Ability to Build Clusters from a Backup or by Streaming Directly from a Master Cluster.

Added Support for MariaDB 10.4 & Galera Cluster 4.x

  • Deployment, Configuration, Monitoring and Management of the Newest Version of Galera Cluster Technology, initially released by MariaDB
    • New Streaming Replication Ability
    • New Support for Dealing with Long Running & Large Transactions
    • New Backup Locks for SST

Added Support for ProxySQL 2.0

  • Deployment and Configuration of the Newest Version of the Best MySQL Load Balancer on the Market
    • Native Support for Galera Cluster
    • Enables Causal Reads Using GTID
    • New Support for SSL Frontend Connections
    • Query Caching Improvements

New User Management Functions for PostgreSQL Clusters

  • Take full control of who is able to access your PostgreSQL database.
 

View Release Details and Resources

Release Details

Cluster-to-Cluster Replication

Either streaming from your master or built from a backup, the new Cluster-to-Cluster Replication function in ClusterControl let’s you create a complete disaster recovery database system into another data center; which you can then easily failover to should something go wrong, during maintenance, or during a major outage.

In addition to disaster recovery, this function also allows you to create a copy of your database infrastructure (in just a couple of clicks) which you can use to test upgrades, patches, or to try some database performance enhancements.

You can also use this function to deploy an analytics or reporting setup, allowing you to separate your reporting load from your OLTP traffic.

Cluster to Cluster Replication

PostgreSQL User Management

You now have the ability to add or remove user access to your PostgreSQL setup. With the simple interface, you can specify specific permissions or restrictions at the individual level. It also provides a view of all defined users who have access to the database, with their respective permissions. For tips on best practices around PostgreSQL user management you can check out this blog.

MariaDB 10.4 / Galera Cluster 4.x Support

In an effort to boost performance for long running or large transactions, MariaDB & Codership have partnered to add Streaming Replication to the new MariaDB Cluster 10.4. This addition solves many challenges that this technology has previously experienced with these types of transactions. There are three new system tables added to the release to support this new function as well as new synchronisation functions.  You can read more about what’s included in this release here.

Deploy MariaDB Cluster 10.4
 

9 ClusterControl Features You Won't Find in Other Database Management Tools

$
0
0

Ensuring smooth operations of your production databases is not a trivial task, and there are a number of tools and utilities to help with the job. There are tools available for monitoring health, server performance, analyzing queries, deployments, managing failover, upgrades, and the list goes on. ClusterControl as a management and monitoring platform for your database infrastructure stands out with its ability to manage the full lifecycle from deployment to monitoring, ongoing management and scaling. 

Although ClusterControl offers important features like automatic database failover, encryption in-transit/at-rest, backup management, point-in-time recovery, Prometheus integration, database scaling, these can be found in other enterprise management/monitoring tools on the market. However, there are some features that you won’t find that easily. In this blog post, we’ll present 9 features that you won't find in any other management and monitoring tools on the market (as the time of this writing). 

Backup Verification

Any backup is literally not a backup until you know it can be recovered - by really verifying that it can be recovered. ClusterControl allows a backup to be verified after the backup has been taken by spinning a new server and testing restore. Verifying a backup is a critical process to make sure you meet your Recovery Point Objective (RPO) policy in the event of disaster recovery. The verification process will perform the restoration on a new standalone host (where ClusterControl will install necessary database packages before restoring) or on a server dedicated for backup verification.

To configure backup verification, simply select an existing backup and click on Restore. There will be an option to Restore and Verify:

Database Backup Verification

Then, simply specify the IP address of the server that you would want to restore and verify:

Database Backup Verification

Make sure the specified host is accessible via passwordless SSH beforehand. You also have a handful of options underneath for provisioning process. You can also shutdown the verification server after restoration to save costs and resources after the backup has been verified. ClusterControl will look for the restoration process exit code and observe the restore log to check whether the verification fails or succeeds.

Simplifying ProxySQL Management Through a GUI

Many would agree that having a graphical user interface is more efficient and less prone to human error when configuring a system. ProxySQL is a part of the critical database layer (although it sits on top of it) and must be visible enough to DBA's eyes to spot common problems and issues. ClusterControl provides a comprehensive graphical user interface for ProxySQL.

ProxySQL instances can be deployed on fresh hosts, or existing ones can be imported into ClusterControl. ClusterControl can configure ProxySQL to be integrated with a virtual IP address (provided by Keepalived) for single endpoint access to the database servers. It also provides monitoring insight to the key ProxySQL components like Queries Backend, Slow Queries, Top Queries, Query Hits, and a bunch of other monitoring stats. The following is a screenshot showing how to add a new query rule:

ProxySQL Management GUI

If you were adding a very complex query rule, you would be more comfortable doing it via the graphical user interface. Every field has a tooltip to assist you when filling in the Query Rule form. When adding or modifying any ProxySQL configuration, ClusterControl will make sure the changes are made to runtime, and saved onto disk for persistency.

ClusterControl 1.7.4 now supports both ProxySQL 1.x and ProxySQL 2.x.

Operational Reports

Operational Reports are a set of summary reports of your database infrastructure that can be generated on-the-fly or can be scheduled to be sent to different recipients. These reports consist of different checks and address various day-to-day DBA tasks. The idea behind ClusterControl operational reporting is to put all of the most relevant data into a single document which can be quickly analyzed in order to get a clear understanding of the status of the databases and its processes.

With ClusterControl you can schedule cross-cluster environment reports like Daily System Report, Package Upgrade Report, Schema Change Report as well as Backups and Availability. These reports will help you to keep your environment secure and operational. You will also see recommendations on how to fix gaps. Reports can be addressed to SysOps, DevOps or even managers who would like to get regular status updates about a given system’s health. 

The following is a sample of daily operational report sent to your mailbox in regards to availability:

Database Operational Report

We have covered this in detail in this blog post, An Overview of Database Operational Reporting in ClusterControl.

Resync a Slave via Backup

ClusterControl allows staging a slave (whether a new slave or a broken slave) via the latest full or incremental backup. It doesn't sound very exciting, but this feature is huge if you have large datasets of 100GB and above. Common practice when resyncing a slave is to stream a backup of the current master which will take some time depending on the database size. This will add an additional burden to the master, which may jeopardize the performance of the master.

To resync a slave via backup, pick the slave node under Nodes page and go to Node Actions -> Rebuild Replication Slave -> Rebuild from a backup. Only PITR-compatible backup will be listed in the dropdown:

Rebuild Database Replication Slave

Resyncing a slave from a backup will not bring any additional overhead to the master, where ClusterControl extracts and streams the backup from the backup storage location into the slave and eventually configures the replication link between the slave to the master. The slave will later catch up with the master once the replication link is established. The master is untouched during the whole process, and you can monitor the whole progress under Activity -> Jobs. 

Bootstrap a Galera Cluster

Galera Cluster is a very popular when implementing high availability for MySQL or MariaDB, but the wrong management commands can lead to disastrous consequences. Take a look at this blog post on how to bootstrap a Galera Cluster under different conditions. This illustrates that bootstrapping a Galera Cluster has many variables and must be performed with extreme care. Otherwise, you may lose data or cause a split-brain. ClusterControl understands the database topology and knows exactly what to do in order to bootstrap a database cluster properly. To bootstrap a cluster via ClusterControl, click on the Cluster Actions -> Bootstrap Cluster:

Bootstrap a Galera Cluster

You will have the option to let ClusterControl pick the right bootstrap node automatically, or perform an initial bootstrap where you pick one of the database nodes from the list to become the reference node and wipe out the MySQL datadir on the joiner nodes to force SST from the bootstrapped node. If bootstrapping process fails, ClusterControl will pull the MySQL error log.

If you would like to perform a manual bootstrap, you can also use "Find Most Advanced Node" feature and perform the cluster bootstrap operation on the most advanced node reported by ClusterControl.

Centralized Configuration and Logging

ClusterControl pulls a number of important configuration and logging files and displays them in a tree structure within ClusterControl. A centralized view of these files is key to efficiently understanding and troubleshooting distributed database setups. The traditional way of tailing/grepping these files is long gone with ClusterControl. The following screenshot shows ClusterControl's configuration file manager which listed out all related configuration files for this cluster in one single view (with syntax highlighting, of course):

Centralized Database Configuration and Logging

ClusterControl eliminates the repetitiveness when changing a configuration option of a database cluster. Changing a configuration option on multiple nodes can be performed via a single interface and will be applied to the database node accordingly. When you click on "Change/Set Parameter", you can select the database instances that you would want to change and specify the configuration group, parameter and value:

Centralized Database Configuration and Logging

You can add a new parameter into the configuration file or modify an existing parameter. The parameter will be applied to the chosen database nodes' runtime and into the configuration file if the option passes the variable validation process. Some variable might require a server restart, which will then be advised by ClusterControl.

Database Cluster Cloning

With ClusterControl, you can quickly clone an existing MySQL Galera Cluster so you have an exact copy of the dataset on the other cluster. ClusterControl performs the cloning operation online, without any locking or bringing downtime to the existing cluster. It's like a cluster scale out operation except both clusters are independent from each other after the syncing completes. The cloned cluster does not necessarily need to be as the same cluster size as the existing one. We could start with a “one-node cluster”, and scale it out with more database nodes at a later stage.

Database Cluster Cloning

Another similar feature offered by ClusterControl is "Create Cluster from Backup". This feature was introduced in ClusterControl 1.7.1, specifically for Galera Cluster and PostgreSQL clusters where one can create a new cluster from the existing backup. Contrary to cluster cloning, this operation does not bring additional load to the source cluster with the tradeoff that the cloned cluster will not be in the same state as the source cluster.

We have covered this topic in detail in this blog post, How to Create a Clone of Your MySQL or PostgreSQL Database Cluster.

Restore Physical Backup

Most database management tools allow backing up a database, and only a handful of them support database restoration of logical backup only. ClusterControl supports full restoration not only for logical backups, but also physical backups, whether it is a full or incremental backup. Restoring a physical backup requires a number of critical steps (especially incremental backups) which basically involves preparing a backup, copying the prepared data into the data directory, assigning correct permission/ownership and starting up the node in a correct order to maintain data consistency across all members in the cluster. ClusterControl performs all of these operations automatically.

You can also restore a physical backup onto another node that is not part of a cluster. In ClusterControl, the option for this is called "Create Cluster from Backup". You can start with a “one-node cluster” to test out the restoration process on another server or to copy out your database cluster to another location.

ClusterControl also supports restoring an external backup, a backup that has been taken not through ClusterControl. You just need to upload the backup to ClusterControl server and specify the physical path to the backup file when restoring. ClusterControl will take care the rest.

Cluster-to-Cluster Replication

This is a new feature introduced in ClusterControl 1.7.4. ClusterControl can now handle and monitor cluster-cluster replication, which basically extends the asynchronous database replication between multiple cluster sets in multiple geographical locations. A cluster can be set as a master cluster (active cluster which processes reads/writes) and the slave cluster can be set as a read-only cluster (standby cluster which can also processes reads). ClusterControl supports asynchronous cluster-cluster replication for Galera Cluster (binary log must be enabled) and also master-slave replication for PostgreSQL Streaming Replication. 

To create a new cluster the replicates from another cluster, go to Cluster Actions -> Create Slave Cluster:

Cluster-to-Cluster Replication

The result of the above deployment is presented clearly on the Database Cluster List dashboard:

Cluster-to-Cluster Replication

The slave cluster is automatically configured as read-only, replicating from the primary cluster and acting as a standby cluster. If disaster strikes the primary cluster and you want to activate the secondary site, simply pick the "Disable Readonly"  menu available under the Nodes -> Node Actions dropdown to promote it as an active cluster.

Avoiding Database Vendor Lock-In for MySQL or MariaDB

$
0
0

Vendor lock-in is defined as "Proprietary lock-in or customer lock-in, which makes a customer dependent on a vendor for their products and services; unable to use another vendor without substantial cost" (wikipedia). Undeniably for many software companies that would be the desired business model. But is it good for their customers?

Proprietary databases have great support for migrations from other popular database software solutions. However, that would just cause another vendor lock-in. Is it then open source a solution? 

Due to limitations that open source had years back many chosen expensive database solutions. Unfortunately, for many open-source was not an option.

In fact over the years, the open-source database has earned Enterprise support and maturity to run critical and complex data transaction systems. 

With the new version database like Percona and MariaDB has added some great new features, either compatibility or enterprise necessitates like 24/7 support, security, auditing, clustering, online backup or fast restore. All that made the migration process more accessible than ever before.

Migration may be a wise move however it comes with the risk. Whether you're planning to migrate from proprietary to open support migration manually or with the help of a commercial tool to automate the entire migration process, you need to know all the possible bottlenecks and methods involved in the process and the validation of the results.

Changing the database system is also an excellent time to consider further vendor lock-in risks. During the migration process, you may think about how to avoid to be locked with some technology.  In this article, we are going to focus on some leading aspects of vendor lock-in of MySQL and MariaDB.

Avoiding Lock-in for Database Monitoring

Users of open source databases often have to use a mixture of tools and homegrown scripts to monitor their production database environments. However, even while having its own homegrown scripts in the solution, it’s hard to maintain it and keep up with new database features. 

Hopefully, there are many interesting free monitoring tools for MySQL/MariaDB.  The most DBA recommended free tools are PMM, Zabbix, ClusterControl Community Edition, Nagios MySQL plugin. Although PMM and ClusterControl are dedicated database sollutions.

Percona Monitoring and Management (PMM) is a fully open-source solution for managing MySQL platform performance and tuning query performance. PMM is an on-premises solution that retains all of your performance and query data inside the confines of your environment. You can find the PMM demo under the below link.

PMM by Percona

Traditional server monitoring tools are not built for modern distributed database architectures. Most production databases today run in some high availability setup - from more straightforward master-slave replication to multi-master clusters fronted by redundant load balancers. Operations teams deal with dozens, often hundreds of services that make up the database environment.

Free Database Monitoring from ClusterControl

Having multiple database systems means your organization will become more agile on the development side and allows the choice to the developers, but it also imposes additional knowledge on the operations side. Extending your infrastructure from only MySQL to deploying other storage backends like MongoDB and PostgreSQL, implies you also have to monitor, manage, and scale them. As every storage backend excels at different use cases, this also means you have to reinvent the wheel for every one of them.

ClusterControl: Replication Setup

ClusterControl was designed to address modern, highly distributed database setups based on replication or clustering. It shows the status of the entire cluster solution however it can be greatly used for a single instance. ClusterControl will show you many advanced metrics however you can also find there build in advisors that will help you to understand them. You can find the ClusterControl demo under the below link.

ClusterControl: Advisors

Avoiding Lock-in for Database Backup Solutions

There are multiple ways to take backups, but which method fits your specific needs? How do I implement point in time recovery?

If you are migrating from Oracle or SQL Server we would like to recommend you xtrabackup tool from Percona or similar mariabackup from Mark.

Percona XtraBackup is the most popular, open-source, MySQL/MariaDB hot backup software that performs non-blocking backups for InnoDB and XtraDB databases. It falls into the physical backup category, which consists of exact copies of the MySQL data directory and files underneath it.

XtraBackup does not lock your database during the backup process. For large databases (100+ GB), it provides much better restoration time as compared to mysqldump. The restoration process involves preparing MySQL data from the backup files, before replacing or switching it with the current data directory on the target node.

Avoiding Lock-in for Database High Availability and Scalability

It is said that if you are not designing for failure, then you are heading for a crash. How do you create a database system from the ground up to withstand failure? This can be a challenge as failures happen in many different ways, sometimes in ways that would be hard to imagine. It is a consequence of the complexity of today's database environments.

Clustering is an expensive feature of databases like Oracle and SQL Server. It requires extra licenses. 

Galera Cluster is a mainstream option for high availability MySQL and MariaDB. And though it has established itself as a credible replacement for traditional MySQL master-slave architectures, it is not a drop-in replacement.

Galera Cluster is a synchronous active-active database clustering technology for MySQL and MariaDB. Galera Cluster differs from what is known as Oracle’s MySQL Cluster - NDB. MariaDB cluster is based on the multi-master replication plugin provided by Codership.

While the Galera Cluster has some characteristics that make it unsuitable for specific use cases, most applications can still be adapted to run on it.

The benefits are clear: multi-master InnoDB setup with built-in failover and read scalability.

Avoiding Lock-in for Database Load Balancing

Proxies are building blocks of high availability setups for MySQL. They can detect failed nodes and route queries to hosts that are still available. If your master failed and you had to promote one of your slaves, proxies will detect such topology changes and route your traffic accordingly.

More advanced proxies can do much more, such as route traffic based on precise query rules, cache queries, or mirror them. They can be even used to implement different types of sharding.

The most useful ones are ProxySQL, HAproxy, MaxScale (limited free usage).

ClusterControl: ProxySQL Load Balancing

Avoiding Lock-in When Migrating to the Cloud

In the last ten years, many businesses have moved to cloud-based technology to avoid the budgetary limitations for data centers and agile software development. Utilizing the cloud enables your company and applications to profit from the cost-savings and versatility that originate with cloud computing.

While cloud solutions offer companies many benefits, it still introduces some risks. For example, vendor lock-in is as high in the cloud as it was in the data center.

As more companies run their workloads in the cloud, cloud database services are increasingly being used to manage data. One of the advantages of using a cloud database service instead of maintaining your database is that it reduces the management overhead. Database services from the leading cloud vendors share many similarities, but they have individual characteristics that may make them well-, or ill-suited to your workload.

ClusterControl: Deploy various database systems in the cloud

The Database Hosting Hybrid Model

As more enterprises are moving to the cloud, the hybrid model is actually becoming more popular. The hybrid model is seen as a safe model for many businesses. 

In fact, it's challenging to do a heart transplant and port everything over immediately. Many companies are doing a slow migration that usually takes a year or even maybe forever until everything is migrated. The move should be made in an acceptable peace.

The hybrid model will not only allow you to build a highly available scalable system but due to its nature is a great first step to avoid lock-in. By architecture design, your systems will work in a kind of mixed mode.

An example of such architectures could be a cluster that operates in house data center and it’s copy located in the cloud. 

ClusterControl: Cluster to Cluster Replication

Conclusion

Migrating from a proprietary database to open source can come with several benefits: lower cost of ownership, access to and use of an open-source database engine, tight integration with the web. Open source has many to offer and due its nature is a great option to avoid vendor lock-in.

 

Best Practices for Archiving Your Database in the Cloud

$
0
0

With the technology available today there is no excuse for failing to recover your data due to lack of backup policies or understanding of how vital it is to take backups as part of your daily, weekly, or monthly routine. Database backups must be taken on a regular basis as part of your overall disaster recovery strategy. 

The technology for handling backups has never been more efficient and many best practices have been adopted (or bundled) as part of a certain database technology or service that offers it.

To some extent, people still don’t understand how to store data backups efficiently, nor do they understand the difference between data backups versus archived data. 

Archiving your data provides many benefits, especially in terms of efficiency such as storage costs, optimizing data retrieval, data facility expenses, or payroll for skilled people to maintain your backup storage and its underlying hardware. In this blog, we'll look at the best practices for archiving your data in the cloud.

Data Backups vs Data Archives

For some folks in the data tech industry, these topics are often confusing, especially for newcomers.

Data backups are backups that are taken from your physical and raw data to be stored locally or offsite which can be accessed in case of emergency or data recovery. It is used to restore data in case it is lost, corrupted or destroyed. 

Data archived, on the other hand, are data (or can still be a backup data) but are no longer used or less critical to your business needs such as stagnant data, yet it's still not obsolete and has value on it. This means that data that is to be stored is still important but that doesn’t need to be accessed or modified frequently (if at all).  Its purpose can be among these:

  • reduce its primary consumption so it can be stored on a low-performant machines since data stored on it doesn't mean it has to be retrieved everyday or immediately.
  • Retain cost-efficiency on maintaining your data infrastructure
  • Worry-less for an overgrowing data especially those data that are old or data that are infrequently changed from time-to-time.
  • Avoid large expenses when maintaining backup appliances or software that are integrated into the backup system.
  • As a requirement to meet regulatory standards like HIPAA, PCI-DSS or GDPR to store legacy data or data that they are required to keep

While for databases, it has a very promising benefits which are,

  • it helps reduce data complexity especially when data grows drastically but archiving your data helps maintain the size of your data set.
  • It helps your daily, weekly, or monthly data backups perform optimally because it has less data since it doesn't need to include processing the old or un-useful data. It's un-useful since it's not a useless data but it's just un-useful for daily or frequent needs.
  • It helps your queries perform efficiently and optimization results can be consistent at times since it doesn't require to scan large and old data.
  • data storage space can be managed and controlled accordingly based on its data retention and policy.

Data archived facility is not necessarily has to be the same power and resources as the data backups storage have. Tape drives, magnetic disk, or optical drives can be used for data archiving opposes. While it's  purpose of storing the data means its infrequently accessed or shall be accessed not very soon but still can be accessible when it's needed.

Additionally, people involved in data archival requires to identify what archived data means. Data archives are those data that are not reproducible or data that can be re-generated or self-generated. If the data stored in the database are records that can be a result of a mathematical determinants or calculation that are predictably reproducible, then these can be re-generated if needed. This can be excluded for your data archival purposes.

Data Retention Standards

It's true that pruning your data records stored in your database and moving it to your archives has some great benefits. It doesn't mean, however, that you are free to do this as it depends on your business requirements. In fact, different countries have laws that require you to follow (or at least implement) based on the regulation. You will need to determine what archived data mean to your business application or what data are infrequently accessed. 

For example, Healthcare providers are commonly required (depending on its country of origin) to retain patient's information for long periods of time. While in Finance, the rules depend on the specific country. What data you need to retain should be verified so you can safely prune it for archival purposes and then store it in a safe, secure place.

The Data Life-Cycle

Data backups and data archives are usually taken alongside through a backup life-cycle process. This life-cycle process has to be defined within your backup policy. Most backup policies have to undergo the process as listed below...

  • it has the process defined on which it has to be taken (daily, weekly, monthly),
  • if it has to be a full backup or an incremental backup,
  • the format of the backup if it has to be compressed or stored in an archived file format, 
  • if the data has to be encrypted or not, 
  • its designated location to store the backup (locally stored on the same machine or over the local network), 
  • its secondary location to store the backup (cloud storage, or in a collo), 
  • and it's data retention on how old your data can be present until its end-of-life or destroyed. 

What Applications Need Data Archiving?

While everyone can enjoy the benefits of data archiving, there are certain fields that regularly practice this process for managing and maintaining their data. 

Government institutions fall into this criteria. Security and public safety (such as video surveillance, threats to personal, residential, social, and business safety) require that this information be retained. This type of data must be stored securely for years to come for forensic and investigative purposes.

Digital Media companies often have to store large amounts of content of their data and these files are often very large in size. Digital Libraries also has to store tons of data for research or information for public use. 

Healthcare providers, including insurance, are required to retain large amounts of information on their patients' for many years. Certainly, data can grow quickly and it can affect the efficiency of the database when it's not maintained properly. 

Cloud Storage Options For Your Archived Data

The oop cloud companies are actively competing to get you great features to store your archived data in the cloud. It starts with a low cost price and offers flexibility to access your data off-site. Cloud storage is a useful and reliable off-site data storage for data backups and data archiving purposes, especially because it's very cost efficient. You don't need to maintain large amounts of data. No need to maintain your hardware and storage services in your local site or primary site. It's less expensive, as well, in handling electricity billings. 

These points are important as you might not need to access your archived date in real-time. On certain occasions, especially when a recovery or investigation has to be done, you might require access to your data abruptly. For some businesses, they offer their customers the ability to access their old data, but you have to wait for hours or days before they can provide the access to download the archived data.

For example, in AWS, they have AWS S3 Glacier which offers a great flexibility. In fact, you can store your data via S3, setup a life-cycle policy and define the end of your data when it will be destroyed. Check out the documentation on How Do I Create a Lifecycle Policy for an S3 Bucket?. The great thing with AWS S3 Glacier is that, it is highly flexible. See their waterfall model below,

Image Courtesy of Amazon's Documentation "Transitioning Objects Using Amazon S3 Lifecycle".

At this level, you can store your backups to S3 and let the life-cycle process defined in that bucket handle the data archival purposes. 

If you're using GCP (Google Cloud Platform), they also offer similar approach. Check out their documentation about Object Lifecycle Management. GCP uses the TTL (or Time-to-Live) approach for retaining objects stored in their Cloud Storage. The great thing with the GCP offering is that they have Archival Cloud Storage which offers Nearline and Coldline storage types. 

Coldline is ideal for data that are infrequently modified or access in a year. Where as with the Nearline storage type, it's more frequent (a monthly rate or at least modified once a month) but possibly multiple times throughout the year. Your data stored in a life-cycle basis can be accessed in a sub-second and that could be fast.

With Microsoft Azure, its offerings are plain and simple. They offer the same thing as GCP and AWS does and it offers you to move your archived data into hot or cool tiers. You maybe able to prioritize your requested archived data when needed to the hot or cool tiers but comes with a price compared to a standard request. Checkout their documentation on Rehydrate blob data from the archive tier.

Overall, this provides hassle free when storing your archived data to the cloud. You may need to define your requirements and of course cost involved when determining which cloud would you need to avail.

Best Practices for Your Archived Data in the Cloud

Since we have tackled the differences of data backups and archived data (or data archives), and some of the top cloud vendor offerings, let's take a list of what's the best practices you must have when storing to the cloud.

  • Identify the type of data to be archived. As stated earlier, data backups is not data archived but your data backups can be a data archived. However, data archives are those data that are stagnant, old data, and has infrequently accessed. You need to identify first what are these, mark a tag or add a label to these archived data so you would be able to identify it when stored off-site.
  • Determine Data Access Frequency. Before everything else has to be archived,  you need to identify how frequently will you be going to access the archived data when needed. Certain price can differ on the time you have to access data. For example, Amazon S3 will charge higher if you avail for Expedite Retrieval using Provisioned instead of On-Demand, same thing with Microsoft Azure when you rehydrate archived data with a higher priority.
  • Ensure Multiple Copies Are Spread. Yes, you read it correctly. Even it's archived data or stagnant data, you still need to ensure that your copies are highly available and highly durable when needed. The cloud vendors we have mentioned earlier offers SLA's that will give you an overview of how they store the data for efficiency and faster accessibility. In fact, when configuring your life-cycle policy/backup policy, ensure that you are able to store it in multiple regions or replicate your archived data into a different region. Most of these tech-giant cloud vendors stores their archival cloud storage offerings with multiple zones to offer highly scalable and durable in times of data retrieval is requested.
  • Data Compliance. Ensure that data compliance and regulations are followed accordingly and make it happen during initial phase and not later. Unless the data doesn't affect customer's profile and are just business logic data and history, it might be harmless when it's destroyed but it's better to make things in accord.
  • Provider standards. Choose the right cloud backup and data-retention provider. Walking the path of online data archiving and backup with an experienced service provider could save you from unrecoverable data loss. The top 3 tech-giants of the cloud can be your top choice. But you're free to choose as well promising cloud vendors out there such as Alibaba, IBM or Oracle Archive Storage. It can be best to try it out before making your final decision.

Data Archiving Tools and Software

Database using MariaDB, MySQL, or Percona Server can benefit with using pt-archiver. pt-archiver has been widely used for almost a decade and allows you to prune your data while doing archiving as well. For example, the command below to remove orphan records can be done as,

pt-archiver --source h=host,D=db,t=child --purge \

  --where 'NOT EXISTS(SELECT * FROM parent WHERE col=child.col)'

or send the rows to a different host such as OLAP server,

pt-archiver --source h=oltp_server,D=test,t=tbl --dest h=olap_server \

  --file '/var/log/archive/%Y-%m-%d-%D.%t'                           \

  --where "1=1" --limit 1000 --commit-each

For PostgreSQL or TimescaleDB, you can try and use the CTE (Common Table Expressions) to achieve this. For example,

CREATE TABLE public.user_info_new (LIKE public.user_info INCLUDING ALL);



ALTER TABLE public.user_info_new OWNER TO sysadmin;



GRANT select ON public.user_info_new TO read_only

GRANT select, insert, update, delete ON public.user_info TO user1;

GRANT all ON public.user_info TO admin;



ALTER TABLE public.user_info INHERIT public.user_info_new;



BEGIN;

LOCK TABLE public.user_info IN ACCESS EXCLUSIVE MODE;

LOCK TABLE public.user_info_new IN ACCESS EXCLUSIVE MODE;

ALTER TABLE public.user_info RENAME TO user_info_old;

ALTER TABLE public.user_info_new RENAME TO user_info;



COMMIT;  (or ROLLBACK; if there's a problem)

Then do a,

WITH row_batch AS (

    SELECT id FROM public.user_info_old WHERE updated_at >= '2016-10-18 00:00:00'::timestamp LIMIT 20000 ),

delete_rows AS (

    DELETE FROM public.user_info_old u USING row_batch b WHERE b.id = o.id RETURNING o.id, account_id, created_at, updated_at, resource_id, notifier_id, notifier_type)

INSERT INTO public.user_info SELECT * FROM delete_rows;

Using CTE with Postgres might incur performance issues. You might have to run this during non-peak hours. See this external blog to be careful on using CTE with PostgreSQL.

For MongoDB, you can try and use mongodump with the --archive parameters just like below,

mongodump --archive=test.$(date +"%Y_%m_%d").archive --db=test

this will dump an archive file namely test.<current-date>.archive

Using ClusterControl for Data Archival

ClusterControl allows you to set a backup policy and upload data off-site to your desired cloud storage location. ClusterControl supports the Top three clouds (AWS, GCP, and Microsoft Azure). Please checkout our previous blog on Best Practices for Database Backups to learn more.

With ClusterControl you can take a backup by first defining the backup policy, choose the database, and archive the table just like below...

Make sure that the "Upload Backup to the cloud" is enabled or checked just like above. Define the backup settings and set retention,

Then define the cloud settings just like below.

For the selected bucket, ensure that you have setup lifecycle management, and in this scenario, we're using AWS S3. In order to setup the lifecycle rule, you just have to select the bucket, then go to the Management tab just like below,

then setup the lifecycle rules as follows,

then ensure its transitions,

In the example above, we're ensuring the transition will go to Amazon S3 Glacier, which is our best choice to retain archived data.

Once you are done setting up, you're good-to-go to take the backup. Your archived data will follow the lifecycle you have setup within AWS for this example. If you use GCP or Microsoft Azure, it's just the same process where you have to set the backup along with its lifecycle.

Conclusion

Adopting the best practices for archiving your data into the cloud can be cumbersome at the beginning, however, if you have the right set of tools or bundled software, it will make your life easier to implement the process.

 
Viewing all 385 articles
Browse latest View live