Running tests with TPAexec

Gianni Ciolli
Gianni Ciolli

TPA (Trusted PostgreSQL Architecture) is a set of reference architectures. They are 2ndQuadrant's detailed recommendations on how to deploy Postgres, Postgres-BDR, and Postgres-XL in production.

TPA architectures are deployed using TPAexec; this article assumes that it has been correctly installed and tested. For customers who need to use TPAexec, this is usually done during an initial "TPAexec Walkthrough" screen sharing session.

This article shows how to prepare automated test files so that they can be placed in a TPAexec cluster directory and run with TPAexec.

This is particularly useful for developing new tests, or for maintaining customer-specific tests that must not become part of the TPAexec source code.

Requirements

You must use TPAexec version 7.6.2 or later.

Introduction

The tpaexec test command accepts the name of a test as an optional argument after the cluster name, as in the following example:

tpaexec test $ClusterDir mytest

where $ClusterDir is an already-provisioned, deployed and running TPAexec cluster.

If there is a file $ClusterDir/tests/mytest.yml then TPAexec will run it, and its output will be logged in the usual TPAexec way.

Test Files

The test file must be a valid Ansible playbook, which can use predefined TPAexec roles and tasks, as in this basic example:

- name: My development test playbook
any_errors_fatal: True
max_fail_percentage: 0
hosts: all
tasks:
- include_role: name=test tasks_from=prereqs.yml
tags: always
- name: Execute a test command
shell: "lsblk"
become: yes
become_user: root
register: testreg
- name: Write the output of the test command to a file
include_role: name=test tasks_from=output.yml
vars:
output_file: mytest.txt
content: |
{{ testreg.stdout }}
tags: always

This playbook is composed by a header, followed by a list of tasks, in this case three.

The first task includes predefined prereqs.yml tasks from the test TPAexec role:

tasks:
- include_role: name=test tasks_from=prereqs.yml
tags: always

These tasks carry out some initial actions, such as creating the subdirectories where the test will place its output files.

The second task executes a test command:

- name: Execute a test command
shell: "lsblk"
become: yes
become_user: root
register: testreg

Note that we change user to root, and we save the output of the command to a so-called Ansible register called testreg.

In the third and final task, the output collected in the testreg register is written to an output file:

- name: Write the output of the test command to a file
include_role: name=test tasks_from=output.yml
vars:
output_file: mytest.txt
content: |
{{ testreg.stdout }}
tags: always

Note that we only need to specify the file name, because we are using the predefined output.yml tasks from the test TPAexec role to write the file; these tasks place the file in a separate output directory for each test, with a subdirectory for each node.

Test Output

When running the test we can see the usual TPAexec output, which ends with a test recap and test timing information:

$ tpaexec test mycluster mytest

(...)

PLAY RECAP *********************************************************************
c1 : ok=34 changed=3 unreachable=0 failed=0 
c2 : ok=27 changed=3 unreachable=0 failed=0 
c3 : ok=27 changed=3 unreachable=0 failed=0 
cb : ok=22 changed=3 unreachable=0 failed=0 


real 0m10.238s
user 0m15.328s
sys 0m2.832s

The output displayed on screen is also copied to $ClusterDir/ansible.log so you do not need to redirect it to a file.

The output collected from each cluster node is placed in a directory

$ClusterDir/test/$Epoch

where $Epoch is the time when the test was started, in Unix epoch format. In our example the cluster has four nodes c1, c2, c3 and cb, so we find four output files:

$ find test -type f
test/1555594823/c3/mytest.txt
test/1555594823/c1/mytest.txt
test/1555594823/c2/mytest.txt
test/1555594823/cb/mytest.txt

We can verify that each file has the expected output:

$ cat test/1555594823/c1/mytest.txt
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sda 8:0 0 9.9G 0 disk 
├─sda1 8:1 0 8.9G 0 part /
├─sda2 8:2 0 1K 0 part 
└─sda5 8:5 0 1022M 0 part [SWAP]

Destructive Tests

By default, a test should not write to the cluster, or perform any activity that could create problems to the cluster.

However, some tests need to do exactly that, in order to verify the property they are supposed to test.

So you should ask yourself: "can I run this test on production?"

If the answer is "no", then you must mark the test as destructive by adding an appropriate variable to the task that includes prerequisites, as in the following example:

- include_role: name=test tasks_from=prereqs.yml
vars:
destructive: yes
tags: always

This variable is designed to protect users from accidentally running the wrong test on a production cluster.

This is how it works. If the test is marked as destructive, TPAexec will refuse to run it unless you add the command line option --destroy-this-cluster.

In this example we try to run a destructive test without adding --destroy-this-cluster, and TPAexec stops us:

$ tpaexec test mycluster mytest

(...)

TASK [test : Check if destructive tests should be run] *************************
fatal: [cb]: FAILED! => {
"assertion": "destroy_cluster|default(False)", 
"changed": false, 
"evaluated_to": false, 
"msg": "You must specify --destroy-this-cluster to run destructive tests"
}
NO MORE HOSTS LEFT *************************************************************

NO MORE HOSTS LEFT *************************************************************

PLAY RECAP *********************************************************************
c1 : ok=31 changed=1 unreachable=0 failed=0 
c2 : ok=24 changed=1 unreachable=0 failed=0 
c3 : ok=24 changed=1 unreachable=0 failed=0 
cb : ok=19 changed=1 unreachable=0 failed=1 


real 0m8.793s
user 0m13.160s
sys 0m2.388s

After we add the --destroy-this-cluster option, the test is carried out normally:

$ tpaexec test mycluster mytest --destroy-this-cluster

(...)

PLAY RECAP *********************************************************************
c1 : ok=40 changed=5 unreachable=0 failed=0 
c2 : ok=32 changed=5 unreachable=0 failed=0 
c3 : ok=32 changed=5 unreachable=0 failed=0 
cb : ok=27 changed=5 unreachable=0 failed=0 


real 0m11.978s
user 0m16.832s
sys 0m3.260s

Using Vagrant Snapshots

One of the problem of destructive tests is that they destroy the cluster (no big surprise). This inconvenience can be mitigated with snapshots, if the platform support them to a sufficient extent.

The example described in this section uses the vagrant platform on VirtualBox.

When taking a snapshot of a cluster, you need to take separate snapshots of each node. To avoid inconsistencies while at the same time keeping things simple, it is sufficient to shutdown all nodes and then take a snapshot of each node:

vagrant halt
vagrant snapshot save -f node1 snapshot_node1
vagrant snapshot save -f node2 snapshot_node2
vagrant snapshot save -f node3 snapshot_node3

At this point we have taken three single-node snapshots.

If we have any "state query" (i.e. a query that displays the state of the cluster), we can run it now and remember its output for later.

We can now perform any destructive action, and verify that the output of the state query has changed.

Then we can restore the snapshots with the following commands:

vagrant halt
vagrant snapshot restore --no-provision node1 snapshot_node1
vagrant snapshot restore --no-provision node2 snapshot_node2
vagrant snapshot restore --no-provision node3 snapshot_node3
vagrant up

First, we shut down all the nodes, to avoid problems like the post-restore version of node 1 talking with the pre-restore version of node 3. Then we restore each node snapshot, specifying each target node, and finally we start the restored instances.

At this point, the state query should return the initial state, proving that we have restored the state of the cluster that was previously saved.

Was this article helpful?

0 out of 0 found this helpful