Given a precious pool called tank
and a secondary pool called sink
:
zelta backup tank sink/Backups/tank
The above command will create a dated snapshot of the pool tank
and replicate it recursively to sink/Backups/tank
. To perform this action every night at midnight, put the same command in cron via crontab -e
:
@daily zelta backup tank sink/Backups/tank
You can also make a remote backup via ssh
by defining an endpoint and dataset in the format [user@]host:target/dataset
, For example:
zelta backup tank backup@remote.host:sink/Backups/tank`
We frequently use zelta match
to identify if a replica is up to date. To confirm the backup in the first example is complete, you can run the following:
% zelta backup tank sink/Backups/tank
target has latest source snapshot: @2024-01-26_14.38.15
target has latest source snapshot: @/ROOT2024-01-26_14.38.15
...
If you see target has latest source snapshot
, everything is up to date. Otherwise, zelta match
's output can help you evaluate your replica's status and decide what to do next. Some responses are:
guid mismatch on: @...
: Snapshots were taken on both the source and target and must be resolved before replicating. Consider making a new replica with zelta backup
or carefully use zfs rollback
, and check your snapshot policy on the source and target.need snapshot for source: ...
: Your dataset tree is "Swiss cheesed" with missing snapshots so it cannot be replicated safely with zelta sync
's default mode. But using zelta backup
or zelta sync -S
will create a snapshot for you and perform the replication.no source snapshots found
or invalid target
: These messages often appear when entering the source endpoint/dataset name incorrectly.Zelta does not assume that backups will be writeable. For recovery, always first consider retrieving backup files in the .zfs/snapshot
directory on your source/primary replica as a potentially safe and efficient recovery option. However, depending on security concerns, disaster recovery, and DR testing scenarios, it will be most necessary to work with a writeable temporary clone of the backup, which consumes almost no additional space on the pool. Use the zelta clone
command to create this environment.
zelta clone sink/Backups/tank sink/temp/cloned-tank`
Consider naming the clones clearly to remind you that they can be safely deleted. When done with the clones, you can destroy them with a privileged user without affecting your backup:
zfs destroy -r sink/temp/cloned-tank`
zfs send -R
Although you will likely find zelta sync -I
or zelta backup
more complete and useful, using zfs send -R
for both new and updated streams can be much faster because fewer snapshots need to be listed before computing the replication stream. If you're confident you're always using recursive snapshots consistently, use:
zelta sync -R -d1 tank sink/Backups/tank
You can also instruct Zelta sync to only create a new snapshot if new data has been written to the source dataset:
zelta sync -sRd1 tank sink/Backups/tank
This example describes the process of performing moving a VM between twin hosts. Zelta reduces the number of commands needed versus using ssh
and additional zfs
commands manually and will help you avoid common pitfalls in a failover/failback process. This example is expanded upon below in the Automatic Instance Migration
use case example. Once set up, the failover/failback requires 5 commands to perform safely that can be easily be added to part of your vm/instance control scripts. *To-do: A soon-to-be-added zelta sync -s
switch will snapshot a pool only if it has been written to since its last snapshot, which will provide an additional safety check before starting a host.`
In this example, we have a pair of twin hosts, host1
and host2
, with similar configurations:
host1
has a pool called ssd1
currently running a VM called win11.belltower.atlantis2
.host2
has a pool called ssd2
, with a hypervisor configuration which can also load a VM beneath this dataset.lan
is available on both systems linked to the same physical switch or VLAN.twin
is on both systems with ssh access between the two systems and zfs allow
permissions for send,snapshot,hold,receive,mount,create,snapshot,readonly
and any local permission set on VM dataset and sub dataset.twin
on host1
.vm stop
and vm start
as the names of our control scripts which start and stop the hypervisor process.Review the list above and be sure to double check ssh
access for the twin
user (preferably via key, without a password) and critical zfs allow
permissions are set. You likely will need compress,volmode,recordsize
or other permissions to replicate a variety of VMs. No destructive permissions are required for the proposed workflow.
To minimize downtime perform the initial sync while our VM is still running on host1
. We'll use the zelta sync -S
flag to make a snapshot and replicate it to host2
.
myuser% sudo su - twin
twin% zelta sync -Sm ssd1/vm/win11.belltower.atlantis2 host2:ssd2/vm/win11.belltower.atlantis2
snapshot created: ssd1/vm/win11.belltower.atlantis2@2024-01-26_09.43.08
36G sent, 1/1 streams received in 29.88 seconds
To reduce downtime as much as possible, we'll use the following workflow:
host1
to host2
.host1
.readonly
permissions the VM to prevent further writes. (At Bell Tower, we also unmount the VM.)host2
host2
.For simplicity, we will assume you are running as an account with full sudo
privileges and will run vm
and zfs
commands as root
.
On host1
:
sudo -u twin zelta sync -S ssd1/vm/win11.belltower.atlantis2 host2:ssd2/vm/win11.belltower.atlantis2
sudo vm stop win11.belltower.atlantis2
sudo zfs set readonly=on ssd1/vm/win11.belltower.atlantis2
sudo -u twin zelta sync -S ssd1/vm/win11.belltower.atlantis2 host2:ssd2/vm/win11.belltower.atlantis2
On host2
:
sudo zfs set readonly=off ssd2/vm/win11.belltower.atlantis2
sudo zfs mount ssd1/vm/win11.belltower.atlantis2
sudo vm start win11.belltower.atlantis2
On host2
:
sudo vm stop win11.belltower.atlantis2
sudo zfs set readonly=on ssd2/vm/win11.belltower.atlantis2
On host1
:
sudo -u twin zelta sync -S host2:ssd2/vm/win11.belltower.atlantis2 ssd1/vm/win11.belltower.atlantis2
sudo zfs set readonly=off ssd1/vm/win11.belltower.atlantis2
sudo vm start win11.belltower.atlantis2
Using the readonly
flag or unmounting datasets before the final sync ensure that syncing can be performed between hosts without requiring a zfs rollback
. Consider adding VM control scripts including readonly
checks to ensure this is maintained in a disciplined way by your system administrators.
After performing the first sync, using zelta sync
or zelta policy
to continue frequent replication between hosts will further reduce the time and stress required to perform these actions. We recommend process be implemented in addition to, not a replacement for, your overall local and remote backup procedures.
Using zelta
's JSON output mode, it's easy to create various backup visualizations in Grafana. These can be helpful for getting an overview of smaller bottlenecks that would be otherwise difficult to monitor, and provide early warnings for hosts that are slow to list and need to be pruned. The following examples were creating using a subset of Bell Tower's production backups. To make this happen:
zelta -j
or the JSON: 1
sinkion in the policy.Here's some example Python code for importing the Zelta Policy output into a new or existing SQLite database.
import json
import sqlite3
import sys
def read_json_objects(file):
obj_str = ''
depth = 0
for line in file:
for ch in line:
if ch == '{':
depth += 1
if depth > 0:
obj_str += ch
if ch == '}':
depth -= 1
if depth == 0:
yield obj_str
obj_str = ''
# Connect to SQLite database
conn = sqlite3.connect(sys.argv[2])
cursor = conn.cursor()
# Create a table
cursor.execute('''CREATE TABLE IF NOT EXISTS data (json_data TEXT)''')
# Read and insert JSON data from file
with open(sys.argv[1], 'r') as file:
for json_obj_str in read_json_objects(file):
try:
json_object = json.loads(json_obj_str)
cursor.execute("INSERT INTO data (json_data) VALUES (?)", [json.dumps(json_object)])
except json.JSONDecodeError as e:
print(f"Error decoding JSON: {e}")
# Commit changes and close the connection
conn.commit()
conn.close()
Be sure to capture Time appropriately so you can zoom in and out based on time. Here's an example visualization of a BACKUP_ROOT
called rust pool/Backups
:
SELECT
json_extract(json_data, '$.startTime') as Time,
REPLACE(json_extract(json_data, '$.targetVolume'), 'rustpool/Backups/', '') as Volume,
json_extract(json_data, '$.replicationSize') as ""
FROM data
ORDER BY Time
And here's a visualization of the cumulative time consumed for each replication over the selected date range:
SELECT
REPLACE(json_extract(json_data, '$.targetVolume'), 'rustpool/Backups/', '') as Volume,
(sum(json_extract(json_data, '$.endTime')) - sum(json_extract(json_data, '$.startTime'))) as TotalTime
FROM data
WHERE
json_extract(json_data, '$.startTime') >= ${__from:date:seconds} AND
json_extract(json_data, '$.startTime') <= ${__to:date:seconds}
GROUP by Volume
ORDER by TotalTime DESC
This is a proof of concept for providing a warm failover scenario simply by setting the appropriate ZFS properties. Snapshots are only taken, and replications are performed, only on the host where the instance is running. By avoiding snapshots unless a written
property has been incremented, and keeping the readonly
property set for the offline replica, a "split brain" scenario is impossible.
A server called zelta-test-1 has 4 jails in tank/jail
. We want them to sync to zelta-test-2:tank/jail
bidirectionally, based on where the jail is started. This example is an attempt to provide the minimum code to do that.
LetsGo:
zelta-test-1:
- tank/jail: zelta-test-2:tank/jail
zelta-test-2:
- tank/jail: zelta-test-1:tank/jail
This script could be run on either of the servers, or a third "tie breaker" server that could handle exceptions.
INSTANCES="tank/jail/inst1 tank/jail/inst2 tank/jail/inst3 tank/jail/inst4"
RETENTION_POLICY='--2d'
TIME_FORMAT='date +%Y-%m-%d_%H.%M.%S'
SNAP=$($TIME_FORMAT)$RETENTION_POLICY
ssh zelta-test-1 "zfs list -Hpr -oname,written $INSTANCES |
awk '{if (\$2) print \$1}' | \
xargs -n1 -I% zfs snapshot %@$SNAP"
ssh zelta-test-2 "zfs list -Hpr -oname,written $INSTANCES |
awk '{if (\$2) print \$1}' | \
xargs -n1 -I% zfs snapshot %@$SNAP"
zelta
Run the sentinel
process frequently to keep the servers in sync to sinkimize for your expected RPO.
To safely cutover a VM or container, unmounting it after shutting down. Once unmounted or set to readonly, run your sentinel
manually to snapshot and sync one last time.
To cause a VM or container to sync, simply mount and start it. Any changes to the instance will increment the written
flag, causing the sentinel
to snapshot and replicate it.
On zelta-test-1, we've started inst1, inst2, and inst4.
We've started inst3, stopped it, and restarted it on zelta-test-2, and back again.
Zelta is able to thread the changes back and forth successfully, regardless of where the instances are running.
LetsGo
zelta-test-1
tank/jail: 234K: ✔ transferred in 1.78s
zelta-test-2
tank/jail: 32K: ✔ transferred in 0.57s
LetsGo
zelta-test-1
tank/jail: 238K: ✔ transferred in 1.82s
zelta-test-2
tank/jail: 78K: ✔ transferred in 0.65s
LetsGo
zelta-test-1
tank/jail: 234K: ✔ transferred in 1.7s
zelta-test-2
tank/jail: 78K: ✔ transferred in 0.63s
Ideally we need a sentinel process to do a few more things.
pgrep zfs || [run snapshot process]