mirror of
https://github.com/lephisto/crossover.git
synced 2025-12-06 04:09:20 +01:00
373 lines
20 KiB
Markdown
373 lines
20 KiB
Markdown
# crossover
|
||
|
||
[](https://www.gnu.org/licenses/gpl-3.0.en.html)
|
||
|
||
Cross-Pool (live) Replication and near-live migration for Proxmox VE
|
||
|
||
|
||
```text
|
||
______
|
||
| |___ ___ ___ ___ ___ _ _ ___ ___
|
||
| --| _| . |_ -|_ -| . | | | -_| _|
|
||
|_____|_| |___|___|___|___|\_/|___|_|
|
||
|
||
Cross Pool (live) replication and near-live migration for Proxmox VE
|
||
|
||
Usage:
|
||
crossover <COMMAND> [ARGS] [OPTIONS]
|
||
crossover help
|
||
crossover version
|
||
|
||
crossover mirror --vmid=<string> --destination=<destionationhost> --pool=<targetpool> --keeplocal=[n][d|s] --keepremote=[n][d|s]
|
||
Commands:
|
||
version Show version program
|
||
help Show help program
|
||
mirror Replicate a stopped VM to another Cluster (full clone)
|
||
|
||
Options:
|
||
--vmid The source+target ID of the VM, comma separated (eg. --vmid=100:100,101:101)
|
||
(The possibility to specify a different Target VMID is to not interfere with VMIDs on the
|
||
target cluster, or mark mirrored VMs on the destination)
|
||
--prefixid Prefix for VMID's on target System [optional]
|
||
--excludevmids Exclusde VM IDs when using --vmid==all
|
||
--destination Target PVE Host in target pool. e.g. --destination=pve04
|
||
--pool Ceph pool name in target pool. e.g. --pool=data
|
||
--keeplocal How many additional Snapshots to keep locally, specified in seconds or day. e.g. --keeplocal=2d
|
||
--keepremote How many additional Snapshots to keep remote, specified in seconds or day. e.g. --keepremote=7d
|
||
--rewrite PCRE Regex to rewrite the Config Files (eg. --rewrite='s/(net0:)(.*)tag=([0-9]+)/\1\2tag=1/g' would
|
||
change the VLAN tag from 5 to 1 for net0.
|
||
--influxurl Influx API url (e.g. --influxurl=https://your-influxserver.com/api/)
|
||
--influxtoken Influx API token with write permission
|
||
--influxbucket Influx Bucket to write to (e.g. --influxbucket=telegraf/autogen)
|
||
Switches:
|
||
--online Allow online Copy
|
||
--nolock Don't lock source VM on Transfer (mainly for test purposes)
|
||
--keep-slock Keep source VM locked on Transfer
|
||
--keep-dlock Keep VM locked after transfer on Destination
|
||
--overwrite Overwrite Destination
|
||
--protect Protect Ceph Snapshots
|
||
--debug Show Debug Output
|
||
|
||
Report bugs to the Github repo at https://github.com/lephisto/crossover/
|
||
```
|
||
|
||
## Introduction
|
||
|
||
When working with hyperconverged Proxmox HA Clusters you sometimes need to get VMs migrated to another cluster, or have a cold-standby copy of a VM ready to start there in case your main Datacenter goes boom. Crossover implements functionality that enables you to do the following:
|
||
|
||
- Transfer a non-running VM to another Cluster
|
||
- Transfer a running VM to another Cluster
|
||
- Continuously update a previously tranferred VM in another Cluster with incemental snapshots
|
||
|
||
Currently this only works with Ceph based storage backends, since the incremental logic heavily
|
||
relies on Rados block device features.
|
||
|
||
It'll work according this scheme:
|
||
|
||
```
|
||
.:::::::::. .:::::::::.
|
||
|Cluster-A| |Cluster-B|
|
||
| | | |
|
||
| _______ | rbd export-diff [..] | ssh pve04 | rbd import-diff [..] | _______ |
|
||
| pve01 -|-----------------------------------------------------------|->pve04 |
|
||
| _______ | | _______ |
|
||
| pve02 | | pve05 |
|
||
| _______ | | _______ |
|
||
| pve03 | | pve06 |
|
||
| _______ | | _______ |
|
||
| | | |
|
||
|:::::::::| |:::::::::|
|
||
```
|
||
|
||
## Main features
|
||
|
||
* Currently only for KVM. I might add LXC support when I need to.
|
||
* Can keep multiple backup
|
||
* Retention policy: (eg. keep x snapshots on the source and y snapshots in the destination cluster)
|
||
* Rewrites VM configurations so they match the new VMID and/or poolname on the destination
|
||
* Secure an encrypted transfer (SSH), so it's safe to mirror between datacenter without an additional VPN
|
||
* Near live-migrate: To move a VM from one Cluster to another, make an initial copy and re-run with --migrate. This will shutdown the VM on the source cluster and start it on the destination cluster.
|
||
|
||
## Installation of prerequisites
|
||
|
||
```apt install git pv gawk jq
|
||
|
||
## Install the Script somewhere, eg to /opt
|
||
|
||
git clone https://github.com/lephisto/crossover/ /opt
|
||
|
||
```
|
||
|
||
Ensure that you can freely ssh from the Node you plan to mirror _from_ to _all_ nodes in the destination cluster, as well as localhost.
|
||
|
||
## Continuous replication between Clusters
|
||
|
||
Example 1: Mirror VM to another Cluster:
|
||
|
||
```
|
||
root@pve01:~/crossover# ./crossover mirror --vmid=all --prefixid=99 --excludevmids=101 --destination=pve04 --pool=data2 --overwrite --online
|
||
ACTION: Onlinemirror
|
||
Start mirror 2022-11-01 19:21:44
|
||
VM 100 - Starting mirror for testubuntu
|
||
VM 100 - Checking for VM 99100 on Destination Host pve04 /etc/pve/nodes/*/qemu-server
|
||
VM 100 - Transmitting Config for to destination pve04 VMID 99100
|
||
VM 100 - locked 100 [rc:0]
|
||
VM 99100 - locked 99100 [rc:0]
|
||
VM 100 - Creating snapshot data/vm-100-disk-0@mirror-20221101192144
|
||
VM 100 - Creating snapshot data/vm-100-disk-1@mirror-20221101192144
|
||
VM 100 - unlocked source VM 100 [rc:0]
|
||
VM 100 - I data/vm-100-disk-0@mirror-20221101192144: e:0:00:01 c:[ 227KiB/s] a:[ 227KiB/s] 372KiB
|
||
VM 100 - Housekeeping: localhost data/vm-100-disk-0, keeping Snapshots for 0s
|
||
VM 100 - Removing Snapshot localhost data/vm-100-disk-0@mirror-20221101192032 (106s) [rc:0]
|
||
VM 100 - Housekeeping: pve04 data2/vm-99100-disk-0-data, keeping Snapshots for 0s
|
||
VM 100 - Removing Snapshot pve04 data2/vm-99100-disk-0-data@mirror-20221101192032 (108s) [rc:0]
|
||
VM 100 - Disk Summary: Took 2 Seconds to transfer 372.89 KiB in a incremental run
|
||
VM 100 - I data/vm-100-disk-1@mirror-20221101192144: e:0:00:00 c:[ 346 B/s] a:[ 346 B/s] 74.0 B
|
||
VM 100 - Housekeeping: localhost data/vm-100-disk-1, keeping Snapshots for 0s
|
||
VM 100 - Removing Snapshot localhost data/vm-100-disk-1@mirror-20221101192032 (114s) [rc:0]
|
||
VM 100 - Housekeeping: pve04 data2/vm-99100-disk-1-data, keeping Snapshots for 0s
|
||
VM 100 - Removing Snapshot pve04 data2/vm-99100-disk-1-data@mirror-20221101192032 (115s) [rc:0]
|
||
VM 100 - Disk Summary: Took 1 Seconds to transfer 372.96 KiB in a incremental run
|
||
VM 99100 - Unlocking destination VM 99100
|
||
Finnished mirror 2022-11-01 19:22:30
|
||
Job Summary: Bytes transferd 2 bytes for 2 Disks on 1 VMs in 00 hours 00 minutes 46 seconds
|
||
VM Freeze OK/failed...: 1/0
|
||
RBD Snapshot OK/failed: 2/0
|
||
Full xmitted..........: 0 byte
|
||
Differential Bytes ...: 372.96 KiB
|
||
|
||
```
|
||
|
||
This example creates a mirror of VM 100 (in the source cluster) as VM 10100 (in the destination cluster) using the ceph pool "data2" for storing all attached disks. It will keep 4 Ceph snapshots prior the latest (in total 5) and 8 snapshots on the remote cluster. It will keep the VM on the target Cluster locked to avoid an accidental start (thus causing split brain issues), and will do it even if the source VM is running.
|
||
|
||
The use case is that you might want to keep a cold-standby copy of a certain VM on another Cluster. If you need to start it on the target cluster you just have to unlock it with `qm unlock VMID` there.
|
||
|
||
Another usecase could be that you want to migrate a VM from one cluster to another with the least downtime possible. Real live migration that you are used to inside one cluster is hard to achive cross-cluster, but you can easily make an initial migration while the VM is still running on the source cluster (fully transferring the block devices), shut it down on source, run the mirror process again (which is much faster now because it only needs to transfer the diff since the initial snapshot) and start it up on the target cluster. This way the migration basically takes one boot plus a few seconds for transferring the incremental snapshot.
|
||
|
||
## Near-live Migration
|
||
|
||
To minimize downtime and achive a near-live Migration from one Cluster to another it's recommended to do an initial Sync of a VM from the source to the destination cluster. After that, run the job again, and add the --migrate switch. This causes the source VM to be shut down prior snapshot + transfer, and be restarted on the destination cluster as soon as the incremental transfer is complete. Using --migrate will always try to start the VM on the destination cluster.
|
||
|
||
Example 2: Near-live migrate VM from one cluster to another (Run initial replication first, which works online, then run with --migrate to shutdown on source, incrematally copy and start on destination):
|
||
|
||
```
|
||
root@pve01:~/crossover# ./crossover mirror --jobname=migrate --vmid=100 --destination=pve04 --pool=data2 --online
|
||
ACTION: Onlinemirror
|
||
Start mirror 2023-04-26 15:02:24
|
||
VM 100 - Starting mirror for testubuntu
|
||
VM 100 - Checking for VM 100 on destination cluster pve04 /etc/pve/nodes/*/qemu-server
|
||
VM 100 - Transmitting Config for to destination pve04 VMID 100
|
||
VM 100 - locked 100 [rc:0] on source
|
||
VM 100 - locked 100 [rc:0] on destination
|
||
VM 100 - Creating snapshot data/vm-100-disk-0@mirror-20230426150224
|
||
VM 100 - Creating snapshot data/vm-100-disk-1@mirror-20230426150224
|
||
VM 100 - unlocked source VM 100 [rc:0]
|
||
VM 100 - F data/vm-100-disk-0@mirror-20230426150224: e:0:09:20 r: c:[36.6MiB/s] a:[36.6MiB/s] 20.0GiB [===============================>] 100%
|
||
VM 100 - created snapshot on 100 [rc:0]
|
||
VM 100 - Disk Summary: Took 560 Seconds to transfer 20.00 GiB in a full run
|
||
VM 100 - F data/vm-100-disk-1@mirror-20230426150224: e:0:00:40 r: c:[50.7MiB/s] a:[50.7MiB/s] 2.00GiB [===============================>] 100%
|
||
VM 100 - created snapshot on 100 [rc:0]
|
||
VM 100 - Disk Summary: Took 40 Seconds to transfer 22.00 GiB in a full run
|
||
VM 100 - Unlocking destination VM 100
|
||
Finnished mirror 2023-04-26 15:13:47
|
||
Job Summary: Bytes transferred 22.00 GiB for 2 Disks on 1 VMs in 00 hours 11 minutes 23 seconds
|
||
VM Freeze OK/failed.......: 1/0
|
||
RBD Snapshot OK/failed....: 2/0
|
||
RBD export-full OK/failed.: 2/0
|
||
RBD export-diff OK/failed.: 0/0
|
||
Full xmitted..............: 22.00 GiB
|
||
Differential Bytes .......: 0 Bytes
|
||
|
||
root@pve01:~/crossover# ./crossover mirror --jobname=migrate --vmid=100 --destination=pve04 --pool=data2 --online --migrate
|
||
ACTION: Onlinemirror
|
||
Start mirror 2023-04-26 15:22:35
|
||
VM 100 - Starting mirror for testubuntu
|
||
VM 100 - Checking for VM 100 on destination cluster pve04 /etc/pve/nodes/*/qemu-server
|
||
VM 100 - Migration requested, shutting down VM on pve01
|
||
VM 100 - locked 100 [rc:0] on source
|
||
VM 100 - locked 100 [rc:0] on destination
|
||
VM 100 - Creating snapshot data/vm-100-disk-0@mirror-20230426152235
|
||
VM 100 - Creating snapshot data/vm-100-disk-1@mirror-20230426152235
|
||
VM 100 - I data/vm-100-disk-0@mirror-20230426152235: e:0:00:03 c:[1.29MiB/s] a:[1.29MiB/s] 4.38MiB
|
||
VM 100 - Housekeeping: localhost data/vm-100-disk-0, keeping Snapshots for 0s
|
||
VM 100 - Removing Snapshot localhost data/vm-100-disk-0@mirror-20230323162532 (2930293s) [rc:0]
|
||
VM 100 - Removing Snapshot localhost data/vm-100-disk-0@mirror-20230426144911 (2076s) [rc:0]
|
||
VM 100 - Removing Snapshot localhost data/vm-100-disk-0@mirror-20230426145632 (1637s) [rc:0]
|
||
VM 100 - Removing Snapshot localhost data/vm-100-disk-0@mirror-20230426145859 (1492s) [rc:0]
|
||
VM 100 - Removing Snapshot localhost data/vm-100-disk-0@mirror-20230426150224 (1290s) [rc:0]
|
||
VM 100 - Housekeeping: pve04 data2/vm-100-disk-0-data, keeping Snapshots for 0s
|
||
VM 100 - Removing Snapshot pve04 data2/vm-100-disk-0-data@mirror-20230426150224 (1293s) [rc:0]
|
||
VM 100 - Disk Summary: Took 4 Seconds to transfer 4.37 MiB in a incremental run
|
||
VM 100 - I data/vm-100-disk-1@mirror-20230426152235: e:0:00:00 c:[ 227 B/s] a:[ 227 B/s] 74.0 B
|
||
VM 100 - Housekeeping: localhost data/vm-100-disk-1, keeping Snapshots for 0s
|
||
VM 100 - Removing Snapshot localhost data/vm-100-disk-1@mirror-20230323162532 (2930315s) [rc:0]
|
||
VM 100 - Removing Snapshot localhost data/vm-100-disk-1@mirror-20230426144911 (2098s) [rc:0]
|
||
VM 100 - Removing Snapshot localhost data/vm-100-disk-1@mirror-20230426145632 (1659s) [rc:0]
|
||
VM 100 - Removing Snapshot localhost data/vm-100-disk-1@mirror-20230426145859 (1513s) [rc:0]
|
||
VM 100 - Removing Snapshot localhost data/vm-100-disk-1@mirror-20230426150224 (1310s) [rc:0]
|
||
VM 100 - Housekeeping: pve04 data2/vm-100-disk-1-data, keeping Snapshots for 0s
|
||
VM 100 - Removing Snapshot pve04 data2/vm-100-disk-1-data@mirror-20230426150224 (1313s) [rc:0]
|
||
VM 100 - Disk Summary: Took 2 Seconds to transfer 4.37 MiB in a incremental run
|
||
VM 100 - Unlocking destination VM 100
|
||
VM 100 - Starting VM on pve01
|
||
Finnished mirror 2023-04-26 15:24:25
|
||
Job Summary: Bytes transferred 4.37 MiB for 2 Disks on 1 VMs in 00 hours 01 minutes 50 seconds
|
||
VM Freeze OK/failed.......: 0/0
|
||
RBD Snapshot OK/failed....: 2/0
|
||
RBD export-full OK/failed.: 0/0
|
||
RBD export-diff OK/failed.: 2/0
|
||
Full xmitted..............: 0 Bytes
|
||
Differential Bytes .......: 4.37 MiB
|
||
```
|
||
|
||
## Things to check
|
||
|
||
From Proxmox VE Hosts you want to backup you need to be able to ssh passwordless to all other Cluster hosts, that may hold VM's or Containers. This goes for the source and for the destination Cluster.
|
||
|
||
This is required for using the free/unfreeze and the lock/unlock function, which has to be called locally from that Host the guest is currently running on. Usually this works out of the box for the source cluster, but you may want to make sure that you can "ssh root@pvehost1...n" from every host to every other host in the cluster.
|
||
|
||
For the Destination Cluster you need to copy your ssh-key to the first host in the cluster, and login once to every node
|
||
in your cluster.
|
||
|
||
|
||
## Some words about Snapshot consistency and what qemu-guest-agent can do for you
|
||
|
||
Bear in mind, that when taking a snapshot of a running VM, it's basically like if you have a server which gets pulled away from the Power. Often this is not cathastrophic as the next fsck will try to fix Filesystem Issues, but in the worst case this could leave you with a severely damaged Filesystem, or even worse, half written Inodes which were in-flight when the power failed lead to silent data corruption. To overcome these things, we have the qemu-guest-agent to improve the consistency of the Filesystem while taking a snapshot. It won't leave you a clean filesystem, but it sync()'s outstanding writes and halts all i/o until the snapshot is complete. Still, there might me issues on the Application layer. Databases processes might have unwritten data in memory, which is the most common case. Here you have the opportunity to do additional tuning, and use hooks to tell your vital processes things to do prio and post freezes.
|
||
|
||
First, you want to make sure that your guest has the qemu-guest-agent running and is working properly. Now we use custom hooks to tell your services with volatile data, to flush all unwritten data to disk. On debian based linux systems the hook file can be set in ```/etc/default/qemu-guest-agent``` and could simply contain this line:
|
||
|
||
```
|
||
DAEMON_ARGS="-F/etc/qemu/fsfreeze-hook"
|
||
```
|
||
|
||
Create ```/etc/qemu/fsfreeze-hook``` and make ist look like:
|
||
|
||
```
|
||
#!/bin/sh
|
||
|
||
# This script is executed when a guest agent receives fsfreeze-freeze and
|
||
# fsfreeze-thaw command, if it is specified in --fsfreeze-hook (-F)
|
||
# option of qemu-ga or placed in default path (/etc/qemu/fsfreeze-hook).
|
||
# When the agent receives fsfreeze-freeze request, this script is issued with
|
||
# "freeze" argument before the filesystem is frozen. And for fsfreeze-thaw
|
||
# request, it is issued with "thaw" argument after filesystem is thawed.
|
||
|
||
LOGFILE=/var/log/qga-fsfreeze-hook.log
|
||
FSFREEZE_D=$(dirname -- "$0")/fsfreeze-hook.d
|
||
|
||
# Check whether file $1 is a backup or rpm-generated file and should be ignored
|
||
is_ignored_file() {
|
||
case "$1" in
|
||
*~ | *.bak | *.orig | *.rpmnew | *.rpmorig | *.rpmsave | *.sample | *.dpkg-old | *.dpkg-new | *.dpkg-tmp | *.dpkg-dist |
|
||
*.dpkg-bak | *.dpkg-backup | *.dpkg-remove)
|
||
return 0 ;;
|
||
esac
|
||
return 1
|
||
}
|
||
|
||
# Iterate executables in directory "fsfreeze-hook.d" with the specified args
|
||
[ ! -d "$FSFREEZE_D" ] && exit 0
|
||
for file in "$FSFREEZE_D"/* ; do
|
||
is_ignored_file "$file" && continue
|
||
[ -x "$file" ] || continue
|
||
printf "$(date): execute $file $@\n" >>$LOGFILE
|
||
"$file" "$@" >>$LOGFILE 2>&1
|
||
STATUS=$?
|
||
printf "$(date): $file finished with status=$STATUS\n" >>$LOGFILE
|
||
done
|
||
|
||
exit 0
|
||
```
|
||
|
||
For testing purposes place this into ```/etc/qemu/fsfreeze-hook.d/10-info```:
|
||
|
||
```
|
||
#!/bin/bash
|
||
dt=$(date +%s)
|
||
|
||
case "$1" in
|
||
freeze)
|
||
echo "frozen on $dt" | tee >(cat >/tmp/fsfreeze)
|
||
;;
|
||
thaw)
|
||
echo "thawed on $dt" | tee >(cat >>/tmp/fsfreeze)
|
||
;;
|
||
esac
|
||
|
||
```
|
||
|
||
Now you can place files for different Services in ```/etc/qemu/fsfreeze-hook.d/``` that tell those services what to to prior and post snapshots. A very common example is mysql. Create a file ```/etc/qemu/fsfreeze-hook.d/20-mysql``` containing
|
||
|
||
```
|
||
#!/bin/sh
|
||
|
||
# Flush MySQL tables to the disk before the filesystem is frozen.
|
||
# At the same time, this keeps a read lock in order to avoid write accesses
|
||
# from the other clients until the filesystem is thawed.
|
||
|
||
MYSQL="/usr/bin/mysql"
|
||
#MYSQL_OPTS="-uroot" #"-prootpassword"
|
||
MYSQL_OPTS="--defaults-extra-file=/etc/mysql/debian.cnf"
|
||
FIFO=/var/run/mysql-flush.fifo
|
||
|
||
# Check mysql is installed and the server running
|
||
[ -x "$MYSQL" ] && "$MYSQL" $MYSQL_OPTS < /dev/null || exit 0
|
||
|
||
flush_and_wait() {
|
||
printf "FLUSH TABLES WITH READ LOCK \\G\n"
|
||
trap 'printf "$(date): $0 is killed\n">&2' HUP INT QUIT ALRM TERM
|
||
read < $FIFO
|
||
printf "UNLOCK TABLES \\G\n"
|
||
rm -f $FIFO
|
||
}
|
||
|
||
case "$1" in
|
||
freeze)
|
||
mkfifo $FIFO || exit 1
|
||
flush_and_wait | "$MYSQL" $MYSQL_OPTS &
|
||
# wait until every block is flushed
|
||
while [ "$(echo 'SHOW STATUS LIKE "Key_blocks_not_flushed"' |\
|
||
"$MYSQL" $MYSQL_OPTS | tail -1 | cut -f 2)" -gt 0 ]; do
|
||
sleep 1
|
||
done
|
||
# for InnoDB, wait until every log is flushed
|
||
INNODB_STATUS=$(mktemp /tmp/mysql-flush.XXXXXX)
|
||
[ $? -ne 0 ] && exit 2
|
||
trap "rm -f $INNODB_STATUS; exit 1" HUP INT QUIT ALRM TERM
|
||
while :; do
|
||
printf "SHOW ENGINE INNODB STATUS \\G" |\
|
||
"$MYSQL" $MYSQL_OPTS > $INNODB_STATUS
|
||
LOG_CURRENT=$(grep 'Log sequence number' $INNODB_STATUS |\
|
||
tr -s ' ' | cut -d' ' -f4)
|
||
LOG_FLUSHED=$(grep 'Log flushed up to' $INNODB_STATUS |\
|
||
tr -s ' ' | cut -d' ' -f5)
|
||
[ "$LOG_CURRENT" = "$LOG_FLUSHED" ] && break
|
||
sleep 1
|
||
done
|
||
rm -f $INNODB_STATUS
|
||
;;
|
||
|
||
thaw)
|
||
[ ! -p $FIFO ] && exit 1
|
||
echo > $FIFO
|
||
;;
|
||
|
||
*)
|
||
exit 1
|
||
;;
|
||
esac
|
||
|
||
```
|
||
|
||
## Last remarks
|
||
|
||
_Test your Backups on a regular Base. Restore them and see if you can mount and/or boot. Snapshots are not meant to be a full replacement for traditional Backups, don't rely on them as the only Source even if it looks very convenient. Follow the n+1 principle and do filebased backups from within your VM's (with Bacula, Borg, rsync, you name it.). If one concept fails for some reason you always have another way to get your Data._
|
||
|
||
## Useful resources
|
||
|
||
Ceph Documentation:
|
||
[Incremental snapshots with rbd](http://ceph.com/dev-notes/incremental-snapshots-with-rbd/)
|
||
[rdb – manage rados block device (rbd) images](http://docs.ceph.com/docs/master/man/8/rbd/)
|
||
|
||
Proxmox Wiki:
|
||
https://pve.proxmox.com/wiki/ |