Backup

In this chapter you will learn all possibilities and options for backup.

$ benji backup --help
usage: benji backup [-h] [-s SNAPSHOT_NAME] [-r RBD_HINTS]
                    [-f BASE_VERSION_UID] [-b BLOCK_SIZE] [-l label]
                    [-S STORAGE]
                    source version_name

positional arguments:
  source                Source URL
  version_name          Backup version name (e.g. the hostname)

optional arguments:
  -h, --help            show this help message and exit
  -s SNAPSHOT_NAME, --snapshot-name SNAPSHOT_NAME
                        Snapshot name (e.g. the name of the RBD snapshot)
  -r RBD_HINTS, --rbd-hints RBD_HINTS
                        Hints in rbd diff JSON format
  -f BASE_VERSION_UID, --base-version BASE_VERSION_UID
                        Base version UID
  -b BLOCK_SIZE, --block-size BLOCK_SIZE
                        Block size in bytes
  -l label, --label label
                        Labels for this version (can be repeated)
  -S STORAGE, --storage STORAGE
                        Destination storage (if unspecified the default is
                        used)

Simple backup

This is how you can create a normal backup:

$ benji backup source name

where source is a URI and name is the name for the backup, which may contain any quotable character.

Note

The name and all other identifiers are stored in SQL VARCHAR columns which are created by SQLAlchemy’s String type. Please refer to http://docs.sqlalchemy.org/en/latest/core/type_basics.html#sqlalchemy.types.String for reference.

The supported schemes for source are file and rbd. So these are realistic examples:

$ benji backup file:///var/lib/vms/database.img database
$ benji backup rbd://poolname/database@snapshot1 database

Versions

An instance of a backup is called a version. A version contains these metadata fields:

  • date: date and time of the backup, this is created by Benji.
  • uid: unique identifier for this version, this is created by Benji
  • name: name from the command line
  • snapshot_name: snapshot name (option -s) from the command line
  • size: size of the backuped image in bytes
  • block_size: block size in bytes
  • valid: validity of this version (True or False) This is False while the backup for this version is running and will be set to True as soon as the backup has finished and all writers have flushed their data. Scrubbing may set this to False if the backup is found invalid for any reason.
  • protected: indication if the version may be deleted (True or False)
  • tags: list of (string) tags for this version

You can output this data with:

$ benji ls
    INFO: $ benji ls
+---------------------+-------------+------+---------------+----------+------------+-------+-----------+------+
|         date        |     uid     | name | snapshot_name |     size | block_size | valid | protected | tags |
+---------------------+-------------+------+---------------+----------+------------+-------+-----------+------+
| 2018-06-07T12:51:19 | V0000000001 | test |               | 41943040 |    4194304 |  True |   False   |      |
+---------------------+-------------+------+---------------+----------+------------+-------+-----------+------+

Hint

You can filter the output with various parameters:

$ benji ls --help
usage: benji ls [-h] [-l] [filter_expression]

positional arguments:
  filter_expression     Version filter expression

optional arguments:
  -h, --help            show this help message and exit
  -l, --include-labels  Include labels in output

Differential Backup

Benji only backups changed blocks. It can do this in two different ways:

  1. It can read the whole image: Checksum each block and look the checksum up in the database backend. If it is found, only a reference to the existing block will be stored, thus there’s no write action on the data backend.
  2. It can receive a hints file: The hints file is a JSON formatted list of (offset, size) tuples (see The Hints File for an example) which indicate the status (used or changed) of each region of the image. Fortunately the format matches exactly the output of rbd diff --format=json. In this case it will only read blocks hinted at by the hints file, checksum each block and look the checksum up in the database backend. If it is found (which may rarely happen for file copies or when blocks are all zeros), only a reference to the existing block will be stored. Otherwise the block is written to the data backend. The hints file is passed via the -r or --rbd option to benji backup.

Note

Benji does forward-incremental backups. So in contrast to backward-incremental backups, there will never be any need to create another full backup after the first full backup. If you don’t trust Benji, you are encouraged to use benji deep-scrub, possibly with the [-s] parameter to see if the backup matches the source.

In any case, if the backup source changes size, Benji will assume that the size change has happened at the end of the volume, which is the case if you resize partitions, logical volumes or Ceph RBD images.

Examples

LVM and other Images

Day 1 (initial backup):

$ lvcreate --size 1G --snapshot --name snap /dev/vg00/lvol1
$ benji backup file:///dev/vg00/snap lvol1
$ lvremove -y /dev/vg00/snap

Day 2..n (differential backups):

$ lvcreate --size 1G --snapshot --name snap /dev/vg00/lvol1
$ benji backup file:///dev/vg00/snap lvol1
$ lvremove -y /dev/vg00/snap

Important

With LVM snapshots, the snapshot increases in size as the origin volume changes. If the snapshot is 100% full, it is lost and invalid. It is important to monitor the snapshot usage with the lvs command to make sure the snapshot doesn’t fill up completely. The --size parameter defines the space reserved for changes during the snapshot’s existence.

Also note that LVM does read-write-write for any overwritten block while a snapshot exists. This may hurt your performance.

Ceph RBD

With Ceph RBD it’s possible to let Ceph calculate the changes between two snapshots. Since the jewel version of Ceph this is a very fast process if the fast-diff feature is enabled. In this case only metadata has to be compared.

Manually

In this example, we will backup an RBD image called vm1 which is in the pool pool.

  1. Create an initial backup:

    $ rbd snap create pool/vm1@backup1
    $ rbd diff --whole-object pool/vm1@backup1 --format=json > /tmp/vm1.diff
    $ benji backup -s backup1 -r /tmp/vm1.diff rbd://pool/vm1@backup1 vm1
    
  2. Create a differential backup:

    $ rbd snap create pool/vm1@backup2
    $ rbd diff --whole-object pool/vm1@backup2 --from-snap backup1 --format=json > /tmp/vm1.diff
    
    # delete old snapshot
    $ rbd snap rm pool/vm1@backup1
    
    # get the uid of the version corrosponding to the old rbd snapshot. This
    # looks like "V001234567". Copy it.
    $ benji ls vm1 -s backup1
    
    # and backup
    $ benji backup -s backup2 -r /tmp/vm1.diff -f V001234567 rbd://pool/vm1@backup2 vm1
    
Automation

This is how you can automate forward differential backups including automatic initial backups where necessary:

#!/usr/bin/env bash

function benji::backup::ceph::initial {
    local NAME="$1"
    local POOL="$2"
    local IMAGE="$3"
    shift 3
    local LABELS=("$@")

    local SNAPNAME="b-$(date '+%Y-%m-%dT%H:%M:%S')"  # b-2017-04-19T11:33:23
    local TEMPFILE=$(mktemp --tmpdir ceph-rbd-diff-tmp.XXXXXXXXXX)

    echo "Performing initial backup of $NAME:$POOL/$IMAGE."

    rbd snap create "$POOL"/"$IMAGE"@"$SNAPNAME"
    rbd diff --whole-object "$POOL"/"$IMAGE"@"$SNAPNAME" --format=json >"$TEMPFILE"
    benji --log-level "$BENJI_LOG_LEVEL" backup -s "$SNAPNAME" -r "$TEMPFILE" \
        $(printf -- "-l %s " "${LABELS[@]}") rbd://"$POOL"/"$IMAGE"@"$SNAPNAME" "$NAME"

    rm -f "$TEMPFILE"
}

function benji::backup::ceph::differential {
    local NAME="$1"
    local POOL="$2"
    local IMAGE="$3"
    local LAST_RBD_SNAP="$4"
    local BENJI_SNAP_VERSION_UID="$5"
    shift 5
    local LABELS=("$@")

    local SNAPNAME="b-$(date '+%Y-%m-%dT%H:%M:%S')"  # b-2017-04-20T11:33:23
    local TEMPFILE=$(mktemp --tmpdir ceph-rbd-diff-tmp.XXXXXXXXXX)

    echo "Performing differential backup of $NAME:$POOL/$IMAGE from RBD snapshot $LAST_RBD_SNAP and Benji version $BENJI_SNAP_VERSION_UID."

    rbd snap create "$POOL"/"$IMAGE"@"$SNAPNAME"
    rbd diff --whole-object "$POOL"/"$IMAGE"@"$SNAPNAME" --from-snap "$LAST_RBD_SNAP" --format=json >"$TEMPFILE"
    # delete old snapshot
    rbd snap rm "$POOL"/"$IMAGE"@"$LAST_RBD_SNAP"
    # and backup
    benji --log-level "$BENJI_LOG_LEVEL" backup -s "$SNAPNAME" -r "$TEMPFILE" -f "$BENJI_SNAP_VERSION_UID" \
        $(printf -- "-l %s " "${LABELS[@]}") rbd://"$POOL"/"$IMAGE"@"$SNAPNAME" "$NAME"
    
    rm -f "$TEMPFILE"
}

function benji::backup::ceph {
    local NAME="$1"
    local POOL="$2"
    local IMAGE="$3"
    shift 3
    local LABELS=("$@")

    # find the latest snapshot name from rbd
    local LAST_RBD_SNAP=$(rbd snap ls "$POOL"/"$IMAGE" --format=json | jq -r '[.[].name] | map(select(test("^b-"))) | sort | .[-1] // ""')
    echo "Snapshot found for $POOL/$IMAGE is $LAST_RBD_SNAP."
    if [[ -z $LAST_RBD_SNAP ]]; then
        echo 'No previous RBD snapshot found, reverting to initial backup.'
        local START_TIME=$(date +'%s')
        benji_backup_start_time -command=backup -auxiliary_data=initial -version_name="$NAME" set $(date +'%s.%N')
        try {
            benji::backup::ceph::initial "$NAME" "$POOL" "$IMAGE" "${LABELS[@]}"
        } catch {
            benji_backup_status_failed -command=backup -auxiliary_data=initial -version_name="$NAME" set 1
        } onsuccess {
            benji_backup_status_succeeded -command=backup -auxiliary_data=initial -version_name="$NAME" set 1
        }
        benji_backup_completion_time -command=backup -auxiliary_data=initial -version_name="$NAME" set $(date +'%s.%N')
        benji_backup_runtime_seconds -command=backup -auxiliary_data=initial -version_name="$NAME" set $[$(date +'%s') - $START_TIME]
    else
        # check if a valid version of this RBD snapshot exists
        BENJI_SNAP_VERSION_UID=$(benji -m ls 'name == "'"$NAME"'" and snapshot_name == "'"$LAST_RBD_SNAP"'"' | jq -r '.versions[0] | select(.status == "valid") | .uid // ""')
        if [[ -z $BENJI_SNAP_VERSION_UID ]]; then
            echo 'Existing RBD snapshot not found in Benji, deleting it and reverting to initial backup.'
            local START_TIME=$(date +'%s')
            benji_backup_start_time -command=backup -auxiliary_data=initial -version_name="$NAME" set $(date +'%s.%N')
            try {
                rbd snap rm "$POOL"/"$IMAGE"@"$LAST_RBD_SNAP"
                benji::backup::ceph::initial "$NAME" "$POOL" "$IMAGE" "${LABELS[@]}"
            } catch {
                benji_backup_status_failed -command=backup -auxiliary_data=initial -version_name="$NAME" set 1
            } onsuccess {
                benji_backup_status_succeeded -command=backup -auxiliary_data=initial -version_name="$NAME" set 1
            }
            benji_backup_completion_time -command=backup -auxiliary_data=initial -version_name="$NAME" set $(date +'%s.%N')
            benji_backup_runtime_seconds -command=backup -auxiliary_data=initial -version_name="$NAME" set $[$(date +'%s') - $START_TIME]
        else
            local START_TIME=$(date +'%s')
            benji_backup_start_time -version_name="$NAME" -command=backup -auxiliary_data=differential set $(date +'%s.%N')
            try {
                benji::backup::ceph::differential "$NAME" "$POOL" "$IMAGE" "$LAST_RBD_SNAP" "$BENJI_SNAP_VERSION_UID" "${LABELS[@]}"
            } catch {
                benji_backup_status_failed -command=backup -auxiliary_data=differential -version_name="$NAME" set 1
            } onsuccess {
                benji_backup_status_succeeded -command=backup -auxiliary_data=differential -version_name="$NAME" set 1
            }
            benji_backup_completion_time -command=backup -auxiliary_data=differential -version_name="$NAME" set $(date +'%s.%N')
            benji_backup_runtime_seconds -command=backup -auxiliary_data=differential -version_name="$NAME" set $[$(date +'%s') - $START_TIME]
        fi
    fi
}

This is what it does:

  • When the backup::ceph is called, it searches for the latest RBD snapshot. As RBD snapshots have no date assigned, it’s the last one from a sorted output of rbd snap ls.

Note

Only RBD snapshots that begin the prefix b- are considered. All other snapshots are left alone. This makes it possible to have manual snapshots that aren’t touched by Benji.

  • If no RBD snapshot is found, an initial backup is performed.
  • If there is an RBD snapshot, Benji is asked if it has a version of this snapshot. If not, an initial_backup is performed.
  • If Benji has a version of this snapshot, a hints file is created via rbd diff --whole-object <new snapshot> --from-snap <old snapshot> --format=json.
  • Benji then only backups changes as listed in the hints file.

These functions could be called each day by a small script (or even multiple times a day) and will automatically keep only one snapshot and create forward-differential backups.

Note

This alone won’t be enough to be on the safe side. You will have to check the validity of the backup data regularly. Please refer to section Scrub.

Specifying a Block Size

To perform a backup Benji splits up the image into equal sized blocks. [1]

By default the block size specified in the configuration file is used. But the block size can also be changed on the command line on a version by version basis, but be aware that this will affect deduplication and increase the space usage.

One possible use case for different block sizes would be backing up LVM volumes and Ceph images with the same Benji installation. While for Ceph 4MB is usually the best size, LVM volumes might profit from a smaller block size.

If you want to base a new version on an old version (as it can be the case when doing a differential backup) the block size of the old and new version have to match. Benji will terminate with an error if that is not the case.

Tag Backups

A version can have multiple tags. They are just for use by the administrator and have no function in Benji. To specify a tag the backup command provides the command line switch -t or --tag:

$ benji backup -t mytag rbd://cephstorage/test_vm test_vm

You can also use multiple tags for one revision:

$ benji backup -t mytag -t anothertag rbd://cephstorage/test_vm test_vm

Later on you can modify tags with the commands ‘add-tag’ and ‘remove-tag’:

$ benji add-tag V0000000001 mytag $ benji rm-tag V0000000001 anothertag

In the case of add-tag and rm-tag you can also specify multiple tags, just list them after the first one. It is no error to add or remove tags which already exist or which don’t exist anymore respectively, Benji just emits a warning in these cases.

Export Metadata

Benji has now backed up all image data to a (hopefully) safe place. However, the blocks are of no use without the corresponding metadata. Benji will need this information to get the blocks back in the correct order and restore your image.

This information is stored in the database backend. Additionally Benji will save the metadata on the data backend automatically. Should you lose your database backend, you can restore these metadata backups by using benji metadata-restore.

$ benji metadata-restore --help
usage: benji metadata-restore [-h] [-S STORAGE] VERSION_UID [VERSION_UID ...]

positional arguments:
  VERSION_UID           Version UID

optional arguments:
  -h, --help            show this help message and exit
  -S STORAGE, --storage STORAGE
                        Source storage (if unspecified the default is used)

There is currently no mechanism to import the backup of all version’s metadata from the data backend, but you could get a list of all versions manually from the data backend.

Note

This metadata backup is compressed and encrypted like the blocks if you have these features enabled.

If you want to make your own copies of your metadata you can do so by using benji metadata-export.

$ benji metadata-export --help
usage: benji metadata-export [-h] [-f] [-o OUTPUT_FILE] [filter_expression]

positional arguments:
  filter_expression     Version filter expression

optional arguments:
  -h, --help            show this help message and exit
  -f, --force           Overwrite an existing output file
  -o OUTPUT_FILE, --output-file OUTPUT_FILE
                        Output file (standard output if missing)

If you’re doing this programmatically and are exporting to STDOUT you should probably add -m to your export command to reduce the logging level of Benji.

$ benji -m metadata-export V1
{
  "metadataVersion": "1.0.0",
  "versions": [
    {
      "uid": 1,
      "date": "2018-06-07T12:51:19",
      "name": "test",
      "snapshot_name": "",
      "size": 41943040,
      "block_size": 4194304,
      "valid": true,
      "protected": false,
      "tags": [],
      "blocks": [
        {
          "uid": {
            "left": 1,
            "right": 1
          },
          "date": "2018-06-07T14:51:20",
          "id": 0,
          "size": 4194304,
          "valid": true,
          "checksum": "aed3116b4e7fad9a3188f5ba7c8e73bf158dabec387ef1a7bca84c58fe72f319"
        },
[...]

You can import such a dump of a version’s metadata with benji metadata-import.

$ benji metadata-import --help
usage: benji metadata-import [-h] [-i INPUT_FILE]

optional arguments:
  -h, --help            show this help message and exit
  -i INPUT_FILE, --input-file INPUT_FILE
                        Input file (standard input if missing)

You can’t import versions that already exist in the database backend.

The Hints File

Example of a hints-file:

[{"offset":0,"length":4194304,"exists":"true"},
{"offset":4194304,"length":4194304,"exists":"true"},
{"offset":8388608,"length":4194304,"exists":"true"},
{"offset":12582912,"length":4194304,"exists":"true"},
{"offset":16777216,"length":4194304,"exists":"true"},
{"offset":20971520,"length":4194304,"exists":"true"},
{"offset":25165824,"length":4194304,"exists":"true"},
{"offset":952107008,"length":4194304,"exists":"true"}

Note

The length may vary, however it’s nicely aligned to 4MB when using rbd diff --whole-object. As Benji by default also uses 4MB blocks, it will not have to recalculate which 4MB blocks are affected by more and smaller offset+length tuples (not that that’d take very long).

[1]Except the last block which may vary in length.