home January 01, 2017

Rsync BackUp


rsync files to backup server

Using a centralized backup server might be one of the most important safety nets you can implement. At some point data is going to be lost and you are going to be relieved to know the files are accessible on the backup server. But, we are not talking about a tape server with proprietary backup software where it could take hours to get the data back.

We are going to design the logic using the open source binary rsync to backup to on-line storage; using a raid or server with redundant disks to rsync the data from other servers and user systems to the backup server. Rsync is an incredible backup tool.

This is what was said about rsync as a backup tool in "Using rsync to backup user files"

Rsync is one of the most powerful open source tools you can use. It allows you to specify a list of files or a directory structure and make an exact copy to another location. What is different in rsync, than copy for example, is it will only copy the data that is different from the source and target locations.

If you have 100 gigs of data you need to backup, but only 4 gig has changed since the last copy you made, there is no need to copy over the whole 100 gigs again. Doing so would waste your time and put a strain on network bandwidth. You also do not want to go through the files manually using "find" and figure out what changed. This is where rsync comes in. You can set up rsync to mirror a list of directories and/or files and compare that list against the target location where you want the files to be backed up. The efficiency of rsync comes from the method used to backup the data.

When data is added or deleted from a file the entire file does not change, only part of it does. Rsync recognizes this and only copies over the changed data. If you had a 100 meg file and only 1 meg changed, then only 1 meg would need to be transfered. Very efficient indeed.

With online backups you can have all of the data available online all the time. If a user deletes their own mail folder they can retrieve it themselves through any method you allow. For example, you could setup a sftp or passworded ftp server allowing your user to browse their backups and retrieve the files of their choosing. Using this ideology can save you hours of tedious backup retrieval time per week if you have "rm -rf" clumsy users.

Setting up the backup file structure

The files being backed in our exercises will be put into a directory called /Raid6. As the name implies we are using a disk array of at least 4 disks in raid 6 format. You can use anything from a single large disk, a set of disks mirroring themselves (raid 1) or any other format you choose. Please make sure that the data you are going to put on the backup server can not be lost by a single failed disk, otherwise this will be a very short exercise.

Under the directory /Raid6 we will make the sub directories for each of the machines we want to backup. Lets backup the mail server and the user machine for bob. The directory of the user machine will be owned by bob himself and chmod'd to 700 so he can get to his own backups and nothing else. We also need to make sub directories under mail_server and bob called current. The "current" directory is where the current days backups will be put.

Finally, the backup scripts will go onto the raid partition. The scripts are as important as the user data, so putting them on a raided partition makes good sense. Make the backup scripts directory in /Raid6/BACKUP_config .

This is an example of what our directory structure will look like:

[root@machine]# ls -la /Raid6/
drwxr-xr-x  2 root  root  1234 Jan 10  2020 .
drwxr-xr-x  2 root  root  1234 Jan 10  2020 ..
drwx------  1 root  root  4096 Jan 10  2020 BACKUP_config
drwx------  1 root  root  4096 Jan 10  2020 mail_server
drwx------  1 bob   root  4096 Jan 10  2020 bob

...and...

[root@machine]# ls -la /Raid6/mail_server
drwxr-xr-x  2 root  root  1234 Jan 10  2020 .
drwxr-xr-x  2 root  root  1234 Jan 10  2020 ..
drwx------  1 root  root  4096 Jan 10  2020 current

...and...

[root@machine]# ls -la /Raid6/bob
drwxr-xr-x  2 bob   root  1234 Jan 10  2020 .
drwxr-xr-x  2 root  root  1234 Jan 10  2020 ..
drwx------  1 bob   root  4096 Jan 10  2020 current

The primary backup script: BACKUP_MasterServer.sh

In the following text window you will find the BACKUP_MasterServer.sh shell script. Lets take a look at each of the lines we are going to use. Remember we are on the backup server and pulling the files from the remote machines:

The Mail Server backups will use the BACKUP_filter to exclude files from the backups. Rsync will connect over ssh using the arcfour encryption algorithm to increase speed and log in as root to the machine mail_server. All files and directories in the /BACKUP on mail_server will be backed up to /Raid6/mail_server/current. The output of the transaction log will me mailed to root. In the end we will "touch" the /Raid6/mail_server/current directory to reflect the date when the script was run.

The backups for Bob's machine will also use the BACKUP_filter to exclude files from the backups. Rsync will connect over ssh using the arcfour encryption algorithm to increase speed and log in as bob to the machine bob. All files and directories in the /BACKUP on mail_server will be backed up to /Raid6/bob/current. The output of the transaction log will me mailed to the backup_admin and to the user bob. This is a good practice to mail to the user bob as the user might have question about what is being backed up and they can simply look at this email. Finally, we will "touch" the /Raid6/bob/current directory to reflect the date when the script was run.

#!/bin/sh
#
## moneyslow.com  BACKUP_MasterServer.sh
## (remember you can use --dry-run for testing)
#
# Mail Server backups on the machine mail_server
rsync -avx --timeout=30 --delete-excluded --exclude-from=BACKUP_filter --rsh="ssh -c arcfour -l root" mail_server:/BACKUP/ /Raid6/mail_server/current | mail -s "Backup DONE for mail-server (`hostname`)" root ; touch /Raid6/mail_server/current
#
# Bob's backups on the machine called Bob
rsync -avx --timeout=30 --delete-excluded --exclude-from=BACKUP_filter --rsh="ssh -c arcfour -l bob" bob:/BACKUP/ /Raid6/bob/current | mail -s "Backup DONE for bob (`hostname`)" -c backup_admin@internal.domain.lan bob@internal.domain.lan ; chown -R bob:bob /Raid6/bob/;touch /Raid6/bob/current
#
# time stamp when the Master backup scripts completes
logger "BACKUP_MasterServer Completed"

The exclude file pattern list: BACKUP_filter

This text file called BACKUP_filter is the list of files or regular expression patterns you DO NOT want to be backup up. This may be for security reasons or just to save space. For example, we do not need any ogg music files, movie files or iso images backed up from the remote systems. The password_list file is also exempt.

#
## moneyslow.com  BACKUP_filter
#
*.ogg
*.mov
*.iso
password_list

Compressing "current" for a long term archive

Now that the backups from the remote machines are working lets setup a script to compress the "current" directories. This allows one to create a time stamped long term backup.

These lines will change to the directory of the machine and bzip2 the current directory. The name of the files will start with "archive_", the date and end with ".tar.bz2". Then the archive files will have their permissions set so the users can read them.

#!/bin/sh
#
# Mail Server on mail_server
cd /Raid6/mail_server/;tar jcf archive_`date +%Y.%m.%d`.tar.bz2 current/; chown root:root archive_*
#
# Bob on the machine bob
cd /Raid6/bob/;tar jcf archive_`date +%Y.%m.%d`.tar.bz2 current/; chown root:bob archive_*

The archive file will look like this:

[root@machine]# ls -la /Raid6/bob
drwxr-xr-x  2 root  root   1234 Jan 10  2020 .
drwxr-xr-x  2 root  root   1234 Jan 10  2020 ..
-rw-r-----  1 root  bob  321456 Jan 10  2020 archive_2020.01.10.tar.bz2
drwx------  1 root  root   4096 Jan 10  2020 current

Verify the backup files, directories and permissions...

At this point you should have three files in your /Raid6/BACKUP_config directory. They are BACKUP_MasterServer.sh, BACKUP_Compress.sh, and BACKUP_filter. The script BACKUP_MasterServer.sh should be owned by root and have the permissions 700. The text file BACKUP_filter should be owned by root and set with permissions 600. Your /Raid6/BACKUP_Config/ should look something like so:

[root@machine]# ls -la /Raid6/BACKUP_Config/
drwxr-xr-x  2 root root  1234 Jan 10  2020 .
drwxr-xr-x  2 root root  1234 Jan 10  2020 ..
-rwx------  1 root root   123 Jan 10  2020 BACKUP_Compress.sh
-rw-------  1 user user    12 Jan 10  2020 BACKUP_filter
-rwx------  1 root root   123 Jan 10  2020 BACKUP_MasterServer.sh

Running the backups

It is now time to test out the backup scripts. Remember you can also insert the "--dry-run" argument into the rsync line to run the script, but not do any work. Now, change to /Raid6/BACKUP_Config and execute the BACKUP_MasterServer.sh script. It will not show and feedback in the terminal as all the output will be sent to root in an email. When the script finishes you should see the directories and files from the remote machine's /BACKUP in "current" under the name of the server.

If you do not see the the files you expected in the "current" directory then you can always run the rsync line by hand. Remove the "|" (pipe) that sends the output to mail and you will see the errors in the terminal.

When you do see that the backups are working correctly then look at using BACKUP_Compress to bzip2 the files into an archive.

Automating backups with cron

Now that we have the backups working and the archive process down, lets setup cron to run the backups automatically at off business hours. We will set the backups to run at 3am every other day and make compressed archives every Monday morning at 5am.

#minute (0-59)
#|   hour (0-23)
#|   |    day of the month (1-31)
#|   |    |   month of the year (1-12 or Jan-Dec)
#|   |    |   |   day of the week (0-6 with 0=Sun or Sun-Sat)
#|   |    |   |   |   commands
#|   |    |   |   |   |
### Automated Backups
 0   3    *   *   */2 /tools/BACKUP_Config/BACKUP_MasterScript.sh >> /dev/null 2>&1
 0   5    *   *   1   /tools/BACKUP_Config/BACKUP_Compress.sh >> /dev/null 2>&1

Wrapping ssh for safety and security

Since we are using ssh keys to connect from one server to another it is a good idea to limit what commands the remote machine can execute. This is an example of a ssh wrapper which will limit the commands the backup server can send to the client to only rsync requests.

First, use the "command" directive on the target system and put the wrapper script at the beginning of the line of the ssh key from the backup server in the authorized_keys2 file like so:

[root@target_machine ~]# cat .ssh/authorized_keys2
command="/tools/rsync-wrapper.sh" ssh-dss pbxHkhasd3c3... root@origin_machine

Next put this wrapper script on the target system.

#!/bin/bash
#
## moneyslow.com  /tools/rsync-wrapper.sh
#
# This script is executed when the user from a remote
# machine successfully authenticates with a public key.
# The privileges of that user on this server is limited
# to the functionality of this script.  This wrapper
# will verify the client is sending the rsync command.
# If it is, the original, unaltered command is run,
# if not, an error is returned.

RSYNC_HOME=/usr/bin
log="$HOME/rsync.log"
                    
command="$SSH_ORIGINAL_COMMAND"; export command
                                              
if [ "N$command" = "N" ] ;  then
  $log
  exit 127
fi
 
cmdbase=`echo $command | awk '{print$1}'`
if [ "$cmdbase" != "rsync" ] ; then
  echo "ERROR: Invalid command"
  exit 127
fi
 
ok="false"
for arg in $command
do
  if [ "N$arg" = "N--server" ] ; then
     ok="true"
  fi
done
   
if [ "$ok" = "true" ] ; then
  $RSYNC_HOME/$command
else
  echo "ERROR: Invalid rsync request"
  exit 127
fi

Once this wrapper is in place you can test it. Try ssh'ing from the backup server to the mail_server. You will be denied the connection because you are asking to execute the shell (bash, sh, tcsh, ect) and not rsyc. If you instead run the backup script then the rsync command will be sent and the connection will be accepted.

Questions?

Do you have more examples using rsync?Yes we do. Check on the moneyslow.com Home Page for instructions on automated server and user based backups. We go through the options of declaring backup schedules, ignoring specified file types and compressing long term archives. We also have tips and hints for incremental backups.