Rsync is one of the most powerful open source tools you can use. It allows you to specify a list of files or a directory structure and make an exact copy to another location. What is different in rsync, than copy for example, is it will only copy the data that is different from the source and target locations.
If you have 100 gigs of data you need to backup, but only 4 gig has changed since the last copy you made, there is no need to copy over the whole 100 gigs again. Doing so would waste your time and put a strain on network bandwidth. You also do not want to go through the files manually using "find" and figure out what changed. This is where rsync comes in. You can set up rsync to mirror a list of directories and/or files and compare that list against the target location where you want the files to be backed up. The efficiency of rsync comes from the method used to backup the data.
When data is added or deleted from a file the entire file does not change, only part of it does. Rsync recognizes this and only copies over the changed data. If you had a 100 meg file and only 1 meg changed, then only 1 meg would need to be transfered. Very efficient indeed.
Let's setup backups for a local user and put the files onto the local machine. If the user deletes their own files they can check the /BACKUP directory and restore the files themselves. This saves you time by not having to due files restores all day long and shifts the work of restoring the backups to the users.
In this excercise we are going to use a set of scripts to backup a list of user files and directories to a target location called /BACKUP. The files from the source and target will be compared and mirrored. We will also use a file to list out patterns of files to be excluded from the backups. Finally, we will compress the backup directory at a specified time so we can maintain a long term archive.
The files being backed up will be put into the /BACKUP directory under the host name of the local machine. /BACKUP can be a real directory on the current disk or a symlink to another larger backup disk. The reason we are using /BACKUP is we can keep the backup directory on all the machines the same if we use this scheme in a multi-machine environment. A user will only need to know that all backups are in /BACKUP on any machine they use. The reason we using the directory under /BACKUP as the host name is because we are also going to have compressed archived files in /BACKUP. It is also a gentle reminder what backup files on what machine one is looking at especially if you are working in a large multi-machine enviorment.
For our example we will be setting up the backup scripts and config files in /tools/BACKUP_config to keep maintenance tasks simple. The admin as well as the user will access this directory to edit the file backup and exclude files list. Only the admin will be able to access the backup and compress scripts. The following files will be placed in /tools/BACKUP_config:
In the following text window you will find the BACKUP_MasterScript.sh shell script. The script will change directory to /tools/BACKUP_Config/ and run rsync. Rsync will exclude any file patterns found in the "BACKUP_filter" file and include all files and directories in "BACKUP_user". The backups will be placed into /BACKUP/`hostname -s` where `hostname -s` is the short hostname of the local machine. If the hostname is "blackbox.domain.work" then the short name is "blackbox". Finally, the script will send email of the backup output to root and tag a line in the logs mentioning the backups have completed.
#!/bin/sh # ## moneyslow.com BACKUP_MasterScript.sh # cd /tools/BACKUP_Config/ # # backup data listed in BACKUP_user # (add --dry-run for testing) rsync -Ravx --timeout=30 --delete-excluded --exclude-from=BACKUP_filter `cat BACKUP_user` /BACKUP/`hostname -s` | mail -s "Backup DONE for `hostname -s`" root # # time stamp when the backup scripts completes logger "BACKUP_Master Completed for `hostname`"
The text file BACKUP_user is the list of files and dirctories that will be mirrored to /BACKUP. The user will edit this file and add what they want to be backed up. Notice directories end with a forward slash and files do not. In the example below the directories /important/documents/, /tools/ and /home/user/ will be mirrored as well as the single file /data/some_file.
# ## moneyslow.com BACKUP_user # /important/documents/ /tools/ /home/user/ /data/some_file
This text file called BACKUP_filter is the list of files or regular expression patterns you DO NOT want to be backup up. This may be for security reasons or just to save space. For example, we do not need any ogg music files or iso images backed up if they are found in any BACKUP_user specified directories. We also do not care about the openoffice and gnome files and directories. The password_list file is also exempt.
# ## moneyslow.com BACKUP_filter # *.ogg *.iso .openoffice* .gnome* password_list
At this point you should have the three files we talked about in your /tools/BACKUP_config directory. They are BACKUP_MasterScript.sh, BACKUP_user, and BACKUP_filter. The script BACKUP_MasterScript.sh should be owned by root and have the permissions 700. The text files BACKUP_user and BACKUP_filter should be owned by the user of the files we are going to backup and set with permissions 600. The /BACKUP directory itself should be root owned and perms or 755. Finally, the /BACKUP/`hostname -s` (blackbox is the hostname in this example) directory should be made and root owned with perms 755. Your /tools/BACKUP_Config/ and /BACKUP should look something like so:
[root@machine]# ls -la /tools/BACKUP_Config/ drwxr-xr-x 2 root root 1234 Jan 10 2020 . drwxr-xr-x 2 root root 1234 Jan 10 2020 .. -rwx------ 1 root root 123 Jan 10 2020 BACKUP_Compress.sh -rw------- 1 user user 12 Jan 10 2020 BACKUP_filter -rwx------ 1 root root 123 Jan 10 2020 BACKUP_MasterScript.sh -rw------- 1 user user 123 Jan 10 2020 BACKUP_user ...and... [root@machine]# ls -la /BACKUP/ drwxr-xr-x 3 root root 1234 Jan 10 2020 . drwxr-xr-x 4 root root 1234 Jan 10 2020 .. drwxr-xr-x 6 root root 1234 Jan 10 2020 blackbox
It is now time to test out the backup scripts. Remember you can also insert the "--dry-run" argument into the rsync line to run the script, but not do any work. Now, change to /tools/BACKUP_Config and execute the BACKUP_MasterScript.sh script. It will not show and feedback in the terminal as all the output will be sent to root in an email. When the script finishes you should see the directories and files you specified in the BACKUP_user file in /BACKUP/blackbox.
If you do not see the the files you expected in the /BACKUP directory then you can always run the rsync line by hand. Remove the "|" (pipe) that sends the output to mail and you will see the errors in the terminal.
When you do see that the backups are working correctly then look at using bzip2 to compress the files into an archive in the following section.
Once in a while the backups in the /BACKUP/hostname directory structure should be archived. This script will bzip2 compress the entire tree including uid/gid permissions. The compressed files are named using the pattern "archive_", the short hostname, today's date, and end with .tar.bz2 and saved in /BACKUP. The old archive files will be deleted with the "find" command if they are older than 180 days.
#!/bin/sh # ## moneyslow.com BACKUP_Compress.sh # # delete old archives after 180 days find /BACKUP/archive_* -type f -mtime +180 -exec rm {} \; # # compress latest backup to archive cd /tools/BACKUP/;tar jcf archive_`hostname -s`_`date +%Y.%m.%d`.tar.bz2 `hostname -s`/; chown root:root archive_* # # time stamp when the backup scripts completes logger "BACKUP_Compress Completed for `hostname`"
After you run the BACKUP_Compress.sh script you should see an archive file in your /BACKUP directory. Perhaps something similar to the following:
[root@machine]# ls -la /BACKUP/ drwxr-xr-x 3 root root 4096 Jan 10 2020 . drwxr-xr-x 4 root root 4096 Jan 10 2020 .. -rw-r--r-- 6 root root 4321567 Jan 10 2020 archive_blackbox_2020.01.10.tar.bz2 drwxr-xr-x 6 root root 4096 Jan 10 2020 blackbox
Now that we have the backups working and the archive process down, lets setup cron to run the backups automatically at off business hours. We will set the backups to run at 3am every other day and make compressed archives every Monday morning at 5am.
#minute (0-59) #| hour (0-23) #| | day of the month (1-31) #| | | month of the year (1-12 or Jan-Dec) #| | | | day of the week (0-6 with 0=Sun or Sun-Sat) #| | | | | commands #| | | | | | ### Automated Backups 0 3 * * */2 /tools/BACKUP_Config/BACKUP_MasterScript.sh >> /dev/null 2>&1 0 5 * * 1 /tools/BACKUP_Config/BACKUP_Compress.sh >> /dev/null 2>&1
Do you have more examples using rsync?Yes we do. Check on the moneyslow.com Home Page for instructions on automated server and user based backups. We go through the options of declaring backup schedules, ignoring specified file types and compressing long term archives. We also have tips and hints for incremental backups.