wolfie
14-02-2005 09:43:08
Okay, I have finally solved my problem with rdiff-backup and would like some feedback to see if I can determine the root of the problem.
I was using rdiff-backup to backup certain directories only. I was not interested in being able to restore a whole machine from backup, I just wanted some assurance that if the box died, I would not have to create everything from scratch.
The dirs of interest were /etc /home/myusername /root /var. Now for etc and the home directories it was not that big of a problem as far as files changing while the backup was going so nothing important was need to deal with that. /var was a different story, so I did an lvm snapshot to keep things from changing during backup.
That is just background and now on to the good stuff. The root of the problem is that I would get Assertion Errors that complained about things being out of order. That is just great but I really didn't understand that error. So for the longest time I just ignored the errors and pretended everything was getting backed up. :) I then broke out the /var/log portion of my rdiff-backup script into a seperate statement. Things started completing without errors.
That was great that part was working but the other ones were still erroring out. Now I did have a lot of excludes to prevent sockets and even symlinks from being backed up but yet no help. I then moved the /var/www to its own statement and low and behold it as well started working, and even the home directories and /etc were now backing up (they are still in the same statement together).
This is quite bewildering and not sure of the logic of all of this. Is it a total number of files limitation or is it something else that my non-python self is running into. I have not tried to trim my excludes to see if things would start working better with the sockets and the symlinks now that things are working again, but I am thinking about trying to see if the results would be different.
Any thoughts or conjectures as to this behavior?
Thanks,
wolfie
robertngreen
14-02-2005 14:37:59
We have been using rdiff backup here at the office for a whille now. Some of the problems that you are seeing are new to me. We have not had any problems with symlinks or sockets, ridiff deals with them apropirately. BTW we are using the version from the web site (we are using 0.12.7-1). I do see occationally that it will ignore a socket and sometimes specaular crashes.
One of the major problems that we had was a mother board that had some bad capacitors on itt. That would crash ridiff nicely. We do see problems where a machine will not completely backup but the next night it will go fine.
It is a little odd that you are having to brreak out as much as you have. Just out of couriosity could you post a copy of the script you use?
Here is what ours looks liike:
EXCLUDES="
--exclude /home
--exclude /net
--exclude /proc
--exclude /sys
--exclude /dev/pts
--exclude /dev/shm
--exclude /var/cache
--exclude /var/tmp
--exclude /var/lib/slocate
--exclude /var/lib/mysql/mysql.sock
--exclude /var/run/dbus/system_bus_socket
--exclude /var/spool/courier/sqwebmail.sock
--exclude /var/spool/courier/authdaemon/socket
--exclude /mnt
--exclude /tmp
--exclude /dev
--exclude /export/home/*/.netscape/*cache
--exclude /export/home/*/.mozilla/*/*.slt/Cache
--exclude /export/home/*/.phoenix/*/*.slt/Cache*
--exclude /export/home/*/.opera/cache*
--exclude /export/home/*/.galeon/mozilla/galeon/Cache
--exclude /export/home/*/.kde/share/apps/kio_http/cache
--exclude /export/home/*/.kde/share/apps/nsplugins/cache
--exclude /export/home/*/.jpi_cache
--exclude /export/home/*/.netbeans/*/cache
--exclude /export/home/*/.gimp*/gimpswap.*
--exclude /export/home/*/.ee
--exclude /export/home/*/tmp"
echo wonko
rdiff-backup --ssh-no-compression $EXCLUDES \
--exclude /export/home/james/.pub/OCR wonko::/ /net/backup/wonko
echo vortex
rdiff-backup --ssh-no-compression --include /home $EXCLUDES \
--exclude /var/spool/squid vortex::/ /net/backup/vortex
echo drdan
rdiff-backup --ssh-no-compression $EXCLUDES --exclude /scans \
drdan::/ /net/backup/drdan
echo prefect
rdiff-backup --ssh-no-compression $EXCLUDES prefect::/ /net/backup/prefect
echo fenchurch
rdiff-backup --ssh-no-compression $EXCLUDES --exclude /export/tunes \
fenchurch::/ /net/backup/fenchurch
echo lunkwill
rdiff-backup --ssh-no-compression $EXCLUDES lunkwill::/ /net/backup/lunkwill
echo bistro
rdiff-backup $EXCLUDES \
--exclude /export/net/ocr/work --exclude /export/net/images \
--exclude /export/net/scans --exclude /export/net/backup \
--exclude /export/net/images / /net/backup/bistro
I have not been overly impressed with rdiff's robustness. The idea is good but the implementation falls a bit short. Also have fun restoring files :-D Not sure this has helped any but I has seen some of the same problems as well.
wolfie
14-02-2005 16:21:45
I excluded the sockets and symlinks to try and resolve the issues I was having, but not sure if that ultimately helped in the end. Restoring files from what I have seen is pretty messy and that is by far it's worst feature, me thinks.
below is my script, not quite as pretty and as organized as yours (my excludes are all inline) but none the less here it is with names removed to protect the innocent. :) I have tried to make it a little more readable.
#!/bin/bash
# rdiff_command
#
# Small script to run rdiff-backup on all the files that I want
# replicated/backed up. This script has been modified to make
# lvm snapshots to help prevent errors.
#
# Creator: Jeff Wolf
#
# Last modified: 2/8/2005
#
#
# This makes a snapshot of /var since it can cause problems because of
# open files.
ssh webserver lvcreate -L1G -s -n varbackup /dev/vg/var
# This mounts the snapshot for backup.
ssh webserver mount -o ro,sb=131072 -t ext3 /dev/vg/varbackup /mnt/varbackup
sleep 60
# This is to backup webserver to fileserver with problem or uneeded directories
#removed.
rdiff-backup -v7 --print-statistics
--exclude /etc/apache
--exclude /etc/apache2/apache2-builtin-mods
--exclude /etc/apache2/lib
--exclude /etc/apache2/modules
--exclude /etc/filesystems
--exclude /etc/modules.autoload
--exclude /etc/init.d/depscan.sh
--exclude /etc/init.d/runscript.sh
--exclude /etc/init.d/functions.sh
--exclude /etc/localtime
--exclude /etc/rmt
--exclude /etc/make.profile
--exclude /etc/X11/rstart/commands/x11
--exclude /etc/X11/rstart/commands/x
--exclude /etc/X11/rstart/contents/x11
--exclude /etc/X11/rstart/contents/x
--exclude /etc/X11/xdm/authdir
--exclude /etc/X11/xkb
--exclude /etc/X11/X
--exclude /etc/ssl/certs/
--exclude /etc/apache2/conf/php.ini
--exclude /etc/apache2/modules
--exclude /etc/apache2/extramodules
--exclude /etc/apache/logs
--exclude /etc/php4/lib
--exclude /etc/php4/php.ini
--exclude /etc/init_d.old/
--exclude /etc/php/apache1-php4/lib
--exclude /etc/php/apache2-php4/lib
--exclude /etc/terminfo/v/vt200
--exclude /etc/apcupsd/powerout
--exclude /etc/spamassassin
--exclude /etc/lvmconf
--exclude /etc/lvmtab.d
--exclude /etc/mtab
--exclude /etc/runlevels
--include /etc
--exclude /home/username/.maildir/.uidvalidity
--exclude /home/username/.maildir/courierimaphieracl
--exclude /home/username/.maildir/courierimapkeywords
--exclude /home/username/.maildir/courierpop3dsizelist
--include /home/username
--exclude /root/sync_output.log
--exclude /root/webalizer_output.log
--include /root
--exclude /mnt/varbackup/run
--exclude /mnt/varbackup/tmp
--exclude /mnt/varbackup/lib/ntp
--exclude /mnt/varbackup/KV
--exclude /mnt/varbackup/cache
--exclude /mnt/varbackup/db
--exclude /mnt/varbackup/empty
--exclude /mnt/varbackup/fileVfNsq7
--exclude /mnt/varbackup/lib/clamav
--exclude /mnt/varbackup/lib/courier-imap
--exclude /mnt/varbackup/lib/mysql
--exclude /mnt/varbackup/lib/init.d
--exclude /mnt/varbackup/lib/slocate
--exclude /mnt/varbackup/lock
--exclude /mnt/varbackup/log
--exclude /mnt/varbackup/spool/cron/lastrun
--exclude /mnt/varbackup/www
--include /mnt/varbackup
--exclude '**' webserver::/ fileserver::/home/export/backup/webserver
echo ""
echo "*****************************************************************"
echo ""
echo "Finished with Files now progressing to logs!"
echo ""
echo "*****************************************************************"
echo ""
sleep 30
# This is to backup system logs on webserver to fileserver.
rdiff-backup -v7 --print-statistics --include /mnt/varbackup/log --exclude '**' webserver::/ fileserver::/home/export/backup/webserverlogs
echo ""
echo "*****************************************************************"
echo ""
echo "Finished with logs now progressing to www!"
echo ""
echo "*****************************************************************"
echo ""
sleep 30
# This is to backup the www directory on webserver to fileserver.
rdiff-backup -v7 --print-statistics --include /mnt/varbackup/www --exclude '**' webserver::/ fileserver::/home/export/backup/webserverwww
sleep 60
# This unmounts the snapshot.
ssh webserver umount /mnt/varbackup
# This removes the snapshot
ssh webserver lvremove -f /dev/vg/varbackup
By the way thanks for posting yours rdiff script, I am now going to try and tidy mine up a bit.
robertngreen
15-02-2005 12:58:13
It looks llike you are running rdiff on one computer backing up a second computer to a third one. You might try running rdiff from the machine you are backing up to which would help remove a layer of problems. We have one box that we backup to and it initiates the backup of all of the other computers includeing itself. We also run a second set that is initiated from off site over a VPN.
wolfie
15-02-2005 13:06:52
Yes, that is correct. I believe I have solved my problems and have eliminated a ton of excludes just by breaking up the different dirs. If I get some time I might set it up that way, but for now I am cool. You might be on to something since it does depend on the length of the backup. If it has a lot of files to check through it does seem to fail. I am wondering if there is some timeout happening that is causing some of the errors because of the "proxy".
Well, if I make the change I will let you know, thanks for the conjecture.
wolfie
20-02-2005 10:51:26
Robert,
Good call on the "proxied" rdiffbackup. I have changed the script (to be much simpler) and launched it from the machine I was backing up to and all the problems seem to have disapeared and things are behaving quite nicely. Looks like they still have some work to do on getting that working correctly.
Thanks,
wolfie
robertngreen
21-02-2005 13:24:42
Yea I my expericance rdiff seems to be a bit fragile. It also doesn't seem to deal with many things very gracefully. I personally would like to re-write it but there just is not enough time or money to do at this point.