|
| ||||||
Introduction to Rdist"Using rdist doesn't make sense unless you are dealing with a sufficiently large number of machines---let's say at least two." (Markus Moster) IntroductionIn a NutshellSynchronizing files and file hierarchies across multiple machines is an ever-recurring task in almost any reasonably-sized computer network environment. Since the arrival of BSD 4.3 a widely ignored tool called rdist is available to deal with this task. At least among machines running Un*x-style operating systems, that is. Simple Real World ExamplesWhat does rdist provide? The basic functionality is best demonstrated with some simplified real-world examples. First consider a WAN connected through low-bandwidth (64kbit/s) links. At about 40 sites there exist general-purpose servers. These servers provide some software repository to say twenty clients each. Whenever a user feels like it s/he may install software from that repository on his/her desktop client. The software in the repository is frequently updated and contains about 500 MB worth of software. Using rdist you simply define one server as the "master" or "reference" machine. Whenever you've modified the repository on that machine you run rdist, for large modifications preferably overnight, and it updates the remote servers without further sysadmin intervention. Since rdist will only update files that have changed this provides a very bandwidth and time preserving way to deal with this distribution issue. If you ever fear that for some reason a repository got out of sync you re-run rdist. Unless things actually have changed this will have little impact on the servers or the bandwidth available in between. Of course the same approach works for the document roots of multiple web servers behind a load balancer, too. Consider the same servers again. Most of their configuration is the same on all machines, or at least should be. If you set them up the same and then use rdist to distribute all configuration files that shouldn't be adjusted individually you won't have to update 40 /etc/profile's by hand (eventually getting things out of sync anyway) but rather do that only once and then run rdist. If you want to test new configurations you do so on only one machine. If things work out you run rdist from that machine, otherwise you can easily do a rollback by running rdist from one of the remaining functional machines. This has proven to be particularly useful in cluster environments where cluster members must be kept identically configured and no excuses. Guess what Markus Moster (see the quote above) is habitually working on. If you are responsible for quality control in a software development team you may want to test the installation packages you get from the development team in a clearly defined test environment. Of course you might want to reinstall those machines either from backup or through an install server (like a Sun[TM] Solaris[TM] JumpStart[TM]). You might however want to find out what system files the package installation has messed up. If you define a reference file system somewhere rdist lets you do all this with a reasonable amount of trouble. And imagine the educational value of test machines that are always in a clearly defined state every morning. In a security-sensitive environment where you need to maintain a consistent user base on multiple machines but can afford neither the security issues of NIS nor the hassles and Solaris-onliness of NIS+ you can easily set up one machine as the "master" passwd server and distribute whatever /etc/passwd, /etc/shadow, /etc/groups and such it has through cron to all other machines. New user accounts get created on that machine or people change their password on that machine and maybe a minute later the new account or password is active on all machines. How rdist WorksHow does rdist provide this functionality? Well, assume that you have a machine master that wants to copy some files to a list of machines slave_1 to slave_n. Here's what happens:
This general behaviour can be modified in various ways. Among others you can
Setting Up rdistHere's an outline how to set up rdist to work in your environment. If you want to use it in conjunction with ssh make sure that ssh is running first.
Now the environment is ready to use rdist. At least for one-shot jobs, that is. The real power of rdist is in its distfiles however, so read on. Elementary DistfilesBasicsA distfile defines what files and file hierarchies to copy where and with which options. You may specify a particular distfile using the "-f distfile" option. Otherwise rdist will default to a default filename "Distfile" in the current working directory. The syntax is somewhat similar to a standard makefile. Here's a distfile that'd copy our previous test file /tmp/distribute-me to some slave machine slave_1. # Distfile 1 /tmp/distribute-me -> slave_1 install /var/tmp ; This distfile tells rdist to install our file on slave_1 in its /var/tmp directory. In detail the first line specifies the source file and the destination machine. The second line tells that the file specified needs to be installed in the /var/tmp directory of the slave. Now we may want to specify multiple files and/or multiple destination machines. Doing so we need to put parentheses around them, like this: # Distfile 2 ( /var/www /var/ftp ) -> ( slave_1 slave_2 slave_3 ) install -o remove /var ; This will copy the contents of both /var/www and /var/ftp to all three slave machines. The -o option to install allows us to specify a wide range of additional options. In this particular case we tell rdist to remove all files on the slaves that don't exist on the master. Use this option with care, a typo in the wrong spot may wipe out lots of data on the slaves. If you just want to see what would happen if you ran rdist without actually doing the update you may add an option "verify" to the install command. That'll make rdist do a dry run that doesn't modify anything on the target machine. See below how you add this option on the command line. Synchronizing directory trees with lots of small files will take some time even if no files need to be copied at all. To speed things up we may define makefile-style targets that can be invoked on the command line. So we redo our previous distfile again. # Distfile 3 www: /var/www -> ( slave_1 slave_2 slave_3 ) install -o remove /var ; ftp: /var/ftp -> ( slave_1 slave_2 slave_3 ) install -o remove /var ; Different to make an invocation of rdist that doesn't specify a particular target or set of targets will have all targets installed, not just the first one. So if we invoke rdist with this distfile it'll behave exactly like No.2. But if we apply an additional argument "www" or "ftp" to the end of our rdist invocation only the target of that name will be installed. There are other commands besides the "install" command we've seen. You may exclude files and/or file patterns from distribution using the "exclude" and "exclude_pat" commands. The former takes a list of file names as arguments and excludes them from installation: # Distfile 4 ftp: /var/ftp -> ( slave_1 slave_2 slave_3 ) install -o remove /var ; except /var/ftp/Distfile /var/ftp/incoming ; This will prevent rdist to install Distfile and the incoming directory. Even more powerful is the except_pat command. It will exclude files according to a ed(1) style regular expression: # Distfile 5 etc: /etc -> ( archivehost ) install -o remove /archive/master/etc ; except_pat ~$ ; Note that you need to escape the dollar character (marking the end of line) in this pattern. It will skip all files ending with a tilde character. VariablesSimilar to makefiles we may use variables in our distfiles. These variables provide a means to "recycle" definitions of file and target groups to simplify the maintenance of distfiles. Here's a very simple example that shows what variables may be good for: # Distfile 6 HOSTS= ( slave_1 slave_2 slave_3 ) FILES= ( /etc/hosts /etc/inet/ntp.conf /etc/services ) base-configs: ${FILES} -> ${HOSTS} install -owhole / ; So far this doesn't look particularly useful. It does however show how we use variables---we'll see what they're good for in the following section. The syntax for both variable definition and variable expansion is very much like in a makefile again, except that we must use curly braces around the variable name, not parentheses. Yes, you may skip the braces if your variable name is only a single character long. Set OperationsThe distfiles so far have basically distributed files using a one-to-one scheme. Especially in the last example we may however see a hint of the problems to come: What if we want to maintain configuration files for multiple machines running different Un*x versions? We may use set operations to define which files are host specific, OS specific or generally used. Here is another (simplified) example: # Distfile 7 # Consider your lists of hosts here SOLARISHOSTS= ( sun1 sun2 sun3 ) FREEBSDHOSTS= ( chuck1 chuck2 ) NETBSDHOSTS= ( pdp1 pdp2 ) OPENBSDHOSTS= ( marvin1 marvin2 ) LEAFNODES= ( sun1 sun2 ) # The list of files going to /etc (BSD) or /etc/inet (Solaris) here. INETCFG= ( /etc/services /etc/protocols /etc/hosts /etc/ntp.conf ) INSECUREINETCFG= ( /etc/inetd.conf ) This example shows what you can do with those set operations. Even though we maintain a single master copy of various configuration files we place them on on the destination machines in different directories according to their architecture and/or operating system. In the case of the "inetd.conf" we explicitly exclude the extra-paranoid OpenBSD installations because they don't run an inetd in our example. And for the Solaris workstations we make sure they don't try to route anything by installing the file "/etc/notrouter" on them. The exact syntax for set operations is this: Every individual object (like a file or target machine name) is considered a set. So is a sequence of objects surrounded in parentheses. So is a variable reference referring to an set. We may use the set operations "+", "&" and "-" on multiple sets to compute the union, intersection and exclusion of those sets, respectively. Use of these set operations may not be nested within an individual expression. You may however use intermediate variables to work around this. Yes, this is considered a silly limitation even within the man page. The "special" And "cmdspecial" CommandsIt may happen that you want to execute some sort of command on the target machines whenever you have changed a file. There are two commands to do so. The "special" command is executed for every file that has been updated by the rule it has been given while the "cmdspecial" command will be executed once when at least one of the files specified in a rule has been updated. The command specified is run on the target machine from within the home directory of the user you connect to. It is run through the Bourne shell so you may use whatever the shell has to offer. The "special" command has the environment variable "FILE" set to the path of the file on the master machine and "REMFILE" variable set to the path of the file on the target machine. The "cmdspecial" executes with the environment variable "FILES" containing a colon-separated list of remote file names that have been updated. The command string needs to be quoted using double quotes. I haven't bothered to figure out the exact details about quoting mechanisms. Instead I rather send a full-blown shell script to the target machine and then execute it. Here is an example. It logs whatever file is updated to the targets syslogd using logger. Once it has finished it also appends all the new files to a tar file. ( /usr /bin /sbin ) -> ( test1 test2 test3 ) install / ; special "logger Updated $REMFILE" ; cmdspecial "tar rf /tmp/rdist-update.tar `echo $FILES | tr : ' '`" Command Line OptionsYou may specify several command line options to rdist. Among the more useful are these:
There are several other options to rdist which are of less importance than the ones shown here. Consult the manual pages for the rest of them. Advanced FeaturesNow that we've mastered the basics of rdist we take a look at some additional features that are not of fundamental importance to the everyday use of rdist but still prove useful for more special purposes. Additional install OptionsThe "quiet" OptionIf you want to make the install command be less talkative you specify an additional option "quiet". As a personal preference however I rather keep the output from an rdist run in a file and either "grep -v" that file afterwards or run rdist again to get a short "everything is up to date"-style output. Modifying the Update ConditionsBy default rdist always updates files whenever the stat(2) syscall returns differences in file size, time stamps, permissions or owners. In case of the former two criteria the file content is updated, in case of the latter two only the permissions and ownership. This default behaviour can be modified using the following options to install.
Modifying the Update Behaviour
Message Logging and Change NotificationIt is possible to make rdist generate various kinds of logging output. By default it is already writing copious output to stdout when running. You may however want it to log even more things to stdout, the master or slave machines syslog facility or log file or an e-mail address. Since these features are somewhat beyond the scope of an "introductory" text and I personally rather use sed or perl to extract whatever I want from the output I won't go into detail about these. Take a look at the man page to find out about this---it has a section dedicated to this issue. There is also a mechanism available to generate lists of files that have changed since some reference file has been last modified. Again, see the man page. I personally rather run rdist with the -overify option instead. Strategic Considerations and Best PracticesNow that we've made it through the features of rdist it is time to consider approaches how to use it best. Here comes a list of issues worth to consider:
ConclusionThe features rdist provides are simple but extremely useful to handle groups of machines that share some common files that occasionally need to be updated in a systematic way. Considering the simplicity and scope of rdist it is surprising that it isn't way more widely known and used. I sincerely hope that this little documentation (which has become slightly longer than the original man page) will help to change this. |