Next: 3.8 Security In Condor Up: 3 Administrators' Manual Previous: 3.6 Setting Up IP/Host-Based

Subsections

3.7 Setting up Condor for Special Environments

The following section describes how to setup Condor for use in a number of special environments or configurations.

3.7.1 Using Condor with AFS

If you are using AFS at your site, be sure to read section 3.3.5 on ``Shared Filesystem Config Files Entries'' for details on configuring your machines to interact with and use shared filesystems, AFS in particular.

Condor does not currently have a way to authenticate itself to AFS. This is true of the Condor daemons that would like to authenticate as AFS user Condor, and the condor_shadow, which would like to authenticate as the user who submitted the job it is serving. Since neither of these things can happen yet, there are a number of special things people who use AFS with Condor must do. Some of this must be done by the administrator(s) installing Condor. Some of this must be done by Condor users who submit jobs.

AFS and Condor for Administrators

The most important thing is that since the Condor daemons can't authenticate to AFS, the LOCAL_DIR (and it's subdirectories like ``log'' and ``spool'') for each machine must be either writable to unauthenticated users, or must not be on AFS. The first option is a VERY bad security hole so you should NOT have your local directory on AFS. If you've got NFS installed as well and want to have your LOCAL_DIR for each machine on a shared file system, use NFS. Otherwise, you should put the LOCAL_DIR on a local partition on each machine in your pool. This means that you should run condor_install to install your release directory and configure your pool, setting the LOCAL_DIR parameter to some local partition. When that's complete, log into each machine in your pool and run condor_init to set up the local Condor directory.

The RELEASE_DIR, which holds all the Condor binaries, libraries and scripts can and probably should be on AFS. None of the Condor daemons need to write to these files, they just need to read them. So, you just have to make your RELEASE_DIR world readable and Condor will work just fine. This makes it easier to upgrade your binaries at a later date, means that your users can find the Condor tools in a consistent location on all the machines in your pool, and that you can have the Condor config files in a centralized location. This is what we do at UW-Madison's CS department Condor pool and it works quite well.

Finally, you might want to setup some special AFS groups to help your users deal with Condor and AFS better (you'll want to read the section below anyway, since you're probably going to have to explain this stuff to your users). Basically, if you can, create an AFS group that contains all unauthenticated users but that is restricted to a given host or subnet. You're supposed to be able to make these host-based ACLs with AFS, but we've had some trouble getting that working here at UW-Madison. What we have instead is a special group for all machines in our department. So, the users here just have to make their output directories on AFS writable to any process running on any of our machines, instead of any process on any machine with AFS on the Internet.

AFS and Condor for Users

The condor_shadow process runs on the machine where you submitted your Condor jobs and performs all file system access for your jobs. Because this process isn't authenticated to AFS as the user who submitted the job, it will not normally be able to write any output. So, when you submit jobs, any directories where your job will be creating output files will need to be world writable (to non-authenticated AFS users). In addition, if your program writes to stdout or stderr, or you're using a user log for your jobs, those files will need to be in a directory that's world-writable.

Any input for your job, either the file you specify as input in your submit file, or any files your program opens explicitly, needs to be world-readable.

Some sites may have special AFS groups set up that can make this unauthenticated access to your files less scary. For example, there's supposed to be a way with AFS to grant access to any unauthenticated process on a given host. That way, you only have to grant write access to unauthenticated processes on your submit machine, instead of any unauthenticated process on the Internet. Similarly, unauthenticated read access could be granted only to processes running your submit machine. Ask your AFS administrators about the existence of such AFS groups and details of how to use them.

The other solution to this problem is to just not use AFS at all. If you have disk space on your submit machine in a partition that is not on AFS, you can submit your jobs from there. While the condor_shadow is not authenticated to AFS, it does run with the effective UID of the user who submitted the jobs. So, on a local (or NFS) file system, the condor_shadow will be able to access your files normally, and you won't have to grant any special permissions to anyone other than yourself. If the Condor daemons are not started as root however, the shadow will not be able to run with your effective UID, and you'll have a similar problem as you would with files on AFS. See the section on ``Running Condor as Non-Root'' for details.

3.7.2 Full Installation of condor_compile

In order to take advantage of two major Condor features: checkpointing and remote system calls, users of the Condor system need to relink their binaries. Programs that are not relinked for Condor can run in Condor's ``vanilla'' universe just fine, however, they cannot checkpoint and migrate, or run on machines without a shared filesystem.

To relink your programs with Condor, we provide a special tool, condor_compile. As installed by default, condor_compile works with the following commands: gcc, g++, g77, cc, acc, c89, CC, f77, fort77, ld. On Solaris and Digital Unix, f90 is also supported. See the condor_compile(1) man page for details on using condor_compile.

However, you can make condor_compile work transparently with all commands on your system whatsoever, including make.

The basic idea here is to replace the system linker (ld) with the Condor linker. Then, when a program is to be linked, the condor linker figures out whether this binary will be for Condor, or for a normal binary. If it is to be a normal compile, the old ld is called. If this binary is to be linked for condor, the script performs the necessary operations in order to prepare a binary that can be used with condor. In order to differentiate between normal builds and condor builds, the user simply places condor_compile before their build command, which sets the appropriate environment variable that lets the condor linker script know it needs to do its magic.

In order to perform this full installation of condor_compile, the following steps need to be taken:

1.: Rename the system linker from ld to ld.real.
2.: Copy the condor linker to the location of the previous ld.
3.: Set the owner of the linker to root.
4.: Set the permissions on the new linker to 755.

The actual commands that you must execute depend upon the system that you are on. The location of the system linker (ld), is as follows:

	Operating System              Location of ld (ld-path)
	Linux                         /usr/bin
	Solaris 2.X                   /usr/ccs/bin
	OSF/1 (Digital Unix)          /usr/lib/cmplrs/cc

On these platforms, issue the following commands (as root), where ld-path is replaced by the path to your system's ld.

        mv /[ld-path]/ld /[ld-path]/ld.real
        cp /usr/local/condor/lib/ld /[ld-path]/ld
        chown root /[ld-path]/ld
        chmod 755 /[ld-path]/ld

On IRIX, things are more complicated in that there are multiple ld binaries that need to be moved, and symbolic links need to be made in order to convince the linker to work, since it looks at the name of it's own binary in order to figure out what to do.

        mv /usr/lib/ld /usr/lib/ld.real
        mv /usr/lib/uld /usr/lib/uld.real
        cp /usr/local/condor/lib/ld /usr/lib/ld
        ln /usr/lib/ld /usr/lib/uld
        chown root /usr/lib/ld /usr/lib/uld
        chmod 755 /usr/lib/ld /usr/lib/uld
        mkdir /usr/lib/condor
        chown root /usr/lib/condor
        chmod 755 /usr/lib/condor
        ln -s /usr/lib/uld.real /usr/lib/condor/uld
        ln -s /usr/lib/uld.real /usr/lib/condor/old_ld

If you remove Condor from your system latter on, linking will continue to work, since the condor linker will always default to compiling normal binaries and simply call the real ld. In the interest of simplicity, it is recommended that you reverse the above changes by moving your ld.real linker back to it's former position as ld, overwriting the condor linker. On IRIX, you need to do this for both linkers, and you will probably want to remove the symbolic links as well.

3.7.3 Installing the condor_kbdd

The condor keyboard daemon (condor_kbdd) monitors X events on machines where the operating system does not provide a way of monitoring the idle time of the keyboard or mouse. In particular, this is necessary on Digital Unix machines and IRIX machines.

NOTE: If you are running on Solaris, Linux, or HP/UX, you do not need to use the keyboard daemon.

Although great measures have been taken to make this daemon as robust as possible, the X window system was not designed to facilitate such a need, and thus is less then optimal on machines where many users log in and out on the console frequently.

In order to work with X authority, the system by which X authorizes processes to connect to X servers, the condor keyboard daemon needs to run with super user privileges. Currently, the daemon assumes that X uses the HOME environment variable in order to locate a file named .Xauthority, which contains keys necessary to connect to an X server. The keyboard daemon attempts to set this environment variable to various users home directories in order to gain a connection to the X server and monitor events. This may fail to work on your system, if you are using a non-standard approach. If the keyboard daemon is not allowed to attach to the X server, the state of a machine may be incorrectly set to idle when a user is, in fact, using the machine.

In some environments, the keyboard daemon will not be able to connect to the X server because the user currently logged into the system keeps there authentication token for using the X server in a place that no local user on the current machine can get to. This may be the case if you are running AFS and have the user's X authority file in an AFS home directory. There may also be cases where you cannot run the daemon with super user privileges because of political reasons, but you would still like to be able to monitor X activity. In these cases, you will need to change your XDM configuration in order to start up the keyboard daemon with the permissions of the currently logging in user. Although your situation may differ, if you are running X11R6.3, you will probably want to edit the files in /usr/X11R6/lib/X11/xdm. The Xsession file should have the keyboard daemon startup at the end, and the Xreset file should have the keyboard daemon shutdown. As of patch level 4 of Condor version 6.0, the keyboard daemon has some additional command line options to facilitate this. The -l option can be used to write the daemons log file to a place where the user running the daemon has permission to write a file. We recommend something akin to $HOME/.kbdd.log since this is a place where every user can write and won't get in the way. The -pidfile and -k options allow for easy shutdown of the daemon by storing the process id in a file. You will need to add lines to your XDM config that look something like this:

	condor_kbdd -l $HOME/.kbdd.log -pidfile $HOME/.kbdd.pid

This will start the keyboard daemon as the user who is currently logging in and write the log to a file in the directory $HOME/.kbdd.log/. Also, this will save the process id of the daemon to /.kbdd.pid, so that when the user logs out, XDM can simply do a:

	condor_kbdd -k $HOME/.kbdd.pid

This will shutdown the process recorded in /.kbdd.pid and exit.

To see how well the keyboard daemon is working on your system, review the log for the daemon and look for successful connections to the X server. If you see none, you may have a situation where the keyboard daemon is unable to connect to your machines X server. If this happens, please send mail to condor-admin@cs.wisc.edu and let us know about your situation.

3.7.4 Installing a Checkpoint Server

$\fbox {This section has not yet been written}$

3.7.5 Installing PVM Support in Condor

To install support for PVM in Condor, download the file archive from http://www.cs.wisc.edu/condor/condor-pvm and follow the directions found the INSTALL file contained in the archive.

Next: 3.8 Security In Condor Up: 3 Administrators' Manual Previous: 3.6 Setting Up IP/Host-Based

Derek Wright
5/22/1998