As part of the support strategy for MkLinux (described last month), Apple has set up an FTP server (ftp://mklinux.apple.com) and a WWW server (http://www.mklinux.apple.com). In practice, both services are performed by the same machine.

It soon became apparent that FTP traffic was degrading response times for both WWW and Telnet (administrative) users. It also became apparent that much of the impact could be alleviated by slowing down a few long-running FTP sessions.

In order to understand this, some background may be useful. The MkLinux DR1 distribution is quite large. A minimal system is about 105 MB; with full sources, it comes to 226 MB. A 28.8 Kbps modem, with no contention, can download about 11 MB/hour. 105 MB thus takes at least nine hours to transfer; 226 MB takes at least twenty hours.

Long downloads like these may seem a bit silly, but we did not wish to prohibit them entirely. On the other hand, it did not seem reasonable to let the server's resources be swamped by them. "update" customers (who only need a megabyte or so) should be able to download their files on a priority basis. WWW and Telnet customers should receive even better treatment, lest they go away in disgust.

A Nice Hack

nice(1) runs a specified command at a given "nice value", ranging from -20 to +19, with a default value of zero. renice(8) adjusts the nice level of running processes. Paraphrasing the SunOS 4.1.3 manual page:

The nice value is one of the factors used by the kernel to determine a process's scheduling priority. The higher the value, the lower the command's scheduling priority; the lower the value, the higher the command's scheduling priority. In addition to the nice value, the kernel also considers recent CPU usage by the process, the time the process has been waiting to run, and other factors to arrive at scheduling priority.

Although nice levels are easy to misuse, they are a very handy way to fine tune the activities of the UNIX scheduler. To avoid administrative problems, use of the renice command is rather carefully limited:

Users other than the super-user may only alter the priority of processes they own, and can only monotonically increase their "nice value" within the range 0 to 20. (This prevents overriding administrative fiats.) The super-user may alter the priority of any process and set the priority to any value in the range -20 to 19. Useful nice values are 19 (the affected processes will run only when nothing else in the system wants to), (the default nice value) and any negative value (to make things go faster).

In our case, we wished to make all FTP sessions "nicer" than any other system activities. We then wanted to make long-running FTP sessions nicer yet. We accomplished this with a simple shell script (tune_ftp).

Every five minutes, tune_ftp wakes up and runs ps(1), obtaining the nice levels of all FTP daemons. It then uses renice to adjust their nice levels, as follows:

The actual tune_ftp code (available by request) isn't all that interesting. In fact, due to AIX peculiarities, it's a bit hacky. Here, however, is some boiled-down pseudo-code:
      while :; do	# loop forever
        ps ...     |	# get pids, nice levels, etc.
        awk ...    |	# create  renice commands
        sh		# execute renice commands
        sleep 300	# go away for five minutes
      done
    
In practice, tune_ftp gives new FTP sessions a few minutes of "free" time, then bumps them to level five. Over time, their nice levels drift up, but short-lived sessions never get very high. Longer-lived sessions drift up to 19 (in an hour or so), where they stay. Here is a graph of typical nice levels for FTP sessions on mklinux:
                           1
       0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9
       =======================================

       3         2 1 2 4 3 2         1       8
    

We see a few new sessions at level zero, followed by a clump between levels five and ten. We then see an outlier at fifteen and a large clump at 19. This appears to correspond nicely with our expectations.

So What?

The actual tune_ftp code may not be all that interesting, but some of the underlying technology is. First, note that we are dynamically tuning the behavior of the UNIX operating system. UNIX is designed as a general-purpose timesharing system, but it can be tweaked.

Second, note that our "dynamic scheduling priority modification daemon" is, in fact, a trivial shell script. Administrators should not feel that they need to hack kernels in order to tune system performance. In many cases, a far simpler solution exists. Even when the kernel needs adjustment, the required change is usually limited to a single line in a configuration file.

Finally, shell scripters should note the general structure of the script. A system command (with suitable arguments) is used to retrieve needed information. awk is used to process the tabular output data, generating a sequence of commands. The commands are fed into an instance of the shell for execution. These are all useful idioms to remember.