Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 26 Jan 1999 12:53:31 -0600 (CST)
From:      Kevin Day <toasty@home.dragondata.com>
To:        hackers@FreeBSD.ORG
Subject:   High Load cron patches - comments?
Message-ID:  <199901261853.MAA15095@home.dragondata.com>

next in thread | raw e-mail | index | archive | help

--ELM917376811-8356-0_
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit


I have a somewhat unusual setup, where I have a server that several hundred
customers use, that has all of /home over an nfs mount, and each customer
has quite a few cron jobs that they like to execute every ten minutes or so.

The problem is that they all want to execute their cron jobs on a */10
minute frequency, so on every minute ending in '0', I suddenly have cron
spawning off a few hundred processes, bringing the machine's load average
above 15.0, and saturating my NFS link for quite a while.

No amount of pleading with my users did much good, since they were just
following a template given to them by the software they were using.

I talked briefly about this with Paul Vixie (cron's author), while we had
differing ideas about how to accomplish this, my patches have been running
for over a month now on a production system, and have worked very well.

These patches limit the number of jobs cron will start per second, with a
initial burst, a hard limit, as well as a 'burst mode', if the number of
jobs on the 'to do list' is geting excessively high.


Giving no options to cron makes it behave exactly as it did without the
patches.

The format for enabling this load balancing is as follows:

cron [-x debugflag[,...]] [-a addweight [-c tickdecay] [-t threshold]]

The -a parameter controls how many 'points' are added on every on execution
The -c parameter controls how many points are subtracted every second
The -t parameter controls how many points are necessary before queuing jobs,
instead of running them.

The flow is as follows:

        If (numpoints < threshold) {
                Execute Job
                numpoints += AddWeight;
        }

        Every second {
                numpoints -= tickdecay;
                if (reallybehindinrunningjobs) {
                        Turn on burst mode
                }        
        }
        

Burst mode will keep tuning itself higher and higher, the further behind
jobs get, until it's caught up. This prevents one user from putting 50,000 jobs
in their crontab from making cron start sucking ram like mad.


For me, cron -a 10 -c 100 -t 200  works very well. (Allow 10 jobs per
second, but allow 20 at the beginning to hurry things up)


Paul's idea was to limit the number of children cron has running at a time,
hwoever for me this wasn't effective, as the user's jobs tend to hang around
for a long time.

Can I get comments/suggestions about this?


Kevin


        

--ELM917376811-8356-0_
Content-Type: text/plain; charset=ISO-8859-1
Content-Disposition: attachment; filename=hlcron.patch
Content-Description: /home/toasty/hlcron.patch
Content-Transfer-Encoding: 7bit

--- ../oldcron/cron.c	Sat Jul 18 06:09:09 1998
+++ cron.c	Sat Jan  2 20:04:02 1999
@@ -47,16 +47,19 @@
 static void
 usage() {
     char **dflags;
 
-	fprintf(stderr, "usage: cron [-x debugflag[,...]]\n");
+	fprintf(stderr, "usage: cron [-x debugflag[,...]] [-a addweight [-c tickdecay] [-t threshold]]\n");
 	fprintf(stderr, "\ndebugflags: ");
 
         for(dflags = DebugFlagNames; *dflags; dflags++) {
 		fprintf(stderr, "%s ", *dflags);
 	}
-        fprintf(stderr, "\n");
+        fprintf(stderr, "\n\n");
+	fprintf(stderr, "-a [addweight]       Number of 'points' to add every time job is run\n");
+	fprintf(stderr, "-c [tickdecay]       Number of 'points' to subtract every second\n");
+	fprintf(stderr, "-t [threshold]       Number of 'points' to stop running jobs and just queue\n");
+	fprintf(stderr, "\n");

 	exit(ERROR_EXIT);
 }
 
 
@@ -126,19 +129,29 @@
 	while (TRUE) {
 # if DEBUGGING
 	    /* if (!(DebugFlags & DTEST)) */
 # endif /*DEBUGGING*/
-			cron_sleep();
-
-		load_database(&database);
+		cron_sleep();
 
-		/* do this iteration
+		/* Prevent misconfigured options from making cron take
+		 * over system ram
+		 * This may not be desirable for production systems
+		 * where cron jobs must run
 		 */
-		cron_tick(&database);
+		if (BurstRate < 7) {
+		
+			load_database(&database);
+
+			/* do this iteration
+			 */
+			cron_tick(&database);
+		}
 
 		/* sleep 1 minute
 		 */
 		TargetTime += 60;
 	}
 }
 
 
@@ -226,31 +239,64 @@
 
 
 static void
 cron_sleep() {
-	register int	seconds_to_wait;
+	register int	seconds_to_wait, seconds_to_delay;
 
-	do {
-		seconds_to_wait = (int) (TargetTime - time((time_t*)0));
-		Debug(DSCH, ("[%d] TargetTime=%ld, sec-to-wait=%d\n",
-			getpid(), (long)TargetTime, seconds_to_wait))
-
-		/* if we intend to sleep, this means that it's finally
-		 * time to empty the job queue (execute it).
-		 *
-		 * if we run any jobs, we'll probably screw up our timing,
-		 * so go recompute.
-		 *
-		 * note that we depend here on the left-to-right nature
-		 * of &&, and the short-circuiting.
-		 */
-	} while (seconds_to_wait > 0 && job_runqueue());
-
-	while (seconds_to_wait > 0) {
-		Debug(DSCH, ("[%d] sleeping for %d seconds\n",
-			getpid(), seconds_to_wait))
-		seconds_to_wait = (int) sleep((unsigned int) seconds_to_wait);
-	}
+		do {
+			seconds_to_wait = (int) (TargetTime - time((time_t*)0));
+			if (LoadAverage > LoadThreshold) {
+				/* if we denied jobs to run last time around,
+				 * see if we should sleep for a short period
+				 * before exiting
+				 */
+				if (seconds_to_wait > 0) { 
+					/* decide if we should take a short nap
+					 * or just go ahead with the normal
+					 * return
+					 */
+					seconds_to_delay = MIN(seconds_to_wait, 
+						((LoadAverage - LoadThreshold) /
+						(TickDecay << BurstRate)) + 1);
+					if (seconds_to_delay == 0)
+						seconds_to_delay = 1;
+					Debug(DSCH, ("[%d] short sleeping for %d seconds\n",
+						getpid(), seconds_to_delay))
+					sleep(seconds_to_delay);
+				}
+			}
+			if (LoadAverage > ((TickDecay << BurstRate) * seconds_to_delay))
+				LoadAverage -= (TickDecay << BurstRate) * seconds_to_delay;
+			else
+				LoadAverage = 0;
+			/* if we're bursting jobs, and still not catching up
+			 * increase the burst speed
+			 */
+			if (NumJobs > (MAXJOBLENHIGH << BurstRate))
+				BurstRate++;
+			/* Put the burst rate back down if we're caught up */
+			else if (BurstRate && (NumJobs < (MAXJOBLENLOW << (BurstRate - 1))))
+				BurstRate--;
+			Debug(DSCH, ("[%d] TargetTime=%ld, sec-to-wait=%d, load=%d, jobs=%d, burst=%d\n",
+				getpid(), (long)TargetTime, seconds_to_wait, LoadAverage, NumJobs,
+				BurstRate))
+			/* if we intend to sleep, this means that it's finally
+			 * time to empty the job queue (execute it).
+			 *
+			 * if we run any jobs, we'll probably screw up our timing,
+			 * so go recompute.
+			 *
+			 * note that we depend here on the left-to-right nature
+			 * of &&, and the short-circuiting.
+			 */
+		} while	((seconds_to_wait > 0) && job_runqueue());
+
+		while (seconds_to_wait > 0) {
+			Debug(DSCH, ("[%d] sleeping for %d seconds\n",
+				getpid(), seconds_to_wait))
+			seconds_to_wait = (int) sleep((unsigned int) seconds_to_wait);
+		}
+		
 }
 
 
 #ifdef USE_SIGCHLD
@@ -296,13 +342,34 @@
 	char	*argv[];
 {
 	int	argch;
 
-	while ((argch = getopt(argc, argv, "x:")) != -1) {
+	while ((argch = getopt(argc, argv, "x:a:t:c:")) != -1) {
 		switch (argch) {
 		case 'x':
 			if (!set_debug_flags(optarg))
 				usage();
+			break;
+		case 'a':
+			AddWeight = atoi(optarg);
+			if (AddWeight > 100) {		/* arbitrary value */
+				fprintf(stderr, "-a parameter %i too high. Max: 100\n\n", AddWeight);
+				usage();
+			}
+			break;
+		case 't':
+			LoadThreshold = atoi(optarg);
+			if (LoadThreshold < 1) {
+				fprintf(stderr, "-t parameter too low. Min: 1\n\n", LoadThreshold);
+				usage();
+			}
+			break;
+		case 'c':
+			TickDecay = atoi(optarg);
+			if (TickDecay < 1) {
+				fprintf(stderr, "-c parameter too low. Min: 1\n\n", TickDecay);
+				usage();
+			}
 			break;
 		default:
 			usage();
 		}
--- ../oldcron/cron.h	Mon Mar  9 05:41:41 1998
+++ cron.h	Sat Jan  2 16:50:13 1999
@@ -72,8 +72,16 @@
 #define	MAX_COMMAND	1000	/* max length of internally generated cmd */
 #define	MAX_ENVSTR	1000	/* max length of envvar=value\0 strings */
 #define	MAX_TEMPSTR	100	/* obvious */
 #define	MAX_UNAME	20	/* max length of username, should be overkill */
+#define	MAXJOBLENHIGH	512	/* How many jobs in the run queue before 
+				 * increasing run speed
+				 * every multiple of two higher than this
+				 * will increase speed even more
+				 */
+#define MAXJOBLENLOW	128	/* How many jobs in the run queue before
+				 * returing to normal
+				 */
 #define	ROOT_UID	0	/* don't change this, it really must be root */
 #define	ROOT_USER	"root"	/* ditto */
 
 				/* NOTE: these correspond to DebugFlagNames,
@@ -266,8 +274,15 @@
 
 char	*ProgramName;
 int	LineNumber;
 time_t	TargetTime;
+int	LoadAverage = 0;
+int	AddWeight = 0;		/* default load balancing off */
+int	TickDecay = 10;         /* sane value if not given */
+int	LoadThreshold = 100;	/* sane value if not given */
+int	BurstRate = 0;
+int	NumJobs = 0;
 
 # if DEBUGGING
 int	DebugFlags;
 char	*DebugFlagNames[] = {	/* sync with #defines */
@@ -281,8 +296,14 @@
 		*DowNames[],
 		*ProgramName;
 extern	int	LineNumber;
 extern	time_t	TargetTime;
+extern	int	LoadAverage;
+extern	int	AddWeight;
+extern	int	TickDecay;
+extern	int	BurstRate;
+extern	int	LoadThreshold;
+extern	int	NumJobs;
 # if DEBUGGING
 extern	int	DebugFlags;
 extern	char	*DebugFlagNames[];
 # endif /* DEBUGGING */
--- ../oldcron/job.c	Mon Mar  9 05:41:47 1998
+++ job.c	Sat Jan  2 19:51:33 1999
@@ -55,22 +55,29 @@
 	/* add it to the tail */
 	if (!jhead) { jhead=j; }
 	else { jtail->next=j; }
 	jtail = j;
+	NumJobs++;
 }
 
 int
 job_runqueue()
 {
 	register job	*j, *jn;
 	register int	run = 0;
 
 	for (j=jhead; j; j=jn) {
+		if (LoadAverage > LoadThreshold) {
+			/* We've executed too much, clean up and stop. */
+			jhead = j;
+			return 1;
+		}
 		do_command(j->e, j->u);
 		jn = j->next;
 		free(j);
 		run++;
+		NumJobs--;
+		LoadAverage += AddWeight;
 	}
 	jhead = jtail = NULL;
 	return run;
 }

--ELM917376811-8356-0_--

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199901261853.MAA15095>