Date: Wed, 26 Jan 2005 11:11:09 -0800 (PST) From: Doug Ambrisko <ambrisko@ambrisko.com> To: Chris Landauer <cal@rushg.aero.org> Cc: freebsd-hackers@freebsd.org Subject: Re: bug in calcru() in kernel: integer overflow computing user time Message-ID: <200501261911.j0QJB9jo088720@ambrisko.com> In-Reply-To: <200501251558.j0PFwFjn029638@calamari.aero.org>
next in thread | previous in thread | raw e-mail | index | archive | help
Chris Landauer writes: | | hihi, all - | | well, i have an "almost" fix for the problem - read on, ... | (this is for discussin before send-pr submission) | | description | | in FreeBSD 5.3-RELEASE (and 5.2.1R, and 5.1R), | in file /usr/src/sys/kern/kern_resource.c, | lines 657-750 define a function calcru(), | and there is a bug in that function - | | line 713 in that file is | | uu = (tu * ut) / tt, | | which is trying to compute a proportion (since ut<=tt) - the problem | is that for my application, the values of tt and ut are very large and | the value of the intermediate product overflows, leading to incorrect | values (i reported the symptoms to freebsd-hackers and a very helpful | description and localization of the problem to calcru() was provided | by peter jeremy, who also wrote that it is a very old, but only partly | known, problem) | | repetition | | use time in csh or /usr/bin/time or getrusage() to time any program | that takes more than a week to run | | fix (almost) | | it turns out that this problem can be (partly) fixed by replacing that | one line above with the following lines: | | /* we know 0 <= ut <= tt and 1 <= tt */ | if (tu >= tt) | { | **whatever type they need to be** q, r; | q = tu / tt; | r = tu % tt; | uu = (q * ut) + (r * ut) / tt; | } | else uu = (tu * ut) / tt | | this change does not solve all the arithmetic problems (especially | when ut is very large), and a similar change is needed for system | time, computing su in line 714 of the file, but it suffices for me - | | analysis (yup, proof that it should work 8-)) - | | i expect that all of these counters are increasing throughout the life | of the process - | tu is total time in microseconds | ut is user 128hz ticks | tt is total 128hz ticks | i assume therefore that | tu ~ tt * 10^6/128 | strictly because of the clock rates | (machines with other kinds of clock rate ratios will need a different | analysis, but the same idea should work there, too) - | | in my case ut ~ tt, so we see overflow in the old computation when | tu * ut >= 2^64 | tt^2 * 10^6/128 >= 2^64 | tt * 10^3/8*sqrt(2) >= 2^32 ~ 4 * 10^9 | tt >= 32*sqrt(2)*10^6, | which is about | sqrt(2)*10^6 / 4 ~ 3.54*10^5 seconds, | or | ~ 98 hours | (this is the phenomenon i observed) | | in the new computation offered above, since we know that | ut <= tt, | we also know that | uu <= tu, | and | (q * ut) <= uu, | so there can be no overflow in the (q * ut) term - | therefore, this changed code will overflow only when r is large | r ~ tt, | and then only when | r * ut >= 2^64 | tt^2 >= 2^64 | tt >= 2^32, | which is about | ~ 2^25 seconds | ~ 9300 hours | ~ 388 days | (i can live with that for now) | | for everyone else, it adds the one test in the if statement to every | call to calcru() (or two, if system time is similarly fixed) - | <philosophy> instrumentation is costly, and correct instrumentation is | even more costly, but it's worth it, every time, to get the right | answer </philosophy> | | i'm about to try it out this week - i will report the results when i | get some data (which will be a few weeks) | | i'm thinking about how to solve the problem correctly for really | long-running programs, but it's all pretty special case ugly so far This is my fix: Index: kern_resource.c =================================================================== RCS file: /usr/local/cvsroot/freebsd/src/sys/kern/kern_resource.c,v retrieving revision 1.55.2.5 diff -u -p -r1.55.2.5 kern_resource.c --- kern_resource.c 3 Nov 2001 01:41:08 -0000 1.55.2.5 +++ kern_resource.c 26 Jan 2005 19:03:27 -0000 @@ -552,10 +552,10 @@ calcru(p, up, sp, ip) tu = ptu; } - /* Subdivide tu. */ - uu = (tu * ut) / tt; - su = (tu * st) / tt; - iu = tu - uu - su; + /* Subdivide tu. try to becareful of overflow */ + su = tu * (st * 1024 / tt) / 1024; + iu = tu * (it * 1024 / tt) / 1024; + uu = tu - (su + iu); /* Enforce monotonicity. */ if (uu < p->p_uu || su < p->p_su || iu < p->p_iu) { If people like it I can commit it. It works for us. We had problems with this as well. It's pretty simple fix. I used 1k since usage of this tends to be % so rounding should effect that much. Doug A.
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200501261911.j0QJB9jo088720>