From owner-freebsd-current@FreeBSD.ORG Sun Aug 29 13:10:09 2010 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 0D8991065697; Sun, 29 Aug 2010 13:10:09 +0000 (UTC) (envelope-from mavbsd@gmail.com) Received: from mail-fx0-f54.google.com (mail-fx0-f54.google.com [209.85.161.54]) by mx1.freebsd.org (Postfix) with ESMTP id 652248FC16; Sun, 29 Aug 2010 13:10:08 +0000 (UTC) Received: by fxm4 with SMTP id 4so3260857fxm.13 for ; Sun, 29 Aug 2010 06:10:07 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:sender:message-id:date:from :user-agent:mime-version:to:subject:x-enigmail-version:content-type :content-transfer-encoding; bh=VJBM3LNnn+8sZAn5z0gPqGKFyWdakafvfONZW5uz35g=; b=i8hFCTuTyEiWV9ad3D/y7n+B5IOCtcXM19/8hY6nHpf8iYjiKYE6Jd33vliP9pJhp6 Xz6J9hCf4RxcsiSN4mv7CzYxK1PNtHZ6tQIp76COZH5dp4BbAgc4GQqvCtubdF0DAdeF mZW4Ov6Ako15TdCuyYAvp3eg+KSbnf+pWCvTc= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=sender:message-id:date:from:user-agent:mime-version:to:subject :x-enigmail-version:content-type:content-transfer-encoding; b=ALW2DigD8uCa8UfODwvXtiEENdL+mxzkfjXYNfJeLQZBx+q7q/RcOlILAr+Avb4/nd Av5JnTdrehN1JUelYn0FVp07ZcRQ+B99CVrhGlIQXvAAVE1VRJnDkAxBe6FDZ7/vm1Rl 4iR22xwZ4jxBrc34ltWKZdqmy4XKpQOSk98VI= Received: by 10.223.110.79 with SMTP id m15mr1838858fap.22.1283087407207; Sun, 29 Aug 2010 06:10:07 -0700 (PDT) Received: from mavbook2.mavhome.dp.ua (pc.mavhome.dp.ua [212.86.226.226]) by mx.google.com with ESMTPS id f28sm2856209faa.24.2010.08.29.06.10.05 (version=SSLv3 cipher=RC4-MD5); Sun, 29 Aug 2010 06:10:06 -0700 (PDT) Sender: Alexander Motin Message-ID: <4C7A5C28.1090904@FreeBSD.org> Date: Sun, 29 Aug 2010 16:10:00 +0300 From: Alexander Motin User-Agent: Thunderbird 2.0.0.23 (X11/20091212) MIME-Version: 1.0 To: FreeBSD-Current , freebsd-hackers@freebsd.org X-Enigmail-Version: 0.96.0 Content-Type: text/plain; charset=KOI8-R Content-Transfer-Encoding: 7bit Cc: Subject: One-shot-oriented event timers management X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 29 Aug 2010 13:10:09 -0000 Hi. I would like to present my new work on timers management code. In my previous work I was mostly orienting on reimplementing existing functionality in better way. The result seemed not bad, but after looking on perspectives of using event timers in one-shot (aperiodic) mode I've understood that implemented code complexity made it hardly possible. So I had to significantly cut it down and rewrite from the new approach, which is instead primarily oriented on using timers in one-shot mode. As soon as some systems have only periodic timers I have left that functionality, though it was slightly limited. New management code implements two modes of operation: one-shot and periodic. Specific mode to be used depends on hardware capabilities and can be controlled. In one-shot mode hardware timers programmed to generate single interrupt precisely at the time of next wanted event. It is done by comparing current binuptime with next scheduled times of system events (hard-/stat-/profclock). This approach has several benefits: event timer precision is now irrelevant for system timekeeping, hard- and statclocks are not aliased, while only one timer used for it, and the most important -- it allows us to define which events and when exactly we really want to handle, without strict dependence on fixed hz, stathz, profhz periods. Sure, our callout system is highly depends on hz value, but now at least we can skip interrupts when we have no callouts to handle at the time. Later we can go further. Periodic mode now also uses alike principals of scheduling events. But timer running in periodic mode just unable to handle arbitrary events and as soon as event timers may not be synchronized to system timecounter and may drift from it, causing jitter effects. So I've used for time source of scheduling the timer events themselves. As result, periodic timer runs on fixed frequency multiply to hz rate, while statclock and profclock generated by dividing it respectively. (If somebody would tell me that hardclock jitter is not really a big problem, I would happily rip that artificial timekeeping out of there to simplify code.) Unluckily this approach makes impossible to use two events timers to completely separate hard- and statclocks any more, but as I have said, this mode is required only for limited set of systems without one-shot capable timers. Looking on my recent experience with different platforms, it is not a big fraction. Management code is still handles both per-CPU and global timers. Per-CPU timers usage is obvious. Global timer is programmed to handle all CPUs needs. In periodic mode global timer generates periodic interrupts to some one CPU, while management code then redistributes them to CPUs that really need it, using IPI. In one-shot mode timer is always programmed to handle first scheduled event throughout the system. When that interrupt arrives, it is also getting redistributed to wanting CPUs with IPI. To demonstrate features that could be obtained from so high flexibility I have incorporated the idea and some parts of dynamic ticks patches of Tsuyoshi Ozawa. Now, when some CPU goes down into C2/C3 ACPI sleep state, that CPU stops scheduling of hard-/stat-/profclock events until the next registered callout event. If CPU wakes up before that time by some unrelated interrupt, missed ticks are called artificially (it is needed now to keep realistic system stats). After system is up to date, interrupt is handled. Now it is implemented only for ACPI systems with C2/C3 states support, because ACPI resumes CPU with interrupts disabled, that allows to keep up missed time before interrupt handler or some other process (in case of unexpected task switch) may need it. As I can see, Linux does alike things in the beginning of every interrupt handler. I have actively tested this code for a few days on my amd64 Core2Duo laptop and i386 Core-i5 desktop system. With C2/C3 states enabled systems experience only about 100-150 interrupts per second, having HZ set to 1000. These events mostly caused by several event-greedy processes in our tree. I have traced and hacked several most aggressive ones in this patch: http://people.freebsd.org/~mav/tm6292_idle.patch . It allowed me to reduce down to as low as 50 interrupts per system, including IPIs! Here is the output of `systat -vm 1` from my test system: http://people.freebsd.org/~mav/systat_w_oneshot.txt . Obviously that with additional tuning the results can be improved even more. My latest patch against 9-CURRENT can be found here: http://people.freebsd.org/~mav/timers_oneshot4.patch Comments, ideas, propositions -- welcome! Thanks to all who read this. ;) -- Alexander Motin