From owner-freebsd-current Thu Feb 13 16:36:30 2003 Delivered-To: freebsd-current@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 4544C37B401 for ; Thu, 13 Feb 2003 16:36:21 -0800 (PST) Received: from freefall.freebsd.org (freefall.freebsd.org [216.136.204.21]) by mx1.FreeBSD.org (Postfix) with ESMTP id 6D38A43FB1 for ; Thu, 13 Feb 2003 16:36:20 -0800 (PST) (envelope-from scottl@FreeBSD.org) Received: from freefall.freebsd.org (scottl@localhost [127.0.0.1]) by freefall.freebsd.org (8.12.6/8.12.6) with ESMTP id h1E0aKNS071052 for ; Thu, 13 Feb 2003 16:36:20 -0800 (PST) (envelope-from scottl@freefall.freebsd.org) Received: (from scottl@localhost) by freefall.freebsd.org (8.12.6/8.12.6/Submit) id h1E0aK3q071051 for current@freebsd.org; Thu, 13 Feb 2003 16:36:20 -0800 (PST) Date: Thu, 13 Feb 2003 16:36:20 -0800 (PST) From: Scott Long Message-Id: <200302140036.h1E0aK3q071051@freefall.freebsd.org> To: current@freebsd.org Subject: 5-STABLE Roadmap Sender: owner-freebsd-current@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG All, Thanks to the hard work of everyone, FreeBSD 5.0 became a reality and is working better than most even hoped. However, there is still a lot of work to be done before we can create the RELENG_5/5-STABLE branch and declare success. Below is a document that I have drafted with the input and review of the Release Engineering Team, the Technical Review Board, and the Core Team that defines what needs to be done in order to reach 5-STABLE. I'm happy to take further input into this, and I will also mark it up and make it available online. The Roadmap for 5-STABLE 1. Introduction and background After nearly three years of work, FreeBSD 5.0 was released in January of 2003. Features like the GEOM block layer, Mandatory Access Controls, ACPI, sparc64 and ia64 platform support, and UFS snapshots, background filesystem checks, and 64-bit inode sizes make it an exciting operating system for both desktop and production users. However, some important features are not complete. The foundations for fine-grained locking and preemption in the kernel exist, but much more work is left to be done. Work on Kernel Schedulable Entities, also known as Scheduler Activations, has been ongoing but needs a push to realize its benefit. Performance compared to FreeBSD 4.x has declined and must be restored and surpassed. This is somewhat similar to the situation that FreeBSD faced in the 3.x series. Work on 3-CURRENT trudged along seemingly forever, and finally a cry was made to 'just ship it' and clean up later. This decision resulted in the 3.0 and 3.1 releases being very unsatisfying for most, and it wasn't until 3.2 that the series was considered 'stable'. To make matters worse, the RELENG_3 branch was created along with the 3.0 release, and the HEAD branch was allowed to advance immediately towards 4-CURRENT. This resulted in a quick divergence between HEAD and RELENG_3, making maintenance of the RELENG_3 branch very difficult. FreeBSD 2.2.8 was left for quite a while as the last production-quality version of FreeBSD. Our intent is to avoid repeating that scenario with FreeBSD 5.x. Delaying the RELENG_5 branch until it is stable and production quality will ensure that it stays maintainable and provides a compelling reason to upgrade from 4.x, To do this, we must identify the current areas of weakness and set clear goals for resolving them. This document contains what we as the release engineering team feel are the milestones and issues that must be resolved for the RELENG_5 branch. It does not dictate every aspect of FreeBSD development, and we welcome further input. Nothing that follows is meant to be a sleight against any person or group, or to trivialize any work that has been done. There are some significant issues, though, that need decisive and unbiased action. 2. Major issues The state of SMPng and kernel lockdown is the biggest concern for 5.x. To date, few major systems have come out from under the kernel-wide mutex known as 'Giant'. The SMP status page at http://www.FreeBSD.org/smp provides a comprehensive breakdown of the overall SMPng status. Status specific to SMPng progress in deivce drivers can be found at at httP//www.FreeBSD.org/projects/busdma. In summary: - VM - the kmem_malloc(M_NOWAIT) path no longer needs Giant held. The kmem_malloc(M_WAITOK) path is in progress and is expected to be finished in the coming weeks. Other facets of the VM system, like the vfs interface, buffer/cache, etc, are largely untouched. - GEOM - The GEOM block layer was designed to run free of Giant, but at this time no block drivers can run without Giant. Additionally, it has the potential to suffer performance loss due to its upcall/downcall data paths happening in kernel threads. Lightweight context switches might help this. - Network - Work is in progress to lock the TCP and UDP portions of the stack. This also includes locking the routing tree, ARP code, and ifaddr and inet data structures. RawIP, IPv6, Appletalk, etc, have not been touched. Locking the socket layer is in progress but is largely untested. None of the hardware drivers have been locked. - VFS - Initial pre-cleanup started. - buffer/cache - Initial work complete. - Proc - Work on locking the proc structure was ongoing for a while but seems to have stalled. - CAM - No significant work has occurred on the CAM SCSI layer. - Newbus - some work has started on locking down the device_t structure. - Pipes - complete with the exception of VM-related optimizations. - File descriptors - complete. - Process accounting - jails, credentials, MAC labels, and scheduler are out from under Giant. - MAC Framework - complete - Timekeeping - complete - kernel encryption - crypto drivers and core crypto framework are Giant-free. KAME IPsec and FAST IPSec have not been locked. - Sound subsystem - complete - kernel preemption - preemption for interrupt threads is enabled. However, contention due to Giant covering much of the kernel and most of the device driver interrupt routines causes excessive context switches and might actually be hurting performance. Work is underway to explore ways to make preemption be conditional. Another issue with SMPng is interrupt latency. The overhead of doing a complete context switch to a kernel interrupt thread is high and shows noticeable latency. Work is ongoing to implement lazy context switching on all platforms. Fine grained locking of drivers will also help this, as will converting drivers to be as efficient as possible in their interrupt routines. Next, the state of KSE must resolved for RELENG_5. Work on it has slowed noticeably in the past 6 months but appears to be picking up again. There are a number of issues that must be addressed: - Signal delivery to threads is not defined. Signals are delivered to the process, but which thread actually receives it is random. - There is confusion over whether upcalls are generated on every system call or when a thread blocks. The former is highly undesirable and needs to be investigated. - The userland threading library, currently called libkse, is incomplete and has not been used for any significant threaded application. - KSE has the potential to uncover latent race conditions and create new ones. An audit needs to be performed to ensure that no obvious problems exist. According to the release schedule below, KSE kernel and userland components must be functionality complete by June 2003 in order to be included in the RELENG_5 branch. For security and stability reasons, if KSE cannot be finished in time then, by default, all KSE-specific syscalls should be modified to return ENOSYS and all other KSE-specific interfaces disabled. Deprecating KSE from RELENG_5 but keeping it in the HEAD branch will pose problems in porting bugfixes and features between the two branches, so every effort should be made to finish it on time. 3. Goals for 5-STABLE The goals for the RELENG_5 branch point are: - All subsystems and interfaces must be mature enough to be maintainable for improvements and bug fixes - equal or better stability from FreeBSD 4.8. - no functional regressions from 4.8. It is important to make sure that users do not avoid upgrading to 5.x because of lost functionality. - performance on par with FreeBSD 4.8 for most common operations. Both UP and SMP configurations should be evaluated. SMP has the potential to perform much better than 4.x, though for the purposes of creating the RELENG_5 branch, comparable performance between the two should be acceptable. It is unrealistic to expect that the SMPng project will be fully complete by RELENG_5, or that performance will be significantly better than 4.x. However, focusing on a subset of the outstanding tasks will give enough benefit for the branch to be viable and maintainable. To break it down: - ABI/API/Infrastructure stability - Enough infrastructure must be in place and stable to allow fixes from HEAD to easily and safely be merged into RELENG_5. Also, we must draw a line as to what subsystems are to be locked down when we go into 5-STABLE. - SMPng - VM - Most codepaths, others than the ones that interact with VFS, should be Giant-free for RELENG_5. - Network - Taking the network stack out from under Giant poses the risk of uncovering latent bugs and races. Locking it down but not removing Giant imposes further performance penalties. A decision on whether to continue with locking the network layers, and whether they should be free from Giant for RELENG_5 should be made no later than March 15. If the decision is made to allow the locking to go forward, the IPv4, UDP, and TCP layers should be free of Giant. IPv6 and the socket layers would be nice to have also, though it should be investigated whether they can be safely locked down in 5.x after the RELENG_5 branch. If the decision is to keep the network stack under Giant for the branch, then an investigation should be made to determine if the present locking work can be reverted and deferred to 6-CURRENT. Having a Giant-free path from the the TCP/IP layers to the hardware should be investigated as it could allow significant performance gains in the network benchmarks. If this can be achieved then the hardware interface layer needs to allow for drivers to incrementally become free of Giant. Locking down at least two Ethernet drivers would be highly desirable. If the semantics are too complex to have the stack free of Giant but not the hardware drivers, investigation should be done into making it configurable. Lesser-used network stacks like netatlk, netipx, etc, should not break while this work is going on. However, locking them is not a high priority. - GEOM - At least 2 block drivers should be locked in order to demonstrate that others can also be locked without changing the interface to GEOM. The ATA driver is a good candidate for this, though caution should be taken as it is also extremely high-profile and any problems with it will affect nearly all users of FreeBSD. - Lazy context switching - sparc64 is the only platform that performs lazy context switching when entering the kernel. The performance gains promised by this are significant enough to require that it be implemented for all other Tier 1 platforms. - KSE - The kernel side of KSE must be functionally complete and have undergone a security audit. libkse must be complete enough to demonstrate a real-world application running correctly on it using the standard POSIX Threads API. Examples would be apache 2.0, squid, and/or mozilla. A functional regression test suite is also a requirement for RELENG_5 and should test signal delivery, scheduling, performance, and process security/credentials for both KSE and non- KSE processes. KSE kernel and userland components must also reach the same level of functionality for all Tier-1 platforms in both UP and SMP configurations. The definition of 'Tier-1 platforms' can be found at http://www.freebsd.org/doc/en_US.ISO8859-1/articles/committers-guide/archs.html. - busdma interface and drivers - architectures like PAE/i386 and sparc64 which don't have a direct mapping between host memory address space and expansion bus address space require the elimination for vtophys() and friends. The busdma interface was created to handle exactly this problem, but many drivers do not use it yet. The busdma project at http://www.FreeBSD.org/projects/busdma/index.html tracks the progress of this and should be used to determine which drivers must be converted for RELENG_5 and which can be left behind. Also, there has been talk by several developers and the original author to give the busdma interface a minor overhaul. If this is to happen, it needs to happen before RELENG_5. Otherwise, differences between the old and new API will make driver maintenance difficult. - PCI resource allocation - PC2003 compliance requires that x86 systems no longer configure PCI devices from the system BIOS, leaving this task soley to the OS. FreeBSD must gain the ability to manage and allocate PCI memory resources on its own. Implementing this should take into account cardbus, PCI-HotPlug, and laptop dockstation requirements. This feature will become increasingly critical through the lifetime of RELENG_5, and therefore is a requirement for the RELENG_5 branch. - Performance - most performance gains hinge on the progress of SMPng Areas that should be concentrated on are: - Storage I/O - I/O performance suffers from two problems, too many expensive context switches, and too much work being done in interrupt threads. Specifically, it takes 3 context switches for most drivers to get from the hardware completion interrupt to unblocking the user process: one for the interrupt thread, one for the GEOM g_up thread, and one to get back to the user thread. Drivers that attempt to be efficient and quick in their interrupt handlers (as all should be) usually also schedule a taskqueue, which adds a context switch in between the interrupt thread and the g_up thread and brings the total up to 4. Two things need to be done to attack this: - make all drivers defer most of their processing out of their interrupt thread. Significant performance gains have been shown recently in the aac(4) driver by making its interrupt handler be 'INTR_MPSAFE' and moving all processing to a taskqueue. - investigate eliminating the taskqueue context switch by adding a callback to the g_up thread that allows a driver to do its interrupt processing there instead of in the taskqueue. - Network - Network drivers suffer from the interrupt latency previously mentioned as well as from the network stack being partially locked down but not free from Giant. Possible strategies for addressing this are described in the previous section. - Other locking - XXX ? - Benchmarks and performance testing - Having a source of reliable and useful benchmarks is essential to identifying performance problems and guarding against performance regressions. A 'performance team' that is made up of people and resources for formulating, developing, and executing benchmark tests should be put into place soon. Comparisons should be made against both FreeBSD 4.x and Linux 2.4.x. Tests to consider are: - the classic 'worldstone' - webstone - /usr/ports/www/webstone - Fstress - http://www.cs.duke.edu/ari/fstress - ApacheBench - /usr/ports/www/p5-ApacheBench - netperf - /usr/ports/benchmarks/netperf - Features: - ACPI - Intel's ACPI power management and device configuration subsystem has become an integral part of FreeBSD's x86 and ia64 device configuration model. However, many bugs exist in Intel's vendor code, our OS-specific code, and motherboard BIOSes, causing many ACPI-enabled systems to fail to boot, misdetect drivers, and/or have many other problems. Fixing these problems seems to be an uphill battle and is often times causing a poor first-impression of FreeBSD 5.0. Most x86 systems can function with ACPI disabled, and logic should be added to the bootloader and sysinstall to allow users to easily and intuitively turn it off. Turning off ACPI by default is prone to problems also as many newer systems rely on it to provide correct interrupt routing information. Also, a centralized resource should be created to track ACPI problems and solutions. Linux uses the same Intel vendor sources as FreeBSD, so we should investigate how they have handled some of the known problems. - NEWCARD/OLDCARD - The NEWCARD subsystem was made the default for 5.0. Unfortunately, it contains no support for non-Cardbus bridges and falls victim to interrupt routine problems on some laptops. The classic 16-bit bridge support, OLDCARD, still exists and can be compiled in, but this is highly inconvenient for users of older laptops. If OLDCARD cannot be completely deprecated for RELENG_5, then provisions must be made to allow users to easily install an OLDCARD-enabled kernel. Documentation should be written to help trasition users from OLDCARD to NEWCARD and from 'pccardd' to 'devd'. The power management and 'dumpcis' functionality of pccardc(1) needs to be brought forward to work with NEWCARD, along with the ability to load CIS quirk entries. Most of this functionality can be integrated into devd and devctl. - New scheduler framework - The new scheduler framework is in place, and users can select between the classic 44bsd scheduler and the new ULE scheduler. A scheduler that demonstrates processor affinity, HyperThreading and KSE awareness, and no regressions in performance or interactivity characteristics must be available for RELENG_5. - sparc64 local console - neither syscons nor vt work on sparc64, leaving it with only serial and 'fake' OFW console support. This is a major support hole for what is a Tier 1 platform. Whether syscons can be shoe-horned in or wscons be adopted from NetBSD is up for debate. However, sparc64 must have local console support for RELENG_5. Having this will also allow the XFree86 server to run, which is also a requirement for RELENG_5. - gcc/toolchain - gcc 3.3 might be available in time for RELENG_5 and might offer some attractive benefits, but also likely to introduce ABI incompatibility with prior gcc versions. ABI compatibility should be locked down for the RELENG_5 branch. There has also been a request to move /usr/include/g++ to /usr/include/g++-v3 to be more compliant with the stock behavior of gcc. This should be investigated for RELENG_5 also. - gdb - gdb from the base system should work for sparc64. It should also understand KSE thread semantics, assuming that KSE is included in the RELENG_5 branch. gdb 5.3 is available and there are reports that it should address the sparc64 issue. - disklabel(8) regressions - The biggest casualty of the introduction of GEOM appears to be the disklabel utility. The '-r' option gives unpredictable results in most cases now and should be removed or fixed. Work is planned for a new unified interface for modifying labels and slices, however this should not preclude disklabel from being fixed. - Documentation: - The manual pages, Handbook, and FAQ should be free from content specific to FreeBSD 4.x, i.e. all text should be equally applicable to FreeBSD 5.x. The installation section of the handbook needs the most work in this area. - The release documentation needs to be complete and accurate for all Tier 1 architectures. The hardware notes and installation guides need specific attention. - If FreeBSD 5.1 is not the branch point for RELENG_5 then the Early Adopters Guide needs to be updated. This document should then be removed just before the release closest to the RELENG_5 branch point. 4. Schedule If branching RELENG_5 at the 5.1 release is paramount, 5.1 will probably need to move out by at least 3 months. The schedule would be: - Jun 30, 2003 - KSE and SMPng feature freeze - Aug 4, 2003 - 5.1-BETA, general code freeze - Aug 18, 2003 - 5.1-RC1, RELENG_5 and RELENG_5_1 branched - Aug 25, 2003 - 5.1-RC2 - Sept 1, 2003 - 5.1-RELEASE Taking an incremental approach might be more beneficial. Releasing 5.1 in time for USENIX ATC 2003 will provide a wide audience for productive feedback and will keep FreeBSD visible. In this scenario, 5.1 should offer a significant improvement over 5.0 in terms of bug fixes and performance. Lockdowns and improvements to the storage subsystem and scheduler should be expected, the NEWCARD/OLDCARD issues should be addressed, and all known bugs and regressions from the 5.0 errata list should be fixed. KSE and other SMPng tasks that cannot finish in time for 5.1 should also not reduce the stability of the release. The schedule for this would be: - May 5, 2003 - 5.1-BETA, general code freeze - May 19, 2003 - 5.1-RC1, RELENG_5_1 branched - May 27, 2003 - 5.1-RC2 - Jun 2, 2003 - 5.1-RELEASE - Jun 30, 2003 - KSE and SMPng feature freeze - Sept 1, 2003 - 5.2-BETA, general code freeze - Sept 15, 2003 - 5.2-RC1, RELENG_5 and RELENG_5_2 branched - Sept 22, 2003 - 5.2-RC2 - Sept 29, 2003 - 5.2-RELEASE 5. Post RELENG_5 direction As with all -STABLE development streams, the focus should be bug fixes and incremental improvements. Just like normal, everything should be vetted through the HEAD branch first and committed to RELENG_5 with caution. As before, new device drivers, incremental features, etc, will be welcome in the branch once they have been proven in HEAD. Further SMPng lockdowns will be divided into two categories, driver and subsystem. The only subsystem that will be sufficiently locked down for RELENG_5 will be GEOM, so incrementally locking down device drivers under it is a worthy goal for the branch. Full subsystem lockdowns will have to be fully tested and proven in HEAD before consideration will be given to merging them into RELENG_5. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-current" in the body of the message