From owner-freebsd-hackers Sun Jul 4 21:26:34 1999 Delivered-To: freebsd-hackers@freebsd.org Received: from mycenae.ilion.eu.org (mycenae.ilion.eu.org [203.35.206.129]) by hub.freebsd.org (Postfix) with ESMTP id A01BE14BF1 for ; Sun, 4 Jul 1999 21:26:27 -0700 (PDT) (envelope-from patrykz@mycenae.ilion.eu.org) Received: from mycenae.ilion.eu.org (patrykz@localhost [127.0.0.1]) by mycenae.ilion.eu.org (8.9.2/8.9.2) with ESMTP id OAA12630; Mon, 5 Jul 1999 14:25:27 +1000 (EST) (envelope-from patrykz@mycenae.ilion.eu.org) Message-Id: <199907050425.OAA12630@mycenae.ilion.eu.org> To: Brian Dean Cc: peter@netplex.com.au (Peter Wemm), rivers@dignus.com (Thomas David Rivers), freebsd-hackers@FreeBSD.ORG Subject: Re: support for i386 hardware debug watch points In-reply-to: Your message of "Sun, 04 Jul 1999 10:53:39 -0400." <199907041453.KAA03044@dean.pc.sas.com> Date: Mon, 05 Jul 1999 14:25:26 +1000 From: Patryk Zadarnowski Sender: owner-freebsd-hackers@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG > I've got some prototype code in place which supports the context > switching part of this. It's pretty simple right now, as I'm trying > to keep changes to a minimum. > > What I've done is simply added the dr0-dr3,dr6,dr7 registers to > 'struct pcb' in /usr/src/sys/i386/include/pcb.h. In cpu_switch(), > during a save operation, I load %dr7, and check the lower 8 bits, > which indicate if any breakpoints are in use. If they are, I save all > the debug registers, then clear out %dr7, which disables the > breakpoints. During a restore operation, I load the value of %dr7 > from the pcb structure of the new process, and if any of the lower 8 > bits are set, I restore all the debug registers. > > This is not as efficent as it could be implemented with a separate > flag to indicate whether saving the debug registers is necessary since > loading/storing the debug registers is fairly expensive (11 clocks on > an i486). 22 clocks on i386, 10 on i486, 11 on pentium. Also, on another topic, DRs are fairly portable as they've been a part of IA32 since i386. > Comments? I'm no expert on FreeBSD kernels, but I can speak for L4, and it's always good to look at past experiences in the area. (L4 is a very lean microkernel running on x86's, MIPS, (and soon Alphas and ARMs, although I'm currently in a process of convincing the authors of the later two to use BSD lincence instead of GPL ;) It currently claims to have the fastest IPC and lightweight thread implementation, so I guess it's a good raw model. From Jochen Liedtke's L4/x86 Reference Manual: User-level debug registers exist per thread. DR0..3, DR6 and DR7 can be accessed by the machine instructions MOV DRx,n and MOV r,DRx. However, only task-local breakpoints can be activated, i.e. bits L0...3 in DR7 cannot be set. Breakpoints operate per thread. Breakpoints are signalled as #DB exception (INT 1). Note that user-level breakpoints are suspended when kernel breakpoints are set by the kernel debugger. This says it all in terms of the user interface (which I think is brilliant in that it doesn't introduce any ugly new system calls.) Anyway, the MOV instructions are simulated in sofware, which should be easy enough. When a MOV DRx, n instruction is executed, it sets a bit in the TCB (well, it would be a PCB on FreeBSD ;) telling the scheduler that the DR registers are now valid (and hence should be saved/restored on a context switch from/to that process.) While on the topic of L4, would there be any interest in implementing Liedtke's small address spaces in FreeBSD? Basically, the idea is to improve TLB utilisation by simulated a tagged TLB on x86 using x86's segment registers. Because a typical process does not need the full 4GB address space, multiple processes could be multiplexed into a preallocated portion of the 4GB address space, eliminating the need for TLB flushes and CR0 reloads when switching between small processes (which hopefully constitute the bulk of the system load.) As a rough guide as to what's up for grabs, Liedtke's measured a reduction of the cost of a context switch on L4 from somewhere between 95 and 914 clocks (on pentium) down to 23 clock cycles when using small address spaces. The performance improvement is huge on pentiums, 6x86s and pentiums II, although the task is far from trivial (read: major changes to the memory manager and dynamic libraries.) Oh, if someone's interested, the original paper is at http://i30www.ira.uka.de/publications/pubcat/As-pent.ps And no, I don't voluneer to do it all by myself, although I'd be glad to help, coordinate the project, or even do most of the work myself given enough time. Patryk. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-hackers" in the body of the message