From owner-freebsd-arch@FreeBSD.ORG Sun Mar 16 03:21:11 2008 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 229EF1065670 for ; Sun, 16 Mar 2008 03:21:11 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail35.syd.optusnet.com.au (mail35.syd.optusnet.com.au [211.29.133.51]) by mx1.freebsd.org (Postfix) with ESMTP id C0C088FC16 for ; Sun, 16 Mar 2008 03:21:10 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from c220-239-252-11.carlnfd3.nsw.optusnet.com.au (c220-239-252-11.carlnfd3.nsw.optusnet.com.au [220.239.252.11]) by mail35.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id m2G3KteC026280 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Sun, 16 Mar 2008 14:21:01 +1100 Date: Sun, 16 Mar 2008 14:20:55 +1100 (EST) From: Bruce Evans X-X-Sender: bde@delplex.bde.org To: Kostik Belousov In-Reply-To: <20080315194809.GN10374@deviant.kiev.zoral.com.ua> Message-ID: <20080316133138.J41270@delplex.bde.org> References: <20080315124008.GF80576@hoeg.nl> <20080316015903.N39516@delplex.bde.org> <20080315194809.GN10374@deviant.kiev.zoral.com.ua> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: Ed Schouten , FreeBSD Arch Subject: Re: vgone() calling VOP_CLOSE() -> blocked threads? X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 16 Mar 2008 03:21:11 -0000 On Sat, 15 Mar 2008, Kostik Belousov wrote: > On Sun, Mar 16, 2008 at 03:55:18AM +1100, Bruce Evans wrote: >> Other problems near here: >> - neither vfs nor drivers currently know how many threads are in a >> driver. vfs uses vp->v_rdev->si_usecount, but this doesn't quite work > This is provided by si_threadcount. > See the dev(vn)_refthread and it usage in the devfs vnops and fops. So why doesn't reovoke() use it? :-). All uses of si_usecount, which normally happen via vcount() and count_dev(), are suspect, especially the latter. vcount() is only used in revoke(), in svr4_fcntl.c to handle another revoke(), and for FreeBSD < 6 in reiserfs for an old multiple-mount check. count_dev() is only used in ata-tape.c (to decide in the same broken way as vfs if a close is the last one -- this driver uses D_TRACKCLOSE to get d_close() called on all closes. This gives it the burden of deciding whether the close is the last one, and it can't do this any better than vfs. D_TRACKCLOSE is used in a few other drivers which don't call count_dev()), in devfs_close() (to decide whether to release the controlling terminal and to decide when to call d_close()). Hmm, it seems to be not vfs but only devfs which handles last-close specially. devfs is closer to devices, so it should know how to use si_threadcount here. Hopefully si_threadcount counts threads sleeping in open or close, although si_usecount doesn't. d_close (or something) should be called to wake up these threads even if si_usecount is 0. Drivers which support sleeping in open or close must support d_close (or something) being called to forcibly end such sleeps. revoke() should forcibly end such sleeps, so it needs to check si_threadcount too. si_usecount in its current form might end up being unused, so si_threadcount could be renamed back to it. Bruce From owner-freebsd-arch@FreeBSD.ORG Sun Mar 16 05:44:25 2008 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id C4AFD1065670 for ; Sun, 16 Mar 2008 05:44:25 +0000 (UTC) (envelope-from kip.macy@gmail.com) Received: from wa-out-1112.google.com (wa-out-1112.google.com [209.85.146.176]) by mx1.freebsd.org (Postfix) with ESMTP id AD4698FC20 for ; Sun, 16 Mar 2008 05:44:25 +0000 (UTC) (envelope-from kip.macy@gmail.com) Received: by wa-out-1112.google.com with SMTP id k17so5496575waf.3 for ; Sat, 15 Mar 2008 22:44:24 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:message-id:date:from:to:subject:mime-version:content-type:content-transfer-encoding:content-disposition; bh=tDDn852zwgXDfq3/4MuOS85TX7+Hexks0q/EnE4R+Nw=; b=E5B68ksLq8oBhMXrMv0zZAKkftiIrvslSydnzRKQReOHy7qQoKCMvcRVeB+sQxeOMdFGfWpLAe10iEmC+AxdibSSKyrqqwt7e+Y5AlpYnrcipi/rhSGDO2bgBhpQ+JOSlSacSrXV3WnemQkURys31yhbADMxYEvNRXvGWWkuKGw= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=message-id:date:from:to:subject:mime-version:content-type:content-transfer-encoding:content-disposition; b=kYr1tTG+XdiTRvpBnmzlvwwznZh0os40g1zWLFmztkN6IoI96hp+dB8zv6MNg2Rc1VdfwdxkibTcCr8gHAe+293SQW4/HT2SNvJNwsoa46WBtPEmmT0i+WpWPV8IlB3ewebV2bJI5wjMLsBt1awqoOSoDc/ajY9IRJNAGbAVQ0c= Received: by 10.114.89.1 with SMTP id m1mr15334677wab.77.1205646264624; Sat, 15 Mar 2008 22:44:24 -0700 (PDT) Received: by 10.115.22.10 with HTTP; Sat, 15 Mar 2008 22:44:24 -0700 (PDT) Message-ID: Date: Sat, 15 Mar 2008 22:44:24 -0700 From: "Kip Macy" To: arch@freebsd.org MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Content-Disposition: inline Cc: Subject: separating out memory checks from INVARIANTS X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 16 Mar 2008 05:44:25 -0000 I find that the serialization of memory allocation frequently hides race conditions. I would like to, at the very least, add an option to disable the memory checks if not make the memory checks a completely separate option. My knee jerk reaction to avoiding bikesheds is to simply add it to my own tree and forget about it. However, this has come up often enough that I feel that it warrants consideration. Thoughts? -Kip From owner-freebsd-arch@FreeBSD.ORG Sun Mar 16 05:53:58 2008 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 7DC45106564A for ; Sun, 16 Mar 2008 05:53:58 +0000 (UTC) (envelope-from jroberson@chesapeake.net) Received: from webaccess-cl.virtdom.com (webaccess-cl.virtdom.com [216.240.101.25]) by mx1.freebsd.org (Postfix) with ESMTP id 555028FC1A for ; Sun, 16 Mar 2008 05:53:58 +0000 (UTC) (envelope-from jroberson@chesapeake.net) Received: from [192.168.1.107] (cpe-24-94-75-93.hawaii.res.rr.com [24.94.75.93]) (authenticated bits=0) by webaccess-cl.virtdom.com (8.13.6/8.13.6) with ESMTP id m2G5rt9x011538; Sun, 16 Mar 2008 01:53:56 -0400 (EDT) (envelope-from jroberson@chesapeake.net) Date: Sat, 15 Mar 2008 19:54:29 -1000 (HST) From: Jeff Roberson X-X-Sender: jroberson@desktop To: Kip Macy In-Reply-To: Message-ID: <20080315195328.V910@desktop> References: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: arch@freebsd.org Subject: Re: separating out memory checks from INVARIANTS X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 16 Mar 2008 05:53:58 -0000 On Sat, 15 Mar 2008, Kip Macy wrote: > I find that the serialization of memory allocation frequently hides > race conditions. I would like to, at the very least, add an option to > disable the memory checks if not make the memory checks a completely > separate option. My knee jerk reaction to avoiding bikesheds is to > simply add it to my own tree and forget about it. However, this has > come up often enough that I feel that it warrants consideration. > > > Thoughts? One other option that I have frequently considered is to convert UMA from using an array of bytes to using bitfields to represent the free space in a slab. Then you could use atomics to update the required information. It'd be a bit of work. Maybe a good SoC? :) Jeff > > -Kip > _______________________________________________ > freebsd-arch@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-arch > To unsubscribe, send any mail to "freebsd-arch-unsubscribe@freebsd.org" > From owner-freebsd-arch@FreeBSD.ORG Sun Mar 16 05:54:59 2008 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 3AA971065671 for ; Sun, 16 Mar 2008 05:54:59 +0000 (UTC) (envelope-from obrien@NUXI.org) Received: from dragon.nuxi.org (trang.nuxi.org [74.95.12.85]) by mx1.freebsd.org (Postfix) with ESMTP id 2A8CE8FC1D for ; Sun, 16 Mar 2008 05:54:59 +0000 (UTC) (envelope-from obrien@NUXI.org) Received: from dragon.nuxi.org (obrien@localhost [127.0.0.1]) by dragon.nuxi.org (8.14.1/8.14.1) with ESMTP id m2G5swnI088028; Sat, 15 Mar 2008 22:54:58 -0700 (PDT) (envelope-from obrien@dragon.nuxi.org) Received: (from obrien@localhost) by dragon.nuxi.org (8.14.2/8.14.1/Submit) id m2G5swPw088027; Sat, 15 Mar 2008 22:54:58 -0700 (PDT) (envelope-from obrien) Date: Sat, 15 Mar 2008 22:54:58 -0700 From: "David O'Brien" To: Joseph Koshy Message-ID: <20080316055458.GA87605@dragon.NUXI.org> Mail-Followup-To: obrien@freebsd.org, Joseph Koshy , freebsd-arch@freebsd.org References: <20080313180805.GA83406@dragon.NUXI.org> <200803131516.12284.jhb@freebsd.org> <84dead720803132232k15c3aad7pe59875f0c84e0c27@mail.gmail.com> <200803141431.53846.jhb@freebsd.org> <84dead720803142243r6c8cc68dm325e7fb925189fd@mail.gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <84dead720803142243r6c8cc68dm325e7fb925189fd@mail.gmail.com> X-Operating-System: FreeBSD 8.0-CURRENT User-Agent: Mutt/1.5.16 (2007-06-09) Cc: freebsd-arch@freebsd.org Subject: Re: [PATCH] hwpmc(4) changes to use 'mp_maxid' instead of 'mp_ncpus'. X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: obrien@freebsd.org List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 16 Mar 2008 05:54:59 -0000 On Sat, Mar 15, 2008 at 11:13:00AM +0530, Joseph Koshy wrote: > > FreeBSD has been trying to not be quite as i386-centric as it used .. > HWPMC is very x86 centric, for obvious reasons. What is the obvious reason? Many non-x86 CPU's have HW event counters. (I assume you're not throwing ia64 into the x64 bucket). -- -- David (obrien@FreeBSD.org) From owner-freebsd-arch@FreeBSD.ORG Sun Mar 16 05:55:27 2008 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 4D5BB1065670 for ; Sun, 16 Mar 2008 05:55:27 +0000 (UTC) (envelope-from kip.macy@gmail.com) Received: from wa-out-1112.google.com (wa-out-1112.google.com [209.85.146.181]) by mx1.freebsd.org (Postfix) with ESMTP id 31E018FC14 for ; Sun, 16 Mar 2008 05:55:27 +0000 (UTC) (envelope-from kip.macy@gmail.com) Received: by wa-out-1112.google.com with SMTP id k17so5500743waf.3 for ; Sat, 15 Mar 2008 22:55:26 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:message-id:date:from:to:subject:cc:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references; bh=7Auzw4GbEotW2PsfLohK5brJDykYQxNo9aRBLN6/XDc=; b=G5AEhuBgcapBftEFsI6o8gm/07IL7hnN0rH2XcZ+3DRD/ELYpv2hv0Oi7QElNhtd159J3ssxPS2wbHPrCRmW5b2oK/xxgjwz+msbtzzSlr7BxeROLSc7/A2gB8SNHmbYcnMzbpp+l2owLHIQ8dyoLrX1/l8k/gsipueKpbpr3o8= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=message-id:date:from:to:subject:cc:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references; b=keDq+pZqUHViTjGtc1pX3WPjlDxZsj5QmzDF4NIrYdzqE4mkV2vaaF/3emLk9Hep6fYN173+Jlvx0F1Pq9xuSYFwqqS5VIqGTxVHFrPPjCElDegzn7BDFAtvhkB+S6XeUeRr6L4uIRRMGP6lasd938x+R1jZ5pOGzldcDm4VIwU= Received: by 10.114.195.19 with SMTP id s19mr15140268waf.57.1205646926748; Sat, 15 Mar 2008 22:55:26 -0700 (PDT) Received: by 10.115.22.10 with HTTP; Sat, 15 Mar 2008 22:55:26 -0700 (PDT) Message-ID: Date: Sat, 15 Mar 2008 22:55:26 -0700 From: "Kip Macy" To: "Jeff Roberson" In-Reply-To: <20080315195328.V910@desktop> MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Content-Disposition: inline References: <20080315195328.V910@desktop> Cc: arch@freebsd.org Subject: Re: separating out memory checks from INVARIANTS X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 16 Mar 2008 05:55:27 -0000 On Sat, Mar 15, 2008 at 10:54 PM, Jeff Roberson wrote: > > On Sat, 15 Mar 2008, Kip Macy wrote: > > > I find that the serialization of memory allocation frequently hides > > race conditions. I would like to, at the very least, add an option to > > disable the memory checks if not make the memory checks a completely > > separate option. My knee jerk reaction to avoiding bikesheds is to > > simply add it to my own tree and forget about it. However, this has > > come up often enough that I feel that it warrants consideration. > > > > > > Thoughts? > > One other option that I have frequently considered is to convert UMA from > using an array of bytes to using bitfields to represent the free space in > a slab. Then you could use atomics to update the required information. > It'd be a bit of work. Maybe a good SoC? :) Would it make it possible to do memory allocation without holding a lock in the M_NOWAIT case? -Kip From owner-freebsd-arch@FreeBSD.ORG Sun Mar 16 06:12:14 2008 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 405C9106564A for ; Sun, 16 Mar 2008 06:12:14 +0000 (UTC) (envelope-from jroberson@chesapeake.net) Received: from webaccess-cl.virtdom.com (webaccess-cl.virtdom.com [216.240.101.25]) by mx1.freebsd.org (Postfix) with ESMTP id 177D78FC37 for ; Sun, 16 Mar 2008 06:12:14 +0000 (UTC) (envelope-from jroberson@chesapeake.net) Received: from [192.168.1.107] (cpe-24-94-75-93.hawaii.res.rr.com [24.94.75.93]) (authenticated bits=0) by webaccess-cl.virtdom.com (8.13.6/8.13.6) with ESMTP id m2G6CC13012954; Sun, 16 Mar 2008 02:12:13 -0400 (EDT) (envelope-from jroberson@chesapeake.net) Date: Sat, 15 Mar 2008 20:12:46 -1000 (HST) From: Jeff Roberson X-X-Sender: jroberson@desktop To: Kip Macy In-Reply-To: Message-ID: <20080315201153.X910@desktop> References: <20080315195328.V910@desktop> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: arch@freebsd.org Subject: Re: separating out memory checks from INVARIANTS X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 16 Mar 2008 06:12:14 -0000 On Sat, 15 Mar 2008, Kip Macy wrote: > On Sat, Mar 15, 2008 at 10:54 PM, Jeff Roberson > wrote: >> >> On Sat, 15 Mar 2008, Kip Macy wrote: >> >> > I find that the serialization of memory allocation frequently hides >> > race conditions. I would like to, at the very least, add an option to >> > disable the memory checks if not make the memory checks a completely >> > separate option. My knee jerk reaction to avoiding bikesheds is to >> > simply add it to my own tree and forget about it. However, this has >> > come up often enough that I feel that it warrants consideration. >> > >> > >> > Thoughts? >> >> One other option that I have frequently considered is to convert UMA from >> using an array of bytes to using bitfields to represent the free space in >> a slab. Then you could use atomics to update the required information. >> It'd be a bit of work. Maybe a good SoC? :) > > > Would it make it possible to do memory allocation without holding a > lock in the M_NOWAIT case? Yes, when I originally wrote the code it didn't require a lock because I relied on byte writes being atomic. However, we had platforms for which that wasn't true. (alpha). It may be that it's safe not to lock even now on x86/amd64. I don't know the specifics of the memory architectures on powerpc, arm, mips, etc. though. Jeff > > -Kip > From owner-freebsd-arch@FreeBSD.ORG Sun Mar 16 13:46:44 2008 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 5A79F106564A for ; Sun, 16 Mar 2008 13:46:44 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail34.syd.optusnet.com.au (mail34.syd.optusnet.com.au [211.29.133.218]) by mx1.freebsd.org (Postfix) with ESMTP id DCDA88FC28 for ; Sun, 16 Mar 2008 13:46:43 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from besplex.bde.org (c220-239-252-11.carlnfd3.nsw.optusnet.com.au [220.239.252.11]) by mail34.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id m2GDkNnR019701 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Mon, 17 Mar 2008 00:46:25 +1100 Date: Mon, 17 Mar 2008 00:45:12 +1100 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Jeff Roberson In-Reply-To: <20080315201153.X910@desktop> Message-ID: <20080317003655.A40697@besplex.bde.org> References: <20080315195328.V910@desktop> <20080315201153.X910@desktop> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: Kip Macy , arch@freebsd.org Subject: Re: separating out memory checks from INVARIANTS X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 16 Mar 2008 13:46:44 -0000 On Sat, 15 Mar 2008, Jeff Roberson wrote: > On Sat, 15 Mar 2008, Kip Macy wrote: >> Would it make it possible to do memory allocation without holding a >> lock in the M_NOWAIT case? > > Yes, when I originally wrote the code it didn't require a lock because I > relied on byte writes being atomic. However, we had platforms for which that > wasn't true. (alpha). It may be that it's safe not to lock even now on > x86/amd64. I don't know the specifics of the memory architectures on > powerpc, arm, mips, etc. though. sparc64 only supports atomic ops on sizes 32 and 64 bits. I think it and not alpha was responsible for eliminating use of 8 and 16 bit atomic ops in MI code. My version of atomic.h for i386 only supports atomic ops on size 32. 8 and 16 bit atomic ops are not usable in MI code and are or were not used in i386 MD code, so they are just interface bloat. Bruce From owner-freebsd-arch@FreeBSD.ORG Sun Mar 16 22:32:37 2008 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 177BA1065672 for ; Sun, 16 Mar 2008 22:32:37 +0000 (UTC) (envelope-from jroberson@chesapeake.net) Received: from webaccess-cl.virtdom.com (webaccess-cl.virtdom.com [216.240.101.25]) by mx1.freebsd.org (Postfix) with ESMTP id CB93C8FC13 for ; Sun, 16 Mar 2008 22:32:36 +0000 (UTC) (envelope-from jroberson@chesapeake.net) Received: from [192.168.1.107] (cpe-24-94-75-93.hawaii.res.rr.com [24.94.75.93]) (authenticated bits=0) by webaccess-cl.virtdom.com (8.13.6/8.13.6) with ESMTP id m2GMWXak036310; Sun, 16 Mar 2008 18:32:35 -0400 (EDT) (envelope-from jroberson@chesapeake.net) Date: Sun, 16 Mar 2008 12:33:11 -1000 (HST) From: Jeff Roberson X-X-Sender: jroberson@desktop To: Bruce Evans In-Reply-To: <20080317003655.A40697@besplex.bde.org> Message-ID: <20080316122539.N910@desktop> References: <20080315195328.V910@desktop> <20080315201153.X910@desktop> <20080317003655.A40697@besplex.bde.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: Kip Macy , arch@freebsd.org Subject: Re: separating out memory checks from INVARIANTS X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 16 Mar 2008 22:32:37 -0000 On Mon, 17 Mar 2008, Bruce Evans wrote: > On Sat, 15 Mar 2008, Jeff Roberson wrote: > >> On Sat, 15 Mar 2008, Kip Macy wrote: >>> Would it make it possible to do memory allocation without holding a >>> lock in the M_NOWAIT case? >> >> Yes, when I originally wrote the code it didn't require a lock because I >> relied on byte writes being atomic. However, we had platforms for which >> that wasn't true. (alpha). It may be that it's safe not to lock even now >> on x86/amd64. I don't know the specifics of the memory architectures on >> powerpc, arm, mips, etc. though. > > sparc64 only supports atomic ops on sizes 32 and 64 bits. I think it > and not alpha was responsible for eliminating use of 8 and 16 bit > atomic ops in MI code. > > My version of atomic.h for i386 only supports atomic ops on size 32. > 8 and 16 bit atomic ops are not usable in MI code and are or were not > used in i386 MD code, so they are just interface bloat. It actually doesn't require atomics, just a write memory barrier and the ability to write a single byte without affecting surrounding bytes. The 'owner' of the piece of memory sets its status in the array in the slab header. The setting marks the memory as allocated. You only have to have the write memory barrier to prevent one cpu from allocating and another from freeing the same piece of memory before the store to the slab header becomes visible. That race is probably impossible given how small store buffers really usually are. The real problem is that you can't write a single byte on earlier alphas. You would load 32bit, set the byte you were interested in, and store 32bit. On later alphas and other more reasonable machines if you write a byte the cache hardware essentially does that with a cache line, but using the coherency protocol to ensure that only that byte is written. Jeff > > Bruce > From owner-freebsd-arch@FreeBSD.ORG Mon Mar 17 07:13:27 2008 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id ABEB9106566B for ; Mon, 17 Mar 2008 07:13:27 +0000 (UTC) (envelope-from jkoshy.freebsd@gmail.com) Received: from fg-out-1718.google.com (fg-out-1718.google.com [72.14.220.152]) by mx1.freebsd.org (Postfix) with ESMTP id C7EED8FC18 for ; Mon, 17 Mar 2008 07:13:26 +0000 (UTC) (envelope-from jkoshy.freebsd@gmail.com) Received: by fg-out-1718.google.com with SMTP id 16so4595687fgg.35 for ; Mon, 17 Mar 2008 00:13:25 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:message-id:to:subject:user-agent:mime-version:content-type:from:date:sender; bh=zzGD1pze9xS55BXgjLeyoqZor5v+owBlV8Grj2npvM0=; b=AmOoC7nOA3XxpK0DXgpFTjDTvxeiaWjWQYqFU9v0+UoSMcSmmgXqN6rZyOPKQXzODEQA+Gv+lVaUEuquLTLIArJOb5wtEvAo0C0DtUm17a204avD+KxyednlHtL0hPsKVFzX5IKFjpL6IqgBfFosu84wg1efTWRGhgY8NZclHZ4= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=message-id:to:subject:user-agent:mime-version:content-type:from:date:sender; b=QF8lR3nfzUOm08ccutfww3JLUk3pXG4n2/ti8B2XAyWR06Gj8rmVQmOutcq4MCyFnA9cA0pKfEFy+mctlVhliIoHN31UJWjaWO2Xhp/6V/W2N66oeus4sXXTKmBw8Appq7sBFDA8VOyZUZUlCl5cgaGS7iMDmkV43IQ5SSZI1Ro= Received: by 10.82.145.7 with SMTP id s7mr35691543bud.24.1205736531936; Sun, 16 Mar 2008 23:48:51 -0700 (PDT) Received: from moria.unixconsulting.co.in ( [203.145.156.9]) by mx.google.com with ESMTPS id d13sm32187776fka.7.2008.03.16.23.47.49 (version=TLSv1/SSLv3 cipher=OTHER); Sun, 16 Mar 2008 23:48:50 -0700 (PDT) Message-ID: <86prtt7svf.wl%koshy@unixconsulting.co.in> To: freebsd-arch@freebsd.org User-Agent: Wanderlust/2.14.0 (Africa) SEMI/1.14.6 (Maruoka) FLIM/1.14.8 (=?ISO-8859-4?Q?Shij=F2?=) APEL/10.7 Emacs/21.3 (amd64--freebsd) MULE/5.0 (SAKAKI) MIME-Version: 1.0 (generated by SEMI 1.14.6 - "Maruoka") Content-Type: multipart/mixed; boundary="Multipart_Mon_Mar_17_12:13:32_2008-1" From: Joseph Koshy Date: Mon, 17 Mar 2008 06:44:11 -0000 Sender: Joseph Koshy Subject: HWPMC changes: sparse CPU numbering and hot plug preliminaries X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 17 Mar 2008 07:13:28 -0000 --Multipart_Mon_Mar_17_12:13:32_2008-1 Content-Type: text/plain; charset=US-ASCII The following patch implements support for sparse CPU numbering in HWPMC and lays the ground work for future hot plugging of CPUs. Highlights: *) CPUs are now numbered from 0..{PMC_CPU_MAX}, where today {PMC_CPU_MAX} is derived from `mp_maxid'. *) CPUs are treated as being in one of the following states: ABSENT, i.e. not present DISABLED, i.e., present but administratively disabled, perhaps in preparation to be pulled out. ACTIVE, i.e., present and participating in scheduling and capable of fielding interrupts. There are set of new predicates that the module can use in "sys/kern/kern_pmc.c". *) Initialization and teardown has been split into two logical parts: - Initialization and teardown that needs to be done for all CPUs whether 'active' or not. - Initialization and teardown for 'active' CPUs. The second kind of initialization/teardown happens when a CPU changes state; the first is for things to be done at module load/unload time. Code for the existing PMC 'MD' layers has been changed to the new scheme. *) The asserts have been changed to use the new support functions in kern_pmc.c; this reduces reliance on specifics of the kernel's implementation (e.g., the direct use of variables like `mp_maxid'). In this new scheme userland will have to cope with the possibility that a working PMC will now become inaccessible. The rest of the implementation of hot-plug support depends on how the base kernel notifies modules of the arrival or departure of a CPU. An eventhandler callback would be sufficient for CPU arrivals, but CPU departures are more complex. For example, we need to distinguish controlled CPU departures from uncontrolled ones. The controlled case is the one where HWPMC code gets a chance to run on the departing CPU before it leaves. In the uncontrolled case, all HWPMC can do is clean up its internal data structures. Regards, Koshy --Multipart_Mon_Mar_17_12:13:32_2008-1 Content-Type: text/plain; charset=US-ASCII Content-Disposition: inline; filename="p.txt" Content-Transfer-Encoding: 7bit Index: sys/dev/hwpmc/hwpmc_amd.c =================================================================== RCS file: /cvs/FreeBSD/src/sys/dev/hwpmc/hwpmc_amd.c,v retrieving revision 1.14 diff -u -r1.14 hwpmc_amd.c --- sys/dev/hwpmc/hwpmc_amd.c 7 Dec 2007 08:20:15 -0000 1.14 +++ sys/dev/hwpmc/hwpmc_amd.c 16 Mar 2008 11:11:32 -0000 @@ -39,6 +39,7 @@ #include #include #include +#include #include #include @@ -269,7 +270,7 @@ const struct pmc_hw *phw; pmc_value_t tmp; - KASSERT(cpu >= 0 && cpu < mp_ncpus, + KASSERT(cpu >= 0 && cpu < pmc_cpu_max(), ("[amd,%d] illegal CPU value %d", __LINE__, cpu)); KASSERT(ri >= 0 && ri < AMD_NPMCS, ("[amd,%d] illegal row-index %d", __LINE__, ri)); @@ -324,7 +325,7 @@ const struct pmc_hw *phw; enum pmc_mode mode; - KASSERT(cpu >= 0 && cpu < mp_ncpus, + KASSERT(cpu >= 0 && cpu < pmc_cpu_max(), ("[amd,%d] illegal CPU value %d", __LINE__, cpu)); KASSERT(ri >= 0 && ri < AMD_NPMCS, ("[amd,%d] illegal row-index %d", __LINE__, ri)); @@ -371,7 +372,7 @@ PMCDBG(MDP,CFG,1, "cpu=%d ri=%d pm=%p", cpu, ri, pm); - KASSERT(cpu >= 0 && cpu < mp_ncpus, + KASSERT(cpu >= 0 && cpu < pmc_cpu_max(), ("[amd,%d] illegal CPU value %d", __LINE__, cpu)); KASSERT(ri >= 0 && ri < AMD_NPMCS, ("[amd,%d] illegal row-index %d", __LINE__, ri)); @@ -453,7 +454,7 @@ (void) cpu; - KASSERT(cpu >= 0 && cpu < mp_ncpus, + KASSERT(cpu >= 0 && cpu < pmc_cpu_max(), ("[amd,%d] illegal CPU value %d", __LINE__, cpu)); KASSERT(ri >= 0 && ri < AMD_NPMCS, ("[amd,%d] illegal row index %d", __LINE__, ri)); @@ -547,7 +548,7 @@ (void) pmc; - KASSERT(cpu >= 0 && cpu < mp_ncpus, + KASSERT(cpu >= 0 && cpu < pmc_cpu_max(), ("[amd,%d] illegal CPU value %d", __LINE__, cpu)); KASSERT(ri >= 0 && ri < AMD_NPMCS, ("[amd,%d] illegal row-index %d", __LINE__, ri)); @@ -579,7 +580,7 @@ struct pmc_hw *phw; const struct amd_descr *pd; - KASSERT(cpu >= 0 && cpu < mp_ncpus, + KASSERT(cpu >= 0 && cpu < pmc_cpu_max(), ("[amd,%d] illegal CPU value %d", __LINE__, cpu)); KASSERT(ri >= 0 && ri < AMD_NPMCS, ("[amd,%d] illegal row-index %d", __LINE__, ri)); @@ -628,7 +629,7 @@ const struct amd_descr *pd; uint64_t config; - KASSERT(cpu >= 0 && cpu < mp_ncpus, + KASSERT(cpu >= 0 && cpu < pmc_cpu_max(), ("[amd,%d] illegal CPU value %d", __LINE__, cpu)); KASSERT(ri >= 0 && ri < AMD_NPMCS, ("[amd,%d] illegal row-index %d", __LINE__, ri)); @@ -680,7 +681,7 @@ struct pmc_hw *phw; pmc_value_t v; - KASSERT(cpu >= 0 && cpu < mp_ncpus, + KASSERT(cpu >= 0 && cpu < pmc_cpu_max(), ("[amd,%d] out of range CPU %d", __LINE__, cpu)); PMCDBG(MDP,INT,1, "cpu=%d tf=0x%p um=%d", cpu, (void *) tf, @@ -760,7 +761,7 @@ const struct amd_descr *pd; struct pmc_hw *phw; - KASSERT(cpu >= 0 && cpu < mp_ncpus, + KASSERT(cpu >= 0 && cpu < pmc_cpu_max(), ("[amd,%d] illegal CPU %d", __LINE__, cpu)); KASSERT(ri >= 0 && ri < AMD_NPMCS, ("[amd,%d] row-index %d out of range", __LINE__, ri)); @@ -829,7 +830,7 @@ struct amd_cpu *pcs; struct pmc_hw *phw; - KASSERT(cpu >= 0 && cpu < mp_ncpus, + KASSERT(cpu >= 0 && cpu < pmc_cpu_max(), ("[amd,%d] insane cpu number %d", __LINE__, cpu)); PMCDBG(MDP,INI,1,"amd-init cpu=%d", cpu); @@ -859,38 +860,44 @@ return 0; } - /* - * processor dependent cleanup prior to the KLD - * being unloaded + * Cleanup actions needed by an active CPU. */ static int -amd_cleanup(int cpu) +amd_cpu_cleanup(int cpu) { int i; - uint32_t evsel; - struct pmc_cpu *pcs; - KASSERT(cpu >= 0 && cpu < mp_ncpus, + KASSERT(cpu >= 0 && cpu < pmc_cpu_max(), ("[amd,%d] insane cpu number (%d)", __LINE__, cpu)); - - PMCDBG(MDP,INI,1,"amd-cleanup cpu=%d", cpu); + PMCDBG(MDP,INI,1,"amd-cpu-cleanup cpu=%d", cpu); /* - * First, turn off all PMCs on this CPU. + * Turn off all PMCs on this CPU. */ + for (i = 0; i < 4; i++) + wrmsr(AMD_PMC_EVSEL_0 + i, 0); - for (i = 0; i < 4; i++) { /* XXX this loop is now not needed */ - evsel = rdmsr(AMD_PMC_EVSEL_0 + i); - evsel &= ~AMD_PMC_ENABLE; - wrmsr(AMD_PMC_EVSEL_0 + i, evsel); - } + return (0); +} + +/* + * Common cleanup. + */ + +static int +amd_cleanup(int cpu) +{ + struct pmc_cpu *pcs; + + KASSERT(cpu >= 0 && cpu < pmc_cpu_max(), + ("[amd,%d] insane cpu number (%d)", __LINE__, cpu)); + PMCDBG(MDP,INI,1,"amd-cleanup cpu=%d", cpu); /* - * Next, free up allocated space. + * Free up allocated space. */ - if ((pcs = pmc_pcpu[cpu]) == NULL) return 0; @@ -988,6 +995,8 @@ pmc_mdep->pmd_init = amd_init; pmc_mdep->pmd_cleanup = amd_cleanup; + pmc_mdep->pmd_cpu_init = NULL; + pmc_mdep->pmd_cpu_cleanup = amd_cpu_cleanup; pmc_mdep->pmd_switch_in = amd_switch_in; pmc_mdep->pmd_switch_out = amd_switch_out; pmc_mdep->pmd_read_pmc = amd_read_pmc; Index: sys/dev/hwpmc/hwpmc_mod.c =================================================================== RCS file: /cvs/FreeBSD/src/sys/dev/hwpmc/hwpmc_mod.c,v retrieving revision 1.32 diff -u -r1.32 hwpmc_mod.c --- sys/dev/hwpmc/hwpmc_mod.c 13 Jan 2008 14:44:02 -0000 1.32 +++ sys/dev/hwpmc/hwpmc_mod.c 17 Mar 2008 04:05:50 -0000 @@ -98,8 +98,8 @@ KASSERT(pmc_pmcdisp[(R)] <= 0, ("[pmc,%d] row disposition error", \ __LINE__)); \ atomic_add_int(&pmc_pmcdisp[(R)], -1); \ - KASSERT(pmc_pmcdisp[(R)] >= (-mp_ncpus), ("[pmc,%d] row " \ - "disposition error", __LINE__)); \ + KASSERT(pmc_pmcdisp[(R)] >= (-pmc_cpu_active_count()), \ + ("[pmc,%d] row disposition error", __LINE__)); \ } while (0) #define PMC_UNMARK_ROW_STANDALONE(R) do { \ @@ -637,12 +637,12 @@ static void pmc_select_cpu(int cpu) { - KASSERT(cpu >= 0 && cpu < mp_ncpus, + KASSERT(cpu >= 0 && cpu < pmc_cpu_max(), ("[pmc,%d] bad cpu number %d", __LINE__, cpu)); - /* never move to a disabled CPU */ - KASSERT(pmc_cpu_is_disabled(cpu) == 0, ("[pmc,%d] selecting " - "disabled CPU %d", __LINE__, cpu)); + /* Never move to an inactive CPU. */ + KASSERT(pmc_cpu_is_active(cpu), ("[pmc,%d] selecting inactive " + "CPU %d", __LINE__, cpu)); PMCDBG(CPU,SEL,2, "select-cpu cpu=%d", cpu); thread_lock(curthread); @@ -1182,7 +1182,7 @@ PMCDBG(CSW,SWI,1, "cpu=%d proc=%p (%d, %s) pp=%p", cpu, p, p->p_pid, p->p_comm, pp); - KASSERT(cpu >= 0 && cpu < mp_ncpus, + KASSERT(cpu >= 0 && cpu < pmc_cpu_max(), ("[pmc,%d] wierd CPU id %d", __LINE__, cpu)); pc = pmc_pcpu[cpu]; @@ -1307,7 +1307,7 @@ PMCDBG(CSW,SWO,1, "cpu=%d proc=%p (%d, %s) pp=%p", cpu, p, p->p_pid, p->p_comm, pp); - KASSERT(cpu >= 0 && cpu < mp_ncpus, + KASSERT(cpu >= 0 && cpu < pmc_cpu_max(), ("[pmc,%d wierd CPU id %d", __LINE__, cpu)); pc = pmc_pcpu[cpu]; @@ -2034,7 +2034,7 @@ #ifdef DEBUG volatile int maxloop; - maxloop = 100 * mp_ncpus; + maxloop = 100 * pmc_cpu_max(); #endif /* @@ -2495,7 +2495,7 @@ cpu = PMC_TO_CPU(pm); - if (pmc_cpu_is_disabled(cpu)) + if (!pmc_cpu_is_active(cpu)) return ENXIO; pmc_select_cpu(cpu); @@ -2562,10 +2562,10 @@ cpu = PMC_TO_CPU(pm); - KASSERT(cpu >= 0 && cpu < mp_ncpus, + KASSERT(cpu >= 0 && cpu < pmc_cpu_max(), ("[pmc,%d] illegal cpu=%d", __LINE__, cpu)); - if (pmc_cpu_is_disabled(cpu)) + if (!pmc_cpu_is_active(cpu)) return ENXIO; pmc_select_cpu(cpu); @@ -2730,7 +2730,7 @@ struct pmc_op_getcpuinfo gci; gci.pm_cputype = md->pmd_cputype; - gci.pm_ncpu = mp_ncpus; + gci.pm_ncpu = mp_ncpus; /* XXX: use pmc_cpu_max() */ gci.pm_npmc = md->pmd_npmc; gci.pm_nclass = md->pmd_nclass; bcopy(md->pmd_classes, &gci.pm_classes, @@ -2798,12 +2798,12 @@ if ((error = copyin(&gpi->pm_cpu, &cpu, sizeof(cpu))) != 0) break; - if (cpu >= (unsigned int) mp_ncpus) { + if (cpu >= pmc_cpu_max()) { error = EINVAL; break; } - if (pmc_cpu_is_disabled(cpu)) { + if (!pmc_cpu_is_active(cpu)) { error = ENXIO; break; } @@ -2892,12 +2892,12 @@ cpu = pma.pm_cpu; - if (cpu < 0 || cpu >= mp_ncpus) { + if (cpu < 0 || cpu >= (int) pmc_cpu_max()) { error = EINVAL; break; } - if (pmc_cpu_is_disabled(cpu)) { + if (!pmc_cpu_is_active(cpu)) { error = ENXIO; break; } @@ -2985,7 +2985,7 @@ if ((mode != PMC_MODE_SS && mode != PMC_MODE_SC && mode != PMC_MODE_TS && mode != PMC_MODE_TC) || - (cpu != (u_int) PMC_CPU_ANY && cpu >= (u_int) mp_ncpus)) { + (cpu != (u_int) PMC_CPU_ANY && cpu >= pmc_cpu_max())) { error = EINVAL; break; } @@ -3002,10 +3002,10 @@ } /* - * Check that a disabled CPU is not being asked for. + * Check that an inactive CPU is not being asked for. */ - if (PMC_IS_SYSTEM_MODE(mode) && pmc_cpu_is_disabled(cpu)) { + if (PMC_IS_SYSTEM_MODE(mode) && !pmc_cpu_is_active(cpu)) { error = ENXIO; break; } @@ -3518,7 +3518,7 @@ cpu = PMC_TO_CPU(pm); ri = PMC_TO_ROWINDEX(pm); - if (pmc_cpu_is_disabled(cpu)) { + if (!pmc_cpu_is_active(cpu)) { error = ENXIO; break; } @@ -4288,6 +4288,7 @@ pmc_initialize(void) { int cpu, error, n; + unsigned int maxcpu; struct pmc_binding pb; struct pmc_sample *ps; struct pmc_samplebuffer *sb; @@ -4345,21 +4346,38 @@ if (md == NULL || md->pmd_init == NULL) return ENOSYS; + maxcpu = pmc_cpu_max(); + /* allocate space for the per-cpu array */ - MALLOC(pmc_pcpu, struct pmc_cpu **, mp_ncpus * sizeof(struct pmc_cpu *), + MALLOC(pmc_pcpu, struct pmc_cpu **, maxcpu * sizeof(struct pmc_cpu *), M_PMC, M_WAITOK|M_ZERO); /* per-cpu 'saved values' for managing process-mode PMCs */ MALLOC(pmc_pcpu_saved, pmc_value_t *, - sizeof(pmc_value_t) * mp_ncpus * md->pmd_npmc, M_PMC, M_WAITOK); + sizeof(pmc_value_t) * maxcpu * md->pmd_npmc, M_PMC, M_WAITOK); - /* perform cpu dependent initialization */ + /* + * Perform MD layer initialization. This initialization has + * two parts: + * + * - Initialization that is needed irrespective of whether a + * CPU is active or not. + * - Initialization required for active CPUs. + */ pmc_save_cpu_binding(&pb); - for (cpu = 0; cpu < mp_ncpus; cpu++) { - if (pmc_cpu_is_disabled(cpu)) + for (cpu = 0; cpu < maxcpu; cpu++) { + if (md->pmd_init != NULL && + (error = md->pmd_init(cpu)) != 0) + break; + /* + * Next, we call the MD initialization code for + * currently `active' CPUs; the MD code can expect to + * run on the CPU it is initializing. + */ + if (!pmc_cpu_is_active(cpu) || md->pmd_cpu_init == NULL) continue; pmc_select_cpu(cpu); - if ((error = md->pmd_init(cpu)) != 0) + if ((error = md->pmd_cpu_init(cpu)) != 0) break; } pmc_restore_cpu_binding(&pb); @@ -4368,9 +4386,7 @@ return error; /* allocate space for the sample array */ - for (cpu = 0; cpu < mp_ncpus; cpu++) { - if (pmc_cpu_is_disabled(cpu)) - continue; + for (cpu = 0; cpu < maxcpu; cpu++) { MALLOC(sb, struct pmc_samplebuffer *, sizeof(struct pmc_samplebuffer) + pmc_nsamples * sizeof(struct pmc_sample), M_PMC, @@ -4459,6 +4475,7 @@ pmc_cleanup(void) { int cpu; + unsigned int maxcpu; struct pmc_ownerhash *ph; struct pmc_owner *po, *tmp; struct pmc_binding pb; @@ -4539,9 +4556,8 @@ ("[pmc,%d] Global SS count not empty", __LINE__)); /* free the per-cpu sample buffers */ - for (cpu = 0; cpu < mp_ncpus; cpu++) { - if (pmc_cpu_is_disabled(cpu)) - continue; + maxcpu = pmc_cpu_max(); + for (cpu = 0; cpu < maxcpu; cpu++) { KASSERT(pmc_pcpu[cpu]->pc_sb != NULL, ("[pmc,%d] Null cpu sample buffer cpu=%d", __LINE__, cpu)); @@ -4554,14 +4570,19 @@ PMCDBG(MOD,INI,3, "%s", "md cleanup"); if (md) { pmc_save_cpu_binding(&pb); - for (cpu = 0; cpu < mp_ncpus; cpu++) { + for (cpu = 0; cpu < maxcpu; cpu++) { PMCDBG(MOD,INI,1,"pmc-cleanup cpu=%d pcs=%p", cpu, pmc_pcpu[cpu]); - if (pmc_cpu_is_disabled(cpu)) + if (pmc_pcpu[cpu] == NULL) continue; - pmc_select_cpu(cpu); - if (pmc_pcpu[cpu]) - (void) md->pmd_cleanup(cpu); + if (pmc_cpu_is_active(cpu) && + md->pmd_cpu_cleanup != NULL) { + pmc_select_cpu(cpu); + (void) md->pmd_cpu_cleanup(cpu); + } + /* Do cleanup for inactive CPUs if any. */ + if (md->pmd_cleanup) + md->pmd_cleanup(cpu); } FREE(md, M_PMC); md = NULL; @@ -4602,8 +4623,8 @@ error = pmc_initialize(); if (error != 0) break; - PMCDBG(MOD,INI,1, "syscall=%d ncpus=%d", - pmc_syscall_num, mp_ncpus); + PMCDBG(MOD,INI,1, "syscall=%d maxcpu=%d", + pmc_syscall_num, pmc_cpu_max()); break; Index: sys/dev/hwpmc/hwpmc_piv.c =================================================================== RCS file: /cvs/FreeBSD/src/sys/dev/hwpmc/hwpmc_piv.c,v retrieving revision 1.15 diff -u -r1.15 hwpmc_piv.c --- sys/dev/hwpmc/hwpmc_piv.c 7 Dec 2007 08:20:15 -0000 1.15 +++ sys/dev/hwpmc/hwpmc_piv.c 16 Mar 2008 11:17:53 -0000 @@ -532,8 +532,8 @@ KASSERT(p4_escrdisp[(E)] <= 0, ("[p4,%d] row disposition error",\ __LINE__)); \ atomic_add_int(&p4_escrdisp[(E)], -1); \ - KASSERT(p4_escrdisp[(E)] >= (-mp_ncpus), ("[p4,%d] row " \ - "disposition error", __LINE__)); \ + KASSERT(p4_escrdisp[(E)] >= (-pmc_cpu_active_count()), \ + ("[p4,%d] row disposition error", __LINE__)); \ } while (0) #define P4_ESCR_UNMARK_ROW_STANDALONE(E) do { \ @@ -596,11 +596,11 @@ struct p4_logicalcpu *plcs; struct pmc_hw *phw; - KASSERT(cpu >= 0 && cpu < mp_ncpus, + KASSERT(cpu >= 0 && cpu < pmc_cpu_max(), ("[p4,%d] insane cpu number %d", __LINE__, cpu)); - PMCDBG(MDP,INI,0, "p4-init cpu=%d logical=%d", cpu, - pmc_cpu_is_logical(cpu) != 0); + PMCDBG(MDP,INI,0, "p4-init cpu=%d is-primary=%d", cpu, + pmc_cpu_is_primary(cpu) != 0); /* * The two CPUs in an HT pair share their per-cpu state. @@ -614,7 +614,7 @@ * secondary. */ - if (pmc_cpu_is_logical(cpu) && (cpu & 1)) { + if (!pmc_cpu_is_primary(cpu) && (cpu & 1)) { p4_system_has_htt = 1; @@ -677,9 +677,20 @@ */ static int -p4_cleanup(int cpu) +p4_cpu_cleanup(int cpu) { int i; + + /* Turn off all PMCs on a primary CPU */ + if (!P4_CPU_IS_HTT_SECONDARY(cpu)) + for (i = 0; i < P4_NPMCS - 1; i++) + wrmsr(P4_CCCR_MSR_FIRST + i, 0); + return (0); +} + +static int +p4_cleanup(int cpu) +{ struct p4_cpu *pcs; PMCDBG(MDP,INI,0, "p4-cleanup cpu=%d", cpu); @@ -687,11 +698,6 @@ if ((pcs = (struct p4_cpu *) pmc_pcpu[cpu]) == NULL) return 0; - /* Turn off all PMCs on this CPU */ - for (i = 0; i < P4_NPMCS - 1; i++) - wrmsr(P4_CCCR_MSR_FIRST + i, - rdmsr(P4_CCCR_MSR_FIRST + i) & ~P4_CCCR_ENABLE); - /* * If the CPU is physical we need to teardown the * full MD state. @@ -761,7 +767,7 @@ struct pmc_hw *phw; pmc_value_t tmp; - KASSERT(cpu >= 0 && cpu < mp_ncpus, + KASSERT(cpu >= 0 && cpu < pmc_cpu_max(), ("[p4,%d] illegal CPU value %d", __LINE__, cpu)); KASSERT(ri >= 0 && ri < P4_NPMCS, ("[p4,%d] illegal row-index %d", __LINE__, ri)); @@ -839,7 +845,7 @@ const struct pmc_hw *phw; const struct p4pmc_descr *pd; - KASSERT(cpu >= 0 && cpu < mp_ncpus, + KASSERT(cpu >= 0 && cpu < pmc_cpu_max(), ("[amd,%d] illegal CPU value %d", __LINE__, cpu)); KASSERT(ri >= 0 && ri < P4_NPMCS, ("[amd,%d] illegal row-index %d", __LINE__, ri)); @@ -913,7 +919,7 @@ struct p4_cpu *pc; int cfgflags, cpuflag; - KASSERT(cpu >= 0 && cpu < mp_ncpus, + KASSERT(cpu >= 0 && cpu < pmc_cpu_max(), ("[p4,%d] illegal CPU %d", __LINE__, cpu)); KASSERT(ri >= 0 && ri < P4_NPMCS, ("[p4,%d] illegal row-index %d", __LINE__, ri)); @@ -1050,7 +1056,7 @@ struct p4_event_descr *pevent; const struct p4pmc_descr *pd; - KASSERT(cpu >= 0 && cpu < mp_ncpus, + KASSERT(cpu >= 0 && cpu < pmc_cpu_max(), ("[p4,%d] illegal CPU %d", __LINE__, cpu)); KASSERT(ri >= 0 && ri < P4_NPMCS, ("[p4,%d] illegal row-index value %d", __LINE__, ri)); @@ -1297,7 +1303,7 @@ struct pmc_hw *phw; struct p4pmc_descr *pd; - KASSERT(cpu >= 0 && cpu < mp_ncpus, + KASSERT(cpu >= 0 && cpu < pmc_cpu_max(), ("[p4,%d] illegal CPU value %d", __LINE__, cpu)); KASSERT(ri >= 0 && ri < P4_NPMCS, ("[p4,%d] illegal row-index %d", __LINE__, ri)); @@ -1449,7 +1455,7 @@ struct p4pmc_descr *pd; pmc_value_t tmp; - KASSERT(cpu >= 0 && cpu < mp_ncpus, + KASSERT(cpu >= 0 && cpu < pmc_cpu_max(), ("[p4,%d] illegal CPU value %d", __LINE__, cpu)); KASSERT(ri >= 0 && ri < P4_NPMCS, ("[p4,%d] illegal row index %d", __LINE__, ri)); @@ -1722,7 +1728,7 @@ struct pmc_hw *phw; const struct p4pmc_descr *pd; - KASSERT(cpu >= 0 && cpu < mp_ncpus, + KASSERT(cpu >= 0 && cpu < pmc_cpu_max(), ("[p4,%d] illegal CPU %d", __LINE__, cpu)); KASSERT(ri >= 0 && ri < P4_NPMCS, ("[p4,%d] row-index %d out of range", __LINE__, ri)); @@ -1791,6 +1797,8 @@ pmc_mdep->pmd_init = p4_init; pmc_mdep->pmd_cleanup = p4_cleanup; + pmc_mdep->pmd_cpu_init = NULL; + pmc_mdep->pmd_cpu_cleanup = p4_cpu_cleanup; pmc_mdep->pmd_switch_in = p4_switch_in; pmc_mdep->pmd_switch_out = p4_switch_out; pmc_mdep->pmd_read_pmc = p4_read_pmc; Index: sys/dev/hwpmc/hwpmc_ppro.c =================================================================== RCS file: /cvs/FreeBSD/src/sys/dev/hwpmc/hwpmc_ppro.c,v retrieving revision 1.10 diff -u -r1.10 hwpmc_ppro.c --- sys/dev/hwpmc/hwpmc_ppro.c 7 Dec 2007 08:20:15 -0000 1.10 +++ sys/dev/hwpmc/hwpmc_ppro.c 16 Mar 2008 10:48:06 -0000 @@ -336,7 +336,7 @@ struct p6_cpu *pcs; struct pmc_hw *phw; - KASSERT(cpu >= 0 && cpu < mp_ncpus, + KASSERT(cpu >= 0 && cpu < pmc_cpu_max(), ("[p6,%d] bad cpu %d", __LINE__, cpu)); PMCDBG(MDP,INI,0,"p6-init cpu=%d", cpu); @@ -366,7 +366,7 @@ { struct pmc_cpu *pcs; - KASSERT(cpu >= 0 && cpu < mp_ncpus, + KASSERT(cpu >= 0 && cpu < pmc_cpu_max(), ("[p6,%d] bad cpu %d", __LINE__, cpu)); PMCDBG(MDP,INI,0,"p6-cleanup cpu=%d", cpu); @@ -379,6 +379,21 @@ } static int +p6_cpu_cleanup(int cpu) +{ + KASSERT(cpu >= 0 && cpu < pmc_cpu_max(), + ("[p6,%d] bad cpu %d", __LINE__, cpu)); + + PMCDBG(MDP,INI,0,"p6-cpu-cleanup cpu=%d", cpu); + + /* Turn off PMCs. */ + wrmsr(P6_MSR_EVSEL1, 0); + wrmsr(P6_MSR_EVSEL0, 0); + + return (0); +} + +static int p6_switch_in(struct pmc_cpu *pc, struct pmc_process *pp) { (void) pc; @@ -512,7 +527,7 @@ (void) cpu; - KASSERT(cpu >= 0 && cpu < mp_ncpus, + KASSERT(cpu >= 0 && cpu < pmc_cpu_max(), ("[p4,%d] illegal CPU %d", __LINE__, cpu)); KASSERT(ri >= 0 && ri < P6_NPMCS, ("[p4,%d] illegal row-index value %d", __LINE__, ri)); @@ -616,7 +631,7 @@ PMCDBG(MDP,REL,1, "p6-release cpu=%d ri=%d pm=%p", cpu, ri, pm); - KASSERT(cpu >= 0 && cpu < mp_ncpus, + KASSERT(cpu >= 0 && cpu < pmc_cpu_max(), ("[p6,%d] illegal CPU value %d", __LINE__, cpu)); KASSERT(ri >= 0 && ri < P6_NPMCS, ("[p6,%d] illegal row-index %d", __LINE__, ri)); @@ -638,7 +653,7 @@ struct pmc_hw *phw; const struct p6pmc_descr *pd; - KASSERT(cpu >= 0 && cpu < mp_ncpus, + KASSERT(cpu >= 0 && cpu < pmc_cpu_max(), ("[p6,%d] illegal CPU value %d", __LINE__, cpu)); KASSERT(ri >= 0 && ri < P6_NPMCS, ("[p6,%d] illegal row-index %d", __LINE__, ri)); @@ -682,7 +697,7 @@ struct pmc_hw *phw; struct p6pmc_descr *pd; - KASSERT(cpu >= 0 && cpu < mp_ncpus, + KASSERT(cpu >= 0 && cpu < pmc_cpu_max(), ("[p6,%d] illegal cpu value %d", __LINE__, cpu)); KASSERT(ri >= 0 && ri < P6_NPMCS, ("[p6,%d] illegal row index %d", __LINE__, ri)); @@ -724,7 +739,7 @@ struct pmc_hw *phw; pmc_value_t v; - KASSERT(cpu >= 0 && cpu < mp_ncpus, + KASSERT(cpu >= 0 && cpu < pmc_cpu_max(), ("[p6,%d] CPU %d out of range", __LINE__, cpu)); retval = 0; @@ -847,6 +862,8 @@ pmc_mdep->pmd_init = p6_init; pmc_mdep->pmd_cleanup = p6_cleanup; + pmc_mdep->pmd_cpu_init = NULL; + pmc_mdep->pmd_cpu_cleanup = p6_cpu_cleanup; pmc_mdep->pmd_switch_in = p6_switch_in; pmc_mdep->pmd_switch_out = p6_switch_out; pmc_mdep->pmd_read_pmc = p6_read_pmc; Index: sys/kern/kern_pmc.c =================================================================== RCS file: /cvs/FreeBSD/src/sys/kern/kern_pmc.c,v retrieving revision 1.7 diff -u -r1.7 kern_pmc.c --- sys/kern/kern_pmc.c 7 Dec 2007 08:20:16 -0000 1.7 +++ sys/kern/kern_pmc.c 16 Mar 2008 10:50:12 -0000 @@ -1,5 +1,5 @@ /*- - * Copyright (c) 2003-2007 Joseph Koshy + * Copyright (c) 2003-2008 Joseph Koshy * Copyright (c) 2007 The FreeBSD Foundation * All rights reserved. * @@ -81,25 +81,95 @@ SYSINIT(pmcsx, SI_SUB_LOCK, SI_ORDER_MIDDLE, pmc_init_sx, NULL); /* - * Helper functions + * Helper functions. */ +/* + * A note on the CPU numbering scheme used by HWPMC. + * + * CPUs are denoted using numbers in the range 0..[pmc_cpu_max()-1]. + * CPUs could be numbered "sparsely" in this range; the pmc_cpu_is_present() + * predicate is used to test whether a given CPU exists. This is a + * runtime test in order to support hot-pluggable CPUs. + * + * A physically present CPU may be administratively disabled or + * otherwise unavailable for use by HWPMC. The pmc_cpu_is_active() + * predicate tests for CPU usability. + * + * On systems with hyperthreaded CPUs, multiple ``CPU''s may share PMC + * hardware resources. For such processors one ``CPU'' is denoted as + * the primary owner of the in-CPU PMC resources. The pmc_cpu_is_primary() + * predicate is used to distinguish this primary CPU from the others. + */ + +/* + * An `active' CPU is one which can be used for PMC operations. It + * should be participating in thread scheduling and should be able to + * field interrupts raised by PMC hardware. + */ + +int +pmc_cpu_is_active(int cpu) +{ +#ifdef SMP + return (pmc_cpu_is_present(cpu) && + (hlt_cpus_mask & (1 << cpu)) == 0); +#else + return (1); +#endif +} + int -pmc_cpu_is_disabled(int cpu) +pmc_cpu_is_present(int cpu) { #ifdef SMP - return ((hlt_cpus_mask & (1 << cpu)) != 0); + return (!CPU_ABSENT(cpu)); #else - return 0; + return (1); #endif } int -pmc_cpu_is_logical(int cpu) +pmc_cpu_is_primary(int cpu) { #ifdef SMP - return ((logical_cpus_mask & (1 << cpu)) != 0); + return ((logical_cpus_mask & (1 << cpu)) == 0); #else - return 0; + return (1); #endif } + + +/* + * Return the maximum CPU number supported by the system. The return + * value is used for scaling internal data structures and for runtime + * checks. + */ + +unsigned int +pmc_cpu_max(void) +{ +#ifdef SMP + return (mp_maxid+1); +#else + return (1); +#endif +} + +#ifdef INVARIANTS + +/* + * Return the count of CPUs in the `active' state in the system. + */ + +int +pmc_cpu_active_count(void) +{ +#ifdef SMP + return (mp_ncpus); /* To be changed along with the base kernel. */ +#else + return (1); +#endif +} + +#endif Index: sys/sys/pmc.h =================================================================== RCS file: /cvs/FreeBSD/src/sys/sys/pmc.h,v retrieving revision 1.14 diff -u -r1.14 pmc.h --- sys/sys/pmc.h 14 Jan 2008 06:33:41 -0000 1.14 +++ sys/sys/pmc.h 16 Mar 2008 10:25:35 -0000 @@ -871,6 +871,8 @@ int (*pmd_init)(int _cpu); /* machine dependent initialization */ int (*pmd_cleanup)(int _cpu); /* machine dependent cleanup */ + int (*pmd_cpu_init)(int _cpu); /* initialization for active CPUs */ + int (*pmd_cpu_cleanup)(int _cpu); /* cleanup for active CPUs */ /* thread context switch in/out */ int (*pmd_switch_in)(struct pmc_cpu *_p, struct pmc_process *_pp); Index: sys/sys/pmckern.h =================================================================== RCS file: /cvs/FreeBSD/src/sys/sys/pmckern.h,v retrieving revision 1.7 diff -u -r1.7 pmckern.h --- sys/sys/pmckern.h 7 Dec 2007 08:20:17 -0000 1.7 +++ sys/sys/pmckern.h 16 Mar 2008 09:14:06 -0000 @@ -124,8 +124,17 @@ /* Check if a CPU has recorded samples. */ #define PMC_CPU_HAS_SAMPLES(C) (__predict_false(pmc_cpumask & (1 << (C)))) -/* helper functions */ -int pmc_cpu_is_disabled(int _cpu); -int pmc_cpu_is_logical(int _cpu); +/* + * Helper functions. + */ + +int pmc_cpu_is_active(int _cpu); +int pmc_cpu_is_present(int _cpu); +int pmc_cpu_is_primary(int _cpu); +unsigned int pmc_cpu_max(void); + +#ifdef INVARIANTS +int pmc_cpu_active_count(void); +#endif /* INVARIANTS */ #endif /* _SYS_PMCKERN_H_ */ --Multipart_Mon_Mar_17_12:13:32_2008-1 Content-Type: text/plain; charset=US-ASCII --Multipart_Mon_Mar_17_12:13:32_2008-1-- From owner-freebsd-arch@FreeBSD.ORG Mon Mar 17 08:53:53 2008 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 35A3B106564A for ; Mon, 17 Mar 2008 08:53:53 +0000 (UTC) (envelope-from Arthur.Hartwig@nokia.com) Received: from mgw-fb01.nokia.com (mgw-fb01.nokia.com [192.100.122.235]) by mx1.freebsd.org (Postfix) with ESMTP id 94C2C8FC15 for ; Mon, 17 Mar 2008 08:53:52 +0000 (UTC) (envelope-from Arthur.Hartwig@nokia.com) Received: from mgw-mx03.nokia.com (mgw-mx03.nokia.com [192.100.122.230]) by mgw-fb01.nokia.com (Switch-3.2.6/Switch-3.2.6) with ESMTP id m2H8SfGv014908 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK) for ; Mon, 17 Mar 2008 10:28:46 +0200 Received: from esebh107.NOE.Nokia.com (esebh107.ntc.nokia.com [172.21.143.143]) by mgw-mx03.nokia.com (Switch-3.2.6/Switch-3.2.6) with ESMTP id m2H8SWS9026220 for ; Mon, 17 Mar 2008 10:28:34 +0200 Received: from esebh104.NOE.Nokia.com ([172.21.143.34]) by esebh107.NOE.Nokia.com with Microsoft SMTPSVC(6.0.3790.1830); Mon, 17 Mar 2008 10:28:27 +0200 Received: from syebe101.NOE.Nokia.com ([172.30.128.65]) by esebh104.NOE.Nokia.com with Microsoft SMTPSVC(6.0.3790.1830); Mon, 17 Mar 2008 10:28:27 +0200 Received: from [172.30.67.77] ([172.30.67.77]) by syebe101.NOE.Nokia.com with Microsoft SMTPSVC(6.0.3790.1830); Mon, 17 Mar 2008 19:28:23 +1100 Message-ID: <47DE2BA6.7080002@nokia.com> Date: Mon, 17 Mar 2008 18:28:22 +1000 From: Arthur Hartwig User-Agent: Thunderbird 1.5.0.12 (Windows/20070509) MIME-Version: 1.0 To: FreeBSD Arch References: <20080315124008.GF80576@hoeg.nl> In-Reply-To: <20080315124008.GF80576@hoeg.nl> X-OriginalArrivalTime: 17 Mar 2008 08:28:23.0414 (UTC) FILETIME=[DD4F5960:01C88808] X-Nokia-AV: Clean Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Content-Filtered-By: Mailman/MimeDel 2.1.5 Cc: Subject: Some devfs and tty issues X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 17 Mar 2008 08:53:53 -0000 I'm developing a driver for a USB hardware modem. I want to allow dialup logins from the modem. The driver currently allows me to use cu to set and examine modem parameters, make outgoing calls and converse with the called system. I'm more interested in allowing logins over an incoming call to the modem. The driver is modeled on the uplcom driver. If I plugin the modem and start getty on the ttyU0 device (# /usr/libexec/getty std.9600 ttyU0) and then pull the adapter out of the USB socket a crash follows. I'm happy to file a PR and supply stack traces or otherwise assist in debugging. A recent message to this list and its replies suggest others have also found the interactions between the tty driver and devfs to be somewhat obscure so I'm posting this in the hope that some other eyes or old hands might be able to point to point me to something I've missed. These remarks apply to FreeBSD 6.3 RELEASE. In destroy_devl() in kern_conf.c I think the call to devfs_destroy() appears too early in the function. The following scenario in destroy_devl() is possible: 1. in process A, devfs_destroy() in fs/devfs/devfs_devs.c called, clearing CDP_ACTIVE. 2. msleep() called; devmtx released 3. context switch to process B which issues an open() which results in a call to devfs_populate() which calls devfs_populate_loop() which finds CDP_ACTIVE clear and calls dev_rel() which results in the device structure getting freed. 4. sleep time expires and process A resumes, but retains pointer to now freed device structure. The devfs_destroy() call would be better moved to somewhere towards the end of destroy_devl(), say after SI_ALIAS is cleared. The dev structure is still safe to reference after calling devfs_destroy() because the devmtx mutex is still held preveneting the freeing of the dev structure After I moved the devfs_destroy() call down past the msleep() call I could still provoke a problem by the following 1. Plug in USB modem 2. # cu -l cuaU0 to set modem parameters to auto answer 3. kill cu 4. dial in from PSTN 5. remove USB modem from USB socket. Now kernel repeatedly reports: Purging 4294967245 threads from cuaU0 I expect it will take a long while to purge that many threads :-) At least longer than I was prepared to wait. When this message was being output, the si_threadcount field of the cuaU0 cdev structure contained 0xffffffcd while much of the rest of the structure contained sensible looking values. On repeating the scenario I observed: All threads purged from cuaU0 Purging 4294967232 threads from ttyU0 Purging 4294967231 threads from ttyU0 On a second repeat of the same scenario, the cuaU0 cdev structure had 0xffffffd3 in the si_threadcount field when it was passed to destroy_devl(). Tomorrow I'll set hardware watchpoints on the si_threadcount field of the cdev structures for both ttyU0 and cuaU0 in an attempt to catch where they are being modified to these extravagant values. Arthur From owner-freebsd-arch@FreeBSD.ORG Mon Mar 17 11:07:00 2008 Return-Path: Delivered-To: freebsd-arch@hub.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id A905F106564A for ; Mon, 17 Mar 2008 11:07:00 +0000 (UTC) (envelope-from owner-bugmaster@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id 861928FC38 for ; Mon, 17 Mar 2008 11:07:00 +0000 (UTC) (envelope-from owner-bugmaster@FreeBSD.org) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.2/8.14.2) with ESMTP id m2HB70fN055039 for ; Mon, 17 Mar 2008 11:07:00 GMT (envelope-from owner-bugmaster@FreeBSD.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.14.2/8.14.1/Submit) id m2HB6xbN055035 for freebsd-arch@FreeBSD.org; Mon, 17 Mar 2008 11:06:59 GMT (envelope-from owner-bugmaster@FreeBSD.org) Date: Mon, 17 Mar 2008 11:06:59 GMT Message-Id: <200803171106.m2HB6xbN055035@freefall.freebsd.org> X-Authentication-Warning: freefall.freebsd.org: gnats set sender to owner-bugmaster@FreeBSD.org using -f From: FreeBSD bugmaster To: freebsd-arch@FreeBSD.org Cc: Subject: Current problem reports assigned to freebsd-arch@FreeBSD.org X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 17 Mar 2008 11:07:00 -0000 Current FreeBSD problem reports Critical problems Serious problems Non-critical problems S Tracker Resp. Description -------------------------------------------------------------------------------- o kern/120749 arch [request] Suggest upping the default kern.ps_arg_cache 1 problem total. From owner-freebsd-arch@FreeBSD.ORG Mon Mar 17 13:30:30 2008 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 9A4781065674 for ; Mon, 17 Mar 2008 13:30:30 +0000 (UTC) (envelope-from csjp@sub.vaned.net) Received: from sub.vaned.net (sub.vaned.net [205.200.235.40]) by mx1.freebsd.org (Postfix) with ESMTP id 666658FC1E for ; Mon, 17 Mar 2008 13:30:30 +0000 (UTC) (envelope-from csjp@sub.vaned.net) Received: by sub.vaned.net (Postfix, from userid 1001) id 350A72E1; Mon, 17 Mar 2008 08:30:29 -0500 (CDT) Date: Mon, 17 Mar 2008 08:30:29 -0500 From: "Christian S.J. Peron" To: freebsd-current@freebsd.org Message-ID: <20080317133029.GA19369@sub.vaned.net> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.4.2.3i Cc: arch@freebsd.org Subject: HEADS UP: zerocopy bpf commits impending X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 17 Mar 2008 13:30:30 -0000 All, Just wanted to give a heads up that I plan to start merging the work located in the zerocopy bpf perforce branch. We have been working on this project for about a year now and feel that it is ready to come into the tree. I will begin to merge hopefully today [assuming nobody has any concerns] or tommorow. Zerocopy bpf will be disabled by default, and can be enabled globally though the use of a sysctl variable. Once the kernel bits are in and we sort out a couple minor nits in libpcap+tcpdump, we will be be looking at getting our libpcap patches committed upstream. I will post a patch for people to experiment with in the meantime after the kernel commits are complete. We do not anticipate this will have any effect on existing bpf consumers like libpcap, tcpdump etc... so if something breaks, it shouldn't have and we need to know about :) We were pretty careful about preserving the ABI. The only exception to this is, netstat will need a recompile because the size of it's bpf stats structure changed. So if there are any objections or concerns, now is the time to raise them. Thanks From owner-freebsd-arch@FreeBSD.ORG Mon Mar 17 13:46:01 2008 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 9EB69106564A; Mon, 17 Mar 2008 13:46:01 +0000 (UTC) (envelope-from rwatson@FreeBSD.org) Received: from cyrus.watson.org (cyrus.watson.org [209.31.154.42]) by mx1.freebsd.org (Postfix) with ESMTP id 6BF4E8FC24; Mon, 17 Mar 2008 13:46:01 +0000 (UTC) (envelope-from rwatson@FreeBSD.org) Received: from fledge.watson.org (fledge.watson.org [209.31.154.41]) by cyrus.watson.org (Postfix) with ESMTP id 0F18E46C7D; Mon, 17 Mar 2008 09:46:01 -0400 (EDT) Date: Mon, 17 Mar 2008 13:46:00 +0000 (GMT) From: Robert Watson X-X-Sender: robert@fledge.watson.org To: "Christian S.J. Peron" In-Reply-To: <20080317133029.GA19369@sub.vaned.net> Message-ID: <20080317134335.A3253@fledge.watson.org> References: <20080317133029.GA19369@sub.vaned.net> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: arch@freebsd.org, freebsd-current@freebsd.org Subject: Re: HEADS UP: zerocopy bpf commits impending X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 17 Mar 2008 13:46:01 -0000 On Mon, 17 Mar 2008, Christian S.J. Peron wrote: > Just wanted to give a heads up that I plan to start merging the work located > in the zerocopy bpf perforce branch. We have been working on this project > for about a year now and feel that it is ready to come into the tree. > > I will begin to merge hopefully today [assuming nobody has any concerns] or > tommorow. Zerocopy bpf will be disabled by default, and can be enabled > globally though the use of a sysctl variable. Once the kernel bits are in > and we sort out a couple minor nits in libpcap+tcpdump, we will be be > looking at getting our libpcap patches committed upstream. I will post a > patch for people to experiment with in the meantime after the kernel commits > are complete. > > We do not anticipate this will have any effect on existing bpf consumers > like libpcap, tcpdump etc... so if something breaks, it shouldn't have and > we need to know about :) We were pretty careful about preserving the ABI. > The only exception to this is, netstat will need a recompile because the > size of it's bpf stats structure changed. > > So if there are any objections or concerns, now is the time to raise them. Per previous posts, interested parties can find the slides on the design from the BSDCan 2008 developer summit here: http://www.watson.org/~robert/freebsd/2007bsdcan/20070517-devsummit-zerocopybpf.pdf Robert N M Watson Computer Laboratory University of Cambridge From owner-freebsd-arch@FreeBSD.ORG Mon Mar 17 13:55:32 2008 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 129571065677 for ; Mon, 17 Mar 2008 13:55:32 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from elvis.mu.org (elvis.mu.org [192.203.228.196]) by mx1.freebsd.org (Postfix) with ESMTP id E6BFE8FC2C for ; Mon, 17 Mar 2008 13:55:31 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from zion.baldwin.cx (66-23-211-162.clients.speedfactory.net [66.23.211.162]) by elvis.mu.org (Postfix) with ESMTP id 88FBB1A4D7C; Mon, 17 Mar 2008 06:54:17 -0700 (PDT) From: John Baldwin To: "Joseph Koshy" Date: Mon, 17 Mar 2008 09:47:25 -0400 User-Agent: KMail/1.9.7 References: <20080313180805.GA83406@dragon.NUXI.org> <200803141431.53846.jhb@freebsd.org> <84dead720803142243r6c8cc68dm325e7fb925189fd@mail.gmail.com> In-Reply-To: <84dead720803142243r6c8cc68dm325e7fb925189fd@mail.gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Content-Disposition: inline Message-Id: <200803170947.25205.jhb@freebsd.org> Cc: freebsd-arch@freebsd.org Subject: Re: [PATCH] hwpmc(4) changes to use 'mp_maxid' instead of 'mp_ncpus'. X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 17 Mar 2008 13:55:32 -0000 On Saturday 15 March 2008 01:43:00 am Joseph Koshy wrote: > > FreeBSD has been trying to not be quite as i386-centric as it used to > > be. If you look at other code in the kernel that handles per-cpu data > > such as UMA you will see that it uses mp_maxid and CPU_ABSENT(). There > > are other places in the kernel that are broken though (such as ndis(4)). > > HWPMC is very x86 centric, for obvious reasons. Considering other CPU archictectures support various performance counters it really shouldn't be designed to be x86-centric even if it is currently only implemented for x86 CPUs. > > > - Will sysctl hw.ncpus represent the count of present CPUs or will it > > > represent the maximum CPU id? > > > > hw.ncpus is always mp_ncpus > > kern.smp.cpus is also mp_ncpus > > kern.smp.maxcpus is MAX_CPUS. > > > > Userland can just iterate from 0 to kern.smp.maxcpus while handling > > absent CPUs. (For example, the kern.cp_time[] sysctl just writes out all > > 0's for absent CPUs so that is how userland can determine an absent CPU > > in that case.) > > I thought of that. For PMCTools use, using the proposed 'online_cpus' > mask would be a better option. MAX_CPUS is a compile time value and could > be large, whereas most machines will have far fewer CPUs than that limit. > Why waste cycles needlessly? Userland cycles are "cheaper". :) I think having both is fine and userland can choose which to use (maxcpus is probably easier to impl but perhaps less efficient). > Now it appears to me that in the scheme of things described > above one of mp_maxid and mp_ncpus is superfluous. > > Here is the reasoning: > > 0) We need a compile time limit for the kernel; this is kern.smp.maxcpus. > > 1) A given machine has a maximum number of CPUs that can fit in it. > This is usually <<= MAXCPUS. Let us call this {MACHINE-MAX}. > We need to scale kernel data structures based on {MACHINE-MAX} > since using {MAXCPUS} is probably wasteful. We cannot just count the > current number of CPUS, as we do today, because more could be > hotplugged in later. > > 2) At any given instant a subset of CPUs 0..{MACHINE_MAX} will be > online. This would be tracked by the kern.smp.online_cpus/all_cpus > bitmask. > > Therefore we can use either a count (mp_ncpus) or a maximum id > (mp_maxid) to represent {MACHINE-MAX}, but either one would do. > > However, x86 MD code uses both, with newer code seeming to prefer > mp_maxid. So I am puzzled. There are far more uses of mp_ncpus > there though. The mp_ncpus uses are mostly bugs (e.g. ndis). I think mp_ncpus's primary use is for userland so people can do: make -j $(sysctl hw.ncpus) or the like. That is, if you need a simple count of CPUs in the system, that is what mp_ncpus is for. If you need to address invididual CPUs by ID, then mp_ncpus is not appropriate and you need to iterate from 0 to mp_maxid suitable to some bitmask (e.g. all_cpus via CPU_ABSENT, or a not-yet-implemented onlines_cpus wth CPU_ONLINE/CPU_OFFLINE wrappers). > jk> Changing HWPMC and its userland before the base kernel itself > jk> changes does not seem to be the right thing to do. > > jb> While the userland intIerface is somewhat lacking, all of the > in-kernel jb> infrastructure has been in place for at least the past 4 > years, and there is > jb> no excuse for any in-kernel code not properly handling sparse CPU IDs. > > I try keep userland, kernel and documentation associated with PmcTools > in sync. > > Looking around, there appear to be lots of nits that need correction. > For one, the kern.smp sysctl hierarchy is undocumented. Not entirely: > sysctl -d kern.smp kern.smp: Kernel SMP kern.smp.maxcpus: Max number of CPUs that the system was compiled for. kern.smp.active: Number of Auxillary Processors (APs) that were successfully started kern.smp.disabled: SMP has been disabled from the loader kern.smp.cpus: Number of CPUs online (On a UP 6.3 box) -- John Baldwin From owner-freebsd-arch@FreeBSD.ORG Mon Mar 17 14:00:41 2008 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 7C1321065672 for ; Mon, 17 Mar 2008 14:00:41 +0000 (UTC) (envelope-from ed@hoeg.nl) Received: from palm.hoeg.nl (mx0.hoeg.nl [IPv6:2001:610:652::211]) by mx1.freebsd.org (Postfix) with ESMTP id 045678FC2E for ; Mon, 17 Mar 2008 14:00:40 +0000 (UTC) (envelope-from ed@hoeg.nl) Received: by palm.hoeg.nl (Postfix, from userid 1000) id 622FA1CC50; Mon, 17 Mar 2008 15:00:39 +0100 (CET) Date: Mon, 17 Mar 2008 15:00:39 +0100 From: Ed Schouten To: Bruce Evans Message-ID: <20080317140039.GJ80576@hoeg.nl> References: <20080315124008.GF80576@hoeg.nl> <20080316015903.N39516@delplex.bde.org> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="3TQuZyvpu40ebvIM" Content-Disposition: inline In-Reply-To: <20080316015903.N39516@delplex.bde.org> User-Agent: Mutt/1.5.17 (2007-11-01) Cc: FreeBSD Arch Subject: Re: vgone() calling VOP_CLOSE() -> blocked threads? X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 17 Mar 2008 14:00:43 -0000 --3TQuZyvpu40ebvIM Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable Hello Bruce, * Bruce Evans wrote: > On Sat, 15 Mar 2008, Ed Schouten wrote: > >> The last couple of days I'm seeing some strange things in my mpsafetty >> branch related to terminal revocation. >> >> In my current TTY design, I hold a count (t_ldisccnt) of the amount of >> threads that are sleeping in the line discipline. I need to store such a >> count, because it's not possible to change line disciplines while some >> threads are still blocked inside the discipline. This means that when >> d_close() is called on a TTY, t_ldisccnt should always be 0. There >> cannot be any threads stuck inside the line discipline when there aren't >> any descriptors referencing it. >> >> Unfortunately, this isn't entirely true with the current VFS/devfs >> design. When vgone() is called, a VOP_CLOSE() is performed , which means >> there could be a dozen threads still stuck inside a device driver, but >> the close routine is already called to clean up stuff. There are a >> *real* lot of drivers that blindly clean up their stuff in the d_close() >> routine, expecting that the device is completely unused. This can >> easily be demonstrated by revoking a bpf device, while running tcpdump. > > Yes, most drivers are broken here, but the problem is rarely noticed > because revoke() isn't normally applied to any devices except ttys. > Even ordinary close() can cause problems when a thread is sleeping > in device open, but this too is only common for ttys (for callin and > callout devices). I think I could in theory work around these crashes by using the si_threadcount, but I'd rather not. Even though revoke() is only used on TTY devices, I don't think that's a valid reason to allow FreeBSD to crash in such cases. > [...] > >> - Maybe vgonel() shouldn't call VOP_CLOSE(). It should probably move the >> vnode into deadfs, with the exception of the close() routine. Maybe >> it's better to add a new function to do this, vrevoke(). >> >> This means that when a revoke() call is performed, all blocked threads >> are woken up, will leave the driver, to find out their terminal has been >> revoked. Further system calls will fail, because the vnode is in deadfs, >> but when the processes close the descriptor, the device driver can still >> clean up everything. > > I think vfs already moves the vnode to deadfs. It doesn't do anything > to synchronize with threads running in device drivers. The forced > last-close() should complete synchronously as part of revoke(). Then > other threads leave the device driver asynchronously, hopefully not > much later. Then if the generation count stuff is working right, the > syscall is restarted, but now file descriptors point to deadfs so the > syscall normally fails. I think the async completion is OK provided > it is done right (don't delay it indefinitely, and don't do more > i/o on completion). It doesn't seem to be useful to make revoke() > wait for the completions. > > I don't think it would work well to move everything except d_close to > deadfs. It wasn't my idea to make revoke() wait for all threads to leave. It should just inform the device driver that a revoke() has been performed, to wake up sleeping threads, and change the vnode to prevent further access. The problem with the current implementation is that the device driver cannot sanely determine whether a revoke() or a real close() is called. Especially in my new TTY design, where a TTY could even be deallocated when a close() is performed - when the device driver has abandoned the TTY device - it would even destroy the TTY object that's being used by the sleeping threads. This is why I chose an approach that would allow threads to just leave the device driver as they normally would, which reduces complexity a lot. My question is: what approach would you take in such a situation? Thanks for your input so far. --=20 Ed Schouten WWW: http://g-rave.nl/ --3TQuZyvpu40ebvIM Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.8 (FreeBSD) iEUEARECAAYFAkfeeYcACgkQ52SDGA2eCwVhVgCfT5cXVGQT6eR4X0WM+37Sonwc 2A4AmLEMa5fRSC8ZxkEP4rf1zEB6gwc= =BhbG -----END PGP SIGNATURE----- --3TQuZyvpu40ebvIM-- From owner-freebsd-arch@FreeBSD.ORG Mon Mar 17 14:37:05 2008 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id E47251065672 for ; Mon, 17 Mar 2008 14:37:05 +0000 (UTC) (envelope-from phk@critter.freebsd.dk) Received: from phk.freebsd.dk (phk.freebsd.dk [130.225.244.222]) by mx1.freebsd.org (Postfix) with ESMTP id A82C88FC16 for ; Mon, 17 Mar 2008 14:37:05 +0000 (UTC) (envelope-from phk@critter.freebsd.dk) Received: from critter.freebsd.dk (unknown [192.168.64.3]) by phk.freebsd.dk (Postfix) with ESMTP id 1E6B017105 for ; Mon, 17 Mar 2008 14:37:03 +0000 (UTC) Received: from critter.freebsd.dk (localhost [127.0.0.1]) by critter.freebsd.dk (8.14.2/8.14.2) with ESMTP id m2HEb3wC003861 for ; Mon, 17 Mar 2008 14:37:03 GMT (envelope-from phk@critter.freebsd.dk) To: arch@freebsd.org From: "Poul-Henning Kamp" In-Reply-To: Your message of "Mon, 17 Mar 2008 14:18:47 GMT." <20080317141717.U3253@fledge.watson.org> Date: Mon, 17 Mar 2008 14:37:03 +0000 Message-ID: <3860.1205764623@critter.freebsd.dk> Sender: phk@critter.freebsd.dk Cc: Subject: Power-Mgt (Was: Re: cvs commit: src/sys/i386/cpufreq est.c ) X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 17 Mar 2008 14:37:06 -0000 In message <20080317141717.U3253@fledge.watson.org>, Robert Watson writes: >If cpufreq is going to be enabled by default, should we be enabling powerd by >default [...] [Moved to arch@] In general, I think we must make power-aware computing our "next SMPng project", not in the sense of delaying the next major release five years, but in the sense that power consumption should permerate our thinking about the operating system from now on. Overall, I think that means that we should: * Enable performance neutral power savings on servers - spin down unused disks. (geom/drivers) - use only as many CPU cores as necessary (scheduler) - light cpu-throttling. - downgrading 1GB to 100MB ether when idle. * Aim to meet or execeed energystar 4.0/5.0[1] on desktops and plugged laptops. - Pretty much as above, but with specific targets. - http://www.energystar.gov/index.cfm?c=revisions.computer_spec * Be as battery-frugal as possible on battery driven laptops. - Any trick in and off the book. -- Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 phk@FreeBSD.ORG | TCP/IP since RFC 956 FreeBSD committer | BSD since 4.3-tahoe Never attribute to malice what can adequately be explained by incompetence. From owner-freebsd-arch@FreeBSD.ORG Mon Mar 17 14:42:53 2008 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id CB2A71065670; Mon, 17 Mar 2008 14:42:53 +0000 (UTC) (envelope-from brooks@lor.one-eyed-alien.net) Received: from lor.one-eyed-alien.net (cl-162.ewr-01.us.sixxs.net [IPv6:2001:4830:1200:a1::2]) by mx1.freebsd.org (Postfix) with ESMTP id 4529F8FC31; Mon, 17 Mar 2008 14:42:53 +0000 (UTC) (envelope-from brooks@lor.one-eyed-alien.net) Received: from lor.one-eyed-alien.net (localhost [127.0.0.1]) by lor.one-eyed-alien.net (8.14.1/8.13.8) with ESMTP id m2HEgq03048108; Mon, 17 Mar 2008 09:42:52 -0500 (CDT) (envelope-from brooks@lor.one-eyed-alien.net) Received: (from brooks@localhost) by lor.one-eyed-alien.net (8.14.2/8.14.2/Submit) id m2HEgqh5048107; Mon, 17 Mar 2008 09:42:52 -0500 (CDT) (envelope-from brooks) Date: Mon, 17 Mar 2008 09:42:52 -0500 From: Brooks Davis To: John Baldwin Message-ID: <20080317144251.GA38485@lor.one-eyed-alien.net> References: <20080313180805.GA83406@dragon.NUXI.org> <200803141431.53846.jhb@freebsd.org> <84dead720803142243r6c8cc68dm325e7fb925189fd@mail.gmail.com> <200803170947.25205.jhb@freebsd.org> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="82I3+IH0IqGh5yIs" Content-Disposition: inline In-Reply-To: <200803170947.25205.jhb@freebsd.org> User-Agent: Mutt/1.5.17 (2007-11-01) X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-3.0 (lor.one-eyed-alien.net [127.0.0.1]); Mon, 17 Mar 2008 09:42:52 -0500 (CDT) Cc: freebsd-arch@freebsd.org Subject: Re: [PATCH] hwpmc(4) changes to use 'mp_maxid' instead of 'mp_ncpus'. X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 17 Mar 2008 14:42:53 -0000 --82I3+IH0IqGh5yIs Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Mon, Mar 17, 2008 at 09:47:25AM -0400, John Baldwin wrote: > On Saturday 15 March 2008 01:43:00 am Joseph Koshy wrote: > > > FreeBSD has been trying to not be quite as i386-centric as it used to > > > be. If you look at other code in the kernel that handles per-cpu data > > > such as UMA you will see that it uses mp_maxid and CPU_ABSENT(). The= re > > > are other places in the kernel that are broken though (such as ndis(4= )). > > > > HWPMC is very x86 centric, for obvious reasons. >=20 > Considering other CPU archictectures support various performance counters= it=20 > really shouldn't be designed to be x86-centric even if it is currently on= ly=20 > implemented for x86 CPUs. We should take some care to make sure we don't over generalize. From what I've heard the people who wrote the performance counter framework for x86 in linux where very, very unhappy when told to rework everything to support at framework that went with ia64's exponentionaly more complex instrumentation. If we can make small changes to support more conventional non-x86 platforms, that's probably a good idea. If nothing else, these counters could be even more useful on CPU-poor embedded devices. -- Brooks --82I3+IH0IqGh5yIs Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.8 (FreeBSD) iD8DBQFH3oNrXY6L6fI4GtQRAjbYAKC+QcDLzwdkxSwfiHwvcfSLL7iOOgCgidXe juBHKIAfKToVaSnCGmVsI4Y= =99LI -----END PGP SIGNATURE----- --82I3+IH0IqGh5yIs-- From owner-freebsd-arch@FreeBSD.ORG Mon Mar 17 18:19:23 2008 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 5B277106566B for ; Mon, 17 Mar 2008 18:19:23 +0000 (UTC) (envelope-from julian@elischer.org) Received: from outF.internet-mail-service.net (outF.internet-mail-service.net [216.240.47.229]) by mx1.freebsd.org (Postfix) with ESMTP id 46FE78FC1A for ; Mon, 17 Mar 2008 18:19:23 +0000 (UTC) (envelope-from julian@elischer.org) Received: from mx0.idiom.com (HELO idiom.com) (216.240.32.160) by out.internet-mail-service.net (qpsmtpd/0.40) with ESMTP; Mon, 17 Mar 2008 11:19:21 -0700 Received: from julian-mac.elischer.org (localhost [127.0.0.1]) by idiom.com (Postfix) with ESMTP id 45C582D6018; Mon, 17 Mar 2008 11:19:20 -0700 (PDT) Message-ID: <47DEB62A.4030301@elischer.org> Date: Mon, 17 Mar 2008 11:19:22 -0700 From: Julian Elischer User-Agent: Thunderbird 2.0.0.12 (Macintosh/20080213) MIME-Version: 1.0 To: Robert Watson References: <20080317133029.GA19369@sub.vaned.net> <20080317134335.A3253@fledge.watson.org> In-Reply-To: <20080317134335.A3253@fledge.watson.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: arch@freebsd.org, freebsd-current@freebsd.org, "Christian S.J. Peron" Subject: Re: HEADS UP: zerocopy bpf commits impending X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 17 Mar 2008 18:19:23 -0000 Robert Watson wrote: > On Mon, 17 Mar 2008, Christian S.J. Peron wrote: > >> Just wanted to give a heads up that I plan to start merging the work >> located in the zerocopy bpf perforce branch. We have been working on >> this project for about a year now and feel that it is ready to come >> into the tree. >> >> I will begin to merge hopefully today [assuming nobody has any >> concerns] or tommorow. Zerocopy bpf will be disabled by default, and >> can be enabled globally though the use of a sysctl variable. Once the >> kernel bits are in and we sort out a couple minor nits in >> libpcap+tcpdump, we will be be looking at getting our libpcap patches >> committed upstream. I will post a patch for people to experiment with >> in the meantime after the kernel commits are complete. >> >> We do not anticipate this will have any effect on existing bpf >> consumers like libpcap, tcpdump etc... so if something breaks, it >> shouldn't have and we need to know about :) We were pretty careful >> about preserving the ABI. The only exception to this is, netstat will >> need a recompile because the size of it's bpf stats structure changed. >> >> So if there are any objections or concerns, now is the time to raise >> them. > > Per previous posts, interested parties can find the slides on the design > from the BSDCan 2008 developer summit here: > > > http://www.watson.org/~robert/freebsd/2007bsdcan/20070517-devsummit-zerocopybpf.pdf with the video of the talk at: http://www.freebsd.org/~julian/BSDCan-2007/rwatson_bpf.mov > > > Robert N M Watson > Computer Laboratory > University of Cambridge > _______________________________________________ > freebsd-current@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-current > To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org" From owner-freebsd-arch@FreeBSD.ORG Mon Mar 17 18:45:53 2008 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id AFFA4106566B; Mon, 17 Mar 2008 18:45:53 +0000 (UTC) (envelope-from rwatson@FreeBSD.org) Received: from cyrus.watson.org (cyrus.watson.org [209.31.154.42]) by mx1.freebsd.org (Postfix) with ESMTP id 2E6C28FC13; Mon, 17 Mar 2008 18:45:52 +0000 (UTC) (envelope-from rwatson@FreeBSD.org) Received: from fledge.watson.org (fledge.watson.org [209.31.154.41]) by cyrus.watson.org (Postfix) with ESMTP id 1F98B46C79; Mon, 17 Mar 2008 14:45:52 -0400 (EDT) Date: Mon, 17 Mar 2008 18:45:52 +0000 (GMT) From: Robert Watson X-X-Sender: robert@fledge.watson.org To: Julian Elischer In-Reply-To: <47DEB62A.4030301@elischer.org> Message-ID: <20080317183024.I80049@fledge.watson.org> References: <20080317133029.GA19369@sub.vaned.net> <20080317134335.A3253@fledge.watson.org> <47DEB62A.4030301@elischer.org> MIME-Version: 1.0 Content-Type: MULTIPART/MIXED; BOUNDARY="621616949-2070634317-1205779552=:80049" Cc: arch@freebsd.org, freebsd-current@freebsd.org, "Christian S.J. Peron" Subject: Re: HEADS UP: zerocopy bpf commits impending X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 17 Mar 2008 18:45:53 -0000 This message is in MIME format. The first part should be readable text, while the remaining parts are likely unreadable without MIME-aware tools. --621616949-2070634317-1205779552=:80049 Content-Type: TEXT/PLAIN; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: QUOTED-PRINTABLE On Mon, 17 Mar 2008, Julian Elischer wrote: >> Per previous posts, interested parties can find the slides on the design= =20 >> from the BSDCan 2008 developer summit here: >> >>=20 >> http://www.watson.org/~robert/freebsd/2007bsdcan/20070517-devsummit-zero= copybpf.pdf > > with the video of the talk at: > > http://www.freebsd.org/~julian/BSDCan-2007/rwatson_bpf.mov The primary design change since that time is that we've eliminated the=20 ioctl-driven monitoring and ACKing of shared memory buffers from userspace.= =20 All shared memory consumers must use the shared memory ACK model, and our= =20 libpcap changes do that. This removes redundancy (and complexity) from the= =20 set of ioctls we've added. I've attached the (new) text from bpf.4 below,= =20 which I think captures the changes best. Robert N M Watson Computer Laboratory University of Cambridge BUFFER MODES bpf devices deliver packet data to the application via memory buffers provided by the application. The buffer mode is set using the BIOCSETBUFMODE ioctl, and read using the BIOCGETBUFMODE ioctl. Buffered read mode By default, bpf devices operate in the BPF_BUFMODE_BUFFER mode, in wh= ich packet data is copied explicitly from the kernel to user memory using= the read(2) system call. The user process will declare a fixed buffer si= ze that will be used both for sizing internal buffers and for all read(2= ) operations on the file. This size is queried using the BIOCGBLEN ioc= tl, and is set using the BIOCSBLEN ioctl. Note that an individual packet larger than the buffer size is necessarily truncated. Zero=E2=80=90copy buffer mode bpf devices may also operate in the BPF_BUFMODE_ZEROCOPY mode, in whi= ch packet data is written directly into user memory buffers by the kerne= l, avoiding both system call and copying overhead. Buffers are of fixed (and equal) size, page=E2=80=90aligned, and an even multiple of the p= age size. The maximum zero=E2=80=90copy buffer size is returned by the BIOCGETZ= MAX ioctl. Note that an individual packet larger than the buffer size is necessa= rily truncated. The user process registers two memory buffers using the BIOCSETZBUF ioctl, which accepts a struct bpf_zbuf pointer as an argument: struct bpf_zbuf { void *bz_bufa; void *bz_bufb; size_t bz_buflen; }; bz_bufa is a pointer to the userspace address of the first buffer tha= t will be filled, and bz_bufb is a pointer to the second buffer. bpf w= ill then cycle between the two buffers starting with bz_bufa. Each buffer begins with a fixed=E2=80=90length header to hold synchro= nization=20 and data length information for the buffer: struct bpf_zbuf_header { volatile u_int bzh_kernel_gen; /* Kernel generation number. = */ volatile u_int bzh_kernel_len; /* Length of data in the buff= er.=20 */ volatile u_int bzh_user_gen; /* User generation number. */ /* ...padding for future use... */ }; The header structure of each buffer, including all padding, should be zeroed before it is passed to the ioctl. Remaining space in the buff= er will be used by the kernel to store packet data, laid out in the same format as with buffered read mode. The kernel and the user process follow a simple acknowledgement proto= col via the buffer header to synchronize access to the buffer: when the header generation numbers, bzh_kernel_gen and bzh_user_gen, hold the = same value, the kernel owns the buffer, and when they differ, userspace ow= ns the buffer. While the kernel owns the buffer, the contents are unstable and may change asynchronously; while the user process owns the buffer, its co= n=E2=80=90 tents are stable and will not be changed until the buffer has been acknowledged. Initializing the buffer headers to all 0=E2=80=99s before registering= the=20 buffer has the effect of assigning initial ownership of both buffers to the= =20 ker=E2=80=90 nel. The kernel signals that a buffer has been assigned to userspace= by modifying bzh_kernel_gen, and userspace acknowledges the buffer and returns it to the kernel by setting the value of bzh_user_gen to the value of bzh_kernel_gen. In order to avoid caching and memory re=E2=80=90ordering effects, the= user process must use atomic operations and memory barriers when checking = for and acknowledging buffers: #include /* * Return ownership of a buffer to the kernel for reuse. */ static void buffer_acknowledge(struct bpf_zbuf_header *bzh) { atomic_store_rel_int(&bzh=E2=80=90>bzh_user_gen,=20 bzh=E2=80=90>bzh_kernel_gen); } /* * Check whether a buffer has been assigned to userspace by the kerne= l. * Return true if userspace owns the buffer, and false otherwise. */ static int buffer_check(struct bpf_zbuf_header *bzh) { return (bzh=E2=80=90>bzh_user_gen !=3D atomic_load_acq_int(&bzh=E2=80=90>bzh_kernel_gen)); } The user process may force the assignment of the next buffer, if any = data is pending, to userspace using the BIOCROTZBUF ioctl. This allows th= e user process to retrieve data in a partially filled buffer before the buffer is full, such as following a timeout; the process must check f= or buffer ownership using the header generation numbers, as the buffer w= ill not be assigned if no data was present. As in the buffered read mode, kqueue(2), poll(2), and select(2) may b= e used to sleep awaiting the availbility of a completed buffer. They w= ill return a readable file descriptor when ownership of the next buffer i= s assigned to user space. In the current implementation, the kernel will assign ownership of at most one buffer at a time to the user process. The user processes mu= st acknowledge the current buffer in order to be notified that the next buffer is ready for processing. Programs should not rely on this as = an invariant, as it may change in future versions. --621616949-2070634317-1205779552=:80049-- From owner-freebsd-arch@FreeBSD.ORG Mon Mar 17 20:18:35 2008 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 491EF1065686 for ; Mon, 17 Mar 2008 20:18:35 +0000 (UTC) (envelope-from kris@FreeBSD.org) Received: from weak.local (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id A003B8FC1C; Mon, 17 Mar 2008 20:18:34 +0000 (UTC) (envelope-from kris@FreeBSD.org) Message-ID: <47DED21C.4070108@FreeBSD.org> Date: Mon, 17 Mar 2008 21:18:36 +0100 From: Kris Kennaway User-Agent: Thunderbird 2.0.0.12 (Macintosh/20080213) MIME-Version: 1.0 To: Poul-Henning Kamp References: <3860.1205764623@critter.freebsd.dk> In-Reply-To: <3860.1205764623@critter.freebsd.dk> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: arch@freebsd.org Subject: Re: Power-Mgt (Was: Re: cvs commit: src/sys/i386/cpufreq est.c ) X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 17 Mar 2008 20:18:35 -0000 Poul-Henning Kamp wrote: > In message <20080317141717.U3253@fledge.watson.org>, Robert Watson writes: > >> If cpufreq is going to be enabled by default, should we be enabling powerd by >> default [...] > > [Moved to arch@] > > In general, I think we must make power-aware computing our "next > SMPng project", not in the sense of delaying the next major release > five years, but in the sense that power consumption should permerate > our thinking about the operating system from now on. > > Overall, I think that means that we should: > > * Enable performance neutral power savings on servers > - spin down unused disks. (geom/drivers) > - use only as many CPU cores as necessary (scheduler) > - light cpu-throttling. > - downgrading 1GB to 100MB ether when idle. > > * Aim to meet or execeed energystar 4.0/5.0[1] on desktops and > plugged laptops. > - Pretty much as above, but with specific targets. > - http://www.energystar.gov/index.cfm?c=revisions.computer_spec > > * Be as battery-frugal as possible on battery driven laptops. > - Any trick in and off the book. I think this is a great idea, but one of the big problems is probably going to be dealing with hardware quirks. e.g. we can't even enable powerd by default because e.g. acpi_throttle hangs on some systems. It might be tricky to get power management to the stage where it works for everyone and can be done automatically. Kris From owner-freebsd-arch@FreeBSD.ORG Mon Mar 17 20:21:26 2008 Return-Path: Delivered-To: arch@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 8DA2B1065677; Mon, 17 Mar 2008 20:21:26 +0000 (UTC) (envelope-from phk@critter.freebsd.dk) Received: from phk.freebsd.dk (phk.freebsd.dk [130.225.244.222]) by mx1.freebsd.org (Postfix) with ESMTP id CB3E18FC40; Mon, 17 Mar 2008 20:21:25 +0000 (UTC) (envelope-from phk@critter.freebsd.dk) Received: from critter.freebsd.dk (unknown [192.168.64.3]) by phk.freebsd.dk (Postfix) with ESMTP id 9D13F17105; Mon, 17 Mar 2008 20:21:23 +0000 (UTC) Received: from critter.freebsd.dk (localhost [127.0.0.1]) by critter.freebsd.dk (8.14.2/8.14.2) with ESMTP id m2HKLNJ8005754; Mon, 17 Mar 2008 20:21:23 GMT (envelope-from phk@critter.freebsd.dk) To: Kris Kennaway From: "Poul-Henning Kamp" In-Reply-To: Your message of "Mon, 17 Mar 2008 21:18:36 +0100." <47DED21C.4070108@FreeBSD.org> Date: Mon, 17 Mar 2008 20:21:22 +0000 Message-ID: <5753.1205785282@critter.freebsd.dk> Sender: phk@critter.freebsd.dk Cc: arch@FreeBSD.org Subject: Re: Power-Mgt (Was: Re: cvs commit: src/sys/i386/cpufreq est.c ) X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 17 Mar 2008 20:21:27 -0000 In message <47DED21C.4070108@FreeBSD.org>, Kris Kennaway writes: >I think this is a great idea, but one of the big problems is probably >going to be dealing with hardware quirks. e.g. we can't even enable >powerd by default because e.g. acpi_throttle hangs on some systems. It >might be tricky to get power management to the stage where it works for >everyone and can be done automatically. I'd expect that this will improve over time, just like all other technologies from ISA to PCI bus implementations did. But yes, it will take time & effort, but given the current cleantech/greentech buzz, I think we'd better get moving. -- Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 phk@FreeBSD.ORG | TCP/IP since RFC 956 FreeBSD committer | BSD since 4.3-tahoe Never attribute to malice what can adequately be explained by incompetence. From owner-freebsd-arch@FreeBSD.ORG Tue Mar 18 00:12:06 2008 Return-Path: Delivered-To: arch@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 7CC241065671 for ; Tue, 18 Mar 2008 00:12:06 +0000 (UTC) (envelope-from kris@FreeBSD.org) Received: from weak.local (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id C75D38FC1E; Tue, 18 Mar 2008 00:12:05 +0000 (UTC) (envelope-from kris@FreeBSD.org) Message-ID: <47DF08D8.2070608@FreeBSD.org> Date: Tue, 18 Mar 2008 01:12:08 +0100 From: Kris Kennaway User-Agent: Thunderbird 2.0.0.12 (Macintosh/20080213) MIME-Version: 1.0 To: Poul-Henning Kamp References: <5753.1205785282@critter.freebsd.dk> In-Reply-To: <5753.1205785282@critter.freebsd.dk> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: arch@FreeBSD.org Subject: Re: Power-Mgt (Was: Re: cvs commit: src/sys/i386/cpufreq est.c ) X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 18 Mar 2008 00:12:06 -0000 Poul-Henning Kamp wrote: > In message <47DED21C.4070108@FreeBSD.org>, Kris Kennaway writes: > >> I think this is a great idea, but one of the big problems is probably >> going to be dealing with hardware quirks. e.g. we can't even enable >> powerd by default because e.g. acpi_throttle hangs on some systems. It >> might be tricky to get power management to the stage where it works for >> everyone and can be done automatically. > > I'd expect that this will improve over time, just like all other > technologies from ISA to PCI bus implementations did. > > But yes, it will take time & effort, but given the current > cleantech/greentech buzz, I think we'd better get moving. Yeah, absolute worst case is we can make progress in automatic deployment by whitelisting. What SoC projects can you think of in this arena? Kris From owner-freebsd-arch@FreeBSD.ORG Tue Mar 18 01:24:50 2008 Return-Path: Delivered-To: FreeBSD-arch@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id BEFA3106566B for ; Tue, 18 Mar 2008 01:24:50 +0000 (UTC) (envelope-from chuckr@chuckr.org) Received: from mail7.sea5.speakeasy.net (mail7.sea5.speakeasy.net [69.17.117.9]) by mx1.freebsd.org (Postfix) with ESMTP id 8BB688FC12 for ; Tue, 18 Mar 2008 01:24:50 +0000 (UTC) (envelope-from chuckr@chuckr.org) Received: (qmail 1592 invoked from network); 18 Mar 2008 00:58:09 -0000 Received: from april.chuckr.org (chuckr@[66.92.151.30]) (envelope-sender ) by mail7.sea5.speakeasy.net (qmail-ldap-1.03) with AES256-SHA encrypted SMTP for ; 18 Mar 2008 00:58:09 -0000 Message-ID: <47DF1257.9080807@chuckr.org> Date: Mon, 17 Mar 2008 20:52:39 -0400 From: Chuck Robey User-Agent: Thunderbird 2.0.0.6 (X11/20071107) MIME-Version: 1.0 To: FreeBSD-arch@FreeBSD.org X-Enigmail-Version: 0.95.5 OpenPGP: id=F3DCA0E9; url=http://pgp.mit.edu Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: Subject: difference between this and that X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 18 Mar 2008 01:24:50 -0000 -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 I was giving some thought recently, to the trend towards adding more and more cores to a single chip, and wondering if maybe, in the next years ahead, if we wouldn't be seeing things that sound loony today, like a 4096 core motherboard. With this in mind, could I ask for a little bit of discussion on the differences between the SMP management that FreeBAD, and several other OSes perform, and the things that stuff like Ganglia (see http://ganglia.sourceforge.net/ ) which manage multi-node networks as if they are 1 computer (or so I understand it). I mean, they both try to spread out job, they both do management, I'm curious, and most especially on the ways that the differences will cause changes in future computing to go. I dunno, I just got curious about this. -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.4 (FreeBSD) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFH3xJXz62J6PPcoOkRApdWAJ4tMJnMM7+pF2nl+qOYX2VJ7Pw7HwCfeEpz 7udOLtujRpFmLPomXDNRgKo= =c99o -----END PGP SIGNATURE----- From owner-freebsd-arch@FreeBSD.ORG Tue Mar 18 02:23:08 2008 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id B0F551065745 for ; Tue, 18 Mar 2008 02:23:08 +0000 (UTC) (envelope-from jroberson@chesapeake.net) Received: from webaccess-cl.virtdom.com (webaccess-cl.virtdom.com [216.240.101.25]) by mx1.freebsd.org (Postfix) with ESMTP id 5E5A58FC1A for ; Tue, 18 Mar 2008 02:23:08 +0000 (UTC) (envelope-from jroberson@chesapeake.net) Received: from [192.168.1.107] (cpe-24-94-75-93.hawaii.res.rr.com [24.94.75.93]) (authenticated bits=0) by webaccess-cl.virtdom.com (8.13.6/8.13.6) with ESMTP id m2I2N3gV054126; Mon, 17 Mar 2008 22:23:04 -0400 (EDT) (envelope-from jroberson@chesapeake.net) Date: Mon, 17 Mar 2008 16:23:48 -1000 (HST) From: Jeff Roberson X-X-Sender: jroberson@desktop To: Poul-Henning Kamp In-Reply-To: <3860.1205764623@critter.freebsd.dk> Message-ID: <20080317161448.Q910@desktop> References: <3860.1205764623@critter.freebsd.dk> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: arch@freebsd.org Subject: Re: Power-Mgt (Was: Re: cvs commit: src/sys/i386/cpufreq est.c ) X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 18 Mar 2008 02:23:08 -0000 On Mon, 17 Mar 2008, Poul-Henning Kamp wrote: > In message <20080317141717.U3253@fledge.watson.org>, Robert Watson writes: > >> If cpufreq is going to be enabled by default, should we be enabling powerd by >> default [...] > > [Moved to arch@] > > In general, I think we must make power-aware computing our "next > SMPng project", not in the sense of delaying the next major release > five years, but in the sense that power consumption should permerate > our thinking about the operating system from now on. > > Overall, I think that means that we should: > > * Enable performance neutral power savings on servers > - spin down unused disks. (geom/drivers) > - use only as many CPU cores as necessary (scheduler) This is an interesting notion which I have tried to leave room for in the current scheduler design. One thing which I have considered in the past is a policy of best power vs best performance. For example, consider a multi-socket system with multi-core parts. With two, unrelated, runnable threads, you'll get the best perf by putting them on different sockets. Then they'll have the most cache and memory bandwidth available to them. You'd be able to spin down a socket if you put them on adjacent cores on the same socket. It's not clear that this would be a power savings however, what if each thread now runs at half the speed? Is that more power efficient than running two cores half the time? And what about the barcelona, which can power down individual cores and even individual parts of cores? And in this point to point bus topology you always need to have the dram controler and HT link on anyway. One further complication is of course that cpus can idle in different states. So someone really is going to have to explore the tradeoff between core speed, number of cores, power and performance. I think the answer to which scheduling algorithm is most power efficient is really going to come down to the cpu architecture and type of workload. This is why I have been reluctant to implement anything yet. I suspect that getting things done the fastest is going to be a good first approximation of using the least power in this regard. > - light cpu-throttling. > - downgrading 1GB to 100MB ether when idle. > > * Aim to meet or execeed energystar 4.0/5.0[1] on desktops and > plugged laptops. > - Pretty much as above, but with specific targets. > - http://www.energystar.gov/index.cfm?c=revisions.computer_spec > > * Be as battery-frugal as possible on battery driven laptops. > - Any trick in and off the book. I think these are all good goals. You should also throw in there the tickless time keeping that linux has done. We've talked about it for ages too but never gotten there. It's a shame that we keep playing catchup in these areas. Thanks, Jeff > > -- > Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 > phk@FreeBSD.ORG | TCP/IP since RFC 956 > FreeBSD committer | BSD since 4.3-tahoe > Never attribute to malice what can adequately be explained by incompetence. > _______________________________________________ > freebsd-arch@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-arch > To unsubscribe, send any mail to "freebsd-arch-unsubscribe@freebsd.org" > From owner-freebsd-arch@FreeBSD.ORG Tue Mar 18 07:12:58 2008 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 5C301106566B for ; Tue, 18 Mar 2008 07:12:58 +0000 (UTC) (envelope-from phk@critter.freebsd.dk) Received: from phk.freebsd.dk (phk.freebsd.dk [130.225.244.222]) by mx1.freebsd.org (Postfix) with ESMTP id 10DB98FC27 for ; Tue, 18 Mar 2008 07:12:57 +0000 (UTC) (envelope-from phk@critter.freebsd.dk) Received: from critter.freebsd.dk (unknown [192.168.64.3]) by phk.freebsd.dk (Postfix) with ESMTP id 1506817105; Tue, 18 Mar 2008 07:12:55 +0000 (UTC) Received: from critter.freebsd.dk (localhost [127.0.0.1]) by critter.freebsd.dk (8.14.2/8.14.2) with ESMTP id m2I7CslZ008484; Tue, 18 Mar 2008 07:12:55 GMT (envelope-from phk@critter.freebsd.dk) To: Jeff Roberson From: "Poul-Henning Kamp" In-Reply-To: Your message of "Mon, 17 Mar 2008 16:23:48 -1000." <20080317161448.Q910@desktop> Date: Tue, 18 Mar 2008 07:12:54 +0000 Message-ID: <8483.1205824374@critter.freebsd.dk> Sender: phk@critter.freebsd.dk Cc: arch@freebsd.org Subject: Re: Power-Mgt (Was: Re: cvs commit: src/sys/i386/cpufreq est.c ) X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 18 Mar 2008 07:12:58 -0000 In message <20080317161448.Q910@desktop>, Jeff Roberson writes: >You should also throw in there the >tickless time keeping that linux has done. I'm actively working on that one. -- Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 phk@FreeBSD.ORG | TCP/IP since RFC 956 FreeBSD committer | BSD since 4.3-tahoe Never attribute to malice what can adequately be explained by incompetence. From owner-freebsd-arch@FreeBSD.ORG Tue Mar 18 08:46:02 2008 Return-Path: Delivered-To: FreeBSD-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 0411F1065677 for ; Tue, 18 Mar 2008 08:46:02 +0000 (UTC) (envelope-from peterjeremy@optushome.com.au) Received: from mail07.syd.optusnet.com.au (mail07.syd.optusnet.com.au [211.29.132.188]) by mx1.freebsd.org (Postfix) with ESMTP id 7F0D68FC12 for ; Tue, 18 Mar 2008 08:46:01 +0000 (UTC) (envelope-from peterjeremy@optushome.com.au) Received: from server.vk2pj.dyndns.org (c220-239-20-82.belrs4.nsw.optusnet.com.au [220.239.20.82]) by mail07.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id m2I8jr64016710 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Tue, 18 Mar 2008 19:45:54 +1100 Received: from server.vk2pj.dyndns.org (localhost.vk2pj.dyndns.org [127.0.0.1]) by server.vk2pj.dyndns.org (8.14.2/8.14.1) with ESMTP id m2I8jrAY079225; Tue, 18 Mar 2008 19:45:53 +1100 (EST) (envelope-from peter@server.vk2pj.dyndns.org) Received: (from peter@localhost) by server.vk2pj.dyndns.org (8.14.2/8.14.2/Submit) id m2I8jrRu079224; Tue, 18 Mar 2008 19:45:53 +1100 (EST) (envelope-from peter) Date: Tue, 18 Mar 2008 19:45:53 +1100 From: Peter Jeremy To: Chuck Robey Message-ID: <20080318084553.GD44676@server.vk2pj.dyndns.org> References: <47DF1257.9080807@chuckr.org> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="xs+9IvWevLaxKUtW" Content-Disposition: inline In-Reply-To: <47DF1257.9080807@chuckr.org> X-PGP-Key: http://members.optusnet.com.au/peterjeremy/pubkey.asc User-Agent: Mutt/1.5.17 (2007-11-01) Cc: FreeBSD-arch@freebsd.org Subject: Re: difference between this and that X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 18 Mar 2008 08:46:02 -0000 --xs+9IvWevLaxKUtW Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Mon, Mar 17, 2008 at 08:52:39PM -0400, Chuck Robey wrote: >I was giving some thought recently, to the trend towards adding more and >more cores to a single chip, and wondering if maybe, in the next years >ahead, if we wouldn't be seeing things that sound loony today, like a 4096 >core motherboard. Actually, there was a bit of a side-thread about this sort of thing on the CVS mailing lists a couple of weeks ago. Definitely lots of cores per system are on the way. You can buy a system with 64 hardware threads from Sun today. A 128-thread system (dual T-2) will arrive RSN and Sun have said they intend to double the threads-per-chip every year. 4096 cores is still a way off and it's not currently clear what will drive these sort of systems into the mainstream. One problem is that FreeBSD currently assumes that a CPU mask will fit into a long - this limits FreeBSD to 32 cores on arm/i386/ppc and 64 cores elsewhere. Getting rid of this limit is going to take some work. >With this in mind, could I ask for a little bit of discussion on the >differences between the SMP management that FreeBAD, and several other OSes >perform, and the things that stuff like Ganglia As a simplification and if you consider that SMP systems are moving to NUMA, the difference is mainly a matter of scale: There is an additional knee in the cost of process migration and RAM access between different CPU cores depending on whether they are on the same chip, different chips within the same host or different hosts. --=20 Peter Jeremy Please excuse any delays as the result of my ISP's inability to implement an MTA that is either RFC2821-compliant or matches their claimed behaviour. --xs+9IvWevLaxKUtW Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.8 (FreeBSD) iEYEARECAAYFAkffgUEACgkQ/opHv/APuIfUeACguWAepUTlcKJNT/ZN6YxUzHad qA8An35Yl5u6XMzMlTWpNKp1/BX2axIM =fxLw -----END PGP SIGNATURE----- --xs+9IvWevLaxKUtW-- From owner-freebsd-arch@FreeBSD.ORG Tue Mar 18 09:00:08 2008 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 022B81065678 for ; Tue, 18 Mar 2008 09:00:08 +0000 (UTC) (envelope-from bzeeb-lists@lists.zabbadoz.net) Received: from mail.cksoft.de (mail.cksoft.de [62.111.66.27]) by mx1.freebsd.org (Postfix) with ESMTP id AFA9A8FC1C for ; Tue, 18 Mar 2008 09:00:07 +0000 (UTC) (envelope-from bzeeb-lists@lists.zabbadoz.net) Received: from localhost (amavis.str.cksoft.de [192.168.74.71]) by mail.cksoft.de (Postfix) with ESMTP id 18AC441C799; Tue, 18 Mar 2008 10:00:06 +0100 (CET) X-Virus-Scanned: amavisd-new at cksoft.de Received: from mail.cksoft.de ([62.111.66.27]) by localhost (amavis.str.cksoft.de [192.168.74.71]) (amavisd-new, port 10024) with ESMTP id 5m2ofGw5OFIx; Tue, 18 Mar 2008 10:00:05 +0100 (CET) Received: by mail.cksoft.de (Postfix, from userid 66) id 9A90241C798; Tue, 18 Mar 2008 10:00:05 +0100 (CET) Received: from maildrop.int.zabbadoz.net (maildrop.int.zabbadoz.net [10.111.66.10]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mail.int.zabbadoz.net (Postfix) with ESMTP id CB45044487F; Tue, 18 Mar 2008 08:59:42 +0000 (UTC) Date: Tue, 18 Mar 2008 08:59:42 +0000 (UTC) From: "Bjoern A. Zeeb" X-X-Sender: bz@maildrop.int.zabbadoz.net To: Poul-Henning Kamp In-Reply-To: <3860.1205764623@critter.freebsd.dk> Message-ID: <20080318085804.I50685@maildrop.int.zabbadoz.net> References: <3860.1205764623@critter.freebsd.dk> X-OpenPGP-Key: 0x14003F198FEFA3E77207EE8D2B58B8F83CCF1842 MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: arch@freebsd.org Subject: Re: Power-Mgt (Was: Re: cvs commit: src/sys/i386/cpufreq est.c ) X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 18 Mar 2008 09:00:08 -0000 On Mon, 17 Mar 2008, Poul-Henning Kamp wrote: Hi, > [Moved to arch@] > > In general, I think we must make power-aware computing our "next > SMPng project", not in the sense of delaying the next major release > five years, but in the sense that power consumption should permerate > our thinking about the operating system from now on. > > Overall, I think that means that we should: > > * Enable performance neutral power savings on servers > - spin down unused disks. (geom/drivers) > - use only as many CPU cores as necessary (scheduler) > - light cpu-throttling. > - downgrading 1GB to 100MB ether when idle. > > * Aim to meet or execeed energystar 4.0/5.0[1] on desktops and > plugged laptops. > - Pretty much as above, but with specific targets. > - http://www.energystar.gov/index.cfm?c=revisions.computer_spec > > * Be as battery-frugal as possible on battery driven laptops. > - Any trick in and off the book. so while this topic is one, what actually happens to an unrecognized card or a card with no driver loaded currently? How much power does an unsued card use and can we do anything about that? Are we perhaps already doing something about that? /bz -- Bjoern A. Zeeb bzeeb at Zabbadoz dot NeT Software is harder than hardware so better get it right the first time. From owner-freebsd-arch@FreeBSD.ORG Tue Mar 18 12:49:24 2008 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 7F817106566B for ; Tue, 18 Mar 2008 12:49:24 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from elvis.mu.org (elvis.mu.org [192.203.228.196]) by mx1.freebsd.org (Postfix) with ESMTP id 795588FC13 for ; Tue, 18 Mar 2008 12:49:24 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from zion.baldwin.cx (66-23-211-162.clients.speedfactory.net [66.23.211.162]) by elvis.mu.org (Postfix) with ESMTP id B07AA1A4D7C; Tue, 18 Mar 2008 05:48:05 -0700 (PDT) From: John Baldwin To: freebsd-arch@freebsd.org Date: Tue, 18 Mar 2008 08:40:18 -0400 User-Agent: KMail/1.9.7 References: <3860.1205764623@critter.freebsd.dk> <20080318085804.I50685@maildrop.int.zabbadoz.net> In-Reply-To: <20080318085804.I50685@maildrop.int.zabbadoz.net> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Content-Disposition: inline Message-Id: <200803180840.18275.jhb@freebsd.org> Cc: "Bjoern A. Zeeb" , Poul-Henning Kamp Subject: Re: Power-Mgt (Was: Re: cvs commit: src/sys/i386/cpufreq est.c ) X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 18 Mar 2008 12:49:24 -0000 On Tuesday 18 March 2008 04:59:42 am Bjoern A. Zeeb wrote: > On Mon, 17 Mar 2008, Poul-Henning Kamp wrote: > > Hi, > > > [Moved to arch@] > > > > In general, I think we must make power-aware computing our "next > > SMPng project", not in the sense of delaying the next major release > > five years, but in the sense that power consumption should permerate > > our thinking about the operating system from now on. > > > > Overall, I think that means that we should: > > > > * Enable performance neutral power savings on servers > > - spin down unused disks. (geom/drivers) > > - use only as many CPU cores as necessary (scheduler) > > - light cpu-throttling. > > - downgrading 1GB to 100MB ether when idle. > > > > * Aim to meet or execeed energystar 4.0/5.0[1] on desktops and > > plugged laptops. > > - Pretty much as above, but with specific targets. > > - http://www.energystar.gov/index.cfm?c=revisions.computer_spec > > > > * Be as battery-frugal as possible on battery driven laptops. > > - Any trick in and off the book. > > so while this topic is one, > > what actually happens to an unrecognized card or a card with no driver > loaded currently? How much power does an unsued card use and can we do > anything about that? Are we perhaps already doing something about > that? We power off PCI cards (to D3) that aren't recognized by a driver already. However, what would be more useful would be to power down cards that have a driver but aren't in use. Ethernet NICs are one example. I see a couple of possibilities: 1) Shut off "down" interfaces (ifconfig foo down) and only turn them on when the user puts them "up". 2) If the device supports D1/D2 then put it into one of those when it has no link (no NICs that we support do D1/D2 currently though). Otherwise, if the device has no carrier, power it down to D3 but periodically (say, every 5 seconds, maybe configurable) power it back up to D0 to check for link. 3) Shut off "down" interfaces but use 2) for "up" interfaces. I think 3) is what I'd prefer. Esp. if we make the timer configurable. It would also be nice to power down sound cards when no userland app has them open, and to power down USB controllers if no USB devices are connected (ideally to a D1/D2 state where they still get an interrupt on device insertion). To avoid lots of code duplication I think we would need to provide some sort of "idle" device support in new-bus. Possibly something like this: /* * Routines to manage putting the device into an "idle" power state when it */ /* * How long we have to be idle before we are turned off. This might should * default to some sort of value (say 5 seconds). It should be exposed via * sysctl by new-bus itself (e.g. dev.foo.0.idle_timer). We may want to have * different defaults for different classes of devices (e.g. maybe there is * a NIC_IDLE_TIMER constant that NIC drivers use to set this in their attach * routines). */ int device_set_idle_timer(device_t dev, int ticks); /* * Note that a device is idle. If the device was previously active, * this starts the idle timer. If the timer completes w/o being cancelled * it invokes device_idle(dev); */ int device_is_idle(device_t dev); /* * Note that a device is now active. If the device was previously idle * then the idle timer is stopped. If the timer wasn't running then it * invokes device_active(dev); */ int device_is_active(device_t dev); /* * device_if.h method, so becomes DEVMETHOD(device_idle, foo_idle); * This is invoked when the idle timer expires (i.e. device has been * idle for a complete idle timer duration). This method should power * down the device in some way. */ int DEVICE_IDLE(device_t parent, device_t child); /* * device_if.h method invoked when a powered down device becomes active * again. This should power the device back up. */ int DEVICE_ACTIVE(device_t parent, device_t child); So one possible impl of 3) for a NIC might be: foo_ioctl(struct ifnet *ifp) { struct foo_softc *sc; sc = ifp->if_softc; ... case SIOCSIFFLAGS: FOO_LOCK(sc); if (ifp->if_flags & IFF_UP) device_is_active(sc->foo_dev); else device_is_idle(sc->foo_dev); foo_init(sc); FOO_UNLOCK(sc); break; ... } /* Routine that gets called on link status change interrupt. */ foo_handle_link(struct foo_softc *sc); { ... if (sc->sc_ifp->if_flags & IFF_UP) { if (link_active) device_is_active(sc->foo_dev); else device_is_idle(sc->foo_dev); } } /* If the device supports D1/D2 which interrupts on link status change: */ foo_intr(void *) { struct foo_softc *sc; /* Invoked first so it can power on the device before we access it. */ device_is_active(sc->foo_dev); ... } int foo_idle(device_t dev) { struct foo_softc *sc; sc = device_get_softc(dev); if (sc->foo_ifp->if_flags & IFF_UP) device_set_powerstate(dev, D2); else device_set_powerstate(dev, D3); } int foo_active(device_t dev) { device_set_powerstate(dev, D0); } For the case where D1/D2 isn't supported foo_intr() would remain unchanged and foo_active() would be as above. foo_idle() would be responsible for starting its own internal timer that would power the device UP, check for link, then power the device down in the IFF_UP case. Behind the scenes the new-bus code would have a task and callout backing the idle timer (callout enqueues the task and the task invokes DEVICE_IDLE()). It would have to use its own internal locking I think to handle the various edge cases of cancelling the timer, etc. This is also the first time I've written this down and I'm still thinking about how we can provide some infrastructure in new-bus to avoid having to duplicate a lot of work in device drivers themselves. -- John Baldwin From owner-freebsd-arch@FreeBSD.ORG Tue Mar 18 13:54:58 2008 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id E44511065671; Tue, 18 Mar 2008 13:54:58 +0000 (UTC) (envelope-from phk@critter.freebsd.dk) Received: from phk.freebsd.dk (phk.freebsd.dk [130.225.244.222]) by mx1.freebsd.org (Postfix) with ESMTP id B8DF98FC15; Tue, 18 Mar 2008 13:54:58 +0000 (UTC) (envelope-from phk@critter.freebsd.dk) Received: from critter.freebsd.dk (unknown [192.168.64.3]) by phk.freebsd.dk (Postfix) with ESMTP id 990C317105; Tue, 18 Mar 2008 13:54:56 +0000 (UTC) Received: from critter.freebsd.dk (localhost [127.0.0.1]) by critter.freebsd.dk (8.14.2/8.14.2) with ESMTP id m2IDstHJ001032; Tue, 18 Mar 2008 13:54:55 GMT (envelope-from phk@critter.freebsd.dk) To: John Baldwin From: "Poul-Henning Kamp" In-Reply-To: Your message of "Tue, 18 Mar 2008 08:40:18 -0400." <200803180840.18275.jhb@freebsd.org> Date: Tue, 18 Mar 2008 13:54:55 +0000 Message-ID: <1031.1205848495@critter.freebsd.dk> Sender: phk@critter.freebsd.dk Cc: "Bjoern A. Zeeb" , freebsd-arch@freebsd.org Subject: Re: Power-Mgt (Was: Re: cvs commit: src/sys/i386/cpufreq est.c ) X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 18 Mar 2008 13:54:59 -0000 >To avoid lots of code duplication I think we would need to provide some sort >of "idle" device support in new-bus. Possibly something like this: We need different levels of "active" also, EnergyStar requires machines to reduce ethernet speed from GE to 100M when idle, (this saves approx 2W, 1W in either end of the cable). -- Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 phk@FreeBSD.ORG | TCP/IP since RFC 956 FreeBSD committer | BSD since 4.3-tahoe Never attribute to malice what can adequately be explained by incompetence. From owner-freebsd-arch@FreeBSD.ORG Tue Mar 18 15:33:01 2008 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 0D48A1065670 for ; Tue, 18 Mar 2008 15:33:01 +0000 (UTC) (envelope-from joao.barros@gmail.com) Received: from gv-out-0910.google.com (gv-out-0910.google.com [216.239.58.191]) by mx1.freebsd.org (Postfix) with ESMTP id A32078FC1B for ; Tue, 18 Mar 2008 15:33:00 +0000 (UTC) (envelope-from joao.barros@gmail.com) Received: by gv-out-0910.google.com with SMTP id n40so1250778gve.39 for ; Tue, 18 Mar 2008 08:32:59 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:message-id:date:from:to:subject:cc:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references; bh=qhFIyvavbSLPsn+K5gkgd2m5SDe4747TSQaPIcR22q4=; b=iSDT6qNHC19BXfsqRWG5MvBFFDsDbA2LsiBx8WoAqfQ6I6OZ29HVXY5z3PDwf5BWI/1NacJQeQBEDvrpZiG79OYM/bxLcPPyYNZAi4M6Q9m20vPhCfUsAkgJCjq/rLKThQOZBPyrwz/qqOZS2OrBi0R4qhJVwUXCd1dAdC5eC+0= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=message-id:date:from:to:subject:cc:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references; b=t9V6f3IGJZwEuumHzjDwsxm4RvNfxYnsc7LafDpSqlFQ9FjedPRuBSAEQw81LTj05bdKc7SPe1GmKSX5aLN8DkI4J28xJkzl+1Xfcv+yEPhMNfox3PPNV9XqtQ8MHHEnPViVOlY/U369mgJpEJnO4xa14uMPE2O4fF6D33gCl2U= Received: by 10.142.191.2 with SMTP id o2mr1039607wff.209.1205852676571; Tue, 18 Mar 2008 08:04:36 -0700 (PDT) Received: by 10.143.160.17 with HTTP; Tue, 18 Mar 2008 08:04:36 -0700 (PDT) Message-ID: <70e8236f0803180804v692c2abs7eb296317cb84ed1@mail.gmail.com> Date: Tue, 18 Mar 2008 15:04:36 +0000 From: "Joao Barros" To: "Poul-Henning Kamp" In-Reply-To: <1031.1205848495@critter.freebsd.dk> MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Content-Disposition: inline References: <200803180840.18275.jhb@freebsd.org> <1031.1205848495@critter.freebsd.dk> Cc: freebsd-arch@freebsd.org Subject: Re: Power-Mgt (Was: Re: cvs commit: src/sys/i386/cpufreq est.c ) X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 18 Mar 2008 15:33:01 -0000 On Tue, Mar 18, 2008 at 1:54 PM, Poul-Henning Kamp wrote: > > >To avoid lots of code duplication I think we would need to provide some sort > >of "idle" device support in new-bus. Possibly something like this: > > We need different levels of "active" also, EnergyStar requires machines > to reduce ethernet speed from GE to 100M when idle, (this saves approx > 2W, 1W in either end of the cable). It's common practice to force connection speeds on both switches and NICs in datacenter environments. When I find a machine with a nic connected at 10mbit Half Duplex I already know it's forced on the switch and someone forgot to setup the server correctly. What I'm saying is, if something is about to automagically happen, people should be aware of it. *Big neon lights* :-) -- Joao Barros From owner-freebsd-arch@FreeBSD.ORG Tue Mar 18 16:04:06 2008 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id BC29E1065674 for ; Tue, 18 Mar 2008 16:04:06 +0000 (UTC) (envelope-from brooks@lor.one-eyed-alien.net) Received: from lor.one-eyed-alien.net (cl-162.ewr-01.us.sixxs.net [IPv6:2001:4830:1200:a1::2]) by mx1.freebsd.org (Postfix) with ESMTP id 3AA2C8FC17 for ; Tue, 18 Mar 2008 16:04:06 +0000 (UTC) (envelope-from brooks@lor.one-eyed-alien.net) Received: from lor.one-eyed-alien.net (localhost [127.0.0.1]) by lor.one-eyed-alien.net (8.14.1/8.13.8) with ESMTP id m2IG4454066617; Tue, 18 Mar 2008 11:04:04 -0500 (CDT) (envelope-from brooks@lor.one-eyed-alien.net) Received: (from brooks@localhost) by lor.one-eyed-alien.net (8.14.2/8.14.2/Submit) id m2IG44ts066616; Tue, 18 Mar 2008 11:04:04 -0500 (CDT) (envelope-from brooks) Date: Tue, 18 Mar 2008 11:04:04 -0500 From: Brooks Davis To: Joao Barros Message-ID: <20080318160404.GD57911@lor.one-eyed-alien.net> References: <200803180840.18275.jhb@freebsd.org> <1031.1205848495@critter.freebsd.dk> <70e8236f0803180804v692c2abs7eb296317cb84ed1@mail.gmail.com> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="AbQceqfdZEv+FvjW" Content-Disposition: inline In-Reply-To: <70e8236f0803180804v692c2abs7eb296317cb84ed1@mail.gmail.com> User-Agent: Mutt/1.5.17 (2007-11-01) X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-3.0 (lor.one-eyed-alien.net [127.0.0.1]); Tue, 18 Mar 2008 11:04:05 -0500 (CDT) Cc: Poul-Henning Kamp , freebsd-arch@freebsd.org Subject: Re: Power-Mgt (Was: Re: cvs commit: src/sys/i386/cpufreq est.c ) X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 18 Mar 2008 16:04:06 -0000 --AbQceqfdZEv+FvjW Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Tue, Mar 18, 2008 at 03:04:36PM +0000, Joao Barros wrote: > On Tue, Mar 18, 2008 at 1:54 PM, Poul-Henning Kamp w= rote: > > > > >To avoid lots of code duplication I think we would need to provide som= e sort > > >of "idle" device support in new-bus. Possibly something like this: > > > > We need different levels of "active" also, EnergyStar requires machines > > to reduce ethernet speed from GE to 100M when idle, (this saves approx > > 2W, 1W in either end of the cable). >=20 > It's common practice to force connection speeds on both switches and > NICs in datacenter environments. > When I find a machine with a nic connected at 10mbit Half Duplex I > already know it's forced on the switch and someone forgot to setup the > server correctly. > What I'm saying is, if something is about to automagically happen, > people should be aware of it. *Big neon lights* :-) I'd think any implementation would only be on if autoneg is on... -- Brooks --AbQceqfdZEv+FvjW Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.8 (FreeBSD) iD8DBQFH3+fzXY6L6fI4GtQRAldEAKC1VuqdN7h1XwVxgPpZvWDKBrlBjgCeIXE/ Up6vQ95vyjBMS/l/WdTpFYQ= =Efzk -----END PGP SIGNATURE----- --AbQceqfdZEv+FvjW-- From owner-freebsd-arch@FreeBSD.ORG Tue Mar 18 18:26:12 2008 Return-Path: Delivered-To: arch@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 6CBA31065670; Tue, 18 Mar 2008 18:26:12 +0000 (UTC) (envelope-from rwatson@FreeBSD.org) Received: from cyrus.watson.org (cyrus.watson.org [209.31.154.42]) by mx1.freebsd.org (Postfix) with ESMTP id 308518FC25; Tue, 18 Mar 2008 18:26:12 +0000 (UTC) (envelope-from rwatson@FreeBSD.org) Received: from fledge.watson.org (fledge.watson.org [209.31.154.41]) by cyrus.watson.org (Postfix) with ESMTP id 79D4546C72; Tue, 18 Mar 2008 14:26:11 -0400 (EDT) Date: Tue, 18 Mar 2008 18:26:11 +0000 (GMT) From: Robert Watson X-X-Sender: robert@fledge.watson.org To: Poul-Henning Kamp In-Reply-To: <5753.1205785282@critter.freebsd.dk> Message-ID: <20080318182358.F34016@fledge.watson.org> References: <5753.1205785282@critter.freebsd.dk> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: arch@FreeBSD.org, Kris Kennaway Subject: Re: Power-Mgt (Was: Re: cvs commit: src/sys/i386/cpufreq est.c ) X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 18 Mar 2008 18:26:12 -0000 On Mon, 17 Mar 2008, Poul-Henning Kamp wrote: > In message <47DED21C.4070108@FreeBSD.org>, Kris Kennaway writes: > >> I think this is a great idea, but one of the big problems is probably going >> to be dealing with hardware quirks. e.g. we can't even enable powerd by >> default because e.g. acpi_throttle hangs on some systems. It might be >> tricky to get power management to the stage where it works for everyone and >> can be done automatically. > > I'd expect that this will improve over time, just like all other > technologies from ISA to PCI bus implementations did. > > But yes, it will take time & effort, but given the current > cleantech/greentech buzz, I think we'd better get moving. I know we've talked about this, but I'll mention it for the benefits of the mailing list: one of the things that makes performance an "easy" target is that there are easy-to-gather metrics. Those metrics may require knowledge of statistics and a lifetime of experience to interpret correctly, but they are still numbers that are easily generated and compared. To drive work in power management, we would benefit from having similarly accessible metrics. Are there any decent documents describing how to do power use measurement, and are there any (relatively) accessible tools for doing it with? For example, on notebooks, can we sample an ACPI value before/after a benchmark, or do we really need to hook something up to the power supply in order to get a useful number? Robert N M Watson Computer Laboratory University of Cambridge From owner-freebsd-arch@FreeBSD.ORG Tue Mar 18 18:54:33 2008 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 99C231065688; Tue, 18 Mar 2008 18:54:33 +0000 (UTC) (envelope-from brooks@lor.one-eyed-alien.net) Received: from lor.one-eyed-alien.net (cl-162.ewr-01.us.sixxs.net [IPv6:2001:4830:1200:a1::2]) by mx1.freebsd.org (Postfix) with ESMTP id 33D738FC16; Tue, 18 Mar 2008 18:54:33 +0000 (UTC) (envelope-from brooks@lor.one-eyed-alien.net) Received: from lor.one-eyed-alien.net (localhost [127.0.0.1]) by lor.one-eyed-alien.net (8.14.2/8.14.2) with ESMTP id m2IIsaxd002917; Tue, 18 Mar 2008 13:54:36 -0500 (CDT) (envelope-from brooks@lor.one-eyed-alien.net) Received: (from brooks@localhost) by lor.one-eyed-alien.net (8.14.2/8.14.2/Submit) id m2IIsZsV002916; Tue, 18 Mar 2008 13:54:35 -0500 (CDT) (envelope-from brooks) Date: Tue, 18 Mar 2008 13:54:35 -0500 From: Brooks Davis To: Robert Watson Message-ID: <20080318185435.GA2853@lor.one-eyed-alien.net> References: <5753.1205785282@critter.freebsd.dk> <20080318182358.F34016@fledge.watson.org> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="ZGiS0Q5IWpPtfppv" Content-Disposition: inline In-Reply-To: <20080318182358.F34016@fledge.watson.org> User-Agent: Mutt/1.5.17 (2007-11-01) Cc: arch@freebsd.org, Poul-Henning Kamp Subject: Re: Power-Mgt (Was: Re: cvs commit: src/sys/i386/cpufreq est.c ) X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 18 Mar 2008 18:54:33 -0000 --ZGiS0Q5IWpPtfppv Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Tue, Mar 18, 2008 at 06:26:11PM +0000, Robert Watson wrote: >=20 > On Mon, 17 Mar 2008, Poul-Henning Kamp wrote: >=20 >> In message <47DED21C.4070108@FreeBSD.org>, Kris Kennaway writes: >>=20 >>> I think this is a great idea, but one of the big problems is probably= =20 >>> going to be dealing with hardware quirks. e.g. we can't even enable=20 >>> powerd by default because e.g. acpi_throttle hangs on some systems. It= =20 >>> might be tricky to get power management to the stage where it works for= =20 >>> everyone and can be done automatically. >>=20 >> I'd expect that this will improve over time, just like all other=20 >> technologies from ISA to PCI bus implementations did. >>=20 >> But yes, it will take time & effort, but given the current=20 >> cleantech/greentech buzz, I think we'd better get moving. >=20 > I know we've talked about this, but I'll mention it for the benefits of t= he=20 > mailing list: one of the things that makes performance an "easy" target i= s=20 > that there are easy-to-gather metrics. Those metrics may require knowled= ge=20 > of statistics and a lifetime of experience to interpret correctly, but th= ey=20 > are still numbers that are easily generated and compared. To drive work = in=20 > power management, we would benefit from having similarly accessible=20 > metrics. Are there any decent documents describing how to do power use= =20 > measurement, and are there any (relatively) accessible tools for doing it= =20 > with? For example, on notebooks, can we sample an ACPI value before/afte= r=20 > a benchmark, or do we really need to hook something up to the power suppl= y=20 > in order to get a useful number? For servers, logging power meters with computer interfaces seem to be fairly expensive, but accumulating or instantaneous ones you have to look at to get data out of aren't too bad. For example, the Kill A Watt meter available in the US is $20-30. http://www.p3international.com/products/special/P4400/P4400-CE.html For amusement value, I had a dual P4-Xeon box hooked up to one once and found that power consumption with SETI@Home running was about 10W _lower_ than idle. -- Brooks --ZGiS0Q5IWpPtfppv Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.8 (FreeBSD) iD8DBQFH4A/rXY6L6fI4GtQRAlkDAJ0e9vERJYu4wBY0BPUxVpNi6xP8mACdG+HB tTfJyYEJ3p8BzvXMsdy/uUI= =5llQ -----END PGP SIGNATURE----- --ZGiS0Q5IWpPtfppv-- From owner-freebsd-arch@FreeBSD.ORG Tue Mar 18 19:52:04 2008 Return-Path: Delivered-To: arch@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 92327106564A; Tue, 18 Mar 2008 19:52:04 +0000 (UTC) (envelope-from phk@critter.freebsd.dk) Received: from phk.freebsd.dk (phk.freebsd.dk [130.225.244.222]) by mx1.freebsd.org (Postfix) with ESMTP id 693D58FC1A; Tue, 18 Mar 2008 19:52:04 +0000 (UTC) (envelope-from phk@critter.freebsd.dk) Received: from critter.freebsd.dk (unknown [192.168.64.3]) by phk.freebsd.dk (Postfix) with ESMTP id 96FFB17105; Tue, 18 Mar 2008 19:52:02 +0000 (UTC) Received: from critter.freebsd.dk (localhost [127.0.0.1]) by critter.freebsd.dk (8.14.2/8.14.2) with ESMTP id m2IJq2WK001279; Tue, 18 Mar 2008 19:52:02 GMT (envelope-from phk@critter.freebsd.dk) To: Robert Watson From: "Poul-Henning Kamp" In-Reply-To: Your message of "Tue, 18 Mar 2008 18:26:11 GMT." <20080318182358.F34016@fledge.watson.org> Date: Tue, 18 Mar 2008 19:52:02 +0000 Message-ID: <1278.1205869922@critter.freebsd.dk> Sender: phk@critter.freebsd.dk Cc: arch@FreeBSD.org, Kris Kennaway Subject: Re: Power-Mgt (Was: Re: cvs commit: src/sys/i386/cpufreq est.c ) X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 18 Mar 2008 19:52:04 -0000 In message <20080318182358.F34016@fledge.watson.org>, Robert Watson writes: >I know we've talked about this, but I'll mention it for the benefits of the >mailing list: one of the things that makes performance an "easy" target is >that there are easy-to-gather metrics. [...] Are >there any decent documents describing how to do power use measurement, and are >there any (relatively) accessible tools for doing it with? I think performance and power-performance is pretty similar in this: you can easily collect some pretty crude performance indications and it takes a determined effort to get hard numbers. When we talk about macroscopic efforts, turning of hardware we don't use, spinning down disks, common sense says that power is saved and we can leave it at that. When we talk about optimizing scheduling to keep chips powered off as much as possible, it will more than the average ampmeter to inform about the optimal algorithms. I have not tried to find out how exact the power measurements ACPI offers on laptops are, I know some of the chips used but have never double-checked the result. Obviously, if you have access to a decent bench supply, you can run your laptop off that (take the battery out so charging does not affect the results), but few off them allow you to measure power (ie: Watts) without hooking up GPIB and accumulating a lot of current measurements by hand. Summary: There isn't anything as handy as time(1), but it isn't rocket science either. -- Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 phk@FreeBSD.ORG | TCP/IP since RFC 956 FreeBSD committer | BSD since 4.3-tahoe Never attribute to malice what can adequately be explained by incompetence. From owner-freebsd-arch@FreeBSD.ORG Tue Mar 18 19:59:42 2008 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 14B3F1065674; Tue, 18 Mar 2008 19:59:42 +0000 (UTC) (envelope-from phk@critter.freebsd.dk) Received: from phk.freebsd.dk (phk.freebsd.dk [130.225.244.222]) by mx1.freebsd.org (Postfix) with ESMTP id C1E438FC1F; Tue, 18 Mar 2008 19:59:41 +0000 (UTC) (envelope-from phk@critter.freebsd.dk) Received: from critter.freebsd.dk (unknown [192.168.64.3]) by phk.freebsd.dk (Postfix) with ESMTP id 7D5E917105; Tue, 18 Mar 2008 19:59:40 +0000 (UTC) Received: from critter.freebsd.dk (localhost [127.0.0.1]) by critter.freebsd.dk (8.14.2/8.14.2) with ESMTP id m2IJxeF9001363; Tue, 18 Mar 2008 19:59:40 GMT (envelope-from phk@critter.freebsd.dk) To: Brooks Davis From: "Poul-Henning Kamp" In-Reply-To: Your message of "Tue, 18 Mar 2008 13:54:35 EST." <20080318185435.GA2853@lor.one-eyed-alien.net> Date: Tue, 18 Mar 2008 19:59:40 +0000 Message-ID: <1362.1205870380@critter.freebsd.dk> Sender: phk@critter.freebsd.dk Cc: arch@freebsd.org, Robert Watson Subject: Re: Power-Mgt (Was: Re: cvs commit: src/sys/i386/cpufreq est.c ) X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 18 Mar 2008 19:59:42 -0000 In message <20080318185435.GA2853@lor.one-eyed-alien.net>, Brooks Davis writes: >For amusement value, I had a dual P4-Xeon box hooked up to one once >and found that power consumption with SETI@Home running was about 10W >_lower_ than idle. That is a very good indication that the meter is a piece of crap that does not have sufficient measurement rate to do a relevant job. Unfortunately, that is the case for most of the gadgets you can buy in shops. In general, you are much better off buying the real thing, for instance a single-phase DIN power-meter like: http://www.metermaid.co.uk/din_rail_tech_info.html They cost less than EUR100/USD150 and have 1% accuracy. The "SO" output can be hooked up to a parallel or serial port and you can accumulate and read the number of pulses using the PPS-API, giving you, in this case, 500mWh resolution. Poul-Henning -- Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 phk@FreeBSD.ORG | TCP/IP since RFC 956 FreeBSD committer | BSD since 4.3-tahoe Never attribute to malice what can adequately be explained by incompetence. From owner-freebsd-arch@FreeBSD.ORG Tue Mar 18 22:07:53 2008 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 47CF81065673; Tue, 18 Mar 2008 22:07:53 +0000 (UTC) (envelope-from bakul@bitblocks.com) Received: from mail.bitblocks.com (bitblocks.com [64.142.15.60]) by mx1.freebsd.org (Postfix) with ESMTP id 3ABC38FC12; Tue, 18 Mar 2008 22:07:53 +0000 (UTC) (envelope-from bakul@bitblocks.com) Received: from bitblocks.com (localhost.bitblocks.com [127.0.0.1]) by mail.bitblocks.com (Postfix) with ESMTP id EB09F5B3B; Tue, 18 Mar 2008 14:43:38 -0700 (PDT) To: "Poul-Henning Kamp" In-reply-to: Your message of "Tue, 18 Mar 2008 19:59:40 -0000." <1362.1205870380@critter.freebsd.dk> Date: Tue, 18 Mar 2008 14:43:38 -0700 From: Bakul Shah Message-Id: <20080318214338.EB09F5B3B@mail.bitblocks.com> Cc: arch@freebsd.org, Brooks Davis , Robert Watson Subject: Re: Power-Mgt (Was: Re: cvs commit: src/sys/i386/cpufreq est.c ) X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 18 Mar 2008 22:07:53 -0000 On Tue, 18 Mar 2008 19:59:40 -0000 "Poul-Henning Kamp" wrote: > In message <20080318185435.GA2853@lor.one-eyed-alien.net>, Brooks Davis write > s: > > >For amusement value, I had a dual P4-Xeon box hooked up to one once > >and found that power consumption with SETI@Home running was about 10W > >_lower_ than idle. > > That is a very good indication that the meter is a piece of crap that > does not have sufficient measurement rate to do a relevant job. You are likely right but it is also possible the system is poorly designed or broken in some way and does in fact draw more power when idle. Brooks ought to hook up his particular p4-xeon box to another brand meter to verify. > Unfortunately, that is the case for most of the gadgets you can buy > in shops. > > In general, you are much better off buying the real thing, for instance > a single-phase DIN power-meter like: > > http://www.metermaid.co.uk/din_rail_tech_info.html > > They cost less than EUR100/USD150 and have 1% accuracy. > > The "SO" output can be hooked up to a parallel or serial port and > you can accumulate and read the number of pulses using the PPS-API, > giving you, in this case, 500mWh resolution. I have been pretty happy with the Wattsup Pro meter but I admit I have not calibrated with to any known good reference for computer loads. It does a pretty accurate job on constant loads (verified with a DMM with good AC amps capability). It measures power down to a watt or so, with 100mWh resolution. Very useful for measuring standby mode power use (chargers, computer, laser printer, TV, phone, etc. -- can easily add up to over a Megawatt-hour a year!). I have the older serial only meter but now it comes with a USB port (about $130) https://www.doubleed.com/secure/products.php From owner-freebsd-arch@FreeBSD.ORG Wed Mar 19 06:44:36 2008 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 94A9B106564A for ; Wed, 19 Mar 2008 06:44:36 +0000 (UTC) (envelope-from peterjeremy@optushome.com.au) Received: from mail15.syd.optusnet.com.au (mail15.syd.optusnet.com.au [211.29.132.196]) by mx1.freebsd.org (Postfix) with ESMTP id 12B398FC22 for ; Wed, 19 Mar 2008 06:44:35 +0000 (UTC) (envelope-from peterjeremy@optushome.com.au) Received: from server.vk2pj.dyndns.org (c220-239-20-82.belrs4.nsw.optusnet.com.au [220.239.20.82]) by mail15.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id m2J6iXBL002633 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Wed, 19 Mar 2008 17:44:34 +1100 Received: from server.vk2pj.dyndns.org (localhost.vk2pj.dyndns.org [127.0.0.1]) by server.vk2pj.dyndns.org (8.14.2/8.14.1) with ESMTP id m2J6iXlQ072441; Wed, 19 Mar 2008 17:44:33 +1100 (EST) (envelope-from peter@server.vk2pj.dyndns.org) Received: (from peter@localhost) by server.vk2pj.dyndns.org (8.14.2/8.14.2/Submit) id m2J6iXM9072440; Wed, 19 Mar 2008 17:44:33 +1100 (EST) (envelope-from peter) Date: Wed, 19 Mar 2008 17:44:33 +1100 From: Peter Jeremy To: Poul-Henning Kamp Message-ID: <20080319064433.GA44676@server.vk2pj.dyndns.org> References: <20080318182358.F34016@fledge.watson.org> <1278.1205869922@critter.freebsd.dk> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="YZVh52eu0Ophig4D" Content-Disposition: inline In-Reply-To: <1278.1205869922@critter.freebsd.dk> X-PGP-Key: http://members.optusnet.com.au/peterjeremy/pubkey.asc User-Agent: Mutt/1.5.17 (2007-11-01) Cc: arch@freebsd.org Subject: Re: Power-Mgt (Was: Re: cvs commit: src/sys/i386/cpufreq est.c ) X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 19 Mar 2008 06:44:36 -0000 --YZVh52eu0Ophig4D Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Tue, Mar 18, 2008 at 07:52:02PM +0000, Poul-Henning Kamp wrote: >When we talk about macroscopic efforts, turning of hardware we don't >use, spinning down disks, common sense says that power is saved and >we can leave it at that. Except that it takes more power to spin up a disk than keep is spinning. Even neglecting the disk life issue, powering a disk down for a short period and then powering it back up may use more energy than keeping it running. >I have not tried to find out how exact the power measurements ACPI >offers on laptops are, I know some of the chips used but have >never double-checked the result. I don't believe ACPI lets you get at the numbers with sufficient resolution to manage anything particularly meaningful. In any case, repeatability and monotonicity are more of an issue than absolute accuracy: As long as we can meaningfully do relational comparisons then we can make progress. I suspect the results are going to vary significantly between systems anyway. >affect the results), but few off them allow you to measure power >(ie: Watts) without hooking up GPIB and accumulating a lot of >current measurements by hand. Any decent bench supply should be stiff enough to treat the voltage as a constant so just monitoring the current is adequate to calculate power. If you want to monitor energy then, yes you probably need to hook it up to an external logger. You can buy a multimeter with a USB interface for AUD140 ( Delivered-To: arch@hub.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id B9FD41065670; Wed, 19 Mar 2008 08:19:00 +0000 (UTC) (envelope-from davidxu@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id 7C89E8FC18; Wed, 19 Mar 2008 08:19:00 +0000 (UTC) (envelope-from davidxu@FreeBSD.org) Received: from apple.my.domain (root@localhost [127.0.0.1]) by freefall.freebsd.org (8.14.2/8.14.2) with ESMTP id m2J8IvS9079221; Wed, 19 Mar 2008 08:18:58 GMT (envelope-from davidxu@freebsd.org) Message-ID: <47E0CCC4.8040503@freebsd.org> Date: Wed, 19 Mar 2008 16:20:20 +0800 From: David Xu User-Agent: Thunderbird 2.0.0.9 (X11/20071211) MIME-Version: 1.0 To: Daniel Eischen References: <20080307020626.G920@desktop> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: arch@FreeBSD.org Subject: Re: Getting rid of the static msleep priority boost X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 19 Mar 2008 08:19:00 -0000 Daniel Eischen wrote: > I'm not sure if any of the above remove the priority from the API, > but it would be nice to get rid of msleep totally and replace it > with an equivalent cv_wait(). > And create sleep queue in each cv to get rid of shared sleep queue lock ? Regards, David Xu From owner-freebsd-arch@FreeBSD.ORG Wed Mar 19 09:52:25 2008 Return-Path: Delivered-To: arch@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 5DA2C1065671; Wed, 19 Mar 2008 09:52:25 +0000 (UTC) (envelope-from jroberson@chesapeake.net) Received: from webaccess-cl.virtdom.com (webaccess-cl.virtdom.com [216.240.101.25]) by mx1.freebsd.org (Postfix) with ESMTP id 182188FC20; Wed, 19 Mar 2008 09:52:25 +0000 (UTC) (envelope-from jroberson@chesapeake.net) Received: from [192.168.1.107] (cpe-24-94-75-93.hawaii.res.rr.com [24.94.75.93]) (authenticated bits=0) by webaccess-cl.virtdom.com (8.13.6/8.13.6) with ESMTP id m2J9qMID087939; Wed, 19 Mar 2008 05:52:23 -0400 (EDT) (envelope-from jroberson@chesapeake.net) Date: Tue, 18 Mar 2008 23:53:14 -1000 (HST) From: Jeff Roberson X-X-Sender: jroberson@desktop To: David Xu In-Reply-To: <47E0CCC4.8040503@freebsd.org> Message-ID: <20080318235125.G910@desktop> References: <20080307020626.G920@desktop> <47E0CCC4.8040503@freebsd.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: Daniel Eischen , arch@FreeBSD.org Subject: Re: Getting rid of the static msleep priority boost X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 19 Mar 2008 09:52:25 -0000 On Wed, 19 Mar 2008, David Xu wrote: > Daniel Eischen wrote: > >> I'm not sure if any of the above remove the priority from the API, >> but it would be nice to get rid of msleep totally and replace it >> with an equivalent cv_wait(). >> > > And create sleep queue in each cv to get rid of shared sleep queue > lock ? Some spinlock is required to interlock with the scheduler lock via thread_lock(). So I don't think you can get rid of that layer. You also wouldn't want to have the cost of a 'struct sleepqueue' everywhere you want a msleep/condvar. I personally don't see any real advantage to using condvar everywhere. The only thing you really get is protection against spurious wakeups. Thanks, Jeff > > Regards, > David Xu > From owner-freebsd-arch@FreeBSD.ORG Wed Mar 19 11:16:57 2008 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id A3D4A1065676 for ; Wed, 19 Mar 2008 11:16:57 +0000 (UTC) (envelope-from imp@bsdimp.com) Received: from harmony.bsdimp.com (bsdimp.com [199.45.160.85]) by mx1.freebsd.org (Postfix) with ESMTP id 61F648FC1F for ; Wed, 19 Mar 2008 11:16:57 +0000 (UTC) (envelope-from imp@bsdimp.com) Received: from localhost (localhost [127.0.0.1]) by harmony.bsdimp.com (8.14.2/8.14.1) with ESMTP id m2JBFRIX016281; Wed, 19 Mar 2008 05:15:28 -0600 (MDT) (envelope-from imp@bsdimp.com) Date: Wed, 19 Mar 2008 05:16:04 -0600 (MDT) Message-Id: <20080319.051604.63052713.imp@bsdimp.com> To: bzeeb-lists@lists.zabbadoz.net From: "M. Warner Losh" In-Reply-To: <20080318085804.I50685@maildrop.int.zabbadoz.net> References: <3860.1205764623@critter.freebsd.dk> <20080318085804.I50685@maildrop.int.zabbadoz.net> X-Mailer: Mew version 5.2 on Emacs 21.3 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit Cc: arch@freebsd.org, phk@phk.freebsd.dk Subject: Re: Power-Mgt X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 19 Mar 2008 11:16:57 -0000 In message: <20080318085804.I50685@maildrop.int.zabbadoz.net> "Bjoern A. Zeeb" writes: : what actually happens to an unrecognized card or a card with no driver : loaded currently? How much power does an unsued card use and can we do : anything about that? Are we perhaps already doing something about : that? For PCI it is set into D3 state. Or at least was until this caused a problem with some raid controllers that didn't follow the rules and had extra devices that the card used, but that the OS didn't have a driver for. For PC Card, the card is powered down entirely. For CardBus I think the same. For USB, ugen takes it, and therefore it is powered up. Warner From owner-freebsd-arch@FreeBSD.ORG Wed Mar 19 11:22:56 2008 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 98F95106564A; Wed, 19 Mar 2008 11:22:56 +0000 (UTC) (envelope-from imp@bsdimp.com) Received: from harmony.bsdimp.com (bsdimp.com [199.45.160.85]) by mx1.freebsd.org (Postfix) with ESMTP id 3BB9F8FC2D; Wed, 19 Mar 2008 11:22:56 +0000 (UTC) (envelope-from imp@bsdimp.com) Received: from localhost (localhost [127.0.0.1]) by harmony.bsdimp.com (8.14.2/8.14.1) with ESMTP id m2JBJQLs016312; Wed, 19 Mar 2008 05:19:26 -0600 (MDT) (envelope-from imp@bsdimp.com) Date: Wed, 19 Mar 2008 05:20:03 -0600 (MDT) Message-Id: <20080319.052003.1159136945.imp@bsdimp.com> To: rwatson@freebsd.org From: "M. Warner Losh" In-Reply-To: <20080318182358.F34016@fledge.watson.org> References: <5753.1205785282@critter.freebsd.dk> <20080318182358.F34016@fledge.watson.org> X-Mailer: Mew version 5.2 on Emacs 21.3 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit Cc: arch@freebsd.org, phk@phk.freebsd.dk Subject: Re: Power-Mgt X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 19 Mar 2008 11:22:56 -0000 In message: <20080318182358.F34016@fledge.watson.org> Robert Watson writes: : For example, on notebooks, can we sample an ACPI value before/after : a benchmark, or do we really need to hook something up to the power : supply in order to get a useful number? The latter. Many notebooks do not provide reasonable power usage. They provide decent power drain at the moment statistics, but are too noisy to be used in traditional benchmarks. Warner From owner-freebsd-arch@FreeBSD.ORG Wed Mar 19 11:29:48 2008 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 52EED1065677 for ; Wed, 19 Mar 2008 11:29:48 +0000 (UTC) (envelope-from phk@critter.freebsd.dk) Received: from phk.freebsd.dk (phk.freebsd.dk [130.225.244.222]) by mx1.freebsd.org (Postfix) with ESMTP id 16BD58FC34 for ; Wed, 19 Mar 2008 11:29:47 +0000 (UTC) (envelope-from phk@critter.freebsd.dk) Received: from critter.freebsd.dk (unknown [192.168.64.3]) by phk.freebsd.dk (Postfix) with ESMTP id 5C71917107; Wed, 19 Mar 2008 11:29:45 +0000 (UTC) Received: from critter.freebsd.dk (localhost [127.0.0.1]) by critter.freebsd.dk (8.14.2/8.14.2) with ESMTP id m2JBThmo008727; Wed, 19 Mar 2008 11:29:44 GMT (envelope-from phk@critter.freebsd.dk) To: "M. Warner Losh" From: "Poul-Henning Kamp" In-Reply-To: Your message of "Wed, 19 Mar 2008 05:16:04 CST." <20080319.051604.63052713.imp@bsdimp.com> Date: Wed, 19 Mar 2008 11:29:43 +0000 Message-ID: <8726.1205926183@critter.freebsd.dk> Sender: phk@critter.freebsd.dk Cc: bzeeb-lists@lists.zabbadoz.net, arch@freebsd.org Subject: Re: Power-Mgt X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 19 Mar 2008 11:29:48 -0000 In message <20080319.051604.63052713.imp@bsdimp.com>, "M. Warner Losh" writes: >In message: <20080318085804.I50685@maildrop.int.zabbadoz.net> > "Bjoern A. Zeeb" writes: >: what actually happens to an unrecognized card or a card with no driver >: loaded currently? How much power does an unsued card use and can we do >: anything about that? Are we perhaps already doing something about >: that? > >For PCI it is set into D3 state. Or at least was until this caused a >problem with some raid controllers that didn't follow the rules and >had extra devices that the card used, but that the OS didn't have a >driver for. > >For PC Card, the card is powered down entirely. For CardBus I think >the same. For USB, ugen takes it, and therefore it is powered up. Not to mention this comment from acpi_cpu.c: /* * Check for bus master activity. If there was activity, clear * the bit and use the lowest non-C3 state. Note that the USB * driver polling for new devices keeps this bit set all the * time if USB is loaded. */ -- Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 phk@FreeBSD.ORG | TCP/IP since RFC 956 FreeBSD committer | BSD since 4.3-tahoe Never attribute to malice what can adequately be explained by incompetence. From owner-freebsd-arch@FreeBSD.ORG Wed Mar 19 12:19:33 2008 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 5623F1065672 for ; Wed, 19 Mar 2008 12:19:33 +0000 (UTC) (envelope-from phk@critter.freebsd.dk) Received: from phk.freebsd.dk (phk.freebsd.dk [130.225.244.222]) by mx1.freebsd.org (Postfix) with ESMTP id 1C28C8FC1A for ; Wed, 19 Mar 2008 12:19:32 +0000 (UTC) (envelope-from phk@critter.freebsd.dk) Received: from critter.freebsd.dk (unknown [192.168.64.3]) by phk.freebsd.dk (Postfix) with ESMTP id DD56817104; Wed, 19 Mar 2008 12:19:30 +0000 (UTC) Received: from critter.freebsd.dk (localhost [127.0.0.1]) by critter.freebsd.dk (8.14.2/8.14.2) with ESMTP id m2JCJT4u009665; Wed, 19 Mar 2008 12:19:30 GMT (envelope-from phk@critter.freebsd.dk) To: Peter Jeremy From: "Poul-Henning Kamp" In-Reply-To: Your message of "Wed, 19 Mar 2008 17:44:33 +1100." <20080319064433.GA44676@server.vk2pj.dyndns.org> Date: Wed, 19 Mar 2008 12:19:29 +0000 Message-ID: <9664.1205929169@critter.freebsd.dk> Sender: phk@critter.freebsd.dk Cc: arch@freebsd.org Subject: Re: Power-Mgt (Was: Re: cvs commit: src/sys/i386/cpufreq est.c ) X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 19 Mar 2008 12:19:33 -0000 In message <20080319064433.GA44676@server.vk2pj.dyndns.org>, Peter Jeremy write s: >On Tue, Mar 18, 2008 at 07:52:02PM +0000, Poul-Henning Kamp wrote: >>When we talk about macroscopic efforts, turning of hardware we don't >>use, spinning down disks, common sense says that power is saved and >>we can leave it at that. > >Except that it takes more power to spin up a disk than keep is >spinning. Even neglecting the disk life issue, powering a disk down >for a short period and then powering it back up may use more energy >than keeping it running. I was talking in the context of having a facility for spinning disks down vs. always letting them run. You're talking about when we spin them down, which is a matter of tuning. Yes, obviously our defaults should be sensible, as always. >>I have not tried to find out how exact the power measurements ACPI >>offers on laptops are, I know some of the chips used but have >>never double-checked the result. > >I don't believe ACPI lets you get at the numbers with sufficient >resolution to manage anything particularly meaningful. I'm not so sure, the chips have pretty good resolution and high accumulation rate, it's ACPI which only ask the chip every 30 seconds. >Any decent bench supply should be stiff enough to treat the voltage as >a constant so just monitoring the current is adequate to calculate >power. The problem with this approach, is that you need to accumulate current measurements at least 500 times per second, to get a realistic picture of the power content of the spikes. You can of course do a lot to smooth this out, but then it turns into (even more) of an electronics task. The only PSU's I know that can do this themselves are the HP/Agilent "extra 3" supplies like the 66311 and similar. If you want to measure on the high-voltage side, the best and cheapest strategy is to get a utility-class powermeter (like the DIN unit i linked to in the other mail) -- Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 phk@FreeBSD.ORG | TCP/IP since RFC 956 FreeBSD committer | BSD since 4.3-tahoe Never attribute to malice what can adequately be explained by incompetence. From owner-freebsd-arch@FreeBSD.ORG Wed Mar 19 12:48:00 2008 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id C15BD106566B for ; Wed, 19 Mar 2008 12:48:00 +0000 (UTC) (envelope-from ed@hoeg.nl) Received: from palm.hoeg.nl (mx0.hoeg.nl [IPv6:2001:610:652::211]) by mx1.freebsd.org (Postfix) with ESMTP id 8427D8FC1A for ; Wed, 19 Mar 2008 12:48:00 +0000 (UTC) (envelope-from ed@hoeg.nl) Received: by palm.hoeg.nl (Postfix, from userid 1000) id D94941CC50; Wed, 19 Mar 2008 13:47:59 +0100 (CET) Date: Wed, 19 Mar 2008 13:47:59 +0100 From: Ed Schouten To: Bruce Evans Message-ID: <20080319124759.GB51074@hoeg.nl> References: <20080315124008.GF80576@hoeg.nl> <20080316015903.N39516@delplex.bde.org> <20080315194809.GN10374@deviant.kiev.zoral.com.ua> <20080316133138.J41270@delplex.bde.org> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="yNb1oOkm5a9FJOVX" Content-Disposition: inline In-Reply-To: <20080316133138.J41270@delplex.bde.org> User-Agent: Mutt/1.5.17 (2007-11-01) Cc: Kostik Belousov , FreeBSD Arch Subject: Re: vgone() calling VOP_CLOSE() -> blocked threads? X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 19 Mar 2008 12:48:00 -0000 --yNb1oOkm5a9FJOVX Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable Hello Bruce, * Bruce Evans wrote: > On Sat, 15 Mar 2008, Kostik Belousov wrote: > >> On Sun, Mar 16, 2008 at 03:55:18AM +1100, Bruce Evans wrote: >>> Other problems near here: >>> - neither vfs nor drivers currently know how many threads are in a >>> driver. vfs uses vp->v_rdev->si_usecount, but this doesn't quite work >> This is provided by si_threadcount. >> See the dev(vn)_refthread and it usage in the devfs vnops and fops. > > So why doesn't reovoke() use it? :-). All uses of si_usecount, which > normally happen via vcount() and count_dev(), are suspect, especially > the latter. > > vcount() is only used in revoke(), in svr4_fcntl.c to handle another > revoke(), and for FreeBSD < 6 in reiserfs for an old multiple-mount > check. > > count_dev() is only used in ata-tape.c (to decide in the same broken > way as vfs if a close is the last one -- this driver uses D_TRACKCLOSE > to get d_close() called on all closes. This gives it the burden of > deciding whether the close is the last one, and it can't do this any > better than vfs. D_TRACKCLOSE is used in a few other drivers which > don't call count_dev()), in devfs_close() (to decide whether to release > the controlling terminal and to decide when to call d_close()). > > Hmm, it seems to be not vfs but only devfs which handles last-close > specially. devfs is closer to devices, so it should know how to use > si_threadcount here. Hopefully si_threadcount counts threads sleeping > in open or close, although si_usecount doesn't. d_close (or something) > should be called to wake up these threads even if si_usecount is 0. > Drivers which support sleeping in open or close must support d_close > (or something) being called to forcibly end such sleeps. revoke() > should forcibly end such sleeps, so it needs to check si_threadcount > too. si_usecount in its current form might end up being unused, so > si_threadcount could be renamed back to it. I just changed my TTY code to perform some garbage collecting on TTY's. It now only performs a device cleanup when si_threadcount =3D=3D 1 and TF_OPENED is unset. Unfortunately, I'm checking for these conditions in all the cdev ops, which is quite expensive. It does the trick, but if someone has a better idea, I'm willing to implement it. Thanks! --=20 Ed Schouten WWW: http://g-rave.nl/ --yNb1oOkm5a9FJOVX Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.8 (FreeBSD) iEYEARECAAYFAkfhC38ACgkQ52SDGA2eCwWzhQCZAfkFEA+m0cZVS4782P9Bwnot r3YAn3oGxzHPffeV0t+wxKHzTuGIdcSh =XhOz -----END PGP SIGNATURE----- --yNb1oOkm5a9FJOVX-- From owner-freebsd-arch@FreeBSD.ORG Wed Mar 19 13:24:54 2008 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 4A2571065670; Wed, 19 Mar 2008 13:24:54 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from elvis.mu.org (elvis.mu.org [192.203.228.196]) by mx1.freebsd.org (Postfix) with ESMTP id 3B0DB8FC1D; Wed, 19 Mar 2008 13:24:54 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from zion.baldwin.cx (66-23-211-162.clients.speedfactory.net [66.23.211.162]) by elvis.mu.org (Postfix) with ESMTP id 41F411A4D7E; Wed, 19 Mar 2008 06:23:31 -0700 (PDT) From: John Baldwin To: freebsd-arch@freebsd.org Date: Wed, 19 Mar 2008 08:54:38 -0400 User-Agent: KMail/1.9.7 References: <8726.1205926183@critter.freebsd.dk> In-Reply-To: <8726.1205926183@critter.freebsd.dk> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-15" Content-Transfer-Encoding: 7bit Content-Disposition: inline Message-Id: <200803190854.39131.jhb@freebsd.org> Cc: bzeeb-lists@lists.zabbadoz.net, Poul-Henning Kamp , arch@freebsd.org Subject: Re: Power-Mgt X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 19 Mar 2008 13:24:54 -0000 On Wednesday 19 March 2008 07:29:43 am Poul-Henning Kamp wrote: > In message <20080319.051604.63052713.imp@bsdimp.com>, "M. Warner Losh" writes: > >In message: <20080318085804.I50685@maildrop.int.zabbadoz.net> > > > > "Bjoern A. Zeeb" writes: > >: what actually happens to an unrecognized card or a card with no driver > >: loaded currently? How much power does an unsued card use and can we do > >: anything about that? Are we perhaps already doing something about > >: that? > > > >For PCI it is set into D3 state. Or at least was until this caused a > >problem with some raid controllers that didn't follow the rules and > >had extra devices that the card used, but that the OS didn't have a > >driver for. > > > >For PC Card, the card is powered down entirely. For CardBus I think > >the same. For USB, ugen takes it, and therefore it is powered up. > > Not to mention this comment from acpi_cpu.c: > > /* > * Check for bus master activity. If there was activity, clear > * the bit and use the lowest non-C3 state. Note that the USB > * driver polling for new devices keeps this bit set all the > * time if USB is loaded. > */ That is something to be fixed in the USB driver, but yes. Changing the USB driver to power down when nothing is plugged in may help. -- John Baldwin From owner-freebsd-arch@FreeBSD.ORG Wed Mar 19 13:24:54 2008 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 4A2571065670; Wed, 19 Mar 2008 13:24:54 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from elvis.mu.org (elvis.mu.org [192.203.228.196]) by mx1.freebsd.org (Postfix) with ESMTP id 3B0DB8FC1D; Wed, 19 Mar 2008 13:24:54 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from zion.baldwin.cx (66-23-211-162.clients.speedfactory.net [66.23.211.162]) by elvis.mu.org (Postfix) with ESMTP id 41F411A4D7E; Wed, 19 Mar 2008 06:23:31 -0700 (PDT) From: John Baldwin To: freebsd-arch@freebsd.org Date: Wed, 19 Mar 2008 08:54:38 -0400 User-Agent: KMail/1.9.7 References: <8726.1205926183@critter.freebsd.dk> In-Reply-To: <8726.1205926183@critter.freebsd.dk> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-15" Content-Transfer-Encoding: 7bit Content-Disposition: inline Message-Id: <200803190854.39131.jhb@freebsd.org> Cc: bzeeb-lists@lists.zabbadoz.net, Poul-Henning Kamp , arch@freebsd.org Subject: Re: Power-Mgt X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 19 Mar 2008 13:24:54 -0000 On Wednesday 19 March 2008 07:29:43 am Poul-Henning Kamp wrote: > In message <20080319.051604.63052713.imp@bsdimp.com>, "M. Warner Losh" writes: > >In message: <20080318085804.I50685@maildrop.int.zabbadoz.net> > > > > "Bjoern A. Zeeb" writes: > >: what actually happens to an unrecognized card or a card with no driver > >: loaded currently? How much power does an unsued card use and can we do > >: anything about that? Are we perhaps already doing something about > >: that? > > > >For PCI it is set into D3 state. Or at least was until this caused a > >problem with some raid controllers that didn't follow the rules and > >had extra devices that the card used, but that the OS didn't have a > >driver for. > > > >For PC Card, the card is powered down entirely. For CardBus I think > >the same. For USB, ugen takes it, and therefore it is powered up. > > Not to mention this comment from acpi_cpu.c: > > /* > * Check for bus master activity. If there was activity, clear > * the bit and use the lowest non-C3 state. Note that the USB > * driver polling for new devices keeps this bit set all the > * time if USB is loaded. > */ That is something to be fixed in the USB driver, but yes. Changing the USB driver to power down when nothing is plugged in may help. -- John Baldwin From owner-freebsd-arch@FreeBSD.ORG Wed Mar 19 15:23:32 2008 Return-Path: Delivered-To: arch@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 82F0F106564A for ; Wed, 19 Mar 2008 15:23:32 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail03.syd.optusnet.com.au (mail03.syd.optusnet.com.au [211.29.132.184]) by mx1.freebsd.org (Postfix) with ESMTP id 2216D8FC21 for ; Wed, 19 Mar 2008 15:23:31 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from besplex.bde.org (c220-239-252-11.carlnfd3.nsw.optusnet.com.au [220.239.252.11]) by mail03.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id m2JFNPIp028750 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Thu, 20 Mar 2008 02:23:26 +1100 Date: Thu, 20 Mar 2008 02:23:24 +1100 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Ed Schouten In-Reply-To: <20080317140039.GJ80576@hoeg.nl> Message-ID: <20080320014118.I10895@besplex.bde.org> References: <20080315124008.GF80576@hoeg.nl> <20080316015903.N39516@delplex.bde.org> <20080317140039.GJ80576@hoeg.nl> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: FreeBSD Arch Subject: Re: vgone() calling VOP_CLOSE() -> blocked threads? X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 19 Mar 2008 15:23:32 -0000 On Mon, 17 Mar 2008, Ed Schouten wrote: > * Bruce Evans wrote: >> ... >> I don't think it would work well to move everything except d_close to >> deadfs. > > It wasn't my idea to make revoke() wait for all threads to leave. It > should just inform the device driver that a revoke() has been performed, > to wake up sleeping threads, and change the vnode to prevent further > access. OK... Keep the move of d_close to deadfs too. The driver might need to call d_close to complete the effects of the revoke, but userland and vfs shouldn't. > The problem with the current implementation is that the device driver > cannot sanely determine whether a revoke() or a real close() is called. > Especially in my new TTY design, where a TTY could even be deallocated > when a close() is performed - when the device driver has abandoned the > TTY device - it would even destroy the TTY object that's being used by > the sleeping threads. > > This is why I chose an approach that would allow threads to just leave > the device driver as they normally would, which reduces complexity a > lot. Yes, kib's reply gives the rule that device close cannot destroy all device structs. It doesn't seem useful to destroy some structs earlier and then have to check all over not to access them while the synchronization is in progress. Some drivers (mainly rp?) get into trouble in another way, by calling dev_unbusy() in device close. Last-close doesn't work well enough for an unconditional dev_unbusy() to work there. vfs refcounts don't work well enough for a conditional dev_unbusy() (conditional on vfs counts alone) to work there either. Just checking si_threadcount in d_close() and providing d_purge() to call dev_unbusy() if d_close() missed doing it seems to be insufficient, since (I think) d_purge() and destroy_devl() designed to destroy the whole device but not to synchronize when last-close is performed out of order. > My question is: what approach would you take in such a situation? > Thanks for your input so far. Something like the old tty approach (a generation count) with more wakeups and more checking of the generation count. It should be possible to check a generation count more efficiently than si_*. For si_*, some device locking is needed. Locking for si_* seems to be undocumented, but seems to be simply mtx_lock() on the global devmtx mutex. A generation count in struct tty would be automatically locked by a tty mutex. Bruce From owner-freebsd-arch@FreeBSD.ORG Wed Mar 19 15:39:56 2008 Return-Path: Delivered-To: arch@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 27113106566C for ; Wed, 19 Mar 2008 15:39:56 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail07.syd.optusnet.com.au (mail07.syd.optusnet.com.au [211.29.132.188]) by mx1.freebsd.org (Postfix) with ESMTP id 9EDD88FC1D for ; Wed, 19 Mar 2008 15:39:55 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from besplex.bde.org (c220-239-252-11.carlnfd3.nsw.optusnet.com.au [220.239.252.11]) by mail07.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id m2JFdo8Y030464 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Thu, 20 Mar 2008 02:39:50 +1100 Date: Thu, 20 Mar 2008 02:39:49 +1100 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Ed Schouten In-Reply-To: <20080319124759.GB51074@hoeg.nl> Message-ID: <20080320022557.K10895@besplex.bde.org> References: <20080315124008.GF80576@hoeg.nl> <20080316015903.N39516@delplex.bde.org> <20080315194809.GN10374@deviant.kiev.zoral.com.ua> <20080316133138.J41270@delplex.bde.org> <20080319124759.GB51074@hoeg.nl> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: Kostik Belousov , FreeBSD Arch Subject: Re: vgone() calling VOP_CLOSE() -> blocked threads? X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 19 Mar 2008 15:39:56 -0000 On Wed, 19 Mar 2008, Ed Schouten wrote: > I just changed my TTY code to perform some garbage collecting on TTY's. > It now only performs a device cleanup when si_threadcount == 1 and > TF_OPENED is unset. Unfortunately, I'm checking for these conditions in > all the cdev ops, which is quite expensive. > > It does the trick, but if someone has a better idea, I'm willing to > implement it. When does si_threadcount go to 0 -- can it be 1 due to something other than a cdev op holding a reference? If revoke() is the only problem, and if non-cdev ops can hold a reference, then it might work to acquire a reference at the time of the revoke. Hold this reference in some process (could even be in userland), and consider releasing it some time later (and later again if the synchronization hasn't completed). While this reference is held, si_refcount cannot go to 0, so it is only necessary to check si_threadcount == 1 when considering releasing this reference. New opens on the device probably need to be blocked while the state is unsynchronized -- otherwise too many states are possible. Bruce From owner-freebsd-arch@FreeBSD.ORG Wed Mar 19 16:23:24 2008 Return-Path: Delivered-To: freebsd-arch@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 23D6A1065670; Wed, 19 Mar 2008 16:23:23 +0000 (UTC) (envelope-from imp@bsdimp.com) Received: from harmony.bsdimp.com (bsdimp.com [199.45.160.85]) by mx1.freebsd.org (Postfix) with ESMTP id A77918FC17; Wed, 19 Mar 2008 16:23:23 +0000 (UTC) (envelope-from imp@bsdimp.com) Received: from localhost (localhost [127.0.0.1]) by harmony.bsdimp.com (8.14.2/8.14.1) with ESMTP id m2JGLB1a023562; Wed, 19 Mar 2008 10:21:11 -0600 (MDT) (envelope-from imp@bsdimp.com) Date: Wed, 19 Mar 2008 10:21:49 -0600 (MDT) Message-Id: <20080319.102149.1723939928.imp@bsdimp.com> To: jhb@FreeBSD.org From: "M. Warner Losh" In-Reply-To: <200803190854.39131.jhb@freebsd.org> References: <8726.1205926183@critter.freebsd.dk> <200803190854.39131.jhb@freebsd.org> X-Mailer: Mew version 5.2 on Emacs 21.3 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit Cc: bzeeb-lists@lists.zabbadoz.net, phk@phk.freebsd.dk, arch@FreeBSD.org, freebsd-arch@FreeBSD.org Subject: Re: Power-Mgt X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 19 Mar 2008 16:23:24 -0000 In message: <200803190854.39131.jhb@freebsd.org> John Baldwin writes: : On Wednesday 19 March 2008 07:29:43 am Poul-Henning Kamp wrote: : > In message <20080319.051604.63052713.imp@bsdimp.com>, "M. Warner Losh" : writes: : > >In message: <20080318085804.I50685@maildrop.int.zabbadoz.net> : > > : > > "Bjoern A. Zeeb" writes: : > >: what actually happens to an unrecognized card or a card with no driver : > >: loaded currently? How much power does an unsued card use and can we do : > >: anything about that? Are we perhaps already doing something about : > >: that? : > > : > >For PCI it is set into D3 state. Or at least was until this caused a : > >problem with some raid controllers that didn't follow the rules and : > >had extra devices that the card used, but that the OS didn't have a : > >driver for. : > > : > >For PC Card, the card is powered down entirely. For CardBus I think : > >the same. For USB, ugen takes it, and therefore it is powered up. : > : > Not to mention this comment from acpi_cpu.c: : > : > /* : > * Check for bus master activity. If there was activity, clear : > * the bit and use the lowest non-C3 state. Note that the USB : > * driver polling for new devices keeps this bit set all the : > * time if USB is loaded. : > */ : : That is something to be fixed in the USB driver, but yes. Changing the USB : driver to power down when nothing is plugged in may help. hps' usb stack implements something in this area. Warner From owner-freebsd-arch@FreeBSD.ORG Wed Mar 19 16:23:24 2008 Return-Path: Delivered-To: arch@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 23D6A1065670; Wed, 19 Mar 2008 16:23:23 +0000 (UTC) (envelope-from imp@bsdimp.com) Received: from harmony.bsdimp.com (bsdimp.com [199.45.160.85]) by mx1.freebsd.org (Postfix) with ESMTP id A77918FC17; Wed, 19 Mar 2008 16:23:23 +0000 (UTC) (envelope-from imp@bsdimp.com) Received: from localhost (localhost [127.0.0.1]) by harmony.bsdimp.com (8.14.2/8.14.1) with ESMTP id m2JGLB1a023562; Wed, 19 Mar 2008 10:21:11 -0600 (MDT) (envelope-from imp@bsdimp.com) Date: Wed, 19 Mar 2008 10:21:49 -0600 (MDT) Message-Id: <20080319.102149.1723939928.imp@bsdimp.com> To: jhb@FreeBSD.org From: "M. Warner Losh" In-Reply-To: <200803190854.39131.jhb@freebsd.org> References: <8726.1205926183@critter.freebsd.dk> <200803190854.39131.jhb@freebsd.org> X-Mailer: Mew version 5.2 on Emacs 21.3 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit Cc: bzeeb-lists@lists.zabbadoz.net, phk@phk.freebsd.dk, arch@FreeBSD.org, freebsd-arch@FreeBSD.org Subject: Re: Power-Mgt X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 19 Mar 2008 16:23:24 -0000 In message: <200803190854.39131.jhb@freebsd.org> John Baldwin writes: : On Wednesday 19 March 2008 07:29:43 am Poul-Henning Kamp wrote: : > In message <20080319.051604.63052713.imp@bsdimp.com>, "M. Warner Losh" : writes: : > >In message: <20080318085804.I50685@maildrop.int.zabbadoz.net> : > > : > > "Bjoern A. Zeeb" writes: : > >: what actually happens to an unrecognized card or a card with no driver : > >: loaded currently? How much power does an unsued card use and can we do : > >: anything about that? Are we perhaps already doing something about : > >: that? : > > : > >For PCI it is set into D3 state. Or at least was until this caused a : > >problem with some raid controllers that didn't follow the rules and : > >had extra devices that the card used, but that the OS didn't have a : > >driver for. : > > : > >For PC Card, the card is powered down entirely. For CardBus I think : > >the same. For USB, ugen takes it, and therefore it is powered up. : > : > Not to mention this comment from acpi_cpu.c: : > : > /* : > * Check for bus master activity. If there was activity, clear : > * the bit and use the lowest non-C3 state. Note that the USB : > * driver polling for new devices keeps this bit set all the : > * time if USB is loaded. : > */ : : That is something to be fixed in the USB driver, but yes. Changing the USB : driver to power down when nothing is plugged in may help. hps' usb stack implements something in this area. Warner From owner-freebsd-arch@FreeBSD.ORG Wed Mar 19 17:25:08 2008 Return-Path: Delivered-To: arch@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 86E7F1065672; Wed, 19 Mar 2008 17:25:08 +0000 (UTC) (envelope-from bright@elvis.mu.org) Received: from elvis.mu.org (elvis.mu.org [192.203.228.196]) by mx1.freebsd.org (Postfix) with ESMTP id 7A8498FC17; Wed, 19 Mar 2008 17:25:08 +0000 (UTC) (envelope-from bright@elvis.mu.org) Received: by elvis.mu.org (Postfix, from userid 1192) id E81231A4D7E; Wed, 19 Mar 2008 10:23:44 -0700 (PDT) Date: Wed, 19 Mar 2008 10:23:44 -0700 From: Alfred Perlstein To: Jeff Roberson Message-ID: <20080319172344.GX67856@elvis.mu.org> References: <20080307020626.G920@desktop> <47E0CCC4.8040503@freebsd.org> <20080318235125.G910@desktop> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20080318235125.G910@desktop> User-Agent: Mutt/1.4.2.3i Cc: Daniel Eischen , arch@FreeBSD.org, David Xu Subject: Re: Getting rid of the static msleep priority boost X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 19 Mar 2008 17:25:08 -0000 * Jeff Roberson [080319 02:51] wrote: > On Wed, 19 Mar 2008, David Xu wrote: > > >Daniel Eischen wrote: > > > >>I'm not sure if any of the above remove the priority from the API, > >>but it would be nice to get rid of msleep totally and replace it > >>with an equivalent cv_wait(). > >> > > > >And create sleep queue in each cv to get rid of shared sleep queue > >lock ? > > Some spinlock is required to interlock with the scheduler lock via > thread_lock(). So I don't think you can get rid of that layer. You also > wouldn't want to have the cost of a 'struct sleepqueue' everywhere you > want a msleep/condvar. > > I personally don't see any real advantage to using condvar everywhere. > The only thing you really get is protection against spurious wakeups. In theory can't you protect the waitq hung off of condvars with the mutex/spinlock used for the condvar instead of a global (hashed) lock on the global waitq? (although doing a condvar_signal/broadcast without the lock would require that the internal code reacquire the lock) -- - Alfred Perlstein From owner-freebsd-arch@FreeBSD.ORG Wed Mar 19 18:40:24 2008 Return-Path: Delivered-To: arch@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 512FA1065671 for ; Wed, 19 Mar 2008 18:40:24 +0000 (UTC) (envelope-from ed@hoeg.nl) Received: from palm.hoeg.nl (mx0.hoeg.nl [IPv6:2001:610:652::211]) by mx1.freebsd.org (Postfix) with ESMTP id 15F008FC1F for ; Wed, 19 Mar 2008 18:40:24 +0000 (UTC) (envelope-from ed@hoeg.nl) Received: by palm.hoeg.nl (Postfix, from userid 1000) id 9211C1CC50; Wed, 19 Mar 2008 19:40:22 +0100 (CET) Date: Wed, 19 Mar 2008 19:40:22 +0100 From: Ed Schouten To: Bruce Evans Message-ID: <20080319184022.GD51074@hoeg.nl> References: <20080315124008.GF80576@hoeg.nl> <20080316015903.N39516@delplex.bde.org> <20080315194809.GN10374@deviant.kiev.zoral.com.ua> <20080316133138.J41270@delplex.bde.org> <20080319124759.GB51074@hoeg.nl> <20080320022557.K10895@besplex.bde.org> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="JwB53PgKC5A7+0Ej" Content-Disposition: inline In-Reply-To: <20080320022557.K10895@besplex.bde.org> User-Agent: Mutt/1.5.17 (2007-11-01) Cc: Kostik Belousov , FreeBSD Arch Subject: Re: vgone() calling VOP_CLOSE() -> blocked threads? X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 19 Mar 2008 18:40:24 -0000 --JwB53PgKC5A7+0Ej Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable Hello Bruce, * Bruce Evans wrote: > On Wed, 19 Mar 2008, Ed Schouten wrote: > >> I just changed my TTY code to perform some garbage collecting on TTY's. >> It now only performs a device cleanup when si_threadcount =3D=3D 1 and >> TF_OPENED is unset. Unfortunately, I'm checking for these conditions in >> all the cdev ops, which is quite expensive. >> >> It does the trick, but if someone has a better idea, I'm willing to >> implement it. > > When does si_threadcount go to 0 -- can it be 1 due to something other > than a cdev op holding a reference? It would probably reach 0, but because the thread calling the close routine is also taken into account, it will be 1. > If revoke() is the only problem, and if non-cdev ops can hold a > reference, then it might work to acquire a reference at the time of > the revoke. Hold this reference in some process (could even be in > userland), and consider releasing it some time later (and later again > if the synchronization hasn't completed). While this reference is > held, si_refcount cannot go to 0, so it is only necessary to check > si_threadcount =3D=3D 1 when considering releasing this reference. Because I'm not really sure about si_threadcount's locking and don't want to rely on undocumented tricks too much, I just introduced a t_threadcnt, which is adjusted when entering/leaving one of the cdev ops. This makes it a lot easier to lock as well. It should be waterproof w.r.t. referencing the TTY, but won't protect it against any open/close races yet. > New opens on the device probably need to be blocked while the state is > unsynchronized -- otherwise too many states are possible. Fortunately, the close() path doesn't block yet, but I was thinking about adding flags, which should be set when entering tricky parts of the TTY layer, such as line discipline changing and device opening and closing. For now, the current solution should suffice. I'm more worried about getting uart(4) to work right now. ;-) --=20 Ed Schouten WWW: http://g-rave.nl/ --JwB53PgKC5A7+0Ej Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.8 (FreeBSD) iEYEARECAAYFAkfhXhYACgkQ52SDGA2eCwVqyQCfaDvXW+AppzkoAYf0XPHVzCxH T7UAnA/PdEW5OcAfBauD/yEa+LhKVnwW =6kOC -----END PGP SIGNATURE----- --JwB53PgKC5A7+0Ej-- From owner-freebsd-arch@FreeBSD.ORG Wed Mar 19 19:29:42 2008 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 7841C106566B; Wed, 19 Mar 2008 19:29:42 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from speedfactory.net (mail.speedfactory.net [66.23.216.219]) by mx1.freebsd.org (Postfix) with ESMTP id 602C88FC15; Wed, 19 Mar 2008 19:29:41 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from server.baldwin.cx (unverified [66.23.211.162]) by speedfactory.net (SurgeMail 3.8s) with ESMTP id 236055968-1834499 for multiple; Wed, 19 Mar 2008 15:30:39 -0400 Received: from localhost.corp.yahoo.com (john@localhost [127.0.0.1]) (authenticated bits=0) by server.baldwin.cx (8.14.2/8.14.2) with ESMTP id m2JJTTcq005262; Wed, 19 Mar 2008 15:29:29 -0400 (EDT) (envelope-from jhb@freebsd.org) From: John Baldwin To: freebsd-arch@freebsd.org Date: Wed, 19 Mar 2008 15:26:56 -0400 User-Agent: KMail/1.9.7 References: <20080307020626.G920@desktop> <20080318235125.G910@desktop> <20080319172344.GX67856@elvis.mu.org> In-Reply-To: <20080319172344.GX67856@elvis.mu.org> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Content-Disposition: inline Message-Id: <200803191526.56761.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH authentication, not delayed by milter-greylist-2.0.2 (server.baldwin.cx [127.0.0.1]); Wed, 19 Mar 2008 15:29:30 -0400 (EDT) X-Virus-Scanned: ClamAV 0.91.2/6305/Wed Mar 19 03:32:53 2008 on server.baldwin.cx X-Virus-Status: Clean X-Spam-Status: No, score=-4.4 required=4.2 tests=ALL_TRUSTED,AWL,BAYES_00 autolearn=ham version=3.1.3 X-Spam-Checker-Version: SpamAssassin 3.1.3 (2006-06-01) on server.baldwin.cx Cc: Daniel Eischen , Alfred Perlstein , David Xu , arch@freebsd.org Subject: Re: Getting rid of the static msleep priority boost X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 19 Mar 2008 19:29:42 -0000 On Wednesday 19 March 2008 01:23:44 pm Alfred Perlstein wrote: > * Jeff Roberson [080319 02:51] wrote: > > On Wed, 19 Mar 2008, David Xu wrote: > > > > >Daniel Eischen wrote: > > > > > >>I'm not sure if any of the above remove the priority from the API, > > >>but it would be nice to get rid of msleep totally and replace it > > >>with an equivalent cv_wait(). > > >> > > > > > >And create sleep queue in each cv to get rid of shared sleep queue > > >lock ? > > > > Some spinlock is required to interlock with the scheduler lock via > > thread_lock(). So I don't think you can get rid of that layer. You also > > wouldn't want to have the cost of a 'struct sleepqueue' everywhere you > > want a msleep/condvar. > > > > I personally don't see any real advantage to using condvar everywhere. > > The only thing you really get is protection against spurious wakeups. > > In theory can't you protect the waitq hung off of condvars with > the mutex/spinlock used for the condvar instead of a global > (hashed) lock on the global waitq? Right now we let people invoke cv_wakeup/signal w/o holding the lock. I actually took the thread queue out of condvar's back when doing the original sleep queue stuff since it is cheaper space wise. Instead of each possible condvar having its own set of queue pointers you just have a set of queue pointers for each thread in the system. Similar to only have a turnstile per thread rather than per lock. -- John Baldwin From owner-freebsd-arch@FreeBSD.ORG Wed Mar 19 19:29:42 2008 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 7841C106566B; Wed, 19 Mar 2008 19:29:42 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from speedfactory.net (mail.speedfactory.net [66.23.216.219]) by mx1.freebsd.org (Postfix) with ESMTP id 602C88FC15; Wed, 19 Mar 2008 19:29:41 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from server.baldwin.cx (unverified [66.23.211.162]) by speedfactory.net (SurgeMail 3.8s) with ESMTP id 236055968-1834499 for multiple; Wed, 19 Mar 2008 15:30:39 -0400 Received: from localhost.corp.yahoo.com (john@localhost [127.0.0.1]) (authenticated bits=0) by server.baldwin.cx (8.14.2/8.14.2) with ESMTP id m2JJTTcq005262; Wed, 19 Mar 2008 15:29:29 -0400 (EDT) (envelope-from jhb@freebsd.org) From: John Baldwin To: freebsd-arch@freebsd.org Date: Wed, 19 Mar 2008 15:26:56 -0400 User-Agent: KMail/1.9.7 References: <20080307020626.G920@desktop> <20080318235125.G910@desktop> <20080319172344.GX67856@elvis.mu.org> In-Reply-To: <20080319172344.GX67856@elvis.mu.org> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Content-Disposition: inline Message-Id: <200803191526.56761.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH authentication, not delayed by milter-greylist-2.0.2 (server.baldwin.cx [127.0.0.1]); Wed, 19 Mar 2008 15:29:30 -0400 (EDT) X-Virus-Scanned: ClamAV 0.91.2/6305/Wed Mar 19 03:32:53 2008 on server.baldwin.cx X-Virus-Status: Clean X-Spam-Status: No, score=-4.4 required=4.2 tests=ALL_TRUSTED,AWL,BAYES_00 autolearn=ham version=3.1.3 X-Spam-Checker-Version: SpamAssassin 3.1.3 (2006-06-01) on server.baldwin.cx Cc: Daniel Eischen , Alfred Perlstein , David Xu , arch@freebsd.org Subject: Re: Getting rid of the static msleep priority boost X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 19 Mar 2008 19:29:42 -0000 On Wednesday 19 March 2008 01:23:44 pm Alfred Perlstein wrote: > * Jeff Roberson [080319 02:51] wrote: > > On Wed, 19 Mar 2008, David Xu wrote: > > > > >Daniel Eischen wrote: > > > > > >>I'm not sure if any of the above remove the priority from the API, > > >>but it would be nice to get rid of msleep totally and replace it > > >>with an equivalent cv_wait(). > > >> > > > > > >And create sleep queue in each cv to get rid of shared sleep queue > > >lock ? > > > > Some spinlock is required to interlock with the scheduler lock via > > thread_lock(). So I don't think you can get rid of that layer. You also > > wouldn't want to have the cost of a 'struct sleepqueue' everywhere you > > want a msleep/condvar. > > > > I personally don't see any real advantage to using condvar everywhere. > > The only thing you really get is protection against spurious wakeups. > > In theory can't you protect the waitq hung off of condvars with > the mutex/spinlock used for the condvar instead of a global > (hashed) lock on the global waitq? Right now we let people invoke cv_wakeup/signal w/o holding the lock. I actually took the thread queue out of condvar's back when doing the original sleep queue stuff since it is cheaper space wise. Instead of each possible condvar having its own set of queue pointers you just have a set of queue pointers for each thread in the system. Similar to only have a turnstile per thread rather than per lock. -- John Baldwin From owner-freebsd-arch@FreeBSD.ORG Wed Mar 19 19:34:46 2008 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id E1FA3106564A; Wed, 19 Mar 2008 19:34:46 +0000 (UTC) (envelope-from bright@elvis.mu.org) Received: from elvis.mu.org (elvis.mu.org [192.203.228.196]) by mx1.freebsd.org (Postfix) with ESMTP id C886B8FC25; Wed, 19 Mar 2008 19:34:46 +0000 (UTC) (envelope-from bright@elvis.mu.org) Received: by elvis.mu.org (Postfix, from userid 1192) id D4CD31A4D7E; Wed, 19 Mar 2008 12:33:22 -0700 (PDT) Date: Wed, 19 Mar 2008 12:33:22 -0700 From: Alfred Perlstein To: John Baldwin Message-ID: <20080319193322.GC67856@elvis.mu.org> References: <20080307020626.G920@desktop> <20080318235125.G910@desktop> <20080319172344.GX67856@elvis.mu.org> <200803191526.56761.jhb@freebsd.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <200803191526.56761.jhb@freebsd.org> User-Agent: Mutt/1.4.2.3i Cc: Daniel Eischen , arch@freebsd.org, David Xu , freebsd-arch@freebsd.org Subject: Re: Getting rid of the static msleep priority boost X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 19 Mar 2008 19:34:47 -0000 * John Baldwin [080319 12:28] wrote: > On Wednesday 19 March 2008 01:23:44 pm Alfred Perlstein wrote: > > * Jeff Roberson [080319 02:51] wrote: > > > On Wed, 19 Mar 2008, David Xu wrote: > > > > > > >Daniel Eischen wrote: > > > > > > > >>I'm not sure if any of the above remove the priority from the API, > > > >>but it would be nice to get rid of msleep totally and replace it > > > >>with an equivalent cv_wait(). > > > >> > > > > > > > >And create sleep queue in each cv to get rid of shared sleep queue > > > >lock ? > > > > > > Some spinlock is required to interlock with the scheduler lock via > > > thread_lock(). So I don't think you can get rid of that layer. You also > > > wouldn't want to have the cost of a 'struct sleepqueue' everywhere you > > > want a msleep/condvar. > > > > > > I personally don't see any real advantage to using condvar everywhere. > > > The only thing you really get is protection against spurious wakeups. > > > > In theory can't you protect the waitq hung off of condvars with > > the mutex/spinlock used for the condvar instead of a global > > (hashed) lock on the global waitq? > > Right now we let people invoke cv_wakeup/signal w/o holding the lock. I > actually took the thread queue out of condvar's back when doing the original > sleep queue stuff since it is cheaper space wise. Instead of each possible > condvar having its own set of queue pointers you just have a set of queue > pointers for each thread in the system. Similar to only have a turnstile per > thread rather than per lock. Ok, thank you, need to think about it. -- - Alfred Perlstein From owner-freebsd-arch@FreeBSD.ORG Wed Mar 19 19:52:02 2008 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id F3755106566B for ; Wed, 19 Mar 2008 19:52:01 +0000 (UTC) (envelope-from bright@elvis.mu.org) Received: from elvis.mu.org (elvis.mu.org [192.203.228.196]) by mx1.freebsd.org (Postfix) with ESMTP id E080F8FC18 for ; Wed, 19 Mar 2008 19:52:01 +0000 (UTC) (envelope-from bright@elvis.mu.org) Received: by elvis.mu.org (Postfix, from userid 1192) id D4CD31A4D7E; Wed, 19 Mar 2008 12:33:22 -0700 (PDT) Date: Wed, 19 Mar 2008 12:33:22 -0700 From: Alfred Perlstein To: John Baldwin Message-ID: <20080319193322.GC67856@elvis.mu.org> References: <20080307020626.G920@desktop> <20080318235125.G910@desktop> <20080319172344.GX67856@elvis.mu.org> <200803191526.56761.jhb@freebsd.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <200803191526.56761.jhb@freebsd.org> User-Agent: Mutt/1.4.2.3i Cc: Daniel Eischen , arch@freebsd.org, David Xu , freebsd-arch@freebsd.org Subject: Re: Getting rid of the static msleep priority boost X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 19 Mar 2008 19:52:02 -0000 * John Baldwin [080319 12:28] wrote: > On Wednesday 19 March 2008 01:23:44 pm Alfred Perlstein wrote: > > * Jeff Roberson [080319 02:51] wrote: > > > On Wed, 19 Mar 2008, David Xu wrote: > > > > > > >Daniel Eischen wrote: > > > > > > > >>I'm not sure if any of the above remove the priority from the API, > > > >>but it would be nice to get rid of msleep totally and replace it > > > >>with an equivalent cv_wait(). > > > >> > > > > > > > >And create sleep queue in each cv to get rid of shared sleep queue > > > >lock ? > > > > > > Some spinlock is required to interlock with the scheduler lock via > > > thread_lock(). So I don't think you can get rid of that layer. You also > > > wouldn't want to have the cost of a 'struct sleepqueue' everywhere you > > > want a msleep/condvar. > > > > > > I personally don't see any real advantage to using condvar everywhere. > > > The only thing you really get is protection against spurious wakeups. > > > > In theory can't you protect the waitq hung off of condvars with > > the mutex/spinlock used for the condvar instead of a global > > (hashed) lock on the global waitq? > > Right now we let people invoke cv_wakeup/signal w/o holding the lock. I > actually took the thread queue out of condvar's back when doing the original > sleep queue stuff since it is cheaper space wise. Instead of each possible > condvar having its own set of queue pointers you just have a set of queue > pointers for each thread in the system. Similar to only have a turnstile per > thread rather than per lock. Ok, thank you, need to think about it. -- - Alfred Perlstein From owner-freebsd-arch@FreeBSD.ORG Thu Mar 20 02:20:07 2008 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id D951D106566B; Thu, 20 Mar 2008 02:20:07 +0000 (UTC) (envelope-from deischen@freebsd.org) Received: from mail.netplex.net (mail.netplex.net [204.213.176.10]) by mx1.freebsd.org (Postfix) with ESMTP id 6F9B38FC13; Thu, 20 Mar 2008 02:20:07 +0000 (UTC) (envelope-from deischen@freebsd.org) Received: from sea.ntplx.net (sea.ntplx.net [204.213.176.11]) by mail.netplex.net (8.14.2/8.14.2/NETPLEX) with ESMTP id m2K2JwO2029965; Wed, 19 Mar 2008 22:19:58 -0400 (EDT) X-Virus-Scanned: by AMaViS and Clam AntiVirus (mail.netplex.net) X-Greylist: Message whitelisted by DRAC access database, not delayed by milter-greylist-4.0 (mail.netplex.net [204.213.176.10]); Wed, 19 Mar 2008 22:19:58 -0400 (EDT) Date: Wed, 19 Mar 2008 22:19:58 -0400 (EDT) From: Daniel Eischen X-X-Sender: eischen@sea.ntplx.net To: John Baldwin In-Reply-To: <200803191526.56761.jhb@freebsd.org> Message-ID: References: <20080307020626.G920@desktop> <20080318235125.G910@desktop> <20080319172344.GX67856@elvis.mu.org> <200803191526.56761.jhb@freebsd.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: arch@freebsd.org, Alfred Perlstein , David Xu , freebsd-arch@freebsd.org Subject: Re: Getting rid of the static msleep priority boost X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: Daniel Eischen List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 20 Mar 2008 02:20:08 -0000 On Wed, 19 Mar 2008, John Baldwin wrote: > On Wednesday 19 March 2008 01:23:44 pm Alfred Perlstein wrote: >> * Jeff Roberson [080319 02:51] wrote: >>> On Wed, 19 Mar 2008, David Xu wrote: >>> >>>> Daniel Eischen wrote: >>>> >>>>> I'm not sure if any of the above remove the priority from the API, >>>>> but it would be nice to get rid of msleep totally and replace it >>>>> with an equivalent cv_wait(). >>>>> >>>> >>>> And create sleep queue in each cv to get rid of shared sleep queue >>>> lock ? >>> >>> Some spinlock is required to interlock with the scheduler lock via >>> thread_lock(). So I don't think you can get rid of that layer. You also >>> wouldn't want to have the cost of a 'struct sleepqueue' everywhere you >>> want a msleep/condvar. >>> >>> I personally don't see any real advantage to using condvar everywhere. >>> The only thing you really get is protection against spurious wakeups. >> >> In theory can't you protect the waitq hung off of condvars with >> the mutex/spinlock used for the condvar instead of a global >> (hashed) lock on the global waitq? > > Right now we let people invoke cv_wakeup/signal w/o holding the lock. I > actually took the thread queue out of condvar's back when doing the original > sleep queue stuff since it is cheaper space wise. Instead of each possible > condvar having its own set of queue pointers you just have a set of queue > pointers for each thread in the system. Similar to only have a turnstile per > thread rather than per lock. In regards to why should we use cv_wait() instead of msleep()'s, the mutex/cv operations are more POSIX/pthreads-like with a simpler set of arguments. Solaris for example does not have msleep() nor anything like that I can see, it uses similar mutex/cv operations as part of its kernel ABI (DDI/DKI). I don't think there is a problem in allowing cv_signal() to be called without holding the lock, but it's nice that cv_wait() requires the lock - this is the same behavior as in pthreads and Solaris primitives. Perhaps there are no performance differences, but the cv/mutex primitives are a nice clean interface that most everyone understands. If you are going to write a professional OS from the ground up, I doubt you are going to have anything as convoluted as msleep() as part of your kernel API/ABI. -- DE From owner-freebsd-arch@FreeBSD.ORG Thu Mar 20 02:31:40 2008 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id E1936106566B; Thu, 20 Mar 2008 02:31:40 +0000 (UTC) (envelope-from jroberson@chesapeake.net) Received: from webaccess-cl.virtdom.com (webaccess-cl.virtdom.com [216.240.101.25]) by mx1.freebsd.org (Postfix) with ESMTP id 91ED88FC14; Thu, 20 Mar 2008 02:31:40 +0000 (UTC) (envelope-from jroberson@chesapeake.net) Received: from [192.168.1.107] (cpe-24-94-75-93.hawaii.res.rr.com [24.94.75.93]) (authenticated bits=0) by webaccess-cl.virtdom.com (8.13.6/8.13.6) with ESMTP id m2K2VVF1028455; Wed, 19 Mar 2008 22:31:36 -0400 (EDT) (envelope-from jroberson@chesapeake.net) Date: Wed, 19 Mar 2008 16:32:28 -1000 (HST) From: Jeff Roberson X-X-Sender: jroberson@desktop To: Daniel Eischen In-Reply-To: Message-ID: <20080319162928.V910@desktop> References: <20080307020626.G920@desktop> <20080318235125.G910@desktop> <20080319172344.GX67856@elvis.mu.org> <200803191526.56761.jhb@freebsd.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: arch@freebsd.org, Alfred Perlstein , David Xu , freebsd-arch@freebsd.org Subject: Re: Getting rid of the static msleep priority boost X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 20 Mar 2008 02:31:41 -0000 On Wed, 19 Mar 2008, Daniel Eischen wrote: > On Wed, 19 Mar 2008, John Baldwin wrote: > >> On Wednesday 19 March 2008 01:23:44 pm Alfred Perlstein wrote: >>> * Jeff Roberson [080319 02:51] wrote: >>>> On Wed, 19 Mar 2008, David Xu wrote: >>>> >>>>> Daniel Eischen wrote: >>>>> >>>>>> I'm not sure if any of the above remove the priority from the API, >>>>>> but it would be nice to get rid of msleep totally and replace it >>>>>> with an equivalent cv_wait(). >>>>>> >>>>> >>>>> And create sleep queue in each cv to get rid of shared sleep queue >>>>> lock ? >>>> >>>> Some spinlock is required to interlock with the scheduler lock via >>>> thread_lock(). So I don't think you can get rid of that layer. You also >>>> wouldn't want to have the cost of a 'struct sleepqueue' everywhere you >>>> want a msleep/condvar. >>>> >>>> I personally don't see any real advantage to using condvar everywhere. >>>> The only thing you really get is protection against spurious wakeups. >>> >>> In theory can't you protect the waitq hung off of condvars with >>> the mutex/spinlock used for the condvar instead of a global >>> (hashed) lock on the global waitq? >> >> Right now we let people invoke cv_wakeup/signal w/o holding the lock. I >> actually took the thread queue out of condvar's back when doing the >> original >> sleep queue stuff since it is cheaper space wise. Instead of each possible >> condvar having its own set of queue pointers you just have a set of queue >> pointers for each thread in the system. Similar to only have a turnstile >> per >> thread rather than per lock. > > In regards to why should we use cv_wait() instead of msleep()'s, the > mutex/cv operations are more POSIX/pthreads-like with a simpler > set of arguments. Solaris for example does not have msleep() nor > anything like that I can see, it uses similar mutex/cv operations > as part of its kernel ABI (DDI/DKI). I don't think there is a problem > in allowing cv_signal() to be called without holding the lock, but > it's nice that cv_wait() requires the lock - this is the same > behavior as in pthreads and Solaris primitives. > > Perhaps there are no performance differences, but the cv/mutex > primitives are a nice clean interface that most everyone > understands. If you are going to write a professional OS from > the ground up, I doubt you are going to have anything as convoluted > as msleep() as part of your kernel API/ABI. One real obstacle to converting all locations to cv_* is the lack of support for anything other than mtx def mutexes in the cv api. It also just doesn't seem like a good use of developer resources regardless of how you feel about msleep. Also, in regards to interlocking with the user supplied lock; the lock that we interlock with for scheduling purposes must be a spinlock. Therefore we can't use just any user supplied lock to protect the sleepq chain. Thanks, Jeff > > -- > DE > From owner-freebsd-arch@FreeBSD.ORG Thu Mar 20 02:31:40 2008 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id E1936106566B; Thu, 20 Mar 2008 02:31:40 +0000 (UTC) (envelope-from jroberson@chesapeake.net) Received: from webaccess-cl.virtdom.com (webaccess-cl.virtdom.com [216.240.101.25]) by mx1.freebsd.org (Postfix) with ESMTP id 91ED88FC14; Thu, 20 Mar 2008 02:31:40 +0000 (UTC) (envelope-from jroberson@chesapeake.net) Received: from [192.168.1.107] (cpe-24-94-75-93.hawaii.res.rr.com [24.94.75.93]) (authenticated bits=0) by webaccess-cl.virtdom.com (8.13.6/8.13.6) with ESMTP id m2K2VVF1028455; Wed, 19 Mar 2008 22:31:36 -0400 (EDT) (envelope-from jroberson@chesapeake.net) Date: Wed, 19 Mar 2008 16:32:28 -1000 (HST) From: Jeff Roberson X-X-Sender: jroberson@desktop To: Daniel Eischen In-Reply-To: Message-ID: <20080319162928.V910@desktop> References: <20080307020626.G920@desktop> <20080318235125.G910@desktop> <20080319172344.GX67856@elvis.mu.org> <200803191526.56761.jhb@freebsd.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: arch@freebsd.org, Alfred Perlstein , David Xu , freebsd-arch@freebsd.org Subject: Re: Getting rid of the static msleep priority boost X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 20 Mar 2008 02:31:41 -0000 On Wed, 19 Mar 2008, Daniel Eischen wrote: > On Wed, 19 Mar 2008, John Baldwin wrote: > >> On Wednesday 19 March 2008 01:23:44 pm Alfred Perlstein wrote: >>> * Jeff Roberson [080319 02:51] wrote: >>>> On Wed, 19 Mar 2008, David Xu wrote: >>>> >>>>> Daniel Eischen wrote: >>>>> >>>>>> I'm not sure if any of the above remove the priority from the API, >>>>>> but it would be nice to get rid of msleep totally and replace it >>>>>> with an equivalent cv_wait(). >>>>>> >>>>> >>>>> And create sleep queue in each cv to get rid of shared sleep queue >>>>> lock ? >>>> >>>> Some spinlock is required to interlock with the scheduler lock via >>>> thread_lock(). So I don't think you can get rid of that layer. You also >>>> wouldn't want to have the cost of a 'struct sleepqueue' everywhere you >>>> want a msleep/condvar. >>>> >>>> I personally don't see any real advantage to using condvar everywhere. >>>> The only thing you really get is protection against spurious wakeups. >>> >>> In theory can't you protect the waitq hung off of condvars with >>> the mutex/spinlock used for the condvar instead of a global >>> (hashed) lock on the global waitq? >> >> Right now we let people invoke cv_wakeup/signal w/o holding the lock. I >> actually took the thread queue out of condvar's back when doing the >> original >> sleep queue stuff since it is cheaper space wise. Instead of each possible >> condvar having its own set of queue pointers you just have a set of queue >> pointers for each thread in the system. Similar to only have a turnstile >> per >> thread rather than per lock. > > In regards to why should we use cv_wait() instead of msleep()'s, the > mutex/cv operations are more POSIX/pthreads-like with a simpler > set of arguments. Solaris for example does not have msleep() nor > anything like that I can see, it uses similar mutex/cv operations > as part of its kernel ABI (DDI/DKI). I don't think there is a problem > in allowing cv_signal() to be called without holding the lock, but > it's nice that cv_wait() requires the lock - this is the same > behavior as in pthreads and Solaris primitives. > > Perhaps there are no performance differences, but the cv/mutex > primitives are a nice clean interface that most everyone > understands. If you are going to write a professional OS from > the ground up, I doubt you are going to have anything as convoluted > as msleep() as part of your kernel API/ABI. One real obstacle to converting all locations to cv_* is the lack of support for anything other than mtx def mutexes in the cv api. It also just doesn't seem like a good use of developer resources regardless of how you feel about msleep. Also, in regards to interlocking with the user supplied lock; the lock that we interlock with for scheduling purposes must be a spinlock. Therefore we can't use just any user supplied lock to protect the sleepq chain. Thanks, Jeff > > -- > DE > From owner-freebsd-arch@FreeBSD.ORG Thu Mar 20 02:47:33 2008 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 8C96A106566C for ; Thu, 20 Mar 2008 02:47:33 +0000 (UTC) (envelope-from deischen@freebsd.org) Received: from mail.netplex.net (mail.netplex.net [204.213.176.10]) by mx1.freebsd.org (Postfix) with ESMTP id 2451A8FC1E for ; Thu, 20 Mar 2008 02:47:32 +0000 (UTC) (envelope-from deischen@freebsd.org) Received: from sea.ntplx.net (sea.ntplx.net [204.213.176.11]) by mail.netplex.net (8.14.2/8.14.2/NETPLEX) with ESMTP id m2K2JwO2029965; Wed, 19 Mar 2008 22:19:58 -0400 (EDT) X-Virus-Scanned: by AMaViS and Clam AntiVirus (mail.netplex.net) X-Greylist: Message whitelisted by DRAC access database, not delayed by milter-greylist-4.0 (mail.netplex.net [204.213.176.10]); Wed, 19 Mar 2008 22:19:58 -0400 (EDT) Date: Wed, 19 Mar 2008 22:19:58 -0400 (EDT) From: Daniel Eischen X-X-Sender: eischen@sea.ntplx.net To: John Baldwin In-Reply-To: <200803191526.56761.jhb@freebsd.org> Message-ID: References: <20080307020626.G920@desktop> <20080318235125.G910@desktop> <20080319172344.GX67856@elvis.mu.org> <200803191526.56761.jhb@freebsd.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: arch@freebsd.org, Alfred Perlstein , David Xu , freebsd-arch@freebsd.org Subject: Re: Getting rid of the static msleep priority boost X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: Daniel Eischen List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 20 Mar 2008 02:47:33 -0000 On Wed, 19 Mar 2008, John Baldwin wrote: > On Wednesday 19 March 2008 01:23:44 pm Alfred Perlstein wrote: >> * Jeff Roberson [080319 02:51] wrote: >>> On Wed, 19 Mar 2008, David Xu wrote: >>> >>>> Daniel Eischen wrote: >>>> >>>>> I'm not sure if any of the above remove the priority from the API, >>>>> but it would be nice to get rid of msleep totally and replace it >>>>> with an equivalent cv_wait(). >>>>> >>>> >>>> And create sleep queue in each cv to get rid of shared sleep queue >>>> lock ? >>> >>> Some spinlock is required to interlock with the scheduler lock via >>> thread_lock(). So I don't think you can get rid of that layer. You also >>> wouldn't want to have the cost of a 'struct sleepqueue' everywhere you >>> want a msleep/condvar. >>> >>> I personally don't see any real advantage to using condvar everywhere. >>> The only thing you really get is protection against spurious wakeups. >> >> In theory can't you protect the waitq hung off of condvars with >> the mutex/spinlock used for the condvar instead of a global >> (hashed) lock on the global waitq? > > Right now we let people invoke cv_wakeup/signal w/o holding the lock. I > actually took the thread queue out of condvar's back when doing the original > sleep queue stuff since it is cheaper space wise. Instead of each possible > condvar having its own set of queue pointers you just have a set of queue > pointers for each thread in the system. Similar to only have a turnstile per > thread rather than per lock. In regards to why should we use cv_wait() instead of msleep()'s, the mutex/cv operations are more POSIX/pthreads-like with a simpler set of arguments. Solaris for example does not have msleep() nor anything like that I can see, it uses similar mutex/cv operations as part of its kernel ABI (DDI/DKI). I don't think there is a problem in allowing cv_signal() to be called without holding the lock, but it's nice that cv_wait() requires the lock - this is the same behavior as in pthreads and Solaris primitives. Perhaps there are no performance differences, but the cv/mutex primitives are a nice clean interface that most everyone understands. If you are going to write a professional OS from the ground up, I doubt you are going to have anything as convoluted as msleep() as part of your kernel API/ABI. -- DE From owner-freebsd-arch@FreeBSD.ORG Thu Mar 20 05:41:32 2008 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id E6A581065672 for ; Thu, 20 Mar 2008 05:41:32 +0000 (UTC) (envelope-from joseph.koshy@gmail.com) Received: from rv-out-0910.google.com (rv-out-0910.google.com [209.85.198.186]) by mx1.freebsd.org (Postfix) with ESMTP id B18388FC25 for ; Thu, 20 Mar 2008 05:41:32 +0000 (UTC) (envelope-from joseph.koshy@gmail.com) Received: by rv-out-0910.google.com with SMTP id g13so472678rvb.43 for ; Wed, 19 Mar 2008 22:41:32 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=beta; h=domainkey-signature:received:received:message-id:date:from:to:subject:cc:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references; bh=8Mb7ZGnUoEtD6fufNouRpNAn8d+2hskPHvv/0OFzFcY=; b=Q8PQR7L0cvvGZ8nIIQEqKxF/Mr2KM0uJxrMZLQPmizsUsisjewPFnDaeITD/jxdLXUNMi9npcfxU+bp8+TtCM2C6JxdfCrSSqlLunCegRVvZA5bhw95lmRe+zrILU3u6nz0O9Cn+pa2XB1QACFdm59SFpnFBpmofve0NOfIhDRU= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=beta; h=message-id:date:from:to:subject:cc:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references; b=eCKdJ9C+zv5KNQGYkXMz5BBJG2Mf3+nx9SF5LfXsIXtCC+mQTBqwwYAsJrNyNhN5TvW8NNqGWrUVVVqpuliHDjFvoe/RTCl3AIxY8vqsGRKu0IiRucfuXiLwhXll6Ltz3AZkr3x84Z6x3o3k7cF0726cP+GEw4UdLbl57GDbPG8= Received: by 10.141.20.7 with SMTP id x7mr565398rvi.255.1205991692113; Wed, 19 Mar 2008 22:41:32 -0700 (PDT) Received: by 10.141.85.7 with HTTP; Wed, 19 Mar 2008 22:41:32 -0700 (PDT) Message-ID: <84dead720803192241x1b8ee4c5y65cea8dcca79530f@mail.gmail.com> Date: Thu, 20 Mar 2008 11:11:32 +0530 From: "Joseph Koshy" To: "John Baldwin" In-Reply-To: <200803170947.25205.jhb@freebsd.org> MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Content-Disposition: inline References: <20080313180805.GA83406@dragon.NUXI.org> <200803141431.53846.jhb@freebsd.org> <84dead720803142243r6c8cc68dm325e7fb925189fd@mail.gmail.com> <200803170947.25205.jhb@freebsd.org> Cc: freebsd-arch@freebsd.org Subject: Re: [PATCH] hwpmc(4) changes to use 'mp_maxid' instead of 'mp_ncpus'. X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 20 Mar 2008 05:41:33 -0000 jk> HWPMC is very x86 centric, for obvious reasons. jhb> Considering other CPU archictectures support various performance counters it jhb> really shouldn't be designed to be x86-centric even if it is currently only jhb> implemented for x86 CPUs. Of course. It isn't DESIGNED as x86-centric---I surveyed a number of non-x86 PMC implementations when designing the MI/MD interface inside of hwpmc(4) and when designing the end-user programming model. The "obviousness" of HWPMC's current x86-centricity arises from the fact that only x86 systems are affordable (or available even) for a hobbyist in my part of the world. > Userland cycles are "cheaper". :) Not so, they cost the same as kernel cycles in the final analysis :). > I think having both is fine and userland can choose which to use > (maxcpus is probably easier to impl but perhaps less efficient). Ok. jk> Looking around, there appear to be lots of nits that need correction. jk> For one, the kern.smp sysctl hierarchy is undocumented. jhb> Not entirely: jhb> sysctl -d kern.smp jhb> kern.smp: Kernel SMP jhb> kern.smp.maxcpus: Max number of CPUs that the system was compiled for. I stand (partially) corrected :). Koshy From owner-freebsd-arch@FreeBSD.ORG Thu Mar 20 06:08:36 2008 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id D68061065675 for ; Thu, 20 Mar 2008 06:08:36 +0000 (UTC) (envelope-from joseph.koshy@gmail.com) Received: from rv-out-0910.google.com (rv-out-0910.google.com [209.85.198.189]) by mx1.freebsd.org (Postfix) with ESMTP id A2FD28FC22 for ; Thu, 20 Mar 2008 06:08:36 +0000 (UTC) (envelope-from joseph.koshy@gmail.com) Received: by rv-out-0910.google.com with SMTP id g13so478524rvb.43 for ; Wed, 19 Mar 2008 23:08:35 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=beta; h=domainkey-signature:received:received:message-id:date:from:to:subject:cc:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references; bh=RRozQJ7SF+k7vEKt+CL/cWVzMWQ4yHEHZA+2AJhqwPY=; b=t6K7JW9zE/9UfI5eWtrr2/7aIe+gVPUE+Abc8NdnTt118baGBpxRQ5EZgWNtFnnDcIYom6/vpORJonk6QhAjyxFYqQIikekoxh3wNs91IelEIMOG0w8yne7n5vqH9Y4VhziB0jsXmMP4ILU03vtwU4dcstFinR5P/l3YbZaUtYc= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=beta; h=message-id:date:from:to:subject:cc:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references; b=FCD8s62ouBHEdODvYMXLkyoLfHFyDID5YUNbodu7MNdDWLtimLxNbOdEeys6QA+U+KAr46qWmc7Q1dUGjwIsKsdZ2zv9XDaSHFWtUjQK6wX6slMojKaOg+uRh6ohhmZAzbV1a5H00+gsBSoUXQZCeXS/7wJRZVX/O0wtRp2upiY= Received: by 10.140.147.13 with SMTP id u13mr570013rvd.228.1205993315505; Wed, 19 Mar 2008 23:08:35 -0700 (PDT) Received: by 10.141.85.7 with HTTP; Wed, 19 Mar 2008 23:08:35 -0700 (PDT) Message-ID: <84dead720803192308l59b2fd02qa2a05729f8fe494e@mail.gmail.com> Date: Thu, 20 Mar 2008 11:38:35 +0530 From: "Joseph Koshy" To: "Brooks Davis" In-Reply-To: <20080317144251.GA38485@lor.one-eyed-alien.net> MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Content-Disposition: inline References: <20080313180805.GA83406@dragon.NUXI.org> <200803141431.53846.jhb@freebsd.org> <84dead720803142243r6c8cc68dm325e7fb925189fd@mail.gmail.com> <200803170947.25205.jhb@freebsd.org> <20080317144251.GA38485@lor.one-eyed-alien.net> Cc: freebsd-arch@freebsd.org Subject: Re: [PATCH] hwpmc(4) changes to use 'mp_maxid' instead of 'mp_ncpus'. X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 20 Mar 2008 06:08:36 -0000 > We should take some care to make sure we don't over generalize. hwpmc(4) today handles Intel P4 counters; those are moderately complex in terms of architecture. > If nothing else, these counters could be even more useful on CPU-poor embedded > devices. Conventional PMCs should be supportable without much ado. Koshy From owner-freebsd-arch@FreeBSD.ORG Thu Mar 20 09:45:38 2008 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 697FF1065674; Thu, 20 Mar 2008 09:45:38 +0000 (UTC) (envelope-from rwatson@FreeBSD.org) Received: from cyrus.watson.org (cyrus.watson.org [209.31.154.42]) by mx1.freebsd.org (Postfix) with ESMTP id 337E58FC15; Thu, 20 Mar 2008 09:45:38 +0000 (UTC) (envelope-from rwatson@FreeBSD.org) Received: from fledge.watson.org (fledge.watson.org [209.31.154.41]) by cyrus.watson.org (Postfix) with ESMTP id 9B8EE46BAB; Thu, 20 Mar 2008 05:45:37 -0400 (EDT) Date: Thu, 20 Mar 2008 09:45:37 +0000 (GMT) From: Robert Watson X-X-Sender: robert@fledge.watson.org To: Jeff Roberson In-Reply-To: <20080319162928.V910@desktop> Message-ID: <20080320094335.R25104@fledge.watson.org> References: <20080307020626.G920@desktop> <20080318235125.G910@desktop> <20080319172344.GX67856@elvis.mu.org> <200803191526.56761.jhb@freebsd.org> <20080319162928.V910@desktop> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: Daniel Eischen , arch@freebsd.org, Alfred Perlstein , David Xu , freebsd-arch@freebsd.org Subject: Re: Getting rid of the static msleep priority boost X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 20 Mar 2008 09:45:38 -0000 On Wed, 19 Mar 2008, Jeff Roberson wrote: >> Perhaps there are no performance differences, but the cv/mutex primitives >> are a nice clean interface that most everyone understands. If you are >> going to write a professional OS from the ground up, I doubt you are going >> to have anything as convoluted as msleep() as part of your kernel API/ABI. > > One real obstacle to converting all locations to cv_* is the lack of support > for anything other than mtx def mutexes in the cv api. It also just doesn't > seem like a good use of developer resources regardless of how you feel about > msleep. I thought condvar was converted in 7.x to accepting a struct lock for precisely this reason? I assume (perhaps incorrectly) that it can't be used with spin mutexes, but thought, as a result, that we could now use it with other lock types, such as sx locks? Robert N M Watson Computer Laboratory University of Cambridge From owner-freebsd-arch@FreeBSD.ORG Thu Mar 20 09:45:38 2008 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 697FF1065674; Thu, 20 Mar 2008 09:45:38 +0000 (UTC) (envelope-from rwatson@FreeBSD.org) Received: from cyrus.watson.org (cyrus.watson.org [209.31.154.42]) by mx1.freebsd.org (Postfix) with ESMTP id 337E58FC15; Thu, 20 Mar 2008 09:45:38 +0000 (UTC) (envelope-from rwatson@FreeBSD.org) Received: from fledge.watson.org (fledge.watson.org [209.31.154.41]) by cyrus.watson.org (Postfix) with ESMTP id 9B8EE46BAB; Thu, 20 Mar 2008 05:45:37 -0400 (EDT) Date: Thu, 20 Mar 2008 09:45:37 +0000 (GMT) From: Robert Watson X-X-Sender: robert@fledge.watson.org To: Jeff Roberson In-Reply-To: <20080319162928.V910@desktop> Message-ID: <20080320094335.R25104@fledge.watson.org> References: <20080307020626.G920@desktop> <20080318235125.G910@desktop> <20080319172344.GX67856@elvis.mu.org> <200803191526.56761.jhb@freebsd.org> <20080319162928.V910@desktop> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: Daniel Eischen , arch@freebsd.org, Alfred Perlstein , David Xu , freebsd-arch@freebsd.org Subject: Re: Getting rid of the static msleep priority boost X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 20 Mar 2008 09:45:38 -0000 On Wed, 19 Mar 2008, Jeff Roberson wrote: >> Perhaps there are no performance differences, but the cv/mutex primitives >> are a nice clean interface that most everyone understands. If you are >> going to write a professional OS from the ground up, I doubt you are going >> to have anything as convoluted as msleep() as part of your kernel API/ABI. > > One real obstacle to converting all locations to cv_* is the lack of support > for anything other than mtx def mutexes in the cv api. It also just doesn't > seem like a good use of developer resources regardless of how you feel about > msleep. I thought condvar was converted in 7.x to accepting a struct lock for precisely this reason? I assume (perhaps incorrectly) that it can't be used with spin mutexes, but thought, as a result, that we could now use it with other lock types, such as sx locks? Robert N M Watson Computer Laboratory University of Cambridge From owner-freebsd-arch@FreeBSD.ORG Thu Mar 20 09:53:17 2008 Return-Path: Delivered-To: freebsd-arch@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id ECEF4106566B; Thu, 20 Mar 2008 09:53:17 +0000 (UTC) (envelope-from jroberson@chesapeake.net) Received: from webaccess-cl.virtdom.com (webaccess-cl.virtdom.com [216.240.101.25]) by mx1.freebsd.org (Postfix) with ESMTP id BABA88FC23; Thu, 20 Mar 2008 09:53:17 +0000 (UTC) (envelope-from jroberson@chesapeake.net) Received: from [192.168.1.107] (cpe-24-94-75-93.hawaii.res.rr.com [24.94.75.93]) (authenticated bits=0) by webaccess-cl.virtdom.com (8.13.6/8.13.6) with ESMTP id m2K9rEdv069832; Thu, 20 Mar 2008 05:53:15 -0400 (EDT) (envelope-from jroberson@chesapeake.net) Date: Wed, 19 Mar 2008 23:54:12 -1000 (HST) From: Jeff Roberson X-X-Sender: jroberson@desktop To: Robert Watson In-Reply-To: <20080320094335.R25104@fledge.watson.org> Message-ID: <20080319235358.Y910@desktop> References: <20080307020626.G920@desktop> <20080318235125.G910@desktop> <20080319172344.GX67856@elvis.mu.org> <200803191526.56761.jhb@freebsd.org> <20080319162928.V910@desktop> <20080320094335.R25104@fledge.watson.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: Daniel Eischen , arch@FreeBSD.org, Alfred Perlstein , David Xu , freebsd-arch@FreeBSD.org Subject: Re: Getting rid of the static msleep priority boost X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 20 Mar 2008 09:53:18 -0000 On Thu, 20 Mar 2008, Robert Watson wrote: > > On Wed, 19 Mar 2008, Jeff Roberson wrote: > >>> Perhaps there are no performance differences, but the cv/mutex primitives >>> are a nice clean interface that most everyone understands. If you are >>> going to write a professional OS from the ground up, I doubt you are going >>> to have anything as convoluted as msleep() as part of your kernel API/ABI. >> >> One real obstacle to converting all locations to cv_* is the lack of >> support for anything other than mtx def mutexes in the cv api. It also >> just doesn't seem like a good use of developer resources regardless of how >> you feel about msleep. > > I thought condvar was converted in 7.x to accepting a struct lock for > precisely this reason? I assume (perhaps incorrectly) that it can't be used > with spin mutexes, but thought, as a result, that we could now use it with > other lock types, such as sx locks? You are right. John did it at the same time. Good on em. > > Robert N M Watson > Computer Laboratory > University of Cambridge > From owner-freebsd-arch@FreeBSD.ORG Thu Mar 20 09:53:17 2008 Return-Path: Delivered-To: arch@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id ECEF4106566B; Thu, 20 Mar 2008 09:53:17 +0000 (UTC) (envelope-from jroberson@chesapeake.net) Received: from webaccess-cl.virtdom.com (webaccess-cl.virtdom.com [216.240.101.25]) by mx1.freebsd.org (Postfix) with ESMTP id BABA88FC23; Thu, 20 Mar 2008 09:53:17 +0000 (UTC) (envelope-from jroberson@chesapeake.net) Received: from [192.168.1.107] (cpe-24-94-75-93.hawaii.res.rr.com [24.94.75.93]) (authenticated bits=0) by webaccess-cl.virtdom.com (8.13.6/8.13.6) with ESMTP id m2K9rEdv069832; Thu, 20 Mar 2008 05:53:15 -0400 (EDT) (envelope-from jroberson@chesapeake.net) Date: Wed, 19 Mar 2008 23:54:12 -1000 (HST) From: Jeff Roberson X-X-Sender: jroberson@desktop To: Robert Watson In-Reply-To: <20080320094335.R25104@fledge.watson.org> Message-ID: <20080319235358.Y910@desktop> References: <20080307020626.G920@desktop> <20080318235125.G910@desktop> <20080319172344.GX67856@elvis.mu.org> <200803191526.56761.jhb@freebsd.org> <20080319162928.V910@desktop> <20080320094335.R25104@fledge.watson.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: Daniel Eischen , arch@FreeBSD.org, Alfred Perlstein , David Xu , freebsd-arch@FreeBSD.org Subject: Re: Getting rid of the static msleep priority boost X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 20 Mar 2008 09:53:18 -0000 On Thu, 20 Mar 2008, Robert Watson wrote: > > On Wed, 19 Mar 2008, Jeff Roberson wrote: > >>> Perhaps there are no performance differences, but the cv/mutex primitives >>> are a nice clean interface that most everyone understands. If you are >>> going to write a professional OS from the ground up, I doubt you are going >>> to have anything as convoluted as msleep() as part of your kernel API/ABI. >> >> One real obstacle to converting all locations to cv_* is the lack of >> support for anything other than mtx def mutexes in the cv api. It also >> just doesn't seem like a good use of developer resources regardless of how >> you feel about msleep. > > I thought condvar was converted in 7.x to accepting a struct lock for > precisely this reason? I assume (perhaps incorrectly) that it can't be used > with spin mutexes, but thought, as a result, that we could now use it with > other lock types, such as sx locks? You are right. John did it at the same time. Good on em. > > Robert N M Watson > Computer Laboratory > University of Cambridge > From owner-freebsd-arch@FreeBSD.ORG Sat Mar 22 02:54:08 2008 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 00D87106566C for ; Sat, 22 Mar 2008 02:54:08 +0000 (UTC) (envelope-from gnn@neville-neil.com) Received: from proxy.meer.net (proxy.meer.net [64.13.141.13]) by mx1.freebsd.org (Postfix) with ESMTP id CD6AD8FC16 for ; Sat, 22 Mar 2008 02:54:07 +0000 (UTC) (envelope-from gnn@neville-neil.com) Received: from outbound0.mx.meer.net (outbound0.mx.meer.net [209.157.153.23]) by proxy.meer.net (8.14.2/8.14.2) with ESMTP id m2M0P2nL072803; Fri, 21 Mar 2008 17:25:02 -0700 (PDT) (envelope-from gnn@neville-neil.com) Received: from mail.meer.net (mail.meer.net [209.157.152.14]) by outbound0.mx.meer.net (8.12.10/8.12.6) with ESMTP id m2M0AGi9078374; Fri, 21 Mar 2008 16:10:22 -0800 (PST) (envelope-from gnn@neville-neil.com) Received: from mail2.meer.net (mail2.meer.net [64.13.141.16]) by mail.meer.net (8.13.3/8.13.3/meer) with ESMTP id m2M09cQg077660; Fri, 21 Mar 2008 17:09:38 -0700 (PDT) (envelope-from gnn@neville-neil.com) Received: from minion.local.neville-neil.com (61.204.211.246.customerlink.pwd.ne.jp [61.204.211.246]) (authenticated bits=0) by mail2.meer.net (8.14.1/8.14.1) with ESMTP id m2M09bhl037110; Fri, 21 Mar 2008 17:09:38 -0700 (PDT) (envelope-from gnn@neville-neil.com) Date: Sat, 22 Mar 2008 09:09:37 +0900 Message-ID: From: gnn@freebsd.org To: Robert Watson In-Reply-To: <20080318182358.F34016@fledge.watson.org> References: <5753.1205785282@critter.freebsd.dk> <20080318182358.F34016@fledge.watson.org> User-Agent: Wanderlust/2.15.5 (Almost Unreal) SEMI/1.14.6 (Maruoka) FLIM/1.14.8 (=?ISO-8859-4?Q?Shij=F2?=) APEL/10.7 Emacs/22.1.50 (i386-apple-darwin8.10.1) MULE/5.0 (SAKAKI) MIME-Version: 1.0 (generated by SEMI 1.14.6 - "Maruoka") Content-Type: text/plain; charset=US-ASCII X-Bayes-Prob: 0.5 (Score 0) X-Spam-Score: 0.70 () [Tag at 5.00] COMBINED_FROM,NO_REAL_NAME X-CanItPRO-Stream: default X-Canit-Stats-ID: 60370 - 853375b3dfd4 X-Scanned-By: CanIt (www . roaringpenguin . com) on 64.13.141.13 Cc: arch@freebsd.org, Poul-Henning Kamp Subject: Re: Power-Mgt (Was: Re: cvs commit: src/sys/i386/cpufreq est.c ) X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 22 Mar 2008 02:54:08 -0000 At Tue, 18 Mar 2008 18:26:11 +0000 (GMT), rwatson wrote: > > > I know we've talked about this, but I'll mention it for the benefits > of the mailing list: one of the things that makes performance an > "easy" target is that there are easy-to-gather metrics. Those > metrics may require knowledge of statistics and a lifetime of > experience to interpret correctly, but they are still numbers that > are easily generated and compared. To drive work in power > management, we would benefit from having similarly accessible > metrics. Are there any decent documents describing how to do power > use measurement, and are there any (relatively) accessible tools for > doing it with? For example, on notebooks, can we sample an ACPI > value before/after a benchmark, or do we really need to hook > something up to the power supply in order to get a useful number? > Queue did a series on this a while back, some decent articles and ideas in there: http://www.acmqueue.org/modules.php?name=Content&pa=list_pages_issues&issue_id=46 http://www.acmqueue.org/modules.php?name=Content&pa=showpage&pid=513 Later, George From owner-freebsd-arch@FreeBSD.ORG Sat Mar 22 11:12:54 2008 Return-Path: Delivered-To: arch@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id A5CD21065677 for ; Sat, 22 Mar 2008 11:12:54 +0000 (UTC) (envelope-from rermilov@team.vega.ru) Received: from mail.vega.ru (infra.dev.vega.ru [90.156.167.14]) by mx1.freebsd.org (Postfix) with ESMTP id 5EA848FC15 for ; Sat, 22 Mar 2008 11:12:53 +0000 (UTC) (envelope-from rermilov@team.vega.ru) Received: from [87.242.97.68] (port=55671 helo=edoofus.dev.vega.ru) by mail.vega.ru with esmtpsa (TLSv1:AES256-SHA:256) (Exim 4.68 (FreeBSD)) (envelope-from ) id 1Jd1KI-0001Qu-Mj for arch@FreeBSD.org; Sat, 22 Mar 2008 10:51:50 +0000 Received: from edoofus.dev.vega.ru (localhost [127.0.0.1]) by edoofus.dev.vega.ru (8.14.2/8.14.2) with ESMTP id m2MApjEB041826 for ; Sat, 22 Mar 2008 13:51:45 +0300 (MSK) (envelope-from rermilov@team.vega.ru) Received: (from ru@localhost) by edoofus.dev.vega.ru (8.14.2/8.14.2/Submit) id m2MApjDr041825 for arch@FreeBSD.org; Sat, 22 Mar 2008 13:51:45 +0300 (MSK) (envelope-from rermilov@team.vega.ru) X-Authentication-Warning: edoofus.dev.vega.ru: ru set sender to rermilov@team.vega.ru using -f Date: Sat, 22 Mar 2008 13:51:45 +0300 From: Ruslan Ermilov To: arch@FreeBSD.org Message-ID: <20080322105145.GA41672@team.vega.ru> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.5.17 (2007-11-01) Cc: Subject: Disposal of a misleading M_TRYWAIT X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 22 Mar 2008 11:12:54 -0000 I'd like to remove the misleading uses of M_TRYWAIT throughout the tree and clean up some dead code that assumes its original behavior (that it could return NULL). Since the advent of MBUMA in FreeBSD (whatever), M_TRYWAIT has meant M_WAITOK. (The reason for M_TRYWAIT itself was that an original mbuf's M_WAIT could return NULL.) There is little or no sign that this will change, and there are lots of consumers that already pass M_WAITOK to mbuf allocator routines and rely on its invariants, so support for the concept of M_TRYWAIT has rotted and would have to be re-written anyway if reintroduced. http://people.freebsd.org/~ru/patches/M_TRYWAIT.patch Cheers, -- Ruslan Ermilov ru@FreeBSD.org FreeBSD committer From owner-freebsd-arch@FreeBSD.ORG Sat Mar 22 13:57:39 2008 Return-Path: Delivered-To: arch@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id D7CA5106566B; Sat, 22 Mar 2008 13:57:39 +0000 (UTC) (envelope-from rwatson@FreeBSD.org) Received: from cyrus.watson.org (cyrus.watson.org [209.31.154.42]) by mx1.freebsd.org (Postfix) with ESMTP id B94018FC1A; Sat, 22 Mar 2008 13:57:39 +0000 (UTC) (envelope-from rwatson@FreeBSD.org) Received: from fledge.watson.org (fledge.watson.org [209.31.154.41]) by cyrus.watson.org (Postfix) with ESMTP id F162146B4C; Sat, 22 Mar 2008 09:57:38 -0400 (EDT) Date: Sat, 22 Mar 2008 13:57:38 +0000 (GMT) From: Robert Watson X-X-Sender: robert@fledge.watson.org To: Ruslan Ermilov In-Reply-To: <20080322105145.GA41672@team.vega.ru> Message-ID: <20080322135637.Y6961@fledge.watson.org> References: <20080322105145.GA41672@team.vega.ru> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: arch@FreeBSD.org Subject: Re: Disposal of a misleading M_TRYWAIT X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 22 Mar 2008 13:57:39 -0000 On Sat, 22 Mar 2008, Ruslan Ermilov wrote: > I'd like to remove the misleading uses of M_TRYWAIT throughout the tree and > clean up some dead code that assumes its original behavior (that it could > return NULL). > > Since the advent of MBUMA in FreeBSD (whatever), M_TRYWAIT has meant > M_WAITOK. (The reason for M_TRYWAIT itself was that an original mbuf's > M_WAIT could return NULL.) > > There is little or no sign that this will change, and there are lots of > consumers that already pass M_WAITOK to mbuf allocator routines and rely on > its invariants, so support for the concept of M_TRYWAIT has rotted and would > have to be re-written anyway if reintroduced. > > http://people.freebsd.org/~ru/patches/M_TRYWAIT.patch This seems reasonable to me for exactly the reasons you stte. We might simultaneously want to complete the M_DONTWAIT -> M_NOWAIT conversion. And you can then remove the XXX comment in mbuf.h about phasing out M_TRYWAIT and M_DONTWAIT. :-) Robert N M Watson Computer Laboratory University of Cambridge