From owner-freebsd-arch@FreeBSD.ORG Sun Jun 14 13:01:44 2009 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 9D4211065689; Sun, 14 Jun 2009 13:01:44 +0000 (UTC) (envelope-from kris@FreeBSD.org) Received: from kennaway-macbookpro.config (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id C842B8FC0A; Sun, 14 Jun 2009 13:01:43 +0000 (UTC) (envelope-from kris@FreeBSD.org) Message-ID: <4A34F4B7.5050904@FreeBSD.org> Date: Sun, 14 Jun 2009 14:01:43 +0100 From: Kris Kennaway User-Agent: Thunderbird 2.0.0.21 (Macintosh/20090302) MIME-Version: 1.0 To: Attilio Rao References: <3bbf2fe10906081342i6ef418e0n75e22d0b9e2543b3@mail.gmail.com> In-Reply-To: <3bbf2fe10906081342i6ef418e0n75e22d0b9e2543b3@mail.gmail.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Cc: freebsd-smp@freebsd.org, freebsd-arch@freebsd.org Subject: Re: [PATCH] Adaptive spinning for lockmgr X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 14 Jun 2009 13:01:45 -0000 Attilio Rao wrote: > This patch enables adaptive spinning for lockmgr: > http://www.freebsd.org/~attilio/adaptive_lockmgr.diff > > and it should presumably improve performance on disks/vfs/buffer cache > based benchmarks, so, if you want to try out and report any benchmarks > result, I'd love to see it. > Please note that there are some parameters to tune: for example, you > would like to not enable adaptive spinning to default while you just > want that for a class of locks (and in that case you want to apply the > reversed logic for what is living now) or you want to use different > values for retries and loops. Interested developers can refer to such > 3 variables. > Peter Holm alredy tested that patch for about 24hours without any > regression to report. > > Also note that the patch is not 100% yet as long as it needs UPDATES > and manpages updates, but they will be added just in time before to > commit. > The modify is all there. I have a vague memory that we had tested a version of this in the past and found that it caused a performance loss in common cases? Many lockmgr callers are not amenable to adaptive spinning because they have to wait on slow I/O. Testing only with e.g. md backing might give results that are non-representative. Kris From owner-freebsd-arch@FreeBSD.ORG Sun Jun 14 14:23:13 2009 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 12511106566B; Sun, 14 Jun 2009 14:23:13 +0000 (UTC) (envelope-from asmrookie@gmail.com) Received: from mail-fx0-f228.google.com (mail-fx0-f228.google.com [209.85.220.228]) by mx1.freebsd.org (Postfix) with ESMTP id 448BA8FC13; Sun, 14 Jun 2009 14:23:12 +0000 (UTC) (envelope-from asmrookie@gmail.com) Received: by fxm28 with SMTP id 28so526281fxm.43 for ; Sun, 14 Jun 2009 07:23:11 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:sender:received:in-reply-to :references:date:x-google-sender-auth:message-id:subject:from:to:cc :content-type:content-transfer-encoding; bh=xXp5J6IjS2+D4UXPIoz/HSnx8yNlhHTELJBoyZaM3Qw=; b=x3YJEHt3Zv61zY/Lm0927Y0GMdOq4KJcUBedp6EGo7Hr0Gvazd+56ofXDezARMaYnA phoACDgX33e5pFZK8gwC3gsP6alI3lVZvLqv6DZZW/UWRWn15TN0PONq6SqQ+j8PpklW ZRID312HYFpfzPlPhcHSG0lTbcJUrUC/KaC0Q= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:cc:content-type :content-transfer-encoding; b=BkQFJl+84Z3417o4A2G4TJrke6jZB69n482YiAdtvL+1v3cn20kI/o0Pv9F/E5XOvY Ktd3nsfLaG24bjf3P5stLSFLu9KBAYaOCJWzPa2LLn23iARU2RZSsCC7U/9xBxsw+dbE xA06DWcgAC6TQCoXMEPZ7JwQF9Xt6XzqRB6qs= MIME-Version: 1.0 Sender: asmrookie@gmail.com Received: by 10.223.110.11 with SMTP id l11mr3668858fap.50.1244989391336; Sun, 14 Jun 2009 07:23:11 -0700 (PDT) In-Reply-To: <4A34F4B7.5050904@FreeBSD.org> References: <3bbf2fe10906081342i6ef418e0n75e22d0b9e2543b3@mail.gmail.com> <4A34F4B7.5050904@FreeBSD.org> Date: Sun, 14 Jun 2009 16:23:11 +0200 X-Google-Sender-Auth: 3dd40696b9bf17fa Message-ID: <3bbf2fe10906140723y2a99eb8an3488796ac6604134@mail.gmail.com> From: Attilio Rao To: Kris Kennaway Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Cc: freebsd-smp@freebsd.org, freebsd-arch@freebsd.org Subject: Re: [PATCH] Adaptive spinning for lockmgr X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 14 Jun 2009 14:23:13 -0000 2009/6/14 Kris Kennaway : > Attilio Rao wrote: >> >> This patch enables adaptive spinning for lockmgr: >> http://www.freebsd.org/~attilio/adaptive_lockmgr.diff >> >> and it should presumably improve performance on disks/vfs/buffer cache >> based benchmarks, so, if you want to try out and report any benchmarks >> result, I'd love to see it. >> Please note that there are some parameters to tune: for example, you >> would like to not enable adaptive spinning to default while you just >> want that for a class of locks (and in that case you want to apply the >> reversed logic for what is living now) or you want to use different >> values =C2=A0for retries and loops. Interested developers can refer to s= uch >> 3 variables. >> Peter Holm alredy tested that patch for about 24hours without any >> regression to report. >> >> Also note that the patch is not 100% yet as long as it needs UPDATES >> and manpages updates, but they will be added just in time before to >> commit. >> The modify is all there. > > I have a vague memory that we had tested a version of this in the past an= d > found that it caused a performance loss in common cases? =C2=A0Many lockm= gr > callers are not amenable to adaptive spinning because they have to wait o= n > slow I/O. =C2=A0Testing only with e.g. md backing might give results that= are > non-representative. I don't think I ever implemented adaptive spinning in lockmgr so if somebody else did I don't know. Said that, probabilly the best approach would be to disable it by default ad use a LK_ADAPTIVESPIN flag on a per instance basis. Such conditions, though, need to be explored a bit and I have no time to dedicate to this right now. Thanks, Attilio --=20 Peace can only be achieved by understanding - A. Einstein From owner-freebsd-arch@FreeBSD.ORG Sun Jun 14 15:00:42 2009 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 8926E1065672; Sun, 14 Jun 2009 15:00:42 +0000 (UTC) (envelope-from kris@FreeBSD.org) Received: from kennaway-macbookpro.config (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id B5ED58FC13; Sun, 14 Jun 2009 15:00:41 +0000 (UTC) (envelope-from kris@FreeBSD.org) Message-ID: <4A351099.3020407@FreeBSD.org> Date: Sun, 14 Jun 2009 16:00:41 +0100 From: Kris Kennaway User-Agent: Thunderbird 2.0.0.21 (Macintosh/20090302) MIME-Version: 1.0 To: Attilio Rao References: <3bbf2fe10906081342i6ef418e0n75e22d0b9e2543b3@mail.gmail.com> <4A34F4B7.5050904@FreeBSD.org> <3bbf2fe10906140723y2a99eb8an3488796ac6604134@mail.gmail.com> In-Reply-To: <3bbf2fe10906140723y2a99eb8an3488796ac6604134@mail.gmail.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: freebsd-smp@freebsd.org, freebsd-arch@freebsd.org Subject: Re: [PATCH] Adaptive spinning for lockmgr X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 14 Jun 2009 15:00:43 -0000 Attilio Rao wrote: > 2009/6/14 Kris Kennaway : >> Attilio Rao wrote: >>> This patch enables adaptive spinning for lockmgr: >>> http://www.freebsd.org/~attilio/adaptive_lockmgr.diff >>> >>> and it should presumably improve performance on disks/vfs/buffer cache >>> based benchmarks, so, if you want to try out and report any benchmarks >>> result, I'd love to see it. >>> Please note that there are some parameters to tune: for example, you >>> would like to not enable adaptive spinning to default while you just >>> want that for a class of locks (and in that case you want to apply the >>> reversed logic for what is living now) or you want to use different >>> values for retries and loops. Interested developers can refer to such >>> 3 variables. >>> Peter Holm alredy tested that patch for about 24hours without any >>> regression to report. >>> >>> Also note that the patch is not 100% yet as long as it needs UPDATES >>> and manpages updates, but they will be added just in time before to >>> commit. >>> The modify is all there. >> I have a vague memory that we had tested a version of this in the past and >> found that it caused a performance loss in common cases? Many lockmgr >> callers are not amenable to adaptive spinning because they have to wait on >> slow I/O. Testing only with e.g. md backing might give results that are >> non-representative. > > I don't think I ever implemented adaptive spinning in lockmgr so if > somebody else did I don't know. Said that, probabilly the best > approach would be to disable it by default ad use a LK_ADAPTIVESPIN > flag on a per instance basis. > Such conditions, though, need to be explored a bit and I have no time > to dedicate to this right now. OK, I am mis-remembering then. Ideally it would be tested in several representative workloads to see where it helps. I can't promise whether I can do this though, for the same reason as you :( Kris From owner-freebsd-arch@FreeBSD.ORG Mon Jun 15 11:06:50 2009 Return-Path: Delivered-To: freebsd-arch@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 3B094106566B for ; Mon, 15 Jun 2009 11:06:50 +0000 (UTC) (envelope-from owner-bugmaster@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id 0DE188FC1E for ; Mon, 15 Jun 2009 11:06:50 +0000 (UTC) (envelope-from owner-bugmaster@FreeBSD.org) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.3/8.14.3) with ESMTP id n5FB6nft076827 for ; Mon, 15 Jun 2009 11:06:49 GMT (envelope-from owner-bugmaster@FreeBSD.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.14.3/8.14.3/Submit) id n5FB6nH2076823 for freebsd-arch@FreeBSD.org; Mon, 15 Jun 2009 11:06:49 GMT (envelope-from owner-bugmaster@FreeBSD.org) Date: Mon, 15 Jun 2009 11:06:49 GMT Message-Id: <200906151106.n5FB6nH2076823@freefall.freebsd.org> X-Authentication-Warning: freefall.freebsd.org: gnats set sender to owner-bugmaster@FreeBSD.org using -f From: FreeBSD bugmaster To: freebsd-arch@FreeBSD.org Cc: Subject: Current problem reports assigned to freebsd-arch@FreeBSD.org X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 15 Jun 2009 11:06:50 -0000 Note: to view an individual PR, use: http://www.freebsd.org/cgi/query-pr.cgi?pr=(number). The following is a listing of current problems submitted by FreeBSD users. These represent problem reports covering all versions including experimental development code and obsolete releases. S Tracker Resp. Description -------------------------------------------------------------------------------- o kern/120749 arch [request] Suggest upping the default kern.ps_arg_cache 1 problem total. From owner-freebsd-arch@FreeBSD.ORG Mon Jun 15 21:53:21 2009 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 1C92C10656BA; Mon, 15 Jun 2009 21:53:21 +0000 (UTC) (envelope-from Daan@vehosting.nl) Received: from VM01.VEHosting.nl (unknown [IPv6:2001:470:1f14:32d::1:140]) by mx1.freebsd.org (Postfix) with ESMTP id D08CF8FC1B; Mon, 15 Jun 2009 21:53:20 +0000 (UTC) (envelope-from Daan@vehosting.nl) Received: from [192.168.72.10] (124-54.bbned.dsl.internl.net [92.254.54.124]) (authenticated bits=0) by VM01.VEHosting.nl (8.14.3/8.13.8) with ESMTP id n5FLrHDI055760; Mon, 15 Jun 2009 23:53:17 +0200 (CEST) (envelope-from Daan@vehosting.nl) From: Daan Vreeken Organization: VEHosting.nl - Vitsch Electronics Hosting To: freebsd-arch@freebsd.org Date: Mon, 15 Jun 2009 23:52:47 +0200 User-Agent: KMail/1.9.10 References: <4A254B45.8050800@mavhome.dp.ua> <4A294DC3.5010008@mavhome.dp.ua> <200906051728.n55HSFf0076644@apollo.backplane.com> In-Reply-To: <200906051728.n55HSFf0076644@apollo.backplane.com> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-6" Content-Transfer-Encoding: 7bit Content-Disposition: inline Message-Id: <200906152352.48231.Daan@vehosting.nl> x-ve-auth-version: mi-1.0.3 2008-05-30 - Copyright (c) 2008 - Daan Vreeken - VEHosting x-ve-auth: authenticated as 'pa4dan' on VM01.VEHosting.nl Cc: FreeBSD-Current , Alexander Motin Subject: Re: WIP: ATA to CAM integration X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 15 Jun 2009 21:53:22 -0000 Hi, On Friday 05 June 2009 19:28:15 Matthew Dillon wrote: > :Latest AHCI specifications define feature named FIS Based Switching. It > :allows controller independently track state of every device beyond port > :multiplier. It should be quite easy to use it, but actually none of my > :controllers have that capability. > > Damn. The FBSS capability bit is not set on my (AMD) MCP77 based AHCI > SATA controller. That sucks. > > ahci0: ... > ahci0: AHCI 1.2 capabilities > 0xe3229f05, 6 port > > Do you know of any host controllers which support FBS ? Any of the > Intel parts or machines per-chance? ... According to the following link : http://www.siliconimage.com/products/product.aspx?pid=32 the SiI3132 supports FIS based switching. We use them in a storage server prototype. Regards, -- Daan Vreeken VEHosting http://VEHosting.nl tel: +31-(0)40-7113050 / +31-(0)6-46210825 KvK nr: 17174380 From owner-freebsd-arch@FreeBSD.ORG Mon Jun 15 22:09:52 2009 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id CEAF510656DD; Mon, 15 Jun 2009 22:09:52 +0000 (UTC) (envelope-from dillon@apollo.backplane.com) Received: from apollo.backplane.com (apollo.backplane.com [216.240.41.2]) by mx1.freebsd.org (Postfix) with ESMTP id 8D33A8FC1E; Mon, 15 Jun 2009 22:09:52 +0000 (UTC) (envelope-from dillon@apollo.backplane.com) Received: from apollo.backplane.com (localhost [127.0.0.1]) by apollo.backplane.com (8.14.2/8.14.1) with ESMTP id n5FM9pDN007071; Mon, 15 Jun 2009 15:09:51 -0700 (PDT) Received: (from dillon@localhost) by apollo.backplane.com (8.14.2/8.13.4/Submit) id n5FM9psY007070; Mon, 15 Jun 2009 15:09:51 -0700 (PDT) Date: Mon, 15 Jun 2009 15:09:51 -0700 (PDT) From: Matthew Dillon Message-Id: <200906152209.n5FM9psY007070@apollo.backplane.com> To: Daan Vreeken References: <4A254B45.8050800@mavhome.dp.ua> <4A294DC3.5010008@mavhome.dp.ua> <200906051728.n55HSFf0076644@apollo.backplane.com> <200906152352.48231.Daan@vehosting.nl> Cc: Alexander Motin , FreeBSD-Current , freebsd-arch@freebsd.org Subject: Re: WIP: ATA to CAM integration X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 15 Jun 2009 22:09:53 -0000 :Hi, : :According to the following link : : http://www.siliconimage.com/products/product.aspx?pid=32 : :the SiI3132 supports FIS based switching. We use them in a storage server :prototype. : :Regards, :-- :Daan Vreeken :VEHosting Yah, I have a bunch of those. They aren't AHCI parts though (as far as I know) so it doesn't help with the AHCI driver. They have their own custom driver. But thanks for mentioning it :-) (Someone tell me if I'm wrong there, I'm pretty sure all the Sili stuff uses a Sili-specific device driver). -Matt From owner-freebsd-arch@FreeBSD.ORG Mon Jun 15 22:57:48 2009 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 7A8461065674; Mon, 15 Jun 2009 22:57:48 +0000 (UTC) (envelope-from oz@nixil.net) Received: from nixil.net (nixil.net [161.58.222.1]) by mx1.freebsd.org (Postfix) with ESMTP id 3C1DF8FC1D; Mon, 15 Jun 2009 22:57:47 +0000 (UTC) (envelope-from oz@nixil.net) Received: from demigorgon.corp.verio.net (fw.oremut02.us.wh.verio.net [198.65.168.24]) (authenticated bits=0) by nixil.net (8.13.6.20060614/8.13.6) with ESMTP id n5FMivUW070044 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NOT); Mon, 15 Jun 2009 16:45:10 -0600 (MDT) Message-ID: <4A36CEE9.9040101@nixil.net> Date: Mon, 15 Jun 2009 16:44:57 -0600 From: Phil Oleson User-Agent: Thunderbird 2.0.0.14 (X11/20080623) MIME-Version: 1.0 To: Matthew Dillon References: <4A254B45.8050800@mavhome.dp.ua> <4A294DC3.5010008@mavhome.dp.ua> <200906051728.n55HSFf0076644@apollo.backplane.com> <200906152352.48231.Daan@vehosting.nl> <200906152209.n5FM9psY007070@apollo.backplane.com> In-Reply-To: <200906152209.n5FM9psY007070@apollo.backplane.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.0.1 (nixil.net [161.58.222.1]); Mon, 15 Jun 2009 16:45:11 -0600 (MDT) X-Virus-Scanned: ClamAV 0.94.2/9467/Mon Jun 15 02:11:58 2009 on nixil.net X-Virus-Status: Clean Cc: Daan Vreeken , Alexander Motin , FreeBSD-Current , freebsd-arch@freebsd.org Subject: Re: WIP: ATA to CAM integration X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 15 Jun 2009 22:57:49 -0000 Matthew Dillon wrote: > :Hi, > : > :According to the following link : > : http://www.siliconimage.com/products/product.aspx?pid=32 > : > :the SiI3132 supports FIS based switching. We use them in a storage server > :prototype. > : > :Regards, > :-- > :Daan Vreeken > :VEHosting > > Yah, I have a bunch of those. They aren't AHCI parts though (as far > as I know) so it doesn't help with the AHCI driver. They have their > own custom driver. But thanks for mentioning it :-) > > (Someone tell me if I'm wrong there, I'm pretty sure all the Sili stuff > uses a Sili-specific device driver). meh.. found this via google: http://www.tomshardware.com/reviews/storage-accessories,1787-2.html The article claims it's AHCI compliant.. though the addonics web page doesn't specifically says so from a cursory glance here: http://www.addonics.com/products/host_controller/extpm.asp and the other form factors. http://www.addonics.com/products/pm/ -Phil. From owner-freebsd-arch@FreeBSD.ORG Mon Jun 15 23:10:40 2009 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 3B464106566B; Mon, 15 Jun 2009 23:10:40 +0000 (UTC) (envelope-from imp@bsdimp.com) Received: from harmony.bsdimp.com (bsdimp.com [199.45.160.85]) by mx1.freebsd.org (Postfix) with ESMTP id D69848FC17; Mon, 15 Jun 2009 23:10:39 +0000 (UTC) (envelope-from imp@bsdimp.com) Received: from localhost (localhost [127.0.0.1]) by harmony.bsdimp.com (8.14.3/8.14.1) with ESMTP id n5FN7V5c077571; Mon, 15 Jun 2009 17:07:31 -0600 (MDT) (envelope-from imp@bsdimp.com) Date: Mon, 15 Jun 2009 17:07:54 -0600 (MDT) Message-Id: <20090615.170754.1399854812.imp@bsdimp.com> To: mav@freebsd.org From: "M. Warner Losh" In-Reply-To: <4A2AF876.1030103@FreeBSD.org> References: <6657.1244328220@critter.freebsd.dk> <4A2AF876.1030103@FreeBSD.org> X-Mailer: Mew version 5.2 on Emacs 21.3 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit Cc: phk@phk.freebsd.dk, freebsd-current@freebsd.org, freebsd-arch@freebsd.org Subject: Re: WIP: ATA to CAM integration X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 15 Jun 2009 23:10:40 -0000 In message: <4A2AF876.1030103@FreeBSD.org> Alexander Motin writes: : Poul-Henning Kamp wrote: : > In message <4A294DC3.5010008@mavhome.dp.ua>, Alexander Motin writes: : >> I think ATAPI disk device is theoretically possible, but I believe it : >> does not exist in practice, as industry do not need it. : > : > Maxtor ZIP ? : : May be, never had an ATA version. But it is more FDD, then HDD. Also it : existed in Parallel Port and SCSI versions, so it could be done in ATAPI : way just for unification. There was a ata/atapi version too. It attaches to afd. Warner From owner-freebsd-arch@FreeBSD.ORG Mon Jun 15 23:37:27 2009 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 3B47110656CC; Mon, 15 Jun 2009 23:37:27 +0000 (UTC) (envelope-from dillon@apollo.backplane.com) Received: from apollo.backplane.com (apollo.backplane.com [216.240.41.2]) by mx1.freebsd.org (Postfix) with ESMTP id 0FE788FC0A; Mon, 15 Jun 2009 23:37:26 +0000 (UTC) (envelope-from dillon@apollo.backplane.com) Received: from apollo.backplane.com (localhost [127.0.0.1]) by apollo.backplane.com (8.14.2/8.14.1) with ESMTP id n5FNbQrk008015; Mon, 15 Jun 2009 16:37:26 -0700 (PDT) Received: (from dillon@localhost) by apollo.backplane.com (8.14.2/8.13.4/Submit) id n5FNbQrI008014; Mon, 15 Jun 2009 16:37:26 -0700 (PDT) Date: Mon, 15 Jun 2009 16:37:26 -0700 (PDT) From: Matthew Dillon Message-Id: <200906152337.n5FNbQrI008014@apollo.backplane.com> To: Phil Oleson References: <4A254B45.8050800@mavhome.dp.ua> <4A294DC3.5010008@mavhome.dp.ua> <200906051728.n55HSFf0076644@apollo.backplane.com> <200906152352.48231.Daan@vehosting.nl> <200906152209.n5FM9psY007070@apollo.backplane.com> <4A36CEE9.9040101@nixil.net> Cc: Daan Vreeken , Alexander Motin , FreeBSD-Current , freebsd-arch@freebsd.org Subject: Re: WIP: ATA to CAM integration X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 15 Jun 2009 23:37:27 -0000 :meh.. found this via google: : :http://www.tomshardware.com/reviews/storage-accessories,1787-2.html : :The article claims it's AHCI compliant.. though the addonics web page :doesn't specifically says so from a cursory glance here: : :http://www.addonics.com/products/host_controller/extpm.asp : :and the other form factors. :http://www.addonics.com/products/pm/ : : -Phil. I think they mis-spoke. They are SATA-compliant and Port Multiplier compliant, and they use FIS-based packets, so they pretty much do away with all the ATA baggage, but they don't use the AHCI device interface so they won't probe as an AHCI driver. I can see why they do it that way, though. It looks like they hide most of the complexity behind the chipset, which is nice. AHCI exposes a lot of that complexity. It looks like a reasonable chipset. -Matt Matthew Dillon From owner-freebsd-arch@FreeBSD.ORG Tue Jun 16 00:12:47 2009 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 9C8D61065670 for ; Tue, 16 Jun 2009 00:12:47 +0000 (UTC) (envelope-from james-freebsd-current@jrv.org) Received: from mail.jrv.org (adsl-70-243-84-13.dsl.austtx.swbell.net [70.243.84.13]) by mx1.freebsd.org (Postfix) with ESMTP id 384468FC19 for ; Tue, 16 Jun 2009 00:12:47 +0000 (UTC) (envelope-from james-freebsd-current@jrv.org) Received: from kremvax.housenet.jrv (kremvax.housenet.jrv [192.168.3.124]) by mail.jrv.org (8.14.3/8.14.3) with ESMTP id n5FNRg3E004454; Mon, 15 Jun 2009 18:27:42 -0500 (CDT) (envelope-from james-freebsd-current@jrv.org) Authentication-Results: mail.jrv.org; domainkeys=pass (testing) header.from=james-freebsd-current@jrv.org DomainKey-Signature: a=rsa-sha1; s=enigma; d=jrv.org; c=nofws; q=dns; h=message-id:date:from:user-agent:mime-version:to:cc:subject: references:in-reply-to:content-type:content-transfer-encoding; b=HBAFMsZwBQ1N6/h/9XsgAY/v8idJZ7w+xFY/gBm6DJfAqKXTxmhYV534xUCrMStZX UTH57puE0dI5aV5bCQ+bNJMkc4sAcZO03GJLWO0Hz0on90AKTiOZ01oAINGcE4Ui0IR iBk14pa6/y1Y8xkv4EAcvZMrlYaTSKkG/cYrIPM= Message-ID: <4A36D8D9.7080104@jrv.org> Date: Mon, 15 Jun 2009 18:27:21 -0500 From: "James R. Van Artsdalen" User-Agent: Thunderbird 2.0.0.21 (Macintosh/20090302) MIME-Version: 1.0 To: Matthew Dillon References: <4A254B45.8050800@mavhome.dp.ua> <4A294DC3.5010008@mavhome.dp.ua> <200906051728.n55HSFf0076644@apollo.backplane.com> <200906152352.48231.Daan@vehosting.nl> <200906152209.n5FM9psY007070@apollo.backplane.com> In-Reply-To: <200906152209.n5FM9psY007070@apollo.backplane.com> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: Daan Vreeken , Alexander Motin , FreeBSD-Current , freebsd-arch@freebsd.org Subject: Re: WIP: ATA to CAM integration X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 16 Jun 2009 00:12:47 -0000 Matthew Dillon wrote: > (Someone tell me if I'm wrong there, I'm pretty sure all the Sili stuff > uses a Sili-specific device driver). > Silicon Image publishes the 3132 datasheet. http://www.siimage.com/docs/SiI-DS-0138-D.pdf This chip is probably the one most commonly used in add-on cards due to low cost. From owner-freebsd-arch@FreeBSD.ORG Tue Jun 16 05:52:48 2009 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id DB440106564A; Tue, 16 Jun 2009 05:52:48 +0000 (UTC) (envelope-from mav@FreeBSD.org) Received: from cmail.optima.ua (cmail.optima.ua [195.248.191.121]) by mx1.freebsd.org (Postfix) with ESMTP id 1E4948FC28; Tue, 16 Jun 2009 05:52:47 +0000 (UTC) (envelope-from mav@FreeBSD.org) Received: from [212.86.226.226] (account mav@alkar.net HELO mavbook.mavhome.dp.ua) by cmail.optima.ua (CommuniGate Pro SMTP 5.2.9) with ESMTPSA id 245817994; Tue, 16 Jun 2009 08:52:44 +0300 Message-ID: <4A373318.9000603@FreeBSD.org> Date: Tue, 16 Jun 2009 08:52:24 +0300 From: Alexander Motin User-Agent: Thunderbird 2.0.0.21 (X11/20090405) MIME-Version: 1.0 To: Matthew Dillon References: <4A254B45.8050800@mavhome.dp.ua> <4A294DC3.5010008@mavhome.dp.ua> <200906051728.n55HSFf0076644@apollo.backplane.com> <200906152352.48231.Daan@vehosting.nl> <200906152209.n5FM9psY007070@apollo.backplane.com> <4A36CEE9.9040101@nixil.net> <200906152337.n5FNbQrI008014@apollo.backplane.com> In-Reply-To: <200906152337.n5FNbQrI008014@apollo.backplane.com> Content-Type: text/plain; charset=KOI8-R; format=flowed Content-Transfer-Encoding: 7bit Cc: Daan Vreeken , FreeBSD-Current , Phil Oleson , freebsd-arch@freebsd.org Subject: Re: WIP: ATA to CAM integration X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 16 Jun 2009 05:52:49 -0000 Matthew Dillon wrote: > I think they mis-spoke. They are SATA-compliant and Port Multiplier > compliant, and they use FIS-based packets, so they pretty much do away > with all the ATA baggage, but they don't use the AHCI device interface > so they won't probe as an AHCI driver. > > I can see why they do it that way, though. It looks like they hide > most of the complexity behind the chipset, which is nice. AHCI > exposes a lot of that complexity. > > It looks like a reasonable chipset. Agree. It's functionally comparable to the latest AHCI specs, but looks more user-friendly. -- Alexander Motin From owner-freebsd-arch@FreeBSD.ORG Wed Jun 17 22:55:58 2009 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id B5324106566C; Wed, 17 Jun 2009 22:55:58 +0000 (UTC) (envelope-from jroberson@jroberson.net) Received: from wa-out-1112.google.com (wa-out-1112.google.com [209.85.146.183]) by mx1.freebsd.org (Postfix) with ESMTP id 7DED98FC18; Wed, 17 Jun 2009 22:55:58 +0000 (UTC) (envelope-from jroberson@jroberson.net) Received: by wa-out-1112.google.com with SMTP id m38so179522waf.27 for ; Wed, 17 Jun 2009 15:55:58 -0700 (PDT) Received: by 10.114.195.19 with SMTP id s19mr1002299waf.10.1245279358085; Wed, 17 Jun 2009 15:55:58 -0700 (PDT) Received: from ?10.0.1.198? (udp016664uds.hawaiiantel.net [72.235.41.117]) by mx.google.com with ESMTPS id l27sm2061528waf.55.2009.06.17.15.55.54 (version=SSLv3 cipher=RC4-MD5); Wed, 17 Jun 2009 15:55:55 -0700 (PDT) Date: Wed, 17 Jun 2009 12:55:52 -1000 (HST) From: Jeff Roberson X-X-Sender: jroberson@desktop To: Peter Grehan In-Reply-To: <4A2F1148.9090706@freebsd.org> Message-ID: References: <20090609201127.GA50903@alchemy.franken.de> <4A2F1148.9090706@freebsd.org> User-Agent: Alpine 2.00 (BSF 1167 2008-08-23) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; format=flowed; charset=US-ASCII Cc: arch@freebsd.org, Marius Strobl Subject: Re: Dynamic pcpu, arm, mips, powerpc, sun, etc. help needed X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 17 Jun 2009 22:55:59 -0000 On Tue, 9 Jun 2009, Peter Grehan wrote: >> As for sparc64 allocating the storage for the dynamic area >> from end probably isn't a good idea as the pmap code assumes >> that the range from KERNBASE to end is covered by the pages >> allocated by and locked into the TLB for the kernel by the >> loader > > Ditto for ppc. It's possible to get the additional space from within or > after return from pmap_bootstrap() (like thread0's kstack, or the msgbuf). http://people.freebsd.org/~jeff/dpcpu.diff I have updated this patch based on feedback relating to various architectures md code. I tried to model most architectures after the way msgbuf memory was taken. I have no capacity to test anything other than i386 and amd64. ARM is reported to work with one minor diff. Apparently sparc64 worked with the earlier diff but this should be cleaner. If anyone can report back on sparc64, mips, or powerpc, I'd appreciate it. Thanks, Jeff > > later, > > Peter. > From owner-freebsd-arch@FreeBSD.ORG Thu Jun 18 02:30:26 2009 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id D03CA1065673 for ; Thu, 18 Jun 2009 02:30:26 +0000 (UTC) (envelope-from xcllnt@mac.com) Received: from asmtpout020.mac.com (asmtpout020.mac.com [17.148.16.95]) by mx1.freebsd.org (Postfix) with ESMTP id 598428FC1A for ; Thu, 18 Jun 2009 02:30:26 +0000 (UTC) (envelope-from xcllnt@mac.com) MIME-version: 1.0 Content-transfer-encoding: 7BIT Content-type: text/plain; charset=US-ASCII; format=flowed; delsp=yes Received: from MacBook-Pro.lan.xcllnt.net (mail.xcllnt.net [75.101.29.67]) by asmtp020.mac.com (Sun Java(tm) System Messaging Server 6.3-8.01 (built Dec 16 2008; 32bit)) with ESMTPSA id <0KLE00CLGUUMRP40@asmtp020.mac.com>; Wed, 17 Jun 2009 18:30:23 -0700 (PDT) Message-id: <94B46331-19AB-4174-BEDA-8B4B0A525B45@mac.com> From: Marcel Moolenaar To: Jeff Roberson In-reply-to: Date: Wed, 17 Jun 2009 18:30:22 -0700 References: <20090609201127.GA50903@alchemy.franken.de> <4A2F1148.9090706@freebsd.org> X-Mailer: Apple Mail (2.935.3) Cc: arch@freebsd.org, Peter Grehan , Marius Strobl Subject: Re: Dynamic pcpu, arm, mips, powerpc, sun, etc. help needed X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 18 Jun 2009 02:30:27 -0000 On Jun 17, 2009, at 3:55 PM, Jeff Roberson wrote: > > On Tue, 9 Jun 2009, Peter Grehan wrote: > >>> As for sparc64 allocating the storage for the dynamic area >>> from end probably isn't a good idea as the pmap code assumes >>> that the range from KERNBASE to end is covered by the pages >>> allocated by and locked into the TLB for the kernel by the >>> loader >> >> Ditto for ppc. It's possible to get the additional space from >> within or after return from pmap_bootstrap() (like thread0's >> kstack, or the msgbuf). > > http://people.freebsd.org/~jeff/dpcpu.diff > > I have updated this patch based on feedback relating to various > architectures md code. I tried to model most architectures after > the way msgbuf memory was taken. I have no capacity to test > anything other than i386 and amd64. ARM is reported to work with > one minor diff. Apparently sparc64 worked with the earlier diff but > this should be cleaner. If anyone can report back on sparc64, mips, > or powerpc, I'd appreciate it. Can you fix the ia64 diff by moving the following lines up as well: /* But if the bootstrap tells us otherwise, believe it! */ if (bootinfo.bi_kernend) kernend = round_page(bootinfo.bi_kernend); Otherwise we're using the wrong kernend value for dpcpu_init() and also override what dpcpu_init() did to kernend. Thanks, -- Marcel Moolenaar xcllnt@mac.com From owner-freebsd-arch@FreeBSD.ORG Thu Jun 18 03:04:04 2009 Return-Path: Delivered-To: arch@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id E0339106564A; Thu, 18 Jun 2009 03:04:04 +0000 (UTC) (envelope-from imp@bsdimp.com) Received: from harmony.bsdimp.com (bsdimp.com [199.45.160.85]) by mx1.freebsd.org (Postfix) with ESMTP id 9B86C8FC0A; Thu, 18 Jun 2009 03:04:04 +0000 (UTC) (envelope-from imp@bsdimp.com) Received: from localhost (localhost [127.0.0.1]) by harmony.bsdimp.com (8.14.3/8.14.1) with ESMTP id n5I32vst017215; Wed, 17 Jun 2009 21:02:57 -0600 (MDT) (envelope-from imp@bsdimp.com) Date: Wed, 17 Jun 2009 21:03:18 -0600 (MDT) Message-Id: <20090617.210318.1878034641.imp@bsdimp.com> To: jroberson@jroberson.net From: "M. Warner Losh" In-Reply-To: References: <20090609201127.GA50903@alchemy.franken.de> <4A2F1148.9090706@freebsd.org> X-Mailer: Mew version 5.2 on Emacs 21.3 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit Cc: arch@FreeBSD.org, grehan@FreeBSD.org, marius@alchemy.franken.de Subject: Re: Dynamic pcpu, arm, mips, powerpc, sun, etc. help needed X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 18 Jun 2009 03:04:05 -0000 In message: Jeff Roberson writes: : : On Tue, 9 Jun 2009, Peter Grehan wrote: : : >> As for sparc64 allocating the storage for the dynamic area : >> from end probably isn't a good idea as the pmap code assumes : >> that the range from KERNBASE to end is covered by the pages : >> allocated by and locked into the TLB for the kernel by the : >> loader : > : > Ditto for ppc. It's possible to get the additional space from within or : > after return from pmap_bootstrap() (like thread0's kstack, or the msgbuf). : : http://people.freebsd.org/~jeff/dpcpu.diff : : I have updated this patch based on feedback relating to various : architectures md code. I tried to model most architectures after the way : msgbuf memory was taken. I have no capacity to test anything other than : i386 and amd64. ARM is reported to work with one minor diff. Apparently : sparc64 worked with the earlier diff but this should be cleaner. If : anyone can report back on sparc64, mips, or powerpc, I'd appreciate it. I don't understand this part of the patch: Index: mips/mips/mp_machdep.c =================================================================== --- mips/mips/mp_machdep.c (revision 194275) +++ mips/mips/mp_machdep.c (working copy) @@ -224,12 +224,15 @@ static int smp_start_secondary(int cpuid) { struct pcpu *pcpu; + void *dpcpu; int i; if (bootverbose) printf("smp_start_secondary: starting cpu %d\n", cpuid); + dpcpu = (void *)kmem_alloc(kernel_map, DPCPU_SIZE); pcpu_init(&__pcpu[cpuid], cpuid, sizeof(struct pcpu)); + dpcpu_init(dpcpu, cpuid); if (bootverbose) printf("smp_start_secondary: cpu %d started\n", cpuid); So were adding a dynamic per-cpu area, in addition to the fixed part? Warner From owner-freebsd-arch@FreeBSD.ORG Thu Jun 18 04:16:10 2009 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 55C61106564A; Thu, 18 Jun 2009 04:16:10 +0000 (UTC) (envelope-from jroberson@jroberson.net) Received: from mail-px0-f203.google.com (mail-px0-f203.google.com [209.85.216.203]) by mx1.freebsd.org (Postfix) with ESMTP id 2AD958FC14; Thu, 18 Jun 2009 04:16:10 +0000 (UTC) (envelope-from jroberson@jroberson.net) Received: by pxi41 with SMTP id 41so243698pxi.3 for ; Wed, 17 Jun 2009 21:16:09 -0700 (PDT) Received: by 10.115.108.1 with SMTP id k1mr1356584wam.190.1245298569848; Wed, 17 Jun 2009 21:16:09 -0700 (PDT) Received: from ?10.0.1.198? (udp016664uds.hawaiiantel.net [72.235.41.117]) by mx.google.com with ESMTPS id k14sm2568371waf.60.2009.06.17.21.16.07 (version=SSLv3 cipher=RC4-MD5); Wed, 17 Jun 2009 21:16:08 -0700 (PDT) Date: Wed, 17 Jun 2009 18:16:06 -1000 (HST) From: Jeff Roberson X-X-Sender: jroberson@desktop To: "M. Warner Losh" In-Reply-To: <20090617.210318.1878034641.imp@bsdimp.com> Message-ID: References: <20090609201127.GA50903@alchemy.franken.de> <4A2F1148.9090706@freebsd.org> <20090617.210318.1878034641.imp@bsdimp.com> User-Agent: Alpine 2.00 (BSF 1167 2008-08-23) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: arch@FreeBSD.org, grehan@FreeBSD.org, marius@alchemy.franken.de Subject: Re: Dynamic pcpu, arm, mips, powerpc, sun, etc. help needed X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 18 Jun 2009 04:16:10 -0000 On Wed, 17 Jun 2009, M. Warner Losh wrote: > In message: > Jeff Roberson writes: > : > : On Tue, 9 Jun 2009, Peter Grehan wrote: > : > : >> As for sparc64 allocating the storage for the dynamic area > : >> from end probably isn't a good idea as the pmap code assumes > : >> that the range from KERNBASE to end is covered by the pages > : >> allocated by and locked into the TLB for the kernel by the > : >> loader > : > > : > Ditto for ppc. It's possible to get the additional space from within or > : > after return from pmap_bootstrap() (like thread0's kstack, or the msgbuf). > : > : http://people.freebsd.org/~jeff/dpcpu.diff > : > : I have updated this patch based on feedback relating to various > : architectures md code. I tried to model most architectures after the way > : msgbuf memory was taken. I have no capacity to test anything other than > : i386 and amd64. ARM is reported to work with one minor diff. Apparently > : sparc64 worked with the earlier diff but this should be cleaner. If > : anyone can report back on sparc64, mips, or powerpc, I'd appreciate it. > > > I don't understand this part of the patch: > > Index: mips/mips/mp_machdep.c > =================================================================== > --- mips/mips/mp_machdep.c (revision 194275) > +++ mips/mips/mp_machdep.c (working copy) > @@ -224,12 +224,15 @@ static int > smp_start_secondary(int cpuid) > { > struct pcpu *pcpu; > + void *dpcpu; > int i; > > if (bootverbose) > printf("smp_start_secondary: starting cpu %d\n", cpuid); > > + dpcpu = (void *)kmem_alloc(kernel_map, DPCPU_SIZE); > pcpu_init(&__pcpu[cpuid], cpuid, sizeof(struct pcpu)); > + dpcpu_init(dpcpu, cpuid); > > if (bootverbose) > printf("smp_start_secondary: cpu %d started\n", cpuid); > > So were adding a dynamic per-cpu area, in addition to the fixed part? Yes, the fixed part is for legacy and very frequently accessed items that need fixed addresses. The dynamic area is for convenience and is slightly more expensive to access. It also has addresses that are not resolved until link time. The fixed area uses a static structure with a size that is known at compile time. The dynamic part is only known at link time and so must be allocated seperately. Jeff > > Warner > From owner-freebsd-arch@FreeBSD.ORG Thu Jun 18 04:34:22 2009 Return-Path: Delivered-To: arch@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id B5DC3106566B for ; Thu, 18 Jun 2009 04:34:22 +0000 (UTC) (envelope-from julian@elischer.org) Received: from outC.internet-mail-service.net (outc.internet-mail-service.net [216.240.47.226]) by mx1.freebsd.org (Postfix) with ESMTP id 935928FC12 for ; Thu, 18 Jun 2009 04:34:22 +0000 (UTC) (envelope-from julian@elischer.org) Received: from idiom.com (mx0.idiom.com [216.240.32.160]) by out.internet-mail-service.net (Postfix) with ESMTP id 3E1ACB755D; Wed, 17 Jun 2009 21:34:22 -0700 (PDT) X-Client-Authorized: MaGic Cook1e X-Client-Authorized: MaGic Cook1e X-Client-Authorized: MaGic Cook1e X-Client-Authorized: MaGic Cook1e X-Client-Authorized: MaGic Cook1e Received: from julian-mac.elischer.org (home.elischer.org [216.240.48.38]) by idiom.com (Postfix) with ESMTP id 8009F2D6006; Wed, 17 Jun 2009 21:34:21 -0700 (PDT) Message-ID: <4A39C3CD.8020909@elischer.org> Date: Wed, 17 Jun 2009 21:34:21 -0700 From: Julian Elischer User-Agent: Thunderbird 2.0.0.21 (Macintosh/20090302) MIME-Version: 1.0 To: Jeff Roberson References: <20090609201127.GA50903@alchemy.franken.de> <4A2F1148.9090706@freebsd.org> <20090617.210318.1878034641.imp@bsdimp.com> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: arch@FreeBSD.org, grehan@FreeBSD.org, marius@alchemy.franken.de Subject: Re: Dynamic pcpu, arm, mips, powerpc, sun, etc. help needed X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 18 Jun 2009 04:34:22 -0000 Jeff Roberson wrote: > On Wed, 17 Jun 2009, M. Warner Losh wrote: > >> In message: >> Jeff Roberson writes: >> : >> : On Tue, 9 Jun 2009, Peter Grehan wrote: >> : >> : >> As for sparc64 allocating the storage for the dynamic area >> : >> from end probably isn't a good idea as the pmap code assumes >> : >> that the range from KERNBASE to end is covered by the pages >> : >> allocated by and locked into the TLB for the kernel by the >> : >> loader >> : > >> : > Ditto for ppc. It's possible to get the additional space from >> within or >> : > after return from pmap_bootstrap() (like thread0's kstack, or the >> msgbuf). >> : >> : http://people.freebsd.org/~jeff/dpcpu.diff >> : >> : I have updated this patch based on feedback relating to various >> : architectures md code. I tried to model most architectures after >> the way >> : msgbuf memory was taken. I have no capacity to test anything other >> than >> : i386 and amd64. ARM is reported to work with one minor diff. >> Apparently >> : sparc64 worked with the earlier diff but this should be cleaner. If >> : anyone can report back on sparc64, mips, or powerpc, I'd appreciate it. >> >> >> I don't understand this part of the patch: >> >> Index: mips/mips/mp_machdep.c >> =================================================================== >> --- mips/mips/mp_machdep.c (revision 194275) >> +++ mips/mips/mp_machdep.c (working copy) >> @@ -224,12 +224,15 @@ static int >> smp_start_secondary(int cpuid) >> { >> struct pcpu *pcpu; >> + void *dpcpu; >> int i; >> >> if (bootverbose) >> printf("smp_start_secondary: starting cpu %d\n", cpuid); >> >> + dpcpu = (void *)kmem_alloc(kernel_map, DPCPU_SIZE); >> pcpu_init(&__pcpu[cpuid], cpuid, sizeof(struct pcpu)); >> + dpcpu_init(dpcpu, cpuid); >> >> if (bootverbose) >> printf("smp_start_secondary: cpu %d started\n", cpuid); >> >> So were adding a dynamic per-cpu area, in addition to the fixed part? > > Yes, the fixed part is for legacy and very frequently accessed items > that need fixed addresses. The dynamic area is for convenience and is > slightly more expensive to access. It also has addresses that are not > resolved until link time. > > The fixed area uses a static structure with a size that is known at > compile time. The dynamic part is only known at link time and so must > be allocated seperately. the compilers know of TLS and it wouldn't take much in the backend code to make the 'thread' keyworkd for TLS generate per-cpu data instead of per-thread data.. basically the register settings for TLS would have to be replaced by per cpu registers but .. wait we do that.. since the per-thread registers in the kernel point to per-cpu data and are kept correct by the scheduler, shouldn't the TLS code "just work" if we put the correct data structures in the right places? > > Jeff > >> >> Warner >> > _______________________________________________ > freebsd-arch@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-arch > To unsubscribe, send any mail to "freebsd-arch-unsubscribe@freebsd.org" From owner-freebsd-arch@FreeBSD.ORG Thu Jun 18 05:32:02 2009 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 9D629106566B; Thu, 18 Jun 2009 05:32:02 +0000 (UTC) (envelope-from jroberson@jroberson.net) Received: from mail-px0-f203.google.com (mail-px0-f203.google.com [209.85.216.203]) by mx1.freebsd.org (Postfix) with ESMTP id 6CB028FC13; Thu, 18 Jun 2009 05:32:02 +0000 (UTC) (envelope-from jroberson@jroberson.net) Received: by mail-px0-f203.google.com with SMTP id 41so279098pxi.3 for ; Wed, 17 Jun 2009 22:32:02 -0700 (PDT) Received: by 10.114.76.10 with SMTP id y10mr1499881waa.83.1245303122036; Wed, 17 Jun 2009 22:32:02 -0700 (PDT) Received: from ?10.0.1.198? (udp016664uds.hawaiiantel.net [72.235.41.117]) by mx.google.com with ESMTPS id l38sm2673854waf.26.2009.06.17.22.31.59 (version=SSLv3 cipher=RC4-MD5); Wed, 17 Jun 2009 22:32:00 -0700 (PDT) Date: Wed, 17 Jun 2009 19:31:58 -1000 (HST) From: Jeff Roberson X-X-Sender: jroberson@desktop To: Julian Elischer In-Reply-To: <4A39C3CD.8020909@elischer.org> Message-ID: References: <20090609201127.GA50903@alchemy.franken.de> <4A2F1148.9090706@freebsd.org> <20090617.210318.1878034641.imp@bsdimp.com> <4A39C3CD.8020909@elischer.org> User-Agent: Alpine 2.00 (BSF 1167 2008-08-23) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: arch@FreeBSD.org, grehan@FreeBSD.org, marius@alchemy.franken.de Subject: Re: Dynamic pcpu, arm, mips, powerpc, sun, etc. help needed X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 18 Jun 2009 05:32:03 -0000 On Wed, 17 Jun 2009, Julian Elischer wrote: > Jeff Roberson wrote: >> On Wed, 17 Jun 2009, M. Warner Losh wrote: >> >>> In message: >>> Jeff Roberson writes: >>> : >>> : On Tue, 9 Jun 2009, Peter Grehan wrote: >>> : >>> : >> As for sparc64 allocating the storage for the dynamic area >>> : >> from end probably isn't a good idea as the pmap code assumes >>> : >> that the range from KERNBASE to end is covered by the pages >>> : >> allocated by and locked into the TLB for the kernel by the >>> : >> loader >>> : > >>> : > Ditto for ppc. It's possible to get the additional space from within >>> or >>> : > after return from pmap_bootstrap() (like thread0's kstack, or the >>> msgbuf). >>> : >>> : http://people.freebsd.org/~jeff/dpcpu.diff >>> : >>> : I have updated this patch based on feedback relating to various >>> : architectures md code. I tried to model most architectures after the >>> way >>> : msgbuf memory was taken. I have no capacity to test anything other than >>> : i386 and amd64. ARM is reported to work with one minor diff. >>> Apparently >>> : sparc64 worked with the earlier diff but this should be cleaner. If >>> : anyone can report back on sparc64, mips, or powerpc, I'd appreciate it. >>> >>> >>> I don't understand this part of the patch: >>> >>> Index: mips/mips/mp_machdep.c >>> =================================================================== >>> --- mips/mips/mp_machdep.c (revision 194275) >>> +++ mips/mips/mp_machdep.c (working copy) >>> @@ -224,12 +224,15 @@ static int >>> smp_start_secondary(int cpuid) >>> { >>> struct pcpu *pcpu; >>> + void *dpcpu; >>> int i; >>> >>> if (bootverbose) >>> printf("smp_start_secondary: starting cpu %d\n", cpuid); >>> >>> + dpcpu = (void *)kmem_alloc(kernel_map, DPCPU_SIZE); >>> pcpu_init(&__pcpu[cpuid], cpuid, sizeof(struct pcpu)); >>> + dpcpu_init(dpcpu, cpuid); >>> >>> if (bootverbose) >>> printf("smp_start_secondary: cpu %d started\n", cpuid); >>> >>> So were adding a dynamic per-cpu area, in addition to the fixed part? >> >> Yes, the fixed part is for legacy and very frequently accessed items that >> need fixed addresses. The dynamic area is for convenience and is slightly >> more expensive to access. It also has addresses that are not resolved >> until link time. >> >> The fixed area uses a static structure with a size that is known at compile >> time. The dynamic part is only known at link time and so must be allocated >> seperately. > > > the compilers know of TLS and it wouldn't take much in the backend > code to make the 'thread' keyworkd for TLS generate per-cpu data > instead of per-thread data.. basically the register settings for TLS > would have to be replaced by per cpu registers but .. wait we do > that.. > since the per-thread registers in the kernel point to per-cpu data > and are kept correct by the scheduler, shouldn't the TLS code "just > work" if we put the correct data structures in the right places? We discussed that at bsdcan and apparently it's not that simple. dfr seemed to think it would take quite some time to do the kernel linker support. There also may be issues because the compiler is free to cache thread local data but not per-cpu data so there may be a mismatch there. It would be nice ultimately to make this work but at that time DPCPU_ could just become a wrapper around __thread. Thanks, Jeff > >> >> Jeff >> >>> >>> Warner >>> >> _______________________________________________ >> freebsd-arch@freebsd.org mailing list >> http://lists.freebsd.org/mailman/listinfo/freebsd-arch >> To unsubscribe, send any mail to "freebsd-arch-unsubscribe@freebsd.org" > From owner-freebsd-arch@FreeBSD.ORG Fri Jun 19 16:23:56 2009 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id AA946106564A for ; Fri, 19 Jun 2009 16:23:56 +0000 (UTC) (envelope-from jilles@stack.nl) Received: from mx1.stack.nl (relay02.stack.nl [IPv6:2001:610:1108:5010::104]) by mx1.freebsd.org (Postfix) with ESMTP id 6AEA78FC08 for ; Fri, 19 Jun 2009 16:23:56 +0000 (UTC) (envelope-from jilles@stack.nl) Received: by mx1.stack.nl (Postfix, from userid 65534) id CDE0A359931; Fri, 19 Jun 2009 18:23:55 +0200 (CEST) X-Spam-DCC: EATSERVER: scanner01.stack.nl 1166; Body=1 Fuz1=1 Fuz2=1 X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on scanner01.stack.nl X-Spam-Level: X-Spam-Status: No, score=-2.8 required=5.0 tests=AWL,BAYES_00,NO_RELAYS autolearn=ham version=3.2.5 X-Spam-Relay-Country: _RELAYCOUNTRY_ Received: from snail.stack.nl (snail.stack.nl [IPv6:2001:610:1108:5010::131]) by mx1.stack.nl (Postfix) with ESMTP id C09E535992A for ; Fri, 19 Jun 2009 18:23:53 +0200 (CEST) Received: by snail.stack.nl (Postfix, from userid 1677) id 7D6D3228CB; Fri, 19 Jun 2009 18:23:28 +0200 (CEST) Date: Fri, 19 Jun 2009 18:23:28 +0200 From: Jilles Tjoelker To: freebsd-arch@freebsd.org Message-ID: <20090619162328.GA79975@stack.nl> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.5.18 (2008-05-17) Subject: deadlocks with intr NFS mounts and ^Z (or: PCATCH and sleepable locks) X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 19 Jun 2009 16:23:56 -0000 I have been having trouble with deadlocks with NFS mounts for a while, and I have found at least one way it can deadlock. It seems an issue with the sleep/lock system. NFS sleeps while holding a lockmgr lock, waiting for a reply from the server. When the mount is set intr, this is an interruptible sleep, so that interrupting signals can abort the sleep. However, this also means that SIGSTOP etc will suspend the thread without waking it up first, so it will be suspended with a lock held. If it holds the wrong locks, it is possible that the shell will not be able to run, and the process cannot be continued in the normal manner. Due to some other things I do not understand, it is then possible that the process cannot be continued at all (SIGCONT seems ignored), but in simple cases SIGCONT works, and things go back to normal. In any case, this situation is undesirable, as even 'umount -f' doesn't work while the thread is suspended. Of course, this reasoning applies to any code that goes to sleep interruptibly while holding a lock (sx or lockmgr). Is this supposed to be possible (likely useful)? If so, a third type of sleep would be needed that is interrupted by signals but not suspended? If not, something should check that it doesn't happen and NFS intr mounts may need to check for signals periodically or find a way to avoid sleeping with locks held. The td_locks field is only accessible for the current thread, so it cannot be used to check if suspending is safe. Also, making SIGSTOP and the like interrupt/restart syscalls is not acceptable unless you find some way to do it such that userland won't notice. For example, a read of 10 megabytes from a regular file with that much available must not return less then 10 megabytes. -- Jilles Tjoelker From owner-freebsd-arch@FreeBSD.ORG Fri Jun 19 20:26:57 2009 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 7D3BE106566C for ; Fri, 19 Jun 2009 20:26:57 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from mail.zoral.com.ua (skuns.zoral.com.ua [91.193.166.194]) by mx1.freebsd.org (Postfix) with ESMTP id F12FA8FC1A for ; Fri, 19 Jun 2009 20:26:56 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from deviant.kiev.zoral.com.ua (root@deviant.kiev.zoral.com.ua [10.1.1.148]) by mail.zoral.com.ua (8.14.2/8.14.2) with ESMTP id n5JJksdb034544 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Fri, 19 Jun 2009 22:46:54 +0300 (EEST) (envelope-from kostikbel@gmail.com) Received: from deviant.kiev.zoral.com.ua (kostik@localhost [127.0.0.1]) by deviant.kiev.zoral.com.ua (8.14.3/8.14.3) with ESMTP id n5JJksXu004256; Fri, 19 Jun 2009 22:46:54 +0300 (EEST) (envelope-from kostikbel@gmail.com) Received: (from kostik@localhost) by deviant.kiev.zoral.com.ua (8.14.3/8.14.3/Submit) id n5JJks78004255; Fri, 19 Jun 2009 22:46:54 +0300 (EEST) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: deviant.kiev.zoral.com.ua: kostik set sender to kostikbel@gmail.com using -f Date: Fri, 19 Jun 2009 22:46:54 +0300 From: Kostik Belousov To: Jilles Tjoelker Message-ID: <20090619194654.GC2884@deviant.kiev.zoral.com.ua> References: <20090619162328.GA79975@stack.nl> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="TYecfFk8j8mZq+dy" Content-Disposition: inline In-Reply-To: <20090619162328.GA79975@stack.nl> User-Agent: Mutt/1.4.2.3i X-Virus-Scanned: clamav-milter 0.95.1 at skuns.kiev.zoral.com.ua X-Virus-Status: Clean X-Spam-Status: No, score=-4.4 required=5.0 tests=ALL_TRUSTED,AWL,BAYES_00 autolearn=ham version=3.2.5 X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on skuns.kiev.zoral.com.ua Cc: freebsd-arch@freebsd.org Subject: Re: deadlocks with intr NFS mounts and ^Z (or: PCATCH and sleepable locks) X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 19 Jun 2009 20:26:57 -0000 --TYecfFk8j8mZq+dy Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Fri, Jun 19, 2009 at 06:23:28PM +0200, Jilles Tjoelker wrote: > I have been having trouble with deadlocks with NFS mounts for a while, > and I have found at least one way it can deadlock. It seems an issue > with the sleep/lock system. >=20 > NFS sleeps while holding a lockmgr lock, waiting for a reply from the > server. When the mount is set intr, this is an interruptible sleep, so > that interrupting signals can abort the sleep. However, this also means > that SIGSTOP etc will suspend the thread without waking it up first, so > it will be suspended with a lock held. >=20 > If it holds the wrong locks, it is possible that the shell will not be > able to run, and the process cannot be continued in the normal manner. >=20 > Due to some other things I do not understand, it is then possible that > the process cannot be continued at all (SIGCONT seems ignored), but in > simple cases SIGCONT works, and things go back to normal. >=20 > In any case, this situation is undesirable, as even 'umount -f' doesn't > work while the thread is suspended. >=20 > Of course, this reasoning applies to any code that goes to sleep > interruptibly while holding a lock (sx or lockmgr). Is this supposed to > be possible (likely useful)? If so, a third type of sleep would be > needed that is interrupted by signals but not suspended? If not, > something should check that it doesn't happen and NFS intr mounts may > need to check for signals periodically or find a way to avoid sleeping > with locks held. >=20 > The td_locks field is only accessible for the current thread, so it > cannot be used to check if suspending is safe. >=20 > Also, making SIGSTOP and the like interrupt/restart syscalls is not > acceptable unless you find some way to do it such that userland won't > notice. For example, a read of 10 megabytes from a regular file with > that much available must not return less then 10 megabytes. See http://lists.freebsd.org/pipermail/freebsd-smp/2009-January/001611.html --TYecfFk8j8mZq+dy Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (FreeBSD) iEQEARECAAYFAko76y0ACgkQC3+MBN1Mb4jnEQCPVJSfYUaE6l5bmEO0blk+iatx AJ9yZ4iAzLs5vCCu4Ne2vUqEMggltw== =k0BL -----END PGP SIGNATURE----- --TYecfFk8j8mZq+dy-- From owner-freebsd-arch@FreeBSD.ORG Sat Jun 20 04:45:22 2009 Return-Path: Delivered-To: freebsd-arch@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id EFA971065670 for ; Sat, 20 Jun 2009 04:45:22 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from fallbackmx06.syd.optusnet.com.au (fallbackmx06.syd.optusnet.com.au [211.29.132.8]) by mx1.freebsd.org (Postfix) with ESMTP id DFFF08FC15 for ; Sat, 20 Jun 2009 04:45:21 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail03.syd.optusnet.com.au (mail03.syd.optusnet.com.au [211.29.132.184]) by fallbackmx06.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id n5K2lxc8009885 for ; Sat, 20 Jun 2009 12:47:59 +1000 Received: from c122-106-159-184.carlnfd1.nsw.optusnet.com.au (c122-106-159-184.carlnfd1.nsw.optusnet.com.au [122.106.159.184]) by mail03.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id n5K2lrM1015560 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Sat, 20 Jun 2009 12:47:55 +1000 Date: Sat, 20 Jun 2009 12:47:53 +1000 (EST) From: Bruce Evans X-X-Sender: bde@delplex.bde.org To: Kostik Belousov In-Reply-To: <20090619194654.GC2884@deviant.kiev.zoral.com.ua> Message-ID: <20090620121543.F29239@delplex.bde.org> References: <20090619162328.GA79975@stack.nl> <20090619194654.GC2884@deviant.kiev.zoral.com.ua> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: Jilles Tjoelker , freebsd-arch@FreeBSD.org Subject: Re: deadlocks with intr NFS mounts and ^Z (or: PCATCH and sleepable locks) X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 20 Jun 2009 04:45:23 -0000 On Fri, 19 Jun 2009, Kostik Belousov wrote: > On Fri, Jun 19, 2009 at 06:23:28PM +0200, Jilles Tjoelker wrote: >> I have been having trouble with deadlocks with NFS mounts for a while, >> and I have found at least one way it can deadlock. It seems an issue >> with the sleep/lock system. >> >> NFS sleeps while holding a lockmgr lock, waiting for a reply from the >> server. When the mount is set intr, this is an interruptible sleep, so >> that interrupting signals can abort the sleep. However, this also means >> that SIGSTOP etc will suspend the thread without waking it up first, so >> it will be suspended with a lock held. >> >> If it holds the wrong locks, it is possible that the shell will not be >> able to run, and the process cannot be continued in the normal manner. >> >> Due to some other things I do not understand, it is then possible that >> the process cannot be continued at all (SIGCONT seems ignored), but in >> simple cases SIGCONT works, and things go back to normal. >> ... >> Also, making SIGSTOP and the like interrupt/restart syscalls is not >> acceptable unless you find some way to do it such that userland won't >> notice. For example, a read of 10 megabytes from a regular file with >> that much available must not return less then 10 megabytes. > > See > http://lists.freebsd.org/pipermail/freebsd-smp/2009-January/001611.html Have any fixes been applied? I now remember seeing problems like the first set above on FreeBSD cluster machines (I don't encounter "intr" nfs mounts anywhere else; mount(8) still doesn't show the "intr" option so I assume that the "intr" specified in fstab is in use on the FreeBSD machines): normal resume after ^Z on a parallel build not working, sometimes hanging the whole file system but other times recoverable after re-logging in and sending suitable SIGCONTs manually These problems seemed to go away, but right now the following problem like the second set above occurs consistently (I first noticed this last week): Script started on Sat Jun 20 02:32:51 2009 pts/0:bde@ref8-i386:~/sys7/i386/compile> sh zm ^Z [1]+ Stopped sh zm pts/0:bde@ref8-i386:~/sys7/i386/compile> % sh zm *** Stopped -- signal 18 *** Stopped -- signal 18 *** Stopped -- signal 18 *** Signal 1 *** Signal 1 *** Signal 1 `all' not remade because of errors. linking kernel ^C pts/0:bde@ref8-i386:~/sys7/i386/compile> exit Script done on Sat Jun 20 02:34:41 2009 The shell script zm builds 6 kernels in parallel using make -k -j8 for each. Signal 18 is SIGTSTP. Receiving this is normal, but the shell shouldn't print any meesages about it. Signal 1 is SIGHUP. This shouldn't occur. On another run, ISTR getting messages about i/o errors or unrestartable processes. Anyway, the messages about signals are associated with failing jobs in the build. ref7-i386 now behaves normally -- ^Z and resume just work; no messages are printed and the build completes successfully after resuming. Bruce From owner-freebsd-arch@FreeBSD.ORG Sat Jun 20 16:15:49 2009 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 223901065693 for ; Sat, 20 Jun 2009 16:15:49 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from mail.zoral.com.ua (skuns.zoral.com.ua [91.193.166.194]) by mx1.freebsd.org (Postfix) with ESMTP id 7CC358FC0A for ; Sat, 20 Jun 2009 16:15:48 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from deviant.kiev.zoral.com.ua (root@deviant.kiev.zoral.com.ua [10.1.1.148]) by mail.zoral.com.ua (8.14.2/8.14.2) with ESMTP id n5KGFhbx020113 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Sat, 20 Jun 2009 19:15:43 +0300 (EEST) (envelope-from kostikbel@gmail.com) Received: from deviant.kiev.zoral.com.ua (kostik@localhost [127.0.0.1]) by deviant.kiev.zoral.com.ua (8.14.3/8.14.3) with ESMTP id n5KGFhQv025134; Sat, 20 Jun 2009 19:15:43 +0300 (EEST) (envelope-from kostikbel@gmail.com) Received: (from kostik@localhost) by deviant.kiev.zoral.com.ua (8.14.3/8.14.3/Submit) id n5KGFejn025133; Sat, 20 Jun 2009 19:15:40 +0300 (EEST) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: deviant.kiev.zoral.com.ua: kostik set sender to kostikbel@gmail.com using -f Date: Sat, 20 Jun 2009 19:15:40 +0300 From: Kostik Belousov To: Jilles Tjoelker Message-ID: <20090620161540.GF2884@deviant.kiev.zoral.com.ua> References: <20090619162328.GA79975@stack.nl> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="l+goss899txtYvYf" Content-Disposition: inline In-Reply-To: <20090619162328.GA79975@stack.nl> User-Agent: Mutt/1.4.2.3i X-Virus-Scanned: clamav-milter 0.95.1 at skuns.kiev.zoral.com.ua X-Virus-Status: Clean X-Spam-Status: No, score=-4.4 required=5.0 tests=ALL_TRUSTED,AWL,BAYES_00 autolearn=ham version=3.2.5 X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on skuns.kiev.zoral.com.ua Cc: freebsd-arch@freebsd.org Subject: Re: deadlocks with intr NFS mounts and ^Z (or: PCATCH and sleepable locks) X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 20 Jun 2009 16:15:50 -0000 --l+goss899txtYvYf Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Fri, Jun 19, 2009 at 06:23:28PM +0200, Jilles Tjoelker wrote: > I have been having trouble with deadlocks with NFS mounts for a while, > and I have found at least one way it can deadlock. It seems an issue > with the sleep/lock system. >=20 > NFS sleeps while holding a lockmgr lock, waiting for a reply from the > server. When the mount is set intr, this is an interruptible sleep, so > that interrupting signals can abort the sleep. However, this also means > that SIGSTOP etc will suspend the thread without waking it up first, so > it will be suspended with a lock held. >=20 > If it holds the wrong locks, it is possible that the shell will not be > able to run, and the process cannot be continued in the normal manner. >=20 > Due to some other things I do not understand, it is then possible that > the process cannot be continued at all (SIGCONT seems ignored), but in > simple cases SIGCONT works, and things go back to normal. >=20 > In any case, this situation is undesirable, as even 'umount -f' doesn't > work while the thread is suspended. >=20 > Of course, this reasoning applies to any code that goes to sleep > interruptibly while holding a lock (sx or lockmgr). Is this supposed to > be possible (likely useful)? If so, a third type of sleep would be > needed that is interrupted by signals but not suspended? If not, > something should check that it doesn't happen and NFS intr mounts may > need to check for signals periodically or find a way to avoid sleeping > with locks held. >=20 > The td_locks field is only accessible for the current thread, so it > cannot be used to check if suspending is safe. >=20 > Also, making SIGSTOP and the like interrupt/restart syscalls is not > acceptable unless you find some way to do it such that userland won't > notice. For example, a read of 10 megabytes from a regular file with > that much available must not return less then 10 megabytes. Note that NFS does check for the signals during i/o, so you may get short reads anyway. I do think that the right solution both there and with SINGLE_NO_EXIT case for thread_single is to stop at the usermode boundary instead of suspending a thread in the interruptible sleep state. I set error code returned from interrupted msleep() to ERESTART, that seems to be the right thing, at least to restart the i/o that transferred no data upon receiving SIGSTOP. My current patch is below. It contains some not strictly related changes, e.g. for wakeup(). diff --git a/sys/kern/kern_sig.c b/sys/kern/kern_sig.c index 5c1d553..28f4f4f 100644 --- a/sys/kern/kern_sig.c +++ b/sys/kern/kern_sig.c @@ -2310,18 +2310,22 @@ static void sig_suspend_threads(struct thread *td, struct proc *p, int sending) { struct thread *td2; + int wakeup_swapper; =20 PROC_LOCK_ASSERT(p, MA_OWNED); PROC_SLOCK_ASSERT(p, MA_OWNED); =20 + wakeup_swapper =3D 0; FOREACH_THREAD_IN_PROC(p, td2) { thread_lock(td2); td2->td_flags |=3D TDF_ASTPENDING | TDF_NEEDSUSPCHK; if ((TD_IS_SLEEPING(td2) || TD_IS_SWAPPED(td2)) && - (td2->td_flags & TDF_SINTR) && - !TD_IS_SUSPENDED(td2)) { - thread_suspend_one(td2); - } else { + (td2->td_flags & TDF_SINTR)) { + if (TD_IS_SUSPENDED(td2)) + wakeup_swapper |=3D thread_unsuspend_one(td2); + if (TD_ON_SLEEPQ(td2) && (td2->td_flags & TDF_SINTR)) + wakeup_swapper |=3D sleepq_abort(td2, ERESTART); + } else if (!TD_IS_SUSPENDED(td2)) { if (sending || td !=3D td2) td2->td_flags |=3D TDF_ASTPENDING; #ifdef SMP @@ -2331,6 +2335,8 @@ sig_suspend_threads(struct thread *td, struct proc *p= , int sending) } thread_unlock(td2); } + if (wakeup_swapper) + kick_proc0(); } =20 int diff --git a/sys/kern/kern_synch.c b/sys/kern/kern_synch.c index b91c1a5..d27d027 100644 --- a/sys/kern/kern_synch.c +++ b/sys/kern/kern_synch.c @@ -344,11 +344,16 @@ wakeup(void *ident) { int wakeup_swapper; =20 + repeat: sleepq_lock(ident); wakeup_swapper =3D sleepq_broadcast(ident, SLEEPQ_SLEEP, 0, 0); sleepq_release(ident); - if (wakeup_swapper) - kick_proc0(); + if (wakeup_swapper) { + if (ident =3D=3D &proc0) + goto repeat; + else + kick_proc0(); + } } =20 /* @@ -361,11 +366,16 @@ wakeup_one(void *ident) { int wakeup_swapper; =20 + repeat: sleepq_lock(ident); wakeup_swapper =3D sleepq_signal(ident, SLEEPQ_SLEEP, 0, 0); sleepq_release(ident); - if (wakeup_swapper) - kick_proc0(); + if (wakeup_swapper) { + if (ident =3D=3D &proc0) + goto repeat; + else + kick_proc0(); + } } =20 static void diff --git a/sys/kern/kern_thread.c b/sys/kern/kern_thread.c index bb8779b..800a1d1 100644 --- a/sys/kern/kern_thread.c +++ b/sys/kern/kern_thread.c @@ -504,6 +504,22 @@ thread_unlink(struct thread *td) /* Must NOT clear links to proc! */ } =20 +static int +recalc_remaining(struct proc *p, int mode) +{ + int remaining; + + if (mode =3D=3D SINGLE_EXIT) + remaining =3D p->p_numthreads; + else if (mode =3D=3D SINGLE_BOUNDARY) + remaining =3D p->p_numthreads - p->p_boundary_count; + else if (mode =3D=3D SINGLE_NO_EXIT) + remaining =3D p->p_numthreads - p->p_suspcount; + else + panic("recalc_remaining: wrong mode %d", mode); + return (remaining); +} + /* * Enforce single-threading. * @@ -551,12 +567,7 @@ thread_single(int mode) p->p_flag |=3D P_STOPPED_SINGLE; PROC_SLOCK(p); p->p_singlethread =3D td; - if (mode =3D=3D SINGLE_EXIT) - remaining =3D p->p_numthreads; - else if (mode =3D=3D SINGLE_BOUNDARY) - remaining =3D p->p_numthreads - p->p_boundary_count; - else - remaining =3D p->p_numthreads - p->p_suspcount; + remaining =3D recalc_remaining(p, mode); while (remaining !=3D 1) { if (P_SHOULDSTOP(p) !=3D P_STOPPED_SINGLE) goto stopme; @@ -587,18 +598,17 @@ thread_single(int mode) wakeup_swapper |=3D sleepq_abort(td2, ERESTART); break; + case SINGLE_NO_EXIT: + if (TD_IS_SUSPENDED(td2) && + !(td2->td_flags & TDF_BOUNDARY)) + wakeup_swapper |=3D + thread_unsuspend_one(td2); + if (TD_ON_SLEEPQ(td2) && + (td2->td_flags & TDF_SINTR)) + wakeup_swapper |=3D + sleepq_abort(td2, ERESTART); + break; default: - if (TD_IS_SUSPENDED(td2)) { - thread_unlock(td2); - continue; - } - /* - * maybe other inhibited states too? - */ - if ((td2->td_flags & TDF_SINTR) && - (td2->td_inhibitors & - (TDI_SLEEPING | TDI_SWAPPED))) - thread_suspend_one(td2); break; } } @@ -611,12 +621,7 @@ thread_single(int mode) } if (wakeup_swapper) kick_proc0(); - if (mode =3D=3D SINGLE_EXIT) - remaining =3D p->p_numthreads; - else if (mode =3D=3D SINGLE_BOUNDARY) - remaining =3D p->p_numthreads - p->p_boundary_count; - else - remaining =3D p->p_numthreads - p->p_suspcount; + remaining =3D recalc_remaining(p, mode); =20 /* * Maybe we suspended some threads.. was it enough? @@ -630,12 +635,7 @@ stopme: * In the mean time we suspend as well. */ thread_suspend_switch(td); - if (mode =3D=3D SINGLE_EXIT) - remaining =3D p->p_numthreads; - else if (mode =3D=3D SINGLE_BOUNDARY) - remaining =3D p->p_numthreads - p->p_boundary_count; - else - remaining =3D p->p_numthreads - p->p_suspcount; + remaining =3D recalc_remaining(p, mode); } if (mode =3D=3D SINGLE_EXIT) { /* --l+goss899txtYvYf Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (FreeBSD) iEYEARECAAYFAko9CywACgkQC3+MBN1Mb4gfMACg53hcePE93ReoIY5Fcaqgc2lI M6EAoLeRDAJZeskPOIlNJ+4Hs6W9IsUo =nuph -----END PGP SIGNATURE----- --l+goss899txtYvYf-- From owner-freebsd-arch@FreeBSD.ORG Sat Jun 20 16:50:57 2009 Return-Path: Delivered-To: freebsd-arch@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 96E871065676 for ; Sat, 20 Jun 2009 16:50:57 +0000 (UTC) (envelope-from kris@FreeBSD.org) Received: from dhcp-172-28-77-157.eur.corp.google.com (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id 803248FC12; Sat, 20 Jun 2009 16:50:56 +0000 (UTC) (envelope-from kris@FreeBSD.org) Message-ID: <4A3D136F.1090106@FreeBSD.org> Date: Sat, 20 Jun 2009 17:50:55 +0100 From: Kris Kennaway User-Agent: Thunderbird 2.0.0.21 (Macintosh/20090302) MIME-Version: 1.0 To: Bruce Evans References: <20090619162328.GA79975@stack.nl> <20090619194654.GC2884@deviant.kiev.zoral.com.ua> <20090620121543.F29239@delplex.bde.org> In-Reply-To: <20090620121543.F29239@delplex.bde.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: Kostik Belousov , Jilles Tjoelker , freebsd-arch@FreeBSD.org Subject: Re: deadlocks with intr NFS mounts and ^Z (or: PCATCH and sleepable locks) X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 20 Jun 2009 16:50:57 -0000 Bruce Evans wrote: > These problems seemed to go away, but right now the following problem > like the second set above occurs consistently (I first noticed this > last week): > > Script started on Sat Jun 20 02:32:51 2009 > pts/0:bde@ref8-i386:~/sys7/i386/compile> sh zm > ^Z > [1]+ Stopped sh zm > pts/0:bde@ref8-i386:~/sys7/i386/compile> % > sh zm > *** Stopped -- signal 18 > *** Stopped -- signal 18 > *** Stopped -- signal 18 > *** Signal 1 > *** Signal 1 > *** Signal 1 > `all' not remade because of errors. > linking kernel > ^C > pts/0:bde@ref8-i386:~/sys7/i386/compile> exit > > Script done on Sat Jun 20 02:34:41 2009 > > The shell script zm builds 6 kernels in parallel using make -k -j8 for > each. Signal 18 is SIGTSTP. Receiving this is normal, but the shell > shouldn't print any meesages about it. Signal 1 is SIGHUP. This > shouldn't occur. On another run, ISTR getting messages about i/o > errors or unrestartable processes. Anyway, the messages about signals > are associated with failing jobs in the build. That's a long standing bug that I don't think is limited to NFS. I first started seeing it several years ago after some changes to make(1), but I don't recall if phk disputed that they were to blame. Kris From owner-freebsd-arch@FreeBSD.ORG Sat Jun 20 20:33:30 2009 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 737E5106564A for ; Sat, 20 Jun 2009 20:33:30 +0000 (UTC) (envelope-from jilles@stack.nl) Received: from mx1.stack.nl (relay02.stack.nl [IPv6:2001:610:1108:5010::104]) by mx1.freebsd.org (Postfix) with ESMTP id 2FA288FC0C for ; Sat, 20 Jun 2009 20:33:30 +0000 (UTC) (envelope-from jilles@stack.nl) Received: by mx1.stack.nl (Postfix, from userid 65534) id 7EDF7359951; Sat, 20 Jun 2009 22:33:28 +0200 (CEST) X-Spam-DCC: CTc-dcc2: scanner01.stack.nl 1031; Body=1 Fuz1=1 Fuz2=1 X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on scanner01.stack.nl X-Spam-Level: X-Spam-Status: No, score=-2.8 required=5.0 tests=AWL,BAYES_00,NO_RELAYS autolearn=ham version=3.2.5 X-Spam-Relay-Country: _RELAYCOUNTRY_ Received: from snail.stack.nl (snail.stack.nl [IPv6:2001:610:1108:5010::131]) by mx1.stack.nl (Postfix) with ESMTP id 1AD1235993F; Sat, 20 Jun 2009 22:33:26 +0200 (CEST) Received: by snail.stack.nl (Postfix, from userid 1677) id C0456228CB; Sat, 20 Jun 2009 22:33:00 +0200 (CEST) Date: Sat, 20 Jun 2009 22:33:00 +0200 From: Jilles Tjoelker To: Kostik Belousov Message-ID: <20090620203300.GA21763@stack.nl> References: <20090619162328.GA79975@stack.nl> <20090620161540.GF2884@deviant.kiev.zoral.com.ua> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20090620161540.GF2884@deviant.kiev.zoral.com.ua> User-Agent: Mutt/1.5.18 (2008-05-17) Cc: freebsd-arch@freebsd.org Subject: Re: deadlocks with intr NFS mounts and ^Z (or: PCATCH and sleepable locks) X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 20 Jun 2009 20:33:30 -0000 On Sat, Jun 20, 2009 at 07:15:40PM +0300, Kostik Belousov wrote: > On Fri, Jun 19, 2009 at 06:23:28PM +0200, Jilles Tjoelker wrote: > > I have been having trouble with deadlocks with NFS mounts for a while, > > and I have found at least one way it can deadlock. It seems an issue > > with the sleep/lock system. > > NFS sleeps while holding a lockmgr lock, waiting for a reply from the > > server. When the mount is set intr, this is an interruptible sleep, so > > that interrupting signals can abort the sleep. However, this also means > > that SIGSTOP etc will suspend the thread without waking it up first, so > > it will be suspended with a lock held. > > If it holds the wrong locks, it is possible that the shell will not be > > able to run, and the process cannot be continued in the normal manner. > > Due to some other things I do not understand, it is then possible that > > the process cannot be continued at all (SIGCONT seems ignored), but in > > simple cases SIGCONT works, and things go back to normal. > > In any case, this situation is undesirable, as even 'umount -f' doesn't > > work while the thread is suspended. > > Of course, this reasoning applies to any code that goes to sleep > > interruptibly while holding a lock (sx or lockmgr). Is this supposed to > > be possible (likely useful)? If so, a third type of sleep would be > > needed that is interrupted by signals but not suspended? If not, > > something should check that it doesn't happen and NFS intr mounts may > > need to check for signals periodically or find a way to avoid sleeping > > with locks held. > > The td_locks field is only accessible for the current thread, so it > > cannot be used to check if suspending is safe. > > Also, making SIGSTOP and the like interrupt/restart syscalls is not > > acceptable unless you find some way to do it such that userland won't > > notice. For example, a read of 10 megabytes from a regular file with > > that much available must not return less then 10 megabytes. > Note that NFS does check for the signals during i/o, so you may get > short reads anyway. > I do think that the right solution both there and with SINGLE_NO_EXIT > case for thread_single is to stop at the usermode boundary instead of > suspending a thread in the interruptible sleep state. > I set error code returned from interrupted msleep() to ERESTART, > that seems to be the right thing, at least to restart the i/o that > transferred no data upon receiving SIGSTOP. Any such short read on a regular file is wrong. That that badness already occurs in some cases is not an excuse to make it occur more often. Particularly because process suspension is expected not to affect the process and interrupting syscalls would change the behaviour of the debugged program significantly, while the current interruptions only occur with signals that likely terminate the process anyway (note that intr mounts only check for SIGINT, SIGTERM, SIGHUP, SIGKILL, SIGSTOP and SIGQUIT and appear to mask all others; I don't know why SIGTSTP gets through -- possibly a thread/process difference). No matter the SIGSTOP issue, a warning about the interruptions in the mount_nfs(8) man page may be in order; the current language makes the impression that intr is only a good thing, which is not the case. This applies to all NFS versions. A better way to deal with nonresponsive NFS servers that will not come back would be forced unmount (does it always work, apart from the case mentioned above? same for the experimental client?). SIGKILL (but not any other signal, not even SIGSTOP) could also be allowed on processes blocked on nointr mounts. Another point (mostly for socket operations and the like) is that the current causes of interrupted system calls are under control of the application: if you do not catch any signals, you will only get short read/writes for reasons related to the underlying object; hence, it is often not necessary to add (ugly) code to handle it: any unexpected short read or write is a problem with the underlying object. Another example which currently works and would be a shame to break: % /usr/bin/time sleep 10 ^Z zsh: suspended /usr/bin/time sleep 10 % fg [1] + continued /usr/bin/time sleep 10 10.00 real 0.00 user 0.00 sys % What's more, the fact that this works is thanks to the kernel. sleep(1) just calls nanosleep(2), and because it doesn't catch any signals, that suffices. I do notice this is already broken for debuggers. Attaching gdb or truss to a running sleep process immediately aborts the nanosleep with EINTR. -- Jilles Tjoelker