From owner-freebsd-arch@FreeBSD.ORG Sun Sep 8 10:29:58 2013 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTP id 5D818EF4; Sun, 8 Sep 2013 10:29:58 +0000 (UTC) (envelope-from etnapierala@gmail.com) Received: from mail-ee0-x230.google.com (mail-ee0-x230.google.com [IPv6:2a00:1450:4013:c00::230]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 9F3D2267E; Sun, 8 Sep 2013 10:29:57 +0000 (UTC) Received: by mail-ee0-f48.google.com with SMTP id l10so2517789eei.35 for ; Sun, 08 Sep 2013 03:29:56 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=sender:content-type:mime-version:subject:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; bh=AQCNobY63TJbO8d2OP7OYtJ31e+g+XXAiHmsVYiU50w=; b=tEssIvjG6VuHew/8gzz7qbwxsSB64Pvy9b6cSv933viRS0FGMOx5SA6iOLqvKyF5JN cK/nIVh7eEmoBFxEo0tFMSrAInlH9hjvXSHGho8dEGqAzQxdV7Et6zFH4FNL+zoAgEzn jTkLieWqhdZWB7sdh/BHCigMpXWRztAP0dqlpChrt8JndNWODECK4N9udos1badCUwPb wLOoQtiwXNugsp1WsKcx0jd4uKYTjLlDzj6vAOpGWCjQV+yqGys5ps+YOhK3tpV9uqn6 E1ILmn135zGheQAcQnHZCzmgu3VhbGHKGL9DpJ+5jLpBjznIlBlwaEfrGIl0XjOufHaG Hm+g== X-Received: by 10.14.184.3 with SMTP id r3mr2092005eem.49.1378636195840; Sun, 08 Sep 2013 03:29:55 -0700 (PDT) Received: from [192.168.1.102] (ajf203.neoplus.adsl.tpnet.pl. [83.25.239.203]) by mx.google.com with ESMTPSA id f49sm12299889eec.7.1969.12.31.16.00.00 (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Sun, 08 Sep 2013 03:29:55 -0700 (PDT) Sender: =?UTF-8?Q?Edward_Tomasz_Napiera=C5=82a?= Content-Type: text/plain; charset=iso-8859-2 Mime-Version: 1.0 (Mac OS X Mail 6.5 \(1508\)) Subject: Re: New iSCSI stack. From: =?iso-8859-2?Q?Edward_Tomasz_Napiera=B3a?= In-Reply-To: <522A1C73.9030402@mu.org> Date: Sun, 8 Sep 2013 12:29:53 +0200 Content-Transfer-Encoding: quoted-printable Message-Id: References: <522A1C73.9030402@mu.org> To: Alfred Perlstein X-Mailer: Apple Mail (2.1508) Cc: freebsd-scsi@freebsd.org, "freebsd-current@FreeBSD.org" , "freebsd-arch@freebsd.org" X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 08 Sep 2013 10:29:58 -0000 Wiadomo=B6=E6 napisana przez Alfred Perlstein w dniu 6 = wrz 2013, o godz. 20:18: > On 9/5/13 3:27 AM, Edward Tomasz Napiera=B3a wrote: >> Hello. At http://people.freebsd.org/~trasz/cfiscsi-20130904.diff = you'll find >> a patch which adds the new iSCSI initiator and target, against = 10-CURRENT. >> To use the new initiator, start with "man iscsictl". For the target = - "man >> ctld". >>=20 >> All feedback is welcome. If nothing unexpected comes up, I'll commit = it >> in a few days from now. Note that it's still not optimized; at this = point >> I'm focusing more on reliability and interoperability. >>=20 >> This work is being sponsored by FreeBSD Foundation. >>=20 >> _______________________________________________ >> freebsd-current@freebsd.org mailing list >> http://lists.freebsd.org/mailman/listinfo/freebsd-current >> To unsubscribe, send any mail to = "freebsd-current-unsubscribe@freebsd.org" >>=20 > Edward, this is really exciting! >=20 > Is there an easy way to use the userland iscsi configuration files? Which iSCSI userland configuration files, the ctl.conf(5)? If you need an ability to parse it and modify from a shell scripts, see confctl = utility (sysutils/confctl, https://github.com/trasz/confctl/).=20 > We would love to quickly backport and ship this with FreeNAS as an = option for our users, having the config files be the same OR having a = very good converter would really make that much easier for us. Porting to 9 should be quite easy - there are Capsicum API differences; you might also want to compare CTL between 10 and 9 to see if there are any changes which need to be merged. Taking a look at the code = searching for possible security issues would be also very welcome :-) As for the config files - writing a converter should be quite easy. = Which configuration files you need to support, ctl.conf(5) and istgt = configuration? From owner-freebsd-arch@FreeBSD.ORG Sun Sep 8 10:45:09 2013 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTP id 2FC2732D; Sun, 8 Sep 2013 10:45:09 +0000 (UTC) (envelope-from slw@zxy.spb.ru) Received: from zxy.spb.ru (zxy.spb.ru [195.70.199.98]) by mx1.freebsd.org (Postfix) with ESMTP id CC457273C; Sun, 8 Sep 2013 10:45:08 +0000 (UTC) Received: from slw by zxy.spb.ru with local (Exim 4.69 (FreeBSD)) (envelope-from ) id 1VIcWN-000GGZ-SW; Sun, 08 Sep 2013 14:47:11 +0400 Date: Sun, 8 Sep 2013 14:47:11 +0400 From: Slawa Olhovchenkov To: Edward Tomasz Napiera?a Subject: Re: New iSCSI stack. Message-ID: <20130908104711.GB41751@zxy.spb.ru> References: <522A1C73.9030402@mu.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) X-SA-Exim-Connect-IP: X-SA-Exim-Mail-From: slw@zxy.spb.ru X-SA-Exim-Scanned: No (on zxy.spb.ru); SAEximRunCond expanded to false Cc: freebsd-scsi@freebsd.org, "freebsd-arch@freebsd.org" , Alfred Perlstein , "freebsd-current@FreeBSD.org" X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 08 Sep 2013 10:45:09 -0000 On Sun, Sep 08, 2013 at 12:29:53PM +0200, Edward Tomasz Napiera?a wrote: > > We would love to quickly backport and ship this with FreeNAS as an option for our users, having the config files be the same OR having a very good converter would really make that much easier for us. > > Porting to 9 should be quite easy - there are Capsicum API differences; > you might also want to compare CTL between 10 and 9 to see if there are > any changes which need to be merged. Taking a look at the code searching > for possible security issues would be also very welcome :-) > > As for the config files - writing a converter should be quite easy. Which > configuration files you need to support, ctl.conf(5) and istgt configuration? Can you write utility for _generate_ ctl.conf from runtime configuration? Curenly configuring directly by `ctladm create` is more predictable from script, but incompatible by syntax and not persistent. From owner-freebsd-arch@FreeBSD.ORG Sun Sep 8 13:32:06 2013 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTP id B24F48A6; Sun, 8 Sep 2013 13:32:06 +0000 (UTC) (envelope-from outbackdingo@gmail.com) Received: from mail-pb0-x236.google.com (mail-pb0-x236.google.com [IPv6:2607:f8b0:400e:c01::236]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 79D4E2E0C; Sun, 8 Sep 2013 13:32:06 +0000 (UTC) Received: by mail-pb0-f54.google.com with SMTP id ro12so5078895pbb.13 for ; Sun, 08 Sep 2013 06:32:06 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=kow/VCGdD95nw4ykriplHWLEEjxJpJPYf5h0XIyugf8=; b=sByVZz1jvmiPqtrVW1oZN/qpYXCYxvNho5TIfj1Tb76FWfmwOSCiWohFjNj8ZJXcEA hAdqGyGxJ8o4yptHkFmOrN4ckvcsS5o0b4iwhl88LITBti7j6o4ReW+xBoXtN4TltiW/ maJ4dfI1CnTytsUSnByheexoufluJPjRS2imekZNEmfg4Y5JXT/Irndzu80o7w/g7I4u sAhR8B5NH1SecP8hInap6TtpwXtpHdkN63nAt023y6psci1uV5PITEuBG2bZoOuv0u6Z kzP2x00kJoh3jYpUL1ZF8yhZr5n1SfRkEG5oFsr3euK+4m5yXZ4081dF9HB9zmEdOCk2 jspQ== MIME-Version: 1.0 X-Received: by 10.66.171.77 with SMTP id as13mr1134059pac.170.1378647126092; Sun, 08 Sep 2013 06:32:06 -0700 (PDT) Received: by 10.66.126.141 with HTTP; Sun, 8 Sep 2013 06:32:06 -0700 (PDT) In-Reply-To: References: <522A1C73.9030402@mu.org> Date: Sun, 8 Sep 2013 09:32:06 -0400 Message-ID: Subject: Re: New iSCSI stack. From: Outback Dingo To: =?ISO-8859-2?Q?Edward_Tomasz_Napiera=B3a?= Content-Type: text/plain; charset=ISO-8859-2 Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.14 Cc: freebsd-scsi@freebsd.org, "freebsd-arch@freebsd.org" , Alfred Perlstein , "freebsd-current@FreeBSD.org" X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 08 Sep 2013 13:32:06 -0000 On Sun, Sep 8, 2013 at 6:29 AM, Edward Tomasz Napiera=B3a wrote: > Wiadomo=B6=E6 napisana przez Alfred Perlstein w dniu 6 wr= z > 2013, o godz. 20:18: > > On 9/5/13 3:27 AM, Edward Tomasz Napiera=B3a wrote: > >> Hello. At http://people.freebsd.org/~trasz/cfiscsi-20130904.diffyou'l= l find > >> a patch which adds the new iSCSI initiator and target, against > 10-CURRENT. > >> To use the new initiator, start with "man iscsictl". For the target - > "man > >> ctld". > >> > >> All feedback is welcome. If nothing unexpected comes up, I'll commit = it > >> in a few days from now. Note that it's still not optimized; at this > point > >> I'm focusing more on reliability and interoperability. > >> > >> This work is being sponsored by FreeBSD Foundation. > >> > >> _______________________________________________ > >> freebsd-current@freebsd.org mailing list > >> http://lists.freebsd.org/mailman/listinfo/freebsd-current > >> To unsubscribe, send any mail to " > freebsd-current-unsubscribe@freebsd.org" > >> > > Edward, this is really exciting! > > > > Is there an easy way to use the userland iscsi configuration files? > > Which iSCSI userland configuration files, the ctl.conf(5)? If you need > an ability to parse it and modify from a shell scripts, see confctl utili= ty > (sysutils/confctl, https://github.com/trasz/confctl/). > > > We would love to quickly backport and ship this with FreeNAS as an > option for our users, having the config files be the same OR having a ver= y > good converter would really make that much easier for us. > > Porting to 9 should be quite easy - there are Capsicum API differences; > you might also want to compare CTL between 10 and 9 to see if there are > any changes which need to be merged. Taking a look at the code searching > for possible security issues would be also very welcome :-) > > As for the config files - writing a converter should be quite easy. Whic= h > configuration files you need to support, ctl.conf(5) and istgt > configuration? > I was i belive quite close to having it working on the last patch, however could never seem to get the ctl kernel module to function, And feel im a bit further away with this latest patch retracing my steps, from previous... quite easy to backport.... maybe for you, or other but yes, I also would like to integrate the work to stable/9 in the lab for some benchmarks > > _______________________________________________ > freebsd-current@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-current > To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org= " > From owner-freebsd-arch@FreeBSD.ORG Mon Sep 9 17:46:05 2013 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTP id 957A03C1 for ; Mon, 9 Sep 2013 17:46:05 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from bigwig.baldwin.cx (bigwig.baldwin.cx [IPv6:2001:470:1f11:75::1]) (using TLSv1 with cipher ADH-CAMELLIA256-SHA (256/256 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 70308283C for ; Mon, 9 Sep 2013 17:46:05 +0000 (UTC) Received: from jhbbsd.localnet (unknown [209.249.190.124]) by bigwig.baldwin.cx (Postfix) with ESMTPSA id 863DDB99B; Mon, 9 Sep 2013 13:46:04 -0400 (EDT) From: John Baldwin To: freebsd-arch@freebsd.org Subject: Re: COMPAT_32BIT oddness in rtld-elf (was: Re: /usr/lib/private) Date: Mon, 9 Sep 2013 11:27:16 -0400 User-Agent: KMail/1.13.5 (FreeBSD/8.4-CBSD-20130906; KDE/4.5.5; amd64; ; ) References: <86zjrut4an.fsf@nine.des.no> <8638pgkg3m.fsf@nine.des.no> In-Reply-To: <8638pgkg3m.fsf@nine.des.no> MIME-Version: 1.0 Content-Type: Text/Plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <201309091127.16643.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.7 (bigwig.baldwin.cx); Mon, 09 Sep 2013 13:46:04 -0400 (EDT) Cc: Dag-Erling =?utf-8?q?Sm=C3=B8rgrav?= X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 09 Sep 2013 17:46:05 -0000 On Saturday, September 07, 2013 7:48:13 am Dag-Erling Sm=C3=B8rgrav wrote: > I'm having trouble understanding this code in libexec/rtld-elf/rtld.c: >=20 > static void * > path_enumerate(const char *path, path_enum_proc callback, void *arg) > { > #ifdef COMPAT_32BIT > const char *trans; > #endif > if (path =3D=3D NULL) > return (NULL); >=20 > path +=3D strspn(path, ":;"); > while (*path !=3D '\0') { > size_t len; > char *res; >=20 > len =3D strcspn(path, ":;"); > #ifdef COMPAT_32BIT > trans =3D lm_findn(NULL, path, len); > if (trans) > res =3D callback(trans, strlen(trans), arg); > else > #endif > res =3D callback(path, len, arg); >=20 > if (res !=3D NULL) > return (res); >=20 > path +=3D len; > path +=3D strspn(path, ":;"); > } >=20 > return (NULL); > } >=20 > This function is used to traverse paths, such as rtld's built-in search > path, LD_LIBRARY_PATH, an Elf object's rpath, etc. As far as I can > tell, the result of this is that *in the COMPAT_32BIT case only* it is > possible to list one directory as replacing another in libmap.conf. In > other words, we could have this in libmap32.conf: >=20 > /lib /lib32 > /usr/lib /usr/lib32 > /usr/lib/private /usr/lib32/private >=20 > instead of hardcoding a different standard search path in rtld.h. >=20 > What I don't understand is why this functionality is only available in > the COMPAT_32BIT case. It seems universally useful to me. I think it would be fine to make it universally available. You should talk= to=20 Doug Ambrisko. He has patches to extend libmap support to key off OS-versi= on,=20 etc. and might already make the directory case universal. =2D-=20 John Baldwin From owner-freebsd-arch@FreeBSD.ORG Wed Sep 11 15:06:02 2013 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTP id AEC5F785; Wed, 11 Sep 2013 15:06:02 +0000 (UTC) (envelope-from outbackdingo@gmail.com) Received: from mail-pd0-x233.google.com (mail-pd0-x233.google.com [IPv6:2607:f8b0:400e:c02::233]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 7432021F1; Wed, 11 Sep 2013 15:06:02 +0000 (UTC) Received: by mail-pd0-f179.google.com with SMTP id v10so9363952pde.10 for ; Wed, 11 Sep 2013 08:06:02 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=PZ1XxfmIn3i1eBm4Ky4ESb1fgOl2sGht+tPdciHJn7U=; b=rljuMJzCEZfJduXL+gDcEY0MzdL+ybEFIMg/ZCR5zBYMpnpWfi0Rw3RgpvlwE3s9ai BXlbKYlcjLzZ9i1S6DmyVsaqzPuhHGZbgkqBC2BgwlmB19nAyjKUTDoGEzj/aT/H/woG qVb6cl2m67N2vEJENfLhG6IK5nadheudHrDfL8Trgh6gMqOCVwrmb/ekQNlxQTWHs970 E7Y+A6fqgf2oCmLY/Y5Ckxd2rJFB5odrGUfApS8p0E+A62mXbtFuO0qyg2bEACJJBHo/ dBRVSsD5XIKm+Dx3LXf37/+u8GZWR6KRB6a+eoL7qhOZABeL5sAJ/R77B7RrCxqIudgf opOQ== MIME-Version: 1.0 X-Received: by 10.68.98.36 with SMTP id ef4mr2320903pbb.27.1378911961951; Wed, 11 Sep 2013 08:06:01 -0700 (PDT) Received: by 10.66.126.141 with HTTP; Wed, 11 Sep 2013 08:06:01 -0700 (PDT) In-Reply-To: References: <522A1C73.9030402@mu.org> Date: Wed, 11 Sep 2013 11:06:01 -0400 Message-ID: Subject: Re: New iSCSI stack. From: Outback Dingo To: =?ISO-8859-2?Q?Edward_Tomasz_Napiera=B3a?= Content-Type: text/plain; charset=ISO-8859-2 Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.14 Cc: freebsd-scsi@freebsd.org, "freebsd-arch@freebsd.org" , Alfred Perlstein , "freebsd-current@FreeBSD.org" X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 11 Sep 2013 15:06:02 -0000 On Sun, Sep 8, 2013 at 9:32 AM, Outback Dingo wrote= : > > > > On Sun, Sep 8, 2013 at 6:29 AM, Edward Tomasz Napiera=B3a > wrote: > >> Wiadomo=B6=E6 napisana przez Alfred Perlstein w dniu 6 w= rz >> 2013, o godz. 20:18: >> > On 9/5/13 3:27 AM, Edward Tomasz Napiera=B3a wrote: >> >> Hello. At http://people.freebsd.org/~trasz/cfiscsi-20130904.diffyou'= ll find >> >> a patch which adds the new iSCSI initiator and target, against >> 10-CURRENT. >> >> To use the new initiator, start with "man iscsictl". For the target = - >> "man >> >> ctld". >> >> >> >> All feedback is welcome. If nothing unexpected comes up, I'll commit >> it >> >> in a few days from now. Note that it's still not optimized; at this >> point >> >> I'm focusing more on reliability and interoperability. >> >> >> >> This work is being sponsored by FreeBSD Foundation. >> >> >> >> _______________________________________________ >> >> freebsd-current@freebsd.org mailing list >> >> http://lists.freebsd.org/mailman/listinfo/freebsd-current >> >> To unsubscribe, send any mail to " >> freebsd-current-unsubscribe@freebsd.org" >> >> >> > Edward, this is really exciting! >> > >> > Is there an easy way to use the userland iscsi configuration files? >> >> Which iSCSI userland configuration files, the ctl.conf(5)? If you need >> an ability to parse it and modify from a shell scripts, see confctl >> utility >> (sysutils/confctl, https://github.com/trasz/confctl/). >> >> > We would love to quickly backport and ship this with FreeNAS as an >> option for our users, having the config files be the same OR having a ve= ry >> good converter would really make that much easier for us. >> >> Porting to 9 should be quite easy - there are Capsicum API differences; >> you might also want to compare CTL between 10 and 9 to see if there are >> any changes which need to be merged. Taking a look at the code searchin= g >> for possible security issues would be also very welcome :-) >> >> As for the config files - writing a converter should be quite easy. Whi= ch >> configuration files you need to support, ctl.conf(5) and istgt >> configuration? >> > > I was i belive quite close to having it working on the last patch, howeve= r > could never seem to get the ctl kernel module to function, > And feel im a bit further away with this latest patch retracing my steps, > from previous... quite easy to backport.... maybe for you, or other > but yes, I also would like to integrate the work to stable/9 in the lab > for some benchmarks > >> >> Still trying to tackle this...... any ideas?? I think if i can get past the few errors im encountering i can get a patch against stable/9 for others to test.... clang -O2 -pipe -fno-strict-aliasing -Werror -D_KERNEL -DKLD_MODULE -nostdinc -DHAVE_KERNEL_OPTION_HEADERS -include /usr/obj/usr/home/dingo/devel/sys/GENERIC/opt_global.h -I. -I@ -I@/contrib/= altq -fno-common -g -fno-omit-frame-pointer -mno-omit-leaf-frame-pointer -I/usr/obj/usr/home/dingo/devel/sys/GENERIC -mno-aes -mno-avx -mcmodel=3Dkernel -mno-red-zone -mno-mmx -mno-sse -msoft-float -fno-asynchronous-unwind-tables -ffreestanding -fstack-protector -std=3Diso9899:1999 -Qunused-arguments -fstack-protector -Wall -Wredundant-decls -Wnested-externs -Wstrict-prototypes -Wmissing-prototypes -Wpointer-arith -Winline -Wcast-qual -Wundef -Wno-pointer-sign -fformat-extensions -Wmissing-include-dirs -fdiagnostics-show-option -Wno-error-tautological-compare -Wno-error-empty-body -Wno-error-parentheses-equality -c /usr/home/dingo/devel/sys/modules/ctl/../../cam/ctl/ctl_frontend_cam_sim.c ctfconvert -L VERSION -g ctl_frontend_cam_sim.o clang -O2 -pipe -fno-strict-aliasing -Werror -D_KERNEL -DKLD_MODULE -nostdinc -DHAVE_KERNEL_OPTION_HEADERS -include /usr/obj/usr/home/dingo/devel/sys/GENERIC/opt_global.h -I. -I@ -I@/contrib/= altq -fno-common -g -fno-omit-frame-pointer -mno-omit-leaf-frame-pointer -I/usr/obj/usr/home/dingo/devel/sys/GENERIC -mno-aes -mno-avx -mcmodel=3Dkernel -mno-red-zone -mno-mmx -mno-sse -msoft-float -fno-asynchronous-unwind-tables -ffreestanding -fstack-protector -std=3Diso9899:1999 -Qunused-arguments -fstack-protector -Wall -Wredundant-decls -Wnested-externs -Wstrict-prototypes -Wmissing-prototypes -Wpointer-arith -Winline -Wcast-qual -Wundef -Wno-pointer-sign -fformat-extensions -Wmissing-include-dirs -fdiagnostics-show-option -Wno-error-tautological-compare -Wno-error-empty-body -Wno-error-parentheses-equality -c /usr/home/dingo/devel/sys/modules/ctl/../../cam/ctl/ctl_frontend_internal.c ctfconvert -L VERSION -g ctl_frontend_internal.o clang -O2 -pipe -fno-strict-aliasing -Werror -D_KERNEL -DKLD_MODULE -nostdinc -DHAVE_KERNEL_OPTION_HEADERS -include /usr/obj/usr/home/dingo/devel/sys/GENERIC/opt_global.h -I. -I@ -I@/contrib/= altq -fno-common -g -fno-omit-frame-pointer -mno-omit-leaf-frame-pointer -I/usr/obj/usr/home/dingo/devel/sys/GENERIC -mno-aes -mno-avx -mcmodel=3Dkernel -mno-red-zone -mno-mmx -mno-sse -msoft-float -fno-asynchronous-unwind-tables -ffreestanding -fstack-protector -std=3Diso9899:1999 -Qunused-arguments -fstack-protector -Wall -Wredundant-decls -Wnested-externs -Wstrict-prototypes -Wmissing-prototypes -Wpointer-arith -Winline -Wcast-qual -Wundef -Wno-pointer-sign -fformat-extensions -Wmissing-include-dirs -fdiagnostics-show-option -Wno-error-tautological-compare -Wno-error-empty-body -Wno-error-parentheses-equality -c /usr/home/dingo/devel/sys/modules/ctl/../../cam/ctl/ctl_frontend_iscsi.c /usr/home/dingo/devel/sys/modules/ctl/../../cam/ctl/ctl_frontend_iscsi.c:52= 5:19: error: no member named 'lun_map_fn' in 'struct ctl_nexus' io->io_hdr.nexus.lun_map_fn =3D cfiscsi_map_lun; ~~~~~~~~~~~~~~~~ ^ /usr/home/dingo/devel/sys/modules/ctl/../../cam/ctl/ctl_frontend_iscsi.c:52= 6:19: error: no member named 'lun_map_arg' in 'struct ctl_nexus' io->io_hdr.nexus.lun_map_arg =3D cs; ~~~~~~~~~~~~~~~~ ^ /usr/home/dingo/devel/sys/modules/ctl/../../cam/ctl/ctl_frontend_iscsi.c:58= 7:19: error: no member named 'lun_map_fn' in 'struct ctl_nexus' io->io_hdr.nexus.lun_map_fn =3D cfiscsi_map_lun; ~~~~~~~~~~~~~~~~ ^ /usr/home/dingo/devel/sys/modules/ctl/../../cam/ctl/ctl_frontend_iscsi.c:58= 8:19: error: no member named 'lun_map_arg' in 'struct ctl_nexus' io->io_hdr.nexus.lun_map_arg =3D cs; ~~~~~~~~~~~~~~~~ ^ /usr/home/dingo/devel/sys/modules/ctl/../../cam/ctl/ctl_frontend_iscsi.c:12= 30:6: error: no member named 'ioctl' in 'struct ctl_frontend' fe->ioctl =3D cfiscsi_ioctl; ~~ ^ /usr/home/dingo/devel/sys/modules/ctl/../../cam/ctl/ctl_frontend_iscsi.c:12= 31:6: error: no member named 'devid' in 'struct ctl_frontend' fe->devid =3D cfiscsi_devid; ~~ ^ /usr/home/dingo/devel/sys/modules/ctl/../../cam/ctl/ctl_frontend_iscsi.c:19= 55:25: error: use of undeclared identifier 'SCSI_PROTO_ISCSI' desc->proto_codeset =3D (SCSI_PROTO_ISCSI << 4) | SVPD_ID_CODESET_ASCII; ^ /usr/home/dingo/devel/sys/modules/ctl/../../cam/ctl/ctl_frontend_iscsi.c:19= 80:33: error: use of undeclared identifier 'SCSI_PROTO_ISCSI' desc1->proto_codeset =3D (SCSI_PROTO_ISCSI << 4) | SVPD_ID_CODESET_UTF8; ^ /usr/home/dingo/devel/sys/modules/ctl/../../cam/ctl/ctl_frontend_iscsi.c:19= 90:33: error: use of undeclared identifier 'SCSI_PROTO_ISCSI' desc2->proto_codeset =3D (SCSI_PROTO_ISCSI << 4) | SVPD_ID_CODESET_BINARY; ^ /usr/home/dingo/devel/sys/modules/ctl/../../cam/ctl/ctl_frontend_iscsi.c:19= 99:33: error: use of undeclared identifier 'SCSI_PROTO_ISCSI' desc3->proto_codeset =3D (SCSI_PROTO_ISCSI << 4) | SVPD_ID_CODESET_BINARY; ^ /usr/home/dingo/devel/sys/modules/ctl/../../cam/ctl/ctl_frontend_iscsi.c:21= 82:48: error: no member named 'options' in 'struct ctl_be_lun' &control_softc->ctl_luns[lun_id]->be_lun->options, links) { ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ^ @/sys/queue.h:272:28: note: expanded from macro 'STAILQ_FOREACH' for((var) =3D STAILQ_FIRST((head)); \ ^ @/sys/queue.h:269:30: note: expanded from macro 'STAILQ_FIRST' #define STAILQ_FIRST(head) ((head)->stqh_first) ^ /usr/home/dingo/devel/sys/modules/ctl/../../cam/ctl/ctl_frontend_iscsi.c:21= 81:2: error: incomplete definition of type 'struct ctl_be_lun_option' STAILQ_FOREACH(opt, ^~~~~~~~~~~~~~~~~~~ @/sys/queue.h:274:13: note: expanded from macro 'STAILQ_FOREACH' (var) =3D STAILQ_NEXT((var), field)) ^~~~~~~~~~~~~~~~~~~~~~~~~ @/sys/queue.h:318:39: note: expanded from macro 'STAILQ_NEXT' #define STAILQ_NEXT(elm, field) ((elm)->field.stqe_next) ~~~~~^ /usr/home/dingo/devel/sys/modules/ctl/../../cam/ctl/ctl_frontend_iscsi.c:21= 74:9: note: forward declaration of 'struct ctl_be_lun_option' struct ctl_be_lun_option *opt; ^ /usr/home/dingo/devel/sys/modules/ctl/../../cam/ctl/ctl_frontend_iscsi.c:21= 83:17: error: incomplete definition of type 'struct ctl_be_lun_option' if (strcmp(opt->name, "cfiscsi_target") =3D=3D 0) ~~~^ /usr/home/dingo/devel/sys/modules/ctl/../../cam/ctl/ctl_frontend_iscsi.c:21= 74:9: note: forward declaration of 'struct ctl_be_lun_option' struct ctl_be_lun_option *opt; ^ /usr/home/dingo/devel/sys/modules/ctl/../../cam/ctl/ctl_frontend_iscsi.c:21= 84:16: error: incomplete definition of type 'struct ctl_be_lun_option' target =3D opt->value; ~~~^ /usr/home/dingo/devel/sys/modules/ctl/../../cam/ctl/ctl_frontend_iscsi.c:21= 74:9: note: forward declaration of 'struct ctl_be_lun_option' struct ctl_be_lun_option *opt; ^ /usr/home/dingo/devel/sys/modules/ctl/../../cam/ctl/ctl_frontend_iscsi.c:21= 85:22: error: incomplete definition of type 'struct ctl_be_lun_option' else if (strcmp(opt->name, "cfiscsi_target_alias") =3D=3D 0= ) ~~~^ /usr/home/dingo/devel/sys/modules/ctl/../../cam/ctl/ctl_frontend_iscsi.c:21= 74:9: note: forward declaration of 'struct ctl_be_lun_option' struct ctl_be_lun_option *opt; ^ /usr/home/dingo/devel/sys/modules/ctl/../../cam/ctl/ctl_frontend_iscsi.c:21= 86:22: error: incomplete definition of type 'struct ctl_be_lun_option' target_alias =3D opt->value; ~~~^ /usr/home/dingo/devel/sys/modules/ctl/../../cam/ctl/ctl_frontend_iscsi.c:21= 74:9: note: forward declaration of 'struct ctl_be_lun_option' struct ctl_be_lun_option *opt; ^ /usr/home/dingo/devel/sys/modules/ctl/../../cam/ctl/ctl_frontend_iscsi.c:21= 87:22: error: incomplete definition of type 'struct ctl_be_lun_option' else if (strcmp(opt->name, "cfiscsi_lun") =3D=3D 0) ~~~^ /usr/home/dingo/devel/sys/modules/ctl/../../cam/ctl/ctl_frontend_iscsi.c:21= 74:9: note: forward declaration of 'struct ctl_be_lun_option' struct ctl_be_lun_option *opt; ^ /usr/home/dingo/devel/sys/modules/ctl/../../cam/ctl/ctl_frontend_iscsi.c:21= 88:13: error: incomplete definition of type 'struct ctl_be_lun_option' lun =3D opt->value; ~~~^ /usr/home/dingo/devel/sys/modules/ctl/../../cam/ctl/ctl_frontend_iscsi.c:21= 74:9: note: forward declaration of 'struct ctl_be_lun_option' struct ctl_be_lun_option *opt; ^ 18 errors generated. *** [ctl_frontend_iscsi.o] Error code 1 Stop in /usr/home/dingo/devel/sys/modules/ctl. *** [all] Error code 1 Stop in /usr/home/dingo/devel/sys/modules. *** [modules-all] Error code 1 Stop in /usr/obj/usr/home/dingo/devel/sys/GENERIC. *** [buildkernel] Error code 1 Stop in /usr/home/dingo/devel. *** [buildkernel] Error code 1 Stop in /usr/home/dingo/devel. _______________________________________________ >> freebsd-current@freebsd.org mailing list >> http://lists.freebsd.org/mailman/listinfo/freebsd-current >> To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.or= g >> " >> > > From owner-freebsd-arch@FreeBSD.ORG Wed Sep 11 17:09:52 2013 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTP id B61F5E36; Wed, 11 Sep 2013 17:09:52 +0000 (UTC) (envelope-from outbackdingo@gmail.com) Received: from mail-pb0-x231.google.com (mail-pb0-x231.google.com [IPv6:2607:f8b0:400e:c01::231]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 7B29C295F; Wed, 11 Sep 2013 17:09:52 +0000 (UTC) Received: by mail-pb0-f49.google.com with SMTP id xb4so9269213pbc.36 for ; Wed, 11 Sep 2013 10:09:52 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=PK7u2Lnzcx/zvkJvw7TZ2ph9+bzfH3ajrRmEM9Ubzss=; b=TmFKiThLG3ORgbHgjuJHgi1zzc2ETLLlKVLZYJ4c5rESQ/umstaU/9Or2TJZ2hsmXJ abVGsWdclkA9TKfiMt0UUMNW7ZcoUmov2SPpLv4SN6+75Z+94MNgXf3IwzZWZnzEzArj SBHqoxtdmx7EUgDctb3Zqo9ixiQ+8tRKAMAFnkRe6iH/rkMaI5WMpMpXQetyPd1Y7bX9 PyRhUe0l+hVClZpMsKrE/s2OGukl9nEtptttxLrMVMC+eLFQSJiZk5WG3SNsWD6pbvfb VqKukFjfxSpxAzT/0+iHcd3axKpV3/iDBwKA5oQ8I5uSYiYg0LYLe92Hvs4JfqFRoaWl iwrw== MIME-Version: 1.0 X-Received: by 10.66.171.77 with SMTP id as13mr4853858pac.170.1378919391933; Wed, 11 Sep 2013 10:09:51 -0700 (PDT) Received: by 10.66.126.141 with HTTP; Wed, 11 Sep 2013 10:09:51 -0700 (PDT) In-Reply-To: References: <522A1C73.9030402@mu.org> Date: Wed, 11 Sep 2013 13:09:51 -0400 Message-ID: Subject: Re: New iSCSI stack. From: Outback Dingo To: =?ISO-8859-2?Q?Edward_Tomasz_Napiera=B3a?= Content-Type: text/plain; charset=ISO-8859-2 Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.14 Cc: freebsd-scsi@freebsd.org, "freebsd-arch@freebsd.org" , Alfred Perlstein , "freebsd-current@FreeBSD.org" X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 11 Sep 2013 17:09:52 -0000 On Wed, Sep 11, 2013 at 11:06 AM, Outback Dingo wro= te: > > > > On Sun, Sep 8, 2013 at 9:32 AM, Outback Dingo wro= te: > >> >> >> >> On Sun, Sep 8, 2013 at 6:29 AM, Edward Tomasz Napiera=B3a < >> trasz@freebsd.org> wrote: >> >>> Wiadomo=B6=E6 napisana przez Alfred Perlstein w dniu 6 = wrz >>> 2013, o godz. 20:18: >>> > On 9/5/13 3:27 AM, Edward Tomasz Napiera=B3a wrote: >>> >> Hello. At http://people.freebsd.org/~trasz/cfiscsi-20130904.diffyou= 'll find >>> >> a patch which adds the new iSCSI initiator and target, against >>> 10-CURRENT. >>> >> To use the new initiator, start with "man iscsictl". For the target >>> - "man >>> >> ctld". >>> >> >>> >> All feedback is welcome. If nothing unexpected comes up, I'll commi= t >>> it >>> >> in a few days from now. Note that it's still not optimized; at this >>> point >>> >> I'm focusing more on reliability and interoperability. >>> >> >>> >> This work is being sponsored by FreeBSD Foundation. >>> >> >>> >> _______________________________________________ >>> >> freebsd-current@freebsd.org mailing list >>> >> http://lists.freebsd.org/mailman/listinfo/freebsd-current >>> >> To unsubscribe, send any mail to " >>> freebsd-current-unsubscribe@freebsd.org" >>> >> >>> > Edward, this is really exciting! >>> > >>> > Is there an easy way to use the userland iscsi configuration files? >>> >>> Which iSCSI userland configuration files, the ctl.conf(5)? If you need >>> an ability to parse it and modify from a shell scripts, see confctl >>> utility >>> (sysutils/confctl, https://github.com/trasz/confctl/). >>> >>> > We would love to quickly backport and ship this with FreeNAS as an >>> option for our users, having the config files be the same OR having a v= ery >>> good converter would really make that much easier for us. >>> >>> Porting to 9 should be quite easy - there are Capsicum API differences; >>> you might also want to compare CTL between 10 and 9 to see if there are >>> any changes which need to be merged. Taking a look at the code searchi= ng >>> for possible security issues would be also very welcome :-) >>> >>> As for the config files - writing a converter should be quite easy. >>> Which >>> configuration files you need to support, ctl.conf(5) and istgt >>> configuration? >>> >> >> I was i belive quite close to having it working on the last patch, >> however could never seem to get the ctl kernel module to function, >> And feel im a bit further away with this latest patch retracing my steps= , >> from previous... quite easy to backport.... maybe for you, or other >> but yes, I also would like to integrate the work to stable/9 in the lab >> for some benchmarks >> >>> >>> > Still trying to tackle this...... any ideas?? I think if i can get past > the few errors im encountering i can get a patch against stable/9 for > others to test.... > Negate the last posted error, Ive worked past it..... I think if i can ge= t > past this capsicum issue, ill have a kernel > > =3D=3D=3D> iscsi (all) > clang -O2 -pipe -fno-strict-aliasing -Werror -D_KERNEL -DKLD_MODULE > -nostdinc -I/usr/home/dingo/devel/sys/modules/iscsi/../../ofed/include > -DHAVE_KERNEL_OPTION_HEADERS -include > /usr/obj/usr/home/dingo/devel/sys/GENERIC/opt_global.h -I. -I@ -I@/contri= b/altq > -fno-common -g -fno-omit-frame-pointer -mno-omit-leaf-frame-pointer > -I/usr/obj/usr/home/dingo/devel/sys/GENERIC -mno-aes -mno-avx > -mcmodel=3Dkernel -mno-red-zone -mno-mmx -mno-sse -msoft-float > -fno-asynchronous-unwind-tables -ffreestanding -fstack-protector > -std=3Diso9899:1999 -Qunused-arguments -fstack-protector -Wall > -Wredundant-decls -Wnested-externs -Wstrict-prototypes > -Wmissing-prototypes -Wpointer-arith -Winline -Wcast-qual -Wundef > -Wno-pointer-sign -fformat-extensions -Wmissing-include-dirs > -fdiagnostics-show-option -Wno-error-tautological-compare > -Wno-error-empty-body -Wno-error-parentheses-equality -c > /usr/home/dingo/devel/sys/modules/iscsi/../../dev/iscsi//icl.c > /usr/home/dingo/devel/sys/modules/iscsi/../../dev/iscsi//icl.c:1098:26: > error: use of undeclared identifier 'CAP_SOCK_CLIENT' > cap_rights(&rights, CAP_SOCK_CLIENT), &fp); > ^ > 1 error generated. > *** [icl.o] Error code 1 > > Stop in /usr/home/dingo/devel/sys/modules/iscsi. > > > _______________________________________________ >>> freebsd-current@freebsd.org mailing list >>> http://lists.freebsd.org/mailman/listinfo/freebsd-current >>> To unsubscribe, send any mail to " >>> freebsd-current-unsubscribe@freebsd.org" >>> >> >> > From owner-freebsd-arch@FreeBSD.ORG Wed Sep 11 19:07:23 2013 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTP id A0234769; Wed, 11 Sep 2013 19:07:23 +0000 (UTC) (envelope-from outbackdingo@gmail.com) Received: from mail-pa0-x22b.google.com (mail-pa0-x22b.google.com [IPv6:2607:f8b0:400e:c03::22b]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 64EEF20E1; Wed, 11 Sep 2013 19:07:23 +0000 (UTC) Received: by mail-pa0-f43.google.com with SMTP id hz10so25695pad.16 for ; Wed, 11 Sep 2013 12:07:23 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=OZLh5zAECRWx3wlJge1vOuuRPWo5vMsXvlLMa8hca7g=; b=pJVHH+0OJT4aFL4Ex9abRnVHK9KMJ71VD4poL+LuKXMFIfDV1RLTAlGSe7Zch5DPQQ ck5eyND++oGgyFxolvbVSo5JZ3GFLCi/kiWbJHYHi5f5ra41FTzQ8mJblVyKfMjlvBBl MZfjoXsOmc5+jW80GkiZu18CsJtRzqZktG9DdRmnVHfxzyLz1isuVSRIdxsKWQQIEFIw cbRZIXkmsWdi61hsQPk3BdkJ4APQMiBRRv/rbh1V6SuMFJ+vYc1fv/dz/f4TqiD2/UWn FvZb3NP5PagSgE3QhN3jY9l205vXyyBbWNpI4KQnS6lpjiLPyDL5HzHn3xl3bV7QOOGY ZnTw== MIME-Version: 1.0 X-Received: by 10.66.49.68 with SMTP id s4mr5374984pan.98.1378926443011; Wed, 11 Sep 2013 12:07:23 -0700 (PDT) Received: by 10.66.126.141 with HTTP; Wed, 11 Sep 2013 12:07:22 -0700 (PDT) In-Reply-To: References: <522A1C73.9030402@mu.org> Date: Wed, 11 Sep 2013 15:07:22 -0400 Message-ID: Subject: Re: New iSCSI stack. From: Outback Dingo To: =?ISO-8859-2?Q?Edward_Tomasz_Napiera=B3a?= Content-Type: text/plain; charset=ISO-8859-2 Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.14 Cc: freebsd-scsi@freebsd.org, "freebsd-arch@freebsd.org" , Alfred Perlstein , "freebsd-current@FreeBSD.org" X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 11 Sep 2013 19:07:23 -0000 On Wed, Sep 11, 2013 at 1:09 PM, Outback Dingo wrot= e: > > > > On Wed, Sep 11, 2013 at 11:06 AM, Outback Dingo w= rote: > >> >> >> >> On Sun, Sep 8, 2013 at 9:32 AM, Outback Dingo wr= ote: >> >>> >>> >>> >>> On Sun, Sep 8, 2013 at 6:29 AM, Edward Tomasz Napiera=B3a < >>> trasz@freebsd.org> wrote: >>> >>>> Wiadomo=B6=E6 napisana przez Alfred Perlstein w dniu 6= wrz >>>> 2013, o godz. 20:18: >>>> > On 9/5/13 3:27 AM, Edward Tomasz Napiera=B3a wrote: >>>> >> Hello. At http://people.freebsd.org/~trasz/cfiscsi-20130904.diffyo= u'll find >>>> >> a patch which adds the new iSCSI initiator and target, against >>>> 10-CURRENT. >>>> >> To use the new initiator, start with "man iscsictl". For the targe= t >>>> - "man >>>> >> ctld". >>>> >> >>>> >> All feedback is welcome. If nothing unexpected comes up, I'll >>>> commit it >>>> >> in a few days from now. Note that it's still not optimized; at thi= s >>>> point >>>> >> I'm focusing more on reliability and interoperability. >>>> >> >>>> >> This work is being sponsored by FreeBSD Foundation. >>>> >> >>>> >> _______________________________________________ >>>> >> freebsd-current@freebsd.org mailing list >>>> >> http://lists.freebsd.org/mailman/listinfo/freebsd-current >>>> >> To unsubscribe, send any mail to " >>>> freebsd-current-unsubscribe@freebsd.org" >>>> >> >>>> > Edward, this is really exciting! >>>> > >>>> > Is there an easy way to use the userland iscsi configuration files? >>>> >>>> Which iSCSI userland configuration files, the ctl.conf(5)? If you nee= d >>>> an ability to parse it and modify from a shell scripts, see confctl >>>> utility >>>> (sysutils/confctl, https://github.com/trasz/confctl/). >>>> >>>> > We would love to quickly backport and ship this with FreeNAS as an >>>> option for our users, having the config files be the same OR having a = very >>>> good converter would really make that much easier for us. >>>> >>>> Porting to 9 should be quite easy - there are Capsicum API differences= ; >>>> you might also want to compare CTL between 10 and 9 to see if there ar= e >>>> any changes which need to be merged. Taking a look at the code >>>> searching >>>> for possible security issues would be also very welcome :-) >>>> >>>> As for the config files - writing a converter should be quite easy. >>>> Which >>>> configuration files you need to support, ctl.conf(5) and istgt >>>> configuration? >>>> >>> >>> I was i belive quite close to having it working on the last patch, >>> however could never seem to get the ctl kernel module to function, >>> And feel im a bit further away with this latest patch retracing my >>> steps, from previous... quite easy to backport.... maybe for you, or ot= her >>> but yes, I also would like to integrate the work to stable/9 in the lab >>> for some benchmarks >>> >>>> >>>> >> Still trying to tackle this...... any ideas?? I think if i can get past >> the few errors im encountering i can get a patch against stable/9 for >> others to test.... >> Negate the last posted error, Ive worked past it..... I think if i can >> get past this capsicum issue, ill have a kernel >> >> =3D=3D=3D> iscsi (all) >> clang -O2 -pipe -fno-strict-aliasing -Werror -D_KERNEL -DKLD_MODULE >> -nostdinc -I/usr/home/dingo/devel/sys/modules/iscsi/../../ofed/include >> -DHAVE_KERNEL_OPTION_HEADERS -include >> /usr/obj/usr/home/dingo/devel/sys/GENERIC/opt_global.h -I. -I@ -I@/contr= ib/altq >> -fno-common -g -fno-omit-frame-pointer -mno-omit-leaf-frame-pointer >> -I/usr/obj/usr/home/dingo/devel/sys/GENERIC -mno-aes -mno-avx >> -mcmodel=3Dkernel -mno-red-zone -mno-mmx -mno-sse -msoft-float >> -fno-asynchronous-unwind-tables -ffreestanding -fstack-protector >> -std=3Diso9899:1999 -Qunused-arguments -fstack-protector -Wall >> -Wredundant-decls -Wnested-externs -Wstrict-prototypes >> -Wmissing-prototypes -Wpointer-arith -Winline -Wcast-qual -Wundef >> -Wno-pointer-sign -fformat-extensions -Wmissing-include-dirs >> -fdiagnostics-show-option -Wno-error-tautological-compare >> -Wno-error-empty-body -Wno-error-parentheses-equality -c >> /usr/home/dingo/devel/sys/modules/iscsi/../../dev/iscsi//icl.c >> /usr/home/dingo/devel/sys/modules/iscsi/../../dev/iscsi//icl.c:1098:26: >> error: use of undeclared identifier 'CAP_SOCK_CLIENT' >> cap_rights(&rights, CAP_SOCK_CLIENT), &fp); >> ^ >> 1 error generated. >> *** [icl.o] Error code 1 >> >> Stop in /usr/home/dingo/devel/sys/modules/iscsi. >> > and i guess, icl needs to be "upgraded" ? KLD ctl.ko: depends on icl - not available or version mismatch > >> >> _______________________________________________ >>>> freebsd-current@freebsd.org mailing list >>>> http://lists.freebsd.org/mailman/listinfo/freebsd-current >>>> To unsubscribe, send any mail to " >>>> freebsd-current-unsubscribe@freebsd.org" >>>> >>> >>> >> > From owner-freebsd-arch@FreeBSD.ORG Wed Sep 11 21:14:56 2013 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTP id 3B91C48E; Wed, 11 Sep 2013 21:14:56 +0000 (UTC) (envelope-from etnapierala@gmail.com) Received: from mail-ee0-x22c.google.com (mail-ee0-x22c.google.com [IPv6:2a00:1450:4013:c00::22c]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 793B528CC; Wed, 11 Sep 2013 21:14:55 +0000 (UTC) Received: by mail-ee0-f44.google.com with SMTP id b47so4938098eek.31 for ; Wed, 11 Sep 2013 14:14:53 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=sender:content-type:mime-version:subject:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; bh=CCikg5X1w6YpZxj/uSGsRhd4DSihFJRRlLMwSscnOf0=; b=H1SNqR9xt/duZQC2+3PuW33RkG04pVX7zfR/JC0/aaKzDM0Zq5ere7eeippkxjrGbR y5Vxe0hlRnKk0f9JRg0xirBC//pOQXGW6a+Ok9FyBU2q5NYBNI8JVY8m6vNigmYcBns5 sMzw/aFFK3YJpQjoZGJpFIo5QXOlJabfjF7oThceYLReuhOBnheouH77R6N36s6RtxCE N8Nx66Nt7kwJdqI0tS2rMlNwMcc7IPk7Y9jsyS58lmGOpzD/JlDbsWBUcEaScMnYvEZa lCpUZ6g7G+uKqSvQvijnSRtptQBjB234lb95IPL4ha5NcwnIBXkP9ZZqXaP+q5u918rl icEA== X-Received: by 10.15.26.136 with SMTP id n8mr71960eeu.65.1378934093729; Wed, 11 Sep 2013 14:14:53 -0700 (PDT) Received: from [192.168.1.102] (aed83.neoplus.adsl.tpnet.pl. [83.25.107.83]) by mx.google.com with ESMTPSA id b45sm153975eef.4.1969.12.31.16.00.00 (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Wed, 11 Sep 2013 14:14:53 -0700 (PDT) Sender: =?UTF-8?Q?Edward_Tomasz_Napiera=C5=82a?= Content-Type: text/plain; charset=iso-8859-2 Mime-Version: 1.0 (Mac OS X Mail 6.5 \(1508\)) Subject: Re: New iSCSI stack. From: =?iso-8859-2?Q?Edward_Tomasz_Napiera=B3a?= In-Reply-To: Date: Wed, 11 Sep 2013 23:14:51 +0200 Content-Transfer-Encoding: quoted-printable Message-Id: References: <522A1C73.9030402@mu.org> To: Outback Dingo X-Mailer: Apple Mail (2.1508) Cc: freebsd-scsi@freebsd.org, "freebsd-arch@freebsd.org" , Alfred Perlstein , "freebsd-current@FreeBSD.org" X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 11 Sep 2013 21:14:56 -0000 I'm working on last few minor nits to get this into the tree. Give me = few days, I'll prepare a patch against 9-STABLE. From owner-freebsd-arch@FreeBSD.ORG Wed Sep 11 21:39:05 2013 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTP id 6A7ED266 for ; Wed, 11 Sep 2013 21:39:05 +0000 (UTC) (envelope-from dkandula@gmail.com) Received: from mail-wg0-x232.google.com (mail-wg0-x232.google.com [IPv6:2a00:1450:400c:c00::232]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 0962C2A5F for ; Wed, 11 Sep 2013 21:39:04 +0000 (UTC) Received: by mail-wg0-f50.google.com with SMTP id j13so8493348wgh.29 for ; Wed, 11 Sep 2013 14:39:03 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:date:message-id:subject:from:to:content-type; bh=PtY6xIm59GVaVvvUXwEuG8d/LB1MzYwapg9H03jtcMc=; b=XIg/RuwkguFkTgkzblg4lBMipPe5S76hyKq+tTrOHpU5+MzvybhEtMpq6qhfmf/0kn sMNRqbq/dCT2fiyUmsk/TRe6bhHBN5qskjvXIftbz5UH37lXuD5NFI21MaIQeNH+8rKD 0Z0Ytz1GpFv5tm8MoE56VvwzPAezJftRmcWwibKA5uryXCvrTL7/ml42n4d+z5DK3Ky2 VbJfT4TME3kepU9OACCs4GnSb/9ZGYiv9AxQ32/nV4GTaEY3fr3hzM+CwcM3ICGH6ui1 LnzB6ifpo6BcIjlwGuAUAugXmFgLDOxxVOdcMxKzJNwXc+UMnqDfG9Dx/U6v/1n0ZLS6 vnrA== MIME-Version: 1.0 X-Received: by 10.180.13.174 with SMTP id i14mr19163241wic.49.1378935543574; Wed, 11 Sep 2013 14:39:03 -0700 (PDT) Received: by 10.194.38.167 with HTTP; Wed, 11 Sep 2013 14:39:03 -0700 (PDT) Date: Wed, 11 Sep 2013 17:39:03 -0400 Message-ID: Subject: Why do we need to acquire the current thread's lock before context switching? From: Dheeraj Kandula To: freebsd-arch@freebsd.org Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.14 X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 11 Sep 2013 21:39:05 -0000 Hey All, When the current thread is being context switched with a newly selected thread, why is the current thread's lock acquired before context switch =96 mi_switch() is invoked after thread_lock(td) is called. A thread at any time runs only on one of the cores of a CPU. Hence when it is being context switched it is added either to the real time runq or the timeshare runq or the idle runq with the lock still held or it is added to the sleep queue or the blocked queue. So this happens atomically even without the lock. Isn't it? Am I missing something here? I don't see any contention for the thread in order to demand a lock for the thread which will basically protect the contents of the thread structure for the thread. Dheeraj From owner-freebsd-arch@FreeBSD.ORG Thu Sep 12 07:10:49 2013 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTP id B110CD16 for ; Thu, 12 Sep 2013 07:10:49 +0000 (UTC) (envelope-from bright@mu.org) Received: from elvis.mu.org (elvis.mu.org [192.203.228.196]) by mx1.freebsd.org (Postfix) with ESMTP id 9CA63259C for ; Thu, 12 Sep 2013 07:10:49 +0000 (UTC) Received: from Alfreds-MacBook-Pro-9.local (c-67-180-208-218.hsd1.ca.comcast.net [67.180.208.218]) by elvis.mu.org (Postfix) with ESMTPSA id 7BC741A3D68; Thu, 12 Sep 2013 00:10:39 -0700 (PDT) Message-ID: <523168EE.4070508@mu.org> Date: Thu, 12 Sep 2013 00:10:38 -0700 From: Alfred Perlstein User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.7; rv:17.0) Gecko/20130801 Thunderbird/17.0.8 MIME-Version: 1.0 To: Dheeraj Kandula Subject: Re: Why do we need to acquire the current thread's lock before context switching? References: In-Reply-To: Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 8bit Cc: freebsd-arch@freebsd.org X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 12 Sep 2013 07:10:49 -0000 On 9/11/13 2:39 PM, Dheeraj Kandula wrote: > Hey All, > > When the current thread is being context switched with a newly selected > thread, why is the current thread's lock acquired before context switch – > mi_switch() is invoked after thread_lock(td) is called. A thread at any > time runs only on one of the cores of a CPU. Hence when it is being context > switched it is added either to the real time runq or the timeshare runq or > the idle runq with the lock still held or it is added to the sleep queue or > the blocked queue. So this happens atomically even without the lock. Isn't > it? Am I missing something here? I don't see any contention for the thread > in order to demand a lock for the thread which will basically protect the > contents of the thread structure for the thread. > > Dheeraj > The thread lock also happens to protect various scheduler variables: struct mtx *volatile td_lock; /* replaces sched lock */ see sys/kern/sched_ule.c on how the thread lock td_lock is changed depending on what the thread is doing. -- Alfred Perlstein From owner-freebsd-arch@FreeBSD.ORG Thu Sep 12 10:48:43 2013 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTP id F05D9410 for ; Thu, 12 Sep 2013 10:48:43 +0000 (UTC) (envelope-from dkandula@gmail.com) Received: from mail-wg0-x22e.google.com (mail-wg0-x22e.google.com [IPv6:2a00:1450:400c:c00::22e]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 8C73A280C for ; Thu, 12 Sep 2013 10:48:43 +0000 (UTC) Received: by mail-wg0-f46.google.com with SMTP id k14so9020242wgh.25 for ; Thu, 12 Sep 2013 03:48:42 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=COD0z2wNo/BAOP8SWzX0ACIJEl7R5g+DCQoA/kNjtw4=; b=tuM+4WGJ0/XIn05txvoLtNfCprHbD5zaIXbn6Jc/PjoZse3nGBCg4M4l7rHmSqf3fQ 6qRTpBS8mnCiXm+Aj1EPY2klqK6yIi9hVDlQ0au4J/RWk/AiY1h6WZKGRdKVd7bQ1/Sb rDuL/veER9XpNsVUyc4J26eSnUvYESoa46B8igEtfVMiv2vLocpgxDlM+JoGQXw4TIjx mWeOwpxu9DxPFS0LJkbeiFppaaGciuHeWKS0fKIfktfAB+l9OV9hXTpJf34iVD378Nbq LffS2gxpWiYZDS69skxS1wC/euPEad1ESsJtASMR9d6PRvUNYBJZU679SjgfsENHUqlz vhrA== MIME-Version: 1.0 X-Received: by 10.194.219.1 with SMTP id pk1mr5517985wjc.36.1378982922076; Thu, 12 Sep 2013 03:48:42 -0700 (PDT) Received: by 10.194.38.167 with HTTP; Thu, 12 Sep 2013 03:48:42 -0700 (PDT) In-Reply-To: <523168EE.4070508@mu.org> References: <523168EE.4070508@mu.org> Date: Thu, 12 Sep 2013 06:48:42 -0400 Message-ID: Subject: Re: Why do we need to acquire the current thread's lock before context switching? From: Dheeraj Kandula To: Alfred Perlstein Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.14 Cc: freebsd-arch@freebsd.org X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 12 Sep 2013 10:48:44 -0000 Thanks a lot Alfred for the clarification. So is the td_lock granular i.e. one separate lock for each thread but also used for protecting the scheduler variables or is it just one lock used by all threads and the scheduler as well. I will anyway go through the code that you suggested but just wanted to have a deeper understanding before I go about hunting in the code. Dheeraj On Thu, Sep 12, 2013 at 3:10 AM, Alfred Perlstein wrote: > On 9/11/13 2:39 PM, Dheeraj Kandula wrote: > >> Hey All, >> >> When the current thread is being context switched with a newly selected >> thread, why is the current thread's lock acquired before context switch = =96 >> mi_switch() is invoked after thread_lock(td) is called. A thread at any >> time runs only on one of the cores of a CPU. Hence when it is being >> context >> switched it is added either to the real time runq or the timeshare runq = or >> the idle runq with the lock still held or it is added to the sleep queue >> or >> the blocked queue. So this happens atomically even without the lock. Isn= 't >> it? Am I missing something here? I don't see any contention for the thre= ad >> in order to demand a lock for the thread which will basically protect th= e >> contents of the thread structure for the thread. >> >> Dheeraj >> >> > The thread lock also happens to protect various scheduler variables: > > struct mtx *volatile td_lock; /* replaces sched lock */ > > see sys/kern/sched_ule.c on how the thread lock td_lock is changed > depending on what the thread is doing. > > -- > Alfred Perlstein > > From owner-freebsd-arch@FreeBSD.ORG Thu Sep 12 11:04:21 2013 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTP id 7A2A7855 for ; Thu, 12 Sep 2013 11:04:21 +0000 (UTC) (envelope-from onwahe@gmail.com) Received: from mail-qc0-x236.google.com (mail-qc0-x236.google.com [IPv6:2607:f8b0:400d:c01::236]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 389CF28E7 for ; Thu, 12 Sep 2013 11:04:21 +0000 (UTC) Received: by mail-qc0-f182.google.com with SMTP id n4so3749944qcx.13 for ; Thu, 12 Sep 2013 04:04:20 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=DOg9Prqu3CwrLJsBmsB/DhbumeXkP2+nTDLRAJyLtWk=; b=0b8EYGH16T/t8YIsn4a4h1YGBuIHPVQ6J6BMxug73pOi5OjdKMK9WtPVYPV2f+mHBo FoEe6iNMmCwg54pdfIg5Ir+R1Z5EgbrkJJv5lyvF8hOLfXbM6eV2UQ8Fi1V4a8l0Vcsc xqexeP0l24s9h+EhINNWDTOSoP8+CS2SstMJc5BACizYxCqiaRHVZKASvHGowKmq0X9A AQlpEBq1uNPX6TfiUyxVcW+YxnkhGnyg637s8bmh+WgPZmkKwQlb9iwBQ+pTjUPCdgtS 8QwV0Uhseng9og8TubmqwK+3QBlOb3KRa08gU8xJ8y9d1TzehX2pMZ6BHUJmso3gY7og FU6A== MIME-Version: 1.0 X-Received: by 10.49.130.162 with SMTP id of2mr12526202qeb.37.1378983860268; Thu, 12 Sep 2013 04:04:20 -0700 (PDT) Received: by 10.140.90.7 with HTTP; Thu, 12 Sep 2013 04:04:20 -0700 (PDT) In-Reply-To: References: <523168EE.4070508@mu.org> Date: Thu, 12 Sep 2013 13:04:20 +0200 Message-ID: Subject: Re: Why do we need to acquire the current thread's lock before context switching? From: Svatopluk Kraus To: Dheeraj Kandula Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.14 Cc: Alfred Perlstein , freebsd-arch@freebsd.org X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 12 Sep 2013 11:04:21 -0000 Think about td_lock like something what is lent by current thread owner. If a thread is running, it's owned by scheduler and td_lock points to scheduler lock. If a thread is sleeping, it's owned by sleeping queue and td_lock points to sleep queue lock. If a thread is contested, it's owned by turnstile queue and td_lock points to turnstile queue lock. And so on. This way an owner can work with owned threads safely without giant lock. The td_lock pointer is changed atomically, so it's safe. Svatopluk Kraus On Thu, Sep 12, 2013 at 12:48 PM, Dheeraj Kandula wrote= : > Thanks a lot Alfred for the clarification. So is the td_lock granular i.e= . > one separate lock for each thread but also used for protecting the > scheduler variables or is it just one lock used by all threads and the > scheduler as well. I will anyway go through the code that you suggested b= ut > just wanted to have a deeper understanding before I go about hunting in t= he > code. > > Dheeraj > > > On Thu, Sep 12, 2013 at 3:10 AM, Alfred Perlstein wrote: > > > On 9/11/13 2:39 PM, Dheeraj Kandula wrote: > > > >> Hey All, > >> > >> When the current thread is being context switched with a newly selecte= d > >> thread, why is the current thread's lock acquired before context switc= h > =96 > >> mi_switch() is invoked after thread_lock(td) is called. A thread at an= y > >> time runs only on one of the cores of a CPU. Hence when it is being > >> context > >> switched it is added either to the real time runq or the timeshare run= q > or > >> the idle runq with the lock still held or it is added to the sleep que= ue > >> or > >> the blocked queue. So this happens atomically even without the lock. > Isn't > >> it? Am I missing something here? I don't see any contention for the > thread > >> in order to demand a lock for the thread which will basically protect > the > >> contents of the thread structure for the thread. > >> > >> Dheeraj > >> > >> > > The thread lock also happens to protect various scheduler variables: > > > > struct mtx *volatile td_lock; /* replaces sched lock */ > > > > see sys/kern/sched_ule.c on how the thread lock td_lock is changed > > depending on what the thread is doing. > > > > -- > > Alfred Perlstein > > > > > _______________________________________________ > freebsd-arch@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-arch > To unsubscribe, send any mail to "freebsd-arch-unsubscribe@freebsd.org" > From owner-freebsd-arch@FreeBSD.ORG Thu Sep 12 11:16:22 2013 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTP id EE222E0C for ; Thu, 12 Sep 2013 11:16:22 +0000 (UTC) (envelope-from dkandula@gmail.com) Received: from mail-we0-x22d.google.com (mail-we0-x22d.google.com [IPv6:2a00:1450:400c:c03::22d]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 79E4329E1 for ; Thu, 12 Sep 2013 11:16:22 +0000 (UTC) Received: by mail-we0-f173.google.com with SMTP id w62so8122708wes.32 for ; Thu, 12 Sep 2013 04:16:20 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=mY+i6oKCk3zRjDb5yv3JQzgT326lc8BHRQnVx/xhDyY=; b=CtGu+qerNZwC2IuFOV6UuMmSQWV2pbjGQrBcTuD3tsV9lHvMFAKp0Qc66U3F2KD/5M JRFwRMADiQoaQQz/g9+WF6GjGrDcgAZeuzwqpbsg73Can2SDHWXXuDRgx5hjCK/sAlyS xHGJDN21I8aiigHi4rS6sPFUsJjdgA09uCmHZ79BxhwaJOwNy4vd6m3QlZrUFX97pxuh wynQspf5YPcVz4naxjUhvYZ786sIlcQNqTne3Zc4hfQOEf54ij+oXWX1RNBTxghDnVlw sGkA+1UF5lK358UMXc8ZAomTSrbFHyILnkz1++D41VQ23UrwZ8PfpDCyaNJaiso27B8w E/9w== MIME-Version: 1.0 X-Received: by 10.194.23.196 with SMTP id o4mr895225wjf.62.1378984580812; Thu, 12 Sep 2013 04:16:20 -0700 (PDT) Received: by 10.194.38.167 with HTTP; Thu, 12 Sep 2013 04:16:20 -0700 (PDT) In-Reply-To: References: <523168EE.4070508@mu.org> Date: Thu, 12 Sep 2013 07:16:20 -0400 Message-ID: Subject: Re: Why do we need to acquire the current thread's lock before context switching? From: Dheeraj Kandula To: Svatopluk Kraus Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.14 Cc: Alfred Perlstein , freebsd-arch@freebsd.org X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 12 Sep 2013 11:16:23 -0000 Thanks a lot Svatopluk for the clarification. Right after I replied to Alfred's mail, I realized that it can't be thread specific lock as it should also protect the scheduler variables. So if I understand it right, even though it is a mutex, it can be unlocked by another thread which is usually not the case with regular mutexes as the thread that locks it must unlock it unlike a binary semaphore. Isn't it? Dheeraj On Thu, Sep 12, 2013 at 7:04 AM, Svatopluk Kraus wrote: > Think about td_lock like something what is lent by current thread owner. > If a thread is running, it's owned by scheduler and td_lock points > to scheduler lock. If a thread is sleeping, it's owned by sleeping queue > and td_lock points to sleep queue lock. If a thread is contested, it's > owned by turnstile queue and td_lock points to turnstile queue lock. And = so > on. This way an owner can work with owned threads safely without giant > lock. The td_lock pointer is changed atomically, so it's safe. > > Svatopluk Kraus > > On Thu, Sep 12, 2013 at 12:48 PM, Dheeraj Kandula wro= te: > >> Thanks a lot Alfred for the clarification. So is the td_lock granular i.= e. >> one separate lock for each thread but also used for protecting the >> scheduler variables or is it just one lock used by all threads and the >> scheduler as well. I will anyway go through the code that you suggested >> but >> just wanted to have a deeper understanding before I go about hunting in >> the >> code. >> >> Dheeraj >> >> >> On Thu, Sep 12, 2013 at 3:10 AM, Alfred Perlstein wrote: >> >> > On 9/11/13 2:39 PM, Dheeraj Kandula wrote: >> > >> >> Hey All, >> >> >> >> When the current thread is being context switched with a newly select= ed >> >> thread, why is the current thread's lock acquired before context >> switch =96 >> >> mi_switch() is invoked after thread_lock(td) is called. A thread at a= ny >> >> time runs only on one of the cores of a CPU. Hence when it is being >> >> context >> >> switched it is added either to the real time runq or the timeshare >> runq or >> >> the idle runq with the lock still held or it is added to the sleep >> queue >> >> or >> >> the blocked queue. So this happens atomically even without the lock. >> Isn't >> >> it? Am I missing something here? I don't see any contention for the >> thread >> >> in order to demand a lock for the thread which will basically protect >> the >> >> contents of the thread structure for the thread. >> >> >> >> Dheeraj >> >> >> >> >> > The thread lock also happens to protect various scheduler variables: >> > >> > struct mtx *volatile td_lock; /* replaces sched lock */ >> > >> > see sys/kern/sched_ule.c on how the thread lock td_lock is changed >> > depending on what the thread is doing. >> > >> > -- >> > Alfred Perlstein >> > >> > >> _______________________________________________ >> freebsd-arch@freebsd.org mailing list >> http://lists.freebsd.org/mailman/listinfo/freebsd-arch >> To unsubscribe, send any mail to "freebsd-arch-unsubscribe@freebsd.org" >> > > From owner-freebsd-arch@FreeBSD.ORG Thu Sep 12 12:30:43 2013 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTP id 19CA63BC for ; Thu, 12 Sep 2013 12:30:43 +0000 (UTC) (envelope-from onwahe@gmail.com) Received: from mail-qc0-x232.google.com (mail-qc0-x232.google.com [IPv6:2607:f8b0:400d:c01::232]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id CCD292F38 for ; Thu, 12 Sep 2013 12:30:42 +0000 (UTC) Received: by mail-qc0-f178.google.com with SMTP id r5so6250092qcx.9 for ; Thu, 12 Sep 2013 05:30:41 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=gh3J7VsSoiUr1ZUhSY2F6hQ3146ws1ym5WYPlOgfMYo=; b=z+Ee2fvBPBCIqRaxlmwcK0IaN5Z85RBOWvBYK4VEqfoohNjQ5OfTRWAdkgie7KRlXo Nc1sU8p5aJVnhjVYC038NwUY6PEUJ2cDY237s0rTNoCVK/7q3nn/k1DlUyjxiYcuO/F0 ghIZflDrbiNjNWh6GWfv8hGpcC6xN+JZkhIyxHl7k2I9ADweuhhedDh7xu+KYQXNqLK/ O3fNF4629rDqJk6LgNvwgc2TPqq6+48xmwWSq/Ei0rSDVr6kaPUmWEGVZGaehroGaxo3 jD3vbADqRFQbPSmK4VcPZ3Y661DzKEgELQuhX2q489fmf9FDjBhzjMyOC3+s070olYgz u+BA== MIME-Version: 1.0 X-Received: by 10.49.42.101 with SMTP id n5mr12889716qel.31.1378989041778; Thu, 12 Sep 2013 05:30:41 -0700 (PDT) Received: by 10.140.90.7 with HTTP; Thu, 12 Sep 2013 05:30:41 -0700 (PDT) In-Reply-To: References: <523168EE.4070508@mu.org> Date: Thu, 12 Sep 2013 14:30:41 +0200 Message-ID: Subject: Re: Why do we need to acquire the current thread's lock before context switching? From: Svatopluk Kraus To: Dheeraj Kandula Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.14 Cc: Alfred Perlstein , freebsd-arch@freebsd.org X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 12 Sep 2013 12:30:43 -0000 Yes, td_lock is a bit of magic and not used like ordinary mutex certainly. And more, some dirty (not standard and hidden) things happen to it at least during task switch. Svatopluk Kraus On Thu, Sep 12, 2013 at 1:16 PM, Dheeraj Kandula wrote= : > Thanks a lot Svatopluk for the clarification. Right after I replied to > Alfred's mail, I realized that it can't be thread specific lock as it > should also protect the scheduler variables. So if I understand it right, > even though it is a mutex, it can be unlocked by another thread which is > usually not the case with regular mutexes as the thread that locks it mus= t > unlock it unlike a binary semaphore. Isn't it? > > Dheeraj > > > On Thu, Sep 12, 2013 at 7:04 AM, Svatopluk Kraus wrote= : > >> Think about td_lock like something what is lent by current thread owner. >> If a thread is running, it's owned by scheduler and td_lock points >> to scheduler lock. If a thread is sleeping, it's owned by sleeping queue >> and td_lock points to sleep queue lock. If a thread is contested, it's >> owned by turnstile queue and td_lock points to turnstile queue lock. And= so >> on. This way an owner can work with owned threads safely without giant >> lock. The td_lock pointer is changed atomically, so it's safe. >> >> Svatopluk Kraus >> >> On Thu, Sep 12, 2013 at 12:48 PM, Dheeraj Kandula w= rote: >> >>> Thanks a lot Alfred for the clarification. So is the td_lock granular >>> i.e. >>> one separate lock for each thread but also used for protecting the >>> scheduler variables or is it just one lock used by all threads and the >>> scheduler as well. I will anyway go through the code that you suggested >>> but >>> just wanted to have a deeper understanding before I go about hunting in >>> the >>> code. >>> >>> Dheeraj >>> >>> >>> On Thu, Sep 12, 2013 at 3:10 AM, Alfred Perlstein wrote= : >>> >>> > On 9/11/13 2:39 PM, Dheeraj Kandula wrote: >>> > >>> >> Hey All, >>> >> >>> >> When the current thread is being context switched with a newly >>> selected >>> >> thread, why is the current thread's lock acquired before context >>> switch =96 >>> >> mi_switch() is invoked after thread_lock(td) is called. A thread at >>> any >>> >> time runs only on one of the cores of a CPU. Hence when it is being >>> >> context >>> >> switched it is added either to the real time runq or the timeshare >>> runq or >>> >> the idle runq with the lock still held or it is added to the sleep >>> queue >>> >> or >>> >> the blocked queue. So this happens atomically even without the lock. >>> Isn't >>> >> it? Am I missing something here? I don't see any contention for the >>> thread >>> >> in order to demand a lock for the thread which will basically protec= t >>> the >>> >> contents of the thread structure for the thread. >>> >> >>> >> Dheeraj >>> >> >>> >> >>> > The thread lock also happens to protect various scheduler variables: >>> > >>> > struct mtx *volatile td_lock; /* replaces sched lock */ >>> > >>> > see sys/kern/sched_ule.c on how the thread lock td_lock is changed >>> > depending on what the thread is doing. >>> > >>> > -- >>> > Alfred Perlstein >>> > >>> > >>> _______________________________________________ >>> freebsd-arch@freebsd.org mailing list >>> http://lists.freebsd.org/mailman/listinfo/freebsd-arch >>> To unsubscribe, send any mail to "freebsd-arch-unsubscribe@freebsd.org" >>> >> >> > From owner-freebsd-arch@FreeBSD.ORG Thu Sep 12 15:59:57 2013 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTP id 4FB654AC for ; Thu, 12 Sep 2013 15:59:57 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from bigwig.baldwin.cx (bigwig.baldwin.cx [IPv6:2001:470:1f11:75::1]) (using TLSv1 with cipher ADH-CAMELLIA256-SHA (256/256 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 06D712CA9 for ; Thu, 12 Sep 2013 15:59:57 +0000 (UTC) Received: from jhbbsd.localnet (unknown [209.249.190.124]) by bigwig.baldwin.cx (Postfix) with ESMTPSA id CFF68B939; Thu, 12 Sep 2013 11:59:55 -0400 (EDT) From: John Baldwin To: freebsd-arch@freebsd.org Subject: Re: Why do we need to acquire the current thread's lock before context switching? Date: Thu, 12 Sep 2013 08:24:52 -0400 User-Agent: KMail/1.13.5 (FreeBSD/8.4-CBSD-20130906; KDE/4.5.5; amd64; ; ) References: In-Reply-To: MIME-Version: 1.0 Content-Type: Text/Plain; charset="windows-1252" Content-Transfer-Encoding: quoted-printable Message-Id: <201309120824.52916.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.7 (bigwig.baldwin.cx); Thu, 12 Sep 2013 11:59:55 -0400 (EDT) Cc: Alfred Perlstein , Svatopluk Kraus , Dheeraj Kandula X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 12 Sep 2013 15:59:57 -0000 On Thursday, September 12, 2013 7:16:20 am Dheeraj Kandula wrote: > Thanks a lot Svatopluk for the clarification. Right after I replied to > Alfred's mail, I realized that it can't be thread specific lock as it > should also protect the scheduler variables. So if I understand it right, > even though it is a mutex, it can be unlocked by another thread which is > usually not the case with regular mutexes as the thread that locks it must > unlock it unlike a binary semaphore. Isn't it? It's less complicated than that. :) It is a mutex, but to expand on what Svatopluk said with an example: take a thread that is asleep on a sleep queue. td_lock points to the relevant SC_LOCK() for the sleep queue chain in that case, so any other thread that wants to examine that thread's state ends up locking the sleep queue while it examines that thread. In particular, the thread that is doing a wakeup() can resume all of the sleeping threads for a wait channel by holding the one SC_LOCK() for that wait channel since that will be td_lock for all those threads. In general mutexes are only unlocked by the thread that locks them, and the td_lock of the old thread is unlocked during sched_switch(). However, the old thread has to grab td_lock of the new thread during sched_switch() and then hand it off to the new thread when it resumes. This is why sched_throw() and sched_switch() in ULE directly assign 'mtx_lock' of the run queue lock before calling cpu_throw() or cpu_switch(). That gives the effect that the new thread resumes while holding the lock pinted to by its td_lock. > Dheeraj >=20 >=20 > On Thu, Sep 12, 2013 at 7:04 AM, Svatopluk Kraus wrote: >=20 > > Think about td_lock like something what is lent by current thread owner. > > If a thread is running, it's owned by scheduler and td_lock points > > to scheduler lock. If a thread is sleeping, it's owned by sleeping queue > > and td_lock points to sleep queue lock. If a thread is contested, it's > > owned by turnstile queue and td_lock points to turnstile queue lock. An= d so > > on. This way an owner can work with owned threads safely without giant > > lock. The td_lock pointer is changed atomically, so it's safe. > > > > Svatopluk Kraus > > > > On Thu, Sep 12, 2013 at 12:48 PM, Dheeraj Kandula w= rote: > > > >> Thanks a lot Alfred for the clarification. So is the td_lock granular = i.e. > >> one separate lock for each thread but also used for protecting the > >> scheduler variables or is it just one lock used by all threads and the > >> scheduler as well. I will anyway go through the code that you suggested > >> but > >> just wanted to have a deeper understanding before I go about hunting in > >> the > >> code. > >> > >> Dheeraj > >> > >> > >> On Thu, Sep 12, 2013 at 3:10 AM, Alfred Perlstein wrot= e: > >> > >> > On 9/11/13 2:39 PM, Dheeraj Kandula wrote: > >> > > >> >> Hey All, > >> >> > >> >> When the current thread is being context switched with a newly sele= cted > >> >> thread, why is the current thread's lock acquired before context > >> switch =96 > >> >> mi_switch() is invoked after thread_lock(td) is called. A thread at= any > >> >> time runs only on one of the cores of a CPU. Hence when it is being > >> >> context > >> >> switched it is added either to the real time runq or the timeshare > >> runq or > >> >> the idle runq with the lock still held or it is added to the sleep > >> queue > >> >> or > >> >> the blocked queue. So this happens atomically even without the lock. > >> Isn't > >> >> it? Am I missing something here? I don't see any contention for the > >> thread > >> >> in order to demand a lock for the thread which will basically prote= ct > >> the > >> >> contents of the thread structure for the thread. > >> >> > >> >> Dheeraj > >> >> > >> >> > >> > The thread lock also happens to protect various scheduler variables: > >> > > >> > struct mtx *volatile td_lock; /* replaces sched lock */ > >> > > >> > see sys/kern/sched_ule.c on how the thread lock td_lock is changed > >> > depending on what the thread is doing. > >> > > >> > -- > >> > Alfred Perlstein > >> > > >> > > >> _______________________________________________ > >> freebsd-arch@freebsd.org mailing list > >> http://lists.freebsd.org/mailman/listinfo/freebsd-arch > >> To unsubscribe, send any mail to "freebsd-arch-unsubscribe@freebsd.org" > >> > > > > > _______________________________________________ > freebsd-arch@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-arch > To unsubscribe, send any mail to "freebsd-arch-unsubscribe@freebsd.org" >=20 =2D-=20 John Baldwin From owner-freebsd-arch@FreeBSD.ORG Thu Sep 12 17:57:15 2013 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTP id 7EF962A5 for ; Thu, 12 Sep 2013 17:57:15 +0000 (UTC) (envelope-from dkandula@gmail.com) Received: from mail-wi0-x233.google.com (mail-wi0-x233.google.com [IPv6:2a00:1450:400c:c05::233]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 0A4B024BC for ; Thu, 12 Sep 2013 17:57:14 +0000 (UTC) Received: by mail-wi0-f179.google.com with SMTP id hm2so163139wib.0 for ; Thu, 12 Sep 2013 10:57:13 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=SvcT7/+VVMGrttiOSmJ0T0tw6L8NUw6RzugmZv6xKM8=; b=U3kNvjX54hvmRFVGT3GnMkm4l4FC2PsLOAyCGSQs/Q8MpVMx6uOlhS7YN73Xm1g430 2Qmiey8e8UqYdanPuQUC3C8Yct/Dfs7rcN9hRVPkH/g+w8WWDKxc41h7IP/ozQ76KujH pRSp4Ehwy1MNG+oWYWFJ9vPHd4o/Yx6UiZ1S/6PnpLMYiOFN4LWDXRYapfM6jj4jwMWi pbe9EiZ3C+v6uD1ZEUUDs6Fduq81iTmwnrU3jT0V9nKMCUZquXfQqPCAqAN8fAnUQYtB uwxM1nezBQigzU/1rJPWkh90VP4o95ZMfAgnmWxaKxpitVdOuGJgGoupsVoOjbkPzTOH ySMA== MIME-Version: 1.0 X-Received: by 10.180.188.49 with SMTP id fx17mr6943418wic.49.1379008633537; Thu, 12 Sep 2013 10:57:13 -0700 (PDT) Received: by 10.194.38.167 with HTTP; Thu, 12 Sep 2013 10:57:13 -0700 (PDT) In-Reply-To: References: <523168EE.4070508@mu.org> Date: Thu, 12 Sep 2013 13:57:13 -0400 Message-ID: Subject: Re: Why do we need to acquire the current thread's lock before context switching? From: Dheeraj Kandula To: Svatopluk Kraus Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.14 Cc: Alfred Perlstein , freebsd-arch@freebsd.org X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 12 Sep 2013 17:57:15 -0000 Great. Now I understand it better. Thanks for the reply Svatopluk. Dheeraj On Thu, Sep 12, 2013 at 8:30 AM, Svatopluk Kraus wrote: > Yes, td_lock is a bit of magic and not used like ordinary mutex certainly= . > And more, some dirty (not standard and hidden) things happen to it at lea= st > during task switch. > > Svatopluk Kraus > > On Thu, Sep 12, 2013 at 1:16 PM, Dheeraj Kandula wrot= e: > >> Thanks a lot Svatopluk for the clarification. Right after I replied to >> Alfred's mail, I realized that it can't be thread specific lock as it >> should also protect the scheduler variables. So if I understand it right= , >> even though it is a mutex, it can be unlocked by another thread which is >> usually not the case with regular mutexes as the thread that locks it mu= st >> unlock it unlike a binary semaphore. Isn't it? >> >> Dheeraj >> >> >> On Thu, Sep 12, 2013 at 7:04 AM, Svatopluk Kraus wrote= : >> >>> Think about td_lock like something what is lent by current thread owner= . >>> If a thread is running, it's owned by scheduler and td_lock points >>> to scheduler lock. If a thread is sleeping, it's owned by sleeping queu= e >>> and td_lock points to sleep queue lock. If a thread is contested, it's >>> owned by turnstile queue and td_lock points to turnstile queue lock. An= d so >>> on. This way an owner can work with owned threads safely without giant >>> lock. The td_lock pointer is changed atomically, so it's safe. >>> >>> Svatopluk Kraus >>> >>> On Thu, Sep 12, 2013 at 12:48 PM, Dheeraj Kandula = wrote: >>> >>>> Thanks a lot Alfred for the clarification. So is the td_lock granular >>>> i.e. >>>> one separate lock for each thread but also used for protecting the >>>> scheduler variables or is it just one lock used by all threads and the >>>> scheduler as well. I will anyway go through the code that you suggeste= d >>>> but >>>> just wanted to have a deeper understanding before I go about hunting i= n >>>> the >>>> code. >>>> >>>> Dheeraj >>>> >>>> >>>> On Thu, Sep 12, 2013 at 3:10 AM, Alfred Perlstein >>>> wrote: >>>> >>>> > On 9/11/13 2:39 PM, Dheeraj Kandula wrote: >>>> > >>>> >> Hey All, >>>> >> >>>> >> When the current thread is being context switched with a newly >>>> selected >>>> >> thread, why is the current thread's lock acquired before context >>>> switch =96 >>>> >> mi_switch() is invoked after thread_lock(td) is called. A thread at >>>> any >>>> >> time runs only on one of the cores of a CPU. Hence when it is being >>>> >> context >>>> >> switched it is added either to the real time runq or the timeshare >>>> runq or >>>> >> the idle runq with the lock still held or it is added to the sleep >>>> queue >>>> >> or >>>> >> the blocked queue. So this happens atomically even without the lock= . >>>> Isn't >>>> >> it? Am I missing something here? I don't see any contention for the >>>> thread >>>> >> in order to demand a lock for the thread which will basically >>>> protect the >>>> >> contents of the thread structure for the thread. >>>> >> >>>> >> Dheeraj >>>> >> >>>> >> >>>> > The thread lock also happens to protect various scheduler variables: >>>> > >>>> > struct mtx *volatile td_lock; /* replaces sched lock */ >>>> > >>>> > see sys/kern/sched_ule.c on how the thread lock td_lock is changed >>>> > depending on what the thread is doing. >>>> > >>>> > -- >>>> > Alfred Perlstein >>>> > >>>> > >>>> _______________________________________________ >>>> freebsd-arch@freebsd.org mailing list >>>> http://lists.freebsd.org/mailman/listinfo/freebsd-arch >>>> To unsubscribe, send any mail to "freebsd-arch-unsubscribe@freebsd.org= " >>>> >>> >>> >> > From owner-freebsd-arch@FreeBSD.ORG Thu Sep 12 18:00:31 2013 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTP id 9602D3CE; Thu, 12 Sep 2013 18:00:31 +0000 (UTC) (envelope-from dkandula@gmail.com) Received: from mail-wg0-x231.google.com (mail-wg0-x231.google.com [IPv6:2a00:1450:400c:c00::231]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id ECDD8250D; Thu, 12 Sep 2013 18:00:30 +0000 (UTC) Received: by mail-wg0-f49.google.com with SMTP id l18so149157wgh.4 for ; Thu, 12 Sep 2013 11:00:29 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=BNaQKmIfay73VNuYodkgNBVhFCTQbs7EranM2VPTVZw=; b=Hhogtid9SyRg5I/nuzSz7ygZ5SLFBzIx9BUuVJ/LiOx0BgvguDEATRms7icLBDF9LI yj4gwOaWR0mdRH784eJe+9pXEBNfvmLLWpP4902phbqUbJrjZ2/7019f3aSjn1wS3gAB +y6cqedi8j+pZuz4Y4qut35Y8YnMe8Yr2wJaM8a3S1vQezgDdjMeV6WpSH9ZL2Yv4miy 3KJwejT33LU82AtR/RS3NuLzppcP/bfQ68oNzZ8UhHR04cDhj2vwVMvPJtayBmxKhkf/ hC/lRGnZ0P9srtXKgXmco2clUP1BJGzXrkpzEsevoCvHa6TrQbpIo+STlBa9IQo1q0pj d+JA== MIME-Version: 1.0 X-Received: by 10.194.158.67 with SMTP id ws3mr7349703wjb.5.1379008829278; Thu, 12 Sep 2013 11:00:29 -0700 (PDT) Received: by 10.194.38.167 with HTTP; Thu, 12 Sep 2013 11:00:29 -0700 (PDT) In-Reply-To: <201309120824.52916.jhb@freebsd.org> References: <201309120824.52916.jhb@freebsd.org> Date: Thu, 12 Sep 2013 14:00:29 -0400 Message-ID: Subject: Re: Why do we need to acquire the current thread's lock before context switching? From: Dheeraj Kandula To: John Baldwin Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.14 Cc: Alfred Perlstein , Svatopluk Kraus , freebsd-arch@freebsd.org X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 12 Sep 2013 18:00:31 -0000 Thanks John for the detailed clarification. Wow that is lot of information. I will digest it and will email you any further questions that I may have. Dheeraj On Thu, Sep 12, 2013 at 8:24 AM, John Baldwin wrote: > On Thursday, September 12, 2013 7:16:20 am Dheeraj Kandula wrote: > > Thanks a lot Svatopluk for the clarification. Right after I replied to > > Alfred's mail, I realized that it can't be thread specific lock as it > > should also protect the scheduler variables. So if I understand it righ= t, > > even though it is a mutex, it can be unlocked by another thread which i= s > > usually not the case with regular mutexes as the thread that locks it > must > > unlock it unlike a binary semaphore. Isn't it? > > It's less complicated than that. :) It is a mutex, but to expand on what > Svatopluk said with an example: take a thread that is asleep on a sleep > queue. td_lock points to the relevant SC_LOCK() for the sleep queue chai= n > in that case, so any other thread that wants to examine that thread's > state ends up locking the sleep queue while it examines that thread. In > particular, the thread that is doing a wakeup() can resume all of the > sleeping threads for a wait channel by holding the one SC_LOCK() for that > wait channel since that will be td_lock for all those threads. > > In general mutexes are only unlocked by the thread that locks them, > and the td_lock of the old thread is unlocked during sched_switch(). > However, the old thread has to grab td_lock of the new thread during > sched_switch() and then hand it off to the new thread when it resumes. > This is why sched_throw() and sched_switch() in ULE directly assign > 'mtx_lock' of the run queue lock before calling cpu_throw() or > cpu_switch(). That gives the effect that the new thread resumes while > holding the lock pinted to by its td_lock. > > > Dheeraj > > > > > > On Thu, Sep 12, 2013 at 7:04 AM, Svatopluk Kraus > wrote: > > > > > Think about td_lock like something what is lent by current thread > owner. > > > If a thread is running, it's owned by scheduler and td_lock points > > > to scheduler lock. If a thread is sleeping, it's owned by sleeping > queue > > > and td_lock points to sleep queue lock. If a thread is contested, it'= s > > > owned by turnstile queue and td_lock points to turnstile queue lock. > And so > > > on. This way an owner can work with owned threads safely without gian= t > > > lock. The td_lock pointer is changed atomically, so it's safe. > > > > > > Svatopluk Kraus > > > > > > On Thu, Sep 12, 2013 at 12:48 PM, Dheeraj Kandula >wrote: > > > > > >> Thanks a lot Alfred for the clarification. So is the td_lock granula= r > i.e. > > >> one separate lock for each thread but also used for protecting the > > >> scheduler variables or is it just one lock used by all threads and t= he > > >> scheduler as well. I will anyway go through the code that you > suggested > > >> but > > >> just wanted to have a deeper understanding before I go about hunting > in > > >> the > > >> code. > > >> > > >> Dheeraj > > >> > > >> > > >> On Thu, Sep 12, 2013 at 3:10 AM, Alfred Perlstein > wrote: > > >> > > >> > On 9/11/13 2:39 PM, Dheeraj Kandula wrote: > > >> > > > >> >> Hey All, > > >> >> > > >> >> When the current thread is being context switched with a newly > selected > > >> >> thread, why is the current thread's lock acquired before context > > >> switch =96 > > >> >> mi_switch() is invoked after thread_lock(td) is called. A thread > at any > > >> >> time runs only on one of the cores of a CPU. Hence when it is bei= ng > > >> >> context > > >> >> switched it is added either to the real time runq or the timeshar= e > > >> runq or > > >> >> the idle runq with the lock still held or it is added to the slee= p > > >> queue > > >> >> or > > >> >> the blocked queue. So this happens atomically even without the > lock. > > >> Isn't > > >> >> it? Am I missing something here? I don't see any contention for t= he > > >> thread > > >> >> in order to demand a lock for the thread which will basically > protect > > >> the > > >> >> contents of the thread structure for the thread. > > >> >> > > >> >> Dheeraj > > >> >> > > >> >> > > >> > The thread lock also happens to protect various scheduler variable= s: > > >> > > > >> > struct mtx *volatile td_lock; /* replaces sched lock = */ > > >> > > > >> > see sys/kern/sched_ule.c on how the thread lock td_lock is changed > > >> > depending on what the thread is doing. > > >> > > > >> > -- > > >> > Alfred Perlstein > > >> > > > >> > > > >> _______________________________________________ > > >> freebsd-arch@freebsd.org mailing list > > >> http://lists.freebsd.org/mailman/listinfo/freebsd-arch > > >> To unsubscribe, send any mail to " > freebsd-arch-unsubscribe@freebsd.org" > > >> > > > > > > > > _______________________________________________ > > freebsd-arch@freebsd.org mailing list > > http://lists.freebsd.org/mailman/listinfo/freebsd-arch > > To unsubscribe, send any mail to "freebsd-arch-unsubscribe@freebsd.org" > > > > -- > John Baldwin > From owner-freebsd-arch@FreeBSD.ORG Thu Sep 12 20:00:58 2013 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTP id 728304CD; Thu, 12 Sep 2013 20:00:58 +0000 (UTC) (envelope-from dkandula@gmail.com) Received: from mail-we0-x229.google.com (mail-we0-x229.google.com [IPv6:2a00:1450:400c:c03::229]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id CE0BC2359; Thu, 12 Sep 2013 20:00:57 +0000 (UTC) Received: by mail-we0-f169.google.com with SMTP id t60so289376wes.14 for ; Thu, 12 Sep 2013 13:00:56 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=fze8uhHHRjiK8gKCJbK9e9OfJloWld0HbM9zidUaF/Q=; b=c+MCuQZu1Xjh4pAwrNfEVGLW+h0XcrKS6Gzql/CQ5uFTCYAmJkyMwObPjsuVpCV4rq GvU/D7VzU17ElGhCdC6DuHA5lCNYOo9UU6EP/E9YN50t9dZYTHNwsY8jNGuQpCdH0Ui+ 6+ay22MSazZSS762xrgHNU2xVD1NGV5/U78/ojew5Td4iJleKmfdf9mdR8HpTy9Of2Ab 9mRVZI9MqxEx0FdDdnSJ2ayWAW0hBVen2EgIQ/y+BzOEzZj1Rfir+F4wBPO4RCUEQW/U FgUZabrPzBqxcu7iJ3EPG++qm0y78fWsf+WiqzRyg5VFrCKK4hbJsUegFjtfRkuofo5s ppQQ== MIME-Version: 1.0 X-Received: by 10.194.109.35 with SMTP id hp3mr3249896wjb.55.1379016056246; Thu, 12 Sep 2013 13:00:56 -0700 (PDT) Received: by 10.194.38.167 with HTTP; Thu, 12 Sep 2013 13:00:56 -0700 (PDT) In-Reply-To: References: <201309120824.52916.jhb@freebsd.org> Date: Thu, 12 Sep 2013 16:00:56 -0400 Message-ID: Subject: Re: Why do we need to acquire the current thread's lock before context switching? From: Dheeraj Kandula To: John Baldwin Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.14 Cc: Alfred Perlstein , Svatopluk Kraus , freebsd-arch@freebsd.org X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 12 Sep 2013 20:00:58 -0000 Hey John, I think I get it now clearly. The td_lock of each thread actually points to the Thread Queue's lock on which it is present. i.e. run queue which may either be the real time runq, timeshare runq or the idle runq. For sleep the td_lock points to the blocked_lock which is a global lock protecting the sleep queue I think. Before cpu_switch() is invoked, the old thread's td_lock is released as shown below: the code is from sched_switch of sched_ule.c lock_profile_release_lock (&TDQ_LOCKPTR (tdq)->lock_object ); TDQ_LOCKPTR (tdq)->mtx_lock =3D (uintptr_t )newtd ; Later after cpu_switch is done, lock_profile_obtain_lock_success (&TDQ_LOCKPTR (tdq)->lock_object , 0, 0, __FILE__ , __LINE__ ); is executed which locks the lock of the thread queue on the current CPU which can be on a different CPU. I assume the new thread's td_lock points to the current CPU's thread queue. Now it is clear that the mutex is unlocked by the same thread that locks it= . Hope my understanding is correct. Dheeraj On Thu, Sep 12, 2013 at 2:00 PM, Dheeraj Kandula wrote= : > Thanks John for the detailed clarification. Wow that is lot of > information. I will digest it and will email you any further questions th= at > I may have. > > Dheeraj > > > On Thu, Sep 12, 2013 at 8:24 AM, John Baldwin wrote: > >> On Thursday, September 12, 2013 7:16:20 am Dheeraj Kandula wrote: >> > Thanks a lot Svatopluk for the clarification. Right after I replied to >> > Alfred's mail, I realized that it can't be thread specific lock as it >> > should also protect the scheduler variables. So if I understand it >> right, >> > even though it is a mutex, it can be unlocked by another thread which = is >> > usually not the case with regular mutexes as the thread that locks it >> must >> > unlock it unlike a binary semaphore. Isn't it? >> >> It's less complicated than that. :) It is a mutex, but to expand on wha= t >> Svatopluk said with an example: take a thread that is asleep on a sleep >> queue. td_lock points to the relevant SC_LOCK() for the sleep queue cha= in >> in that case, so any other thread that wants to examine that thread's >> state ends up locking the sleep queue while it examines that thread. In >> particular, the thread that is doing a wakeup() can resume all of the >> sleeping threads for a wait channel by holding the one SC_LOCK() for tha= t >> wait channel since that will be td_lock for all those threads. >> >> In general mutexes are only unlocked by the thread that locks them, >> and the td_lock of the old thread is unlocked during sched_switch(). >> However, the old thread has to grab td_lock of the new thread during >> sched_switch() and then hand it off to the new thread when it resumes. >> This is why sched_throw() and sched_switch() in ULE directly assign >> 'mtx_lock' of the run queue lock before calling cpu_throw() or >> cpu_switch(). That gives the effect that the new thread resumes while >> holding the lock pinted to by its td_lock. >> >> > Dheeraj >> > >> > >> > On Thu, Sep 12, 2013 at 7:04 AM, Svatopluk Kraus >> wrote: >> > >> > > Think about td_lock like something what is lent by current thread >> owner. >> > > If a thread is running, it's owned by scheduler and td_lock points >> > > to scheduler lock. If a thread is sleeping, it's owned by sleeping >> queue >> > > and td_lock points to sleep queue lock. If a thread is contested, it= 's >> > > owned by turnstile queue and td_lock points to turnstile queue lock. >> And so >> > > on. This way an owner can work with owned threads safely without gia= nt >> > > lock. The td_lock pointer is changed atomically, so it's safe. >> > > >> > > Svatopluk Kraus >> > > >> > > On Thu, Sep 12, 2013 at 12:48 PM, Dheeraj Kandula > >wrote: >> > > >> > >> Thanks a lot Alfred for the clarification. So is the td_lock >> granular i.e. >> > >> one separate lock for each thread but also used for protecting the >> > >> scheduler variables or is it just one lock used by all threads and >> the >> > >> scheduler as well. I will anyway go through the code that you >> suggested >> > >> but >> > >> just wanted to have a deeper understanding before I go about huntin= g >> in >> > >> the >> > >> code. >> > >> >> > >> Dheeraj >> > >> >> > >> >> > >> On Thu, Sep 12, 2013 at 3:10 AM, Alfred Perlstein >> wrote: >> > >> >> > >> > On 9/11/13 2:39 PM, Dheeraj Kandula wrote: >> > >> > >> > >> >> Hey All, >> > >> >> >> > >> >> When the current thread is being context switched with a newly >> selected >> > >> >> thread, why is the current thread's lock acquired before context >> > >> switch =96 >> > >> >> mi_switch() is invoked after thread_lock(td) is called. A thread >> at any >> > >> >> time runs only on one of the cores of a CPU. Hence when it is >> being >> > >> >> context >> > >> >> switched it is added either to the real time runq or the timesha= re >> > >> runq or >> > >> >> the idle runq with the lock still held or it is added to the sle= ep >> > >> queue >> > >> >> or >> > >> >> the blocked queue. So this happens atomically even without the >> lock. >> > >> Isn't >> > >> >> it? Am I missing something here? I don't see any contention for >> the >> > >> thread >> > >> >> in order to demand a lock for the thread which will basically >> protect >> > >> the >> > >> >> contents of the thread structure for the thread. >> > >> >> >> > >> >> Dheeraj >> > >> >> >> > >> >> >> > >> > The thread lock also happens to protect various scheduler >> variables: >> > >> > >> > >> > struct mtx *volatile td_lock; /* replaces sched lock >> */ >> > >> > >> > >> > see sys/kern/sched_ule.c on how the thread lock td_lock is change= d >> > >> > depending on what the thread is doing. >> > >> > >> > >> > -- >> > >> > Alfred Perlstein >> > >> > >> > >> > >> > >> _______________________________________________ >> > >> freebsd-arch@freebsd.org mailing list >> > >> http://lists.freebsd.org/mailman/listinfo/freebsd-arch >> > >> To unsubscribe, send any mail to " >> freebsd-arch-unsubscribe@freebsd.org" >> > >> >> > > >> > > >> > _______________________________________________ >> > freebsd-arch@freebsd.org mailing list >> > http://lists.freebsd.org/mailman/listinfo/freebsd-arch >> > To unsubscribe, send any mail to "freebsd-arch-unsubscribe@freebsd.org= " >> > >> >> -- >> John Baldwin >> > > From owner-freebsd-arch@FreeBSD.ORG Thu Sep 12 20:21:39 2013 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTP id 93A28C44; Thu, 12 Sep 2013 20:21:39 +0000 (UTC) (envelope-from bright@mu.org) Received: from elvis.mu.org (elvis.mu.org [192.203.228.196]) by mx1.freebsd.org (Postfix) with ESMTP id 7F0BD24B5; Thu, 12 Sep 2013 20:21:39 +0000 (UTC) Received: from [10.239.127.44] (5.sub-174-254-208.myvzw.com [174.254.208.5]) by elvis.mu.org (Postfix) with ESMTPSA id F2F5A1A3E3F; Thu, 12 Sep 2013 13:21:37 -0700 (PDT) References: <201309120824.52916.jhb@freebsd.org> Mime-Version: 1.0 (1.0) In-Reply-To: <201309120824.52916.jhb@freebsd.org> Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Message-Id: X-Mailer: iPhone Mail (10B329) From: Alfred Perlstein Subject: Re: Why do we need to acquire the current thread's lock before context switching? Date: Thu, 12 Sep 2013 13:21:35 -0700 To: John Baldwin Cc: Dheeraj Kandula , Svatopluk Kraus , "freebsd-arch@freebsd.org" X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 12 Sep 2013 20:21:39 -0000 Both these explanations are so great. Is there any way we can add this to pr= oc.h or maybe document somewhere and then link to it from proc.h? Sent from my iPhone On Sep 12, 2013, at 5:24 AM, John Baldwin wrote: > On Thursday, September 12, 2013 7:16:20 am Dheeraj Kandula wrote: >> Thanks a lot Svatopluk for the clarification. Right after I replied to >> Alfred's mail, I realized that it can't be thread specific lock as it >> should also protect the scheduler variables. So if I understand it right,= >> even though it is a mutex, it can be unlocked by another thread which is >> usually not the case with regular mutexes as the thread that locks it mus= t >> unlock it unlike a binary semaphore. Isn't it? >=20 > It's less complicated than that. :) It is a mutex, but to expand on what > Svatopluk said with an example: take a thread that is asleep on a sleep > queue. td_lock points to the relevant SC_LOCK() for the sleep queue chain= > in that case, so any other thread that wants to examine that thread's > state ends up locking the sleep queue while it examines that thread. In > particular, the thread that is doing a wakeup() can resume all of the > sleeping threads for a wait channel by holding the one SC_LOCK() for that > wait channel since that will be td_lock for all those threads. >=20 > In general mutexes are only unlocked by the thread that locks them, > and the td_lock of the old thread is unlocked during sched_switch(). > However, the old thread has to grab td_lock of the new thread during > sched_switch() and then hand it off to the new thread when it resumes. > This is why sched_throw() and sched_switch() in ULE directly assign > 'mtx_lock' of the run queue lock before calling cpu_throw() or > cpu_switch(). That gives the effect that the new thread resumes while > holding the lock pinted to by its td_lock. >=20 >> Dheeraj >>=20 >>=20 >> On Thu, Sep 12, 2013 at 7:04 AM, Svatopluk Kraus wrote= : >>=20 >>> Think about td_lock like something what is lent by current thread owner.= >>> If a thread is running, it's owned by scheduler and td_lock points >>> to scheduler lock. If a thread is sleeping, it's owned by sleeping queue= >>> and td_lock points to sleep queue lock. If a thread is contested, it's >>> owned by turnstile queue and td_lock points to turnstile queue lock. And= so >>> on. This way an owner can work with owned threads safely without giant >>> lock. The td_lock pointer is changed atomically, so it's safe. >>>=20 >>> Svatopluk Kraus >>>=20 >>> On Thu, Sep 12, 2013 at 12:48 PM, Dheeraj Kandula wr= ote: >>>=20 >>>> Thanks a lot Alfred for the clarification. So is the td_lock granular i= .e. >>>> one separate lock for each thread but also used for protecting the >>>> scheduler variables or is it just one lock used by all threads and the >>>> scheduler as well. I will anyway go through the code that you suggested= >>>> but >>>> just wanted to have a deeper understanding before I go about hunting in= >>>> the >>>> code. >>>>=20 >>>> Dheeraj >>>>=20 >>>>=20 >>>> On Thu, Sep 12, 2013 at 3:10 AM, Alfred Perlstein wrote= : >>>>=20 >>>>> On 9/11/13 2:39 PM, Dheeraj Kandula wrote: >>>>>=20 >>>>>> Hey All, >>>>>>=20 >>>>>> When the current thread is being context switched with a newly select= ed >>>>>> thread, why is the current thread's lock acquired before context >>>> switch =E2=80=93 >>>>>> mi_switch() is invoked after thread_lock(td) is called. A thread at a= ny >>>>>> time runs only on one of the cores of a CPU. Hence when it is being >>>>>> context >>>>>> switched it is added either to the real time runq or the timeshare >>>> runq or >>>>>> the idle runq with the lock still held or it is added to the sleep >>>> queue >>>>>> or >>>>>> the blocked queue. So this happens atomically even without the lock. >>>> Isn't >>>>>> it? Am I missing something here? I don't see any contention for the >>>> thread >>>>>> in order to demand a lock for the thread which will basically protect= >>>> the >>>>>> contents of the thread structure for the thread. >>>>>>=20 >>>>>> Dheeraj >>>>> The thread lock also happens to protect various scheduler variables: >>>>>=20 >>>>> struct mtx *volatile td_lock; /* replaces sched lock */ >>>>>=20 >>>>> see sys/kern/sched_ule.c on how the thread lock td_lock is changed >>>>> depending on what the thread is doing. >>>>>=20 >>>>> -- >>>>> Alfred Perlstein >>>> _______________________________________________ >>>> freebsd-arch@freebsd.org mailing list >>>> http://lists.freebsd.org/mailman/listinfo/freebsd-arch >>>> To unsubscribe, send any mail to "freebsd-arch-unsubscribe@freebsd.org"= >> _______________________________________________ >> freebsd-arch@freebsd.org mailing list >> http://lists.freebsd.org/mailman/listinfo/freebsd-arch >> To unsubscribe, send any mail to "freebsd-arch-unsubscribe@freebsd.org" >=20 > --=20 > John Baldwin > _______________________________________________ > freebsd-arch@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-arch > To unsubscribe, send any mail to "freebsd-arch-unsubscribe@freebsd.org" >=20 From owner-freebsd-arch@FreeBSD.ORG Thu Sep 12 20:27:47 2013 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTP id 5F9BBD50; Thu, 12 Sep 2013 20:27:47 +0000 (UTC) (envelope-from dkandula@gmail.com) Received: from mail-wi0-x22c.google.com (mail-wi0-x22c.google.com [IPv6:2a00:1450:400c:c05::22c]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id B7C5E24E4; Thu, 12 Sep 2013 20:27:46 +0000 (UTC) Received: by mail-wi0-f172.google.com with SMTP id c10so4114681wiw.5 for ; Thu, 12 Sep 2013 13:27:45 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=XEqEByXHjfUcs5/bJZBZJHeOeTUyc7N1mql3Z8f42is=; b=B+xs8obAN2KPWcqkNUjmKSdh7PueMLaltG60qr+W9PKJGDgnhVtYsnX3UyAOf2oj2m SUe2UOhT+E0ky1wrayJAgM2THkuzroGEkNNJWqW0WjFzQY085y5ChcjcOKhmlUpAxph3 pIIYYFRwZ0ZclLbLc2bmMUJ08Z2FUpfgi8xO2iFHs6GVknwNfXhy+fAkprUBAJW8DJQW vBhdL3+SSuzyvHa2nx0xF+O2Ab752WWAhqpJ2Ox458N35xjcdyD7gm4uRWDN+JF9UK5x 3UKfdbSbgc88fSSj8NC2eQHs8FxAVUS4PqkuxdmWi7U3Zj6pby4NnXZZDaLB0MEfbnH5 jatQ== MIME-Version: 1.0 X-Received: by 10.194.201.168 with SMTP id kb8mr2768137wjc.63.1379017665061; Thu, 12 Sep 2013 13:27:45 -0700 (PDT) Received: by 10.194.38.167 with HTTP; Thu, 12 Sep 2013 13:27:44 -0700 (PDT) In-Reply-To: References: <201309120824.52916.jhb@freebsd.org> Date: Thu, 12 Sep 2013 16:27:44 -0400 Message-ID: Subject: Re: Why do we need to acquire the current thread's lock before context switching? From: Dheeraj Kandula To: Alfred Perlstein Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.14 Cc: Svatopluk Kraus , "freebsd-arch@freebsd.org" X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 12 Sep 2013 20:27:47 -0000 Hey Alfred, I can create a diff to add the comments to the file proc.h and commit it if that works. Dheeraj On Thu, Sep 12, 2013 at 4:21 PM, Alfred Perlstein wrote: > Both these explanations are so great. Is there any way we can add this to > proc.h or maybe document somewhere and then link to it from proc.h? > > Sent from my iPhone > > On Sep 12, 2013, at 5:24 AM, John Baldwin wrote: > > > On Thursday, September 12, 2013 7:16:20 am Dheeraj Kandula wrote: > >> Thanks a lot Svatopluk for the clarification. Right after I replied to > >> Alfred's mail, I realized that it can't be thread specific lock as it > >> should also protect the scheduler variables. So if I understand it > right, > >> even though it is a mutex, it can be unlocked by another thread which = is > >> usually not the case with regular mutexes as the thread that locks it > must > >> unlock it unlike a binary semaphore. Isn't it? > > > > It's less complicated than that. :) It is a mutex, but to expand on wh= at > > Svatopluk said with an example: take a thread that is asleep on a sleep > > queue. td_lock points to the relevant SC_LOCK() for the sleep queue > chain > > in that case, so any other thread that wants to examine that thread's > > state ends up locking the sleep queue while it examines that thread. I= n > > particular, the thread that is doing a wakeup() can resume all of the > > sleeping threads for a wait channel by holding the one SC_LOCK() for th= at > > wait channel since that will be td_lock for all those threads. > > > > In general mutexes are only unlocked by the thread that locks them, > > and the td_lock of the old thread is unlocked during sched_switch(). > > However, the old thread has to grab td_lock of the new thread during > > sched_switch() and then hand it off to the new thread when it resumes. > > This is why sched_throw() and sched_switch() in ULE directly assign > > 'mtx_lock' of the run queue lock before calling cpu_throw() or > > cpu_switch(). That gives the effect that the new thread resumes while > > holding the lock pinted to by its td_lock. > > > >> Dheeraj > >> > >> > >> On Thu, Sep 12, 2013 at 7:04 AM, Svatopluk Kraus > wrote: > >> > >>> Think about td_lock like something what is lent by current thread > owner. > >>> If a thread is running, it's owned by scheduler and td_lock points > >>> to scheduler lock. If a thread is sleeping, it's owned by sleeping > queue > >>> and td_lock points to sleep queue lock. If a thread is contested, it'= s > >>> owned by turnstile queue and td_lock points to turnstile queue lock. > And so > >>> on. This way an owner can work with owned threads safely without gian= t > >>> lock. The td_lock pointer is changed atomically, so it's safe. > >>> > >>> Svatopluk Kraus > >>> > >>> On Thu, Sep 12, 2013 at 12:48 PM, Dheeraj Kandula >wrote: > >>> > >>>> Thanks a lot Alfred for the clarification. So is the td_lock granula= r > i.e. > >>>> one separate lock for each thread but also used for protecting the > >>>> scheduler variables or is it just one lock used by all threads and t= he > >>>> scheduler as well. I will anyway go through the code that you > suggested > >>>> but > >>>> just wanted to have a deeper understanding before I go about hunting > in > >>>> the > >>>> code. > >>>> > >>>> Dheeraj > >>>> > >>>> > >>>> On Thu, Sep 12, 2013 at 3:10 AM, Alfred Perlstein > wrote: > >>>> > >>>>> On 9/11/13 2:39 PM, Dheeraj Kandula wrote: > >>>>> > >>>>>> Hey All, > >>>>>> > >>>>>> When the current thread is being context switched with a newly > selected > >>>>>> thread, why is the current thread's lock acquired before context > >>>> switch =96 > >>>>>> mi_switch() is invoked after thread_lock(td) is called. A thread a= t > any > >>>>>> time runs only on one of the cores of a CPU. Hence when it is bein= g > >>>>>> context > >>>>>> switched it is added either to the real time runq or the timeshare > >>>> runq or > >>>>>> the idle runq with the lock still held or it is added to the sleep > >>>> queue > >>>>>> or > >>>>>> the blocked queue. So this happens atomically even without the loc= k. > >>>> Isn't > >>>>>> it? Am I missing something here? I don't see any contention for th= e > >>>> thread > >>>>>> in order to demand a lock for the thread which will basically > protect > >>>> the > >>>>>> contents of the thread structure for the thread. > >>>>>> > >>>>>> Dheeraj > >>>>> The thread lock also happens to protect various scheduler variables= : > >>>>> > >>>>> struct mtx *volatile td_lock; /* replaces sched lock */ > >>>>> > >>>>> see sys/kern/sched_ule.c on how the thread lock td_lock is changed > >>>>> depending on what the thread is doing. > >>>>> > >>>>> -- > >>>>> Alfred Perlstein > >>>> _______________________________________________ > >>>> freebsd-arch@freebsd.org mailing list > >>>> http://lists.freebsd.org/mailman/listinfo/freebsd-arch > >>>> To unsubscribe, send any mail to " > freebsd-arch-unsubscribe@freebsd.org" > >> _______________________________________________ > >> freebsd-arch@freebsd.org mailing list > >> http://lists.freebsd.org/mailman/listinfo/freebsd-arch > >> To unsubscribe, send any mail to "freebsd-arch-unsubscribe@freebsd.org= " > > > > -- > > John Baldwin > > _______________________________________________ > > freebsd-arch@freebsd.org mailing list > > http://lists.freebsd.org/mailman/listinfo/freebsd-arch > > To unsubscribe, send any mail to "freebsd-arch-unsubscribe@freebsd.org" > > > From owner-freebsd-arch@FreeBSD.ORG Thu Sep 12 20:44:48 2013 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTP id D0FAE19F; Thu, 12 Sep 2013 20:44:48 +0000 (UTC) (envelope-from dkandula@gmail.com) Received: from mail-we0-x22b.google.com (mail-we0-x22b.google.com [IPv6:2a00:1450:400c:c03::22b]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 32BF92616; Thu, 12 Sep 2013 20:44:48 +0000 (UTC) Received: by mail-we0-f171.google.com with SMTP id t61so328121wes.16 for ; Thu, 12 Sep 2013 13:44:46 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=yW8IWYdR9OHPx2Z4sXRKIVpfI7mUhwzmdmkUQMevL8w=; b=SQSVFvq/yamXw/Xqoj7qFLBiyRWn8wGBpxzV7tCmAd4PQqxVeoGfRZaTjku4eR4yUC Z0wL9sse42rZx/1IEBs5zUbo6oj9BjLQdcGFfynso+SdOBpGDPZaoIWM1oBGIMjU8peI AofbEjDTCMp8o4Ks5D7S0qfvC7Z8O8hHGPlC3Z82wB4IuTHJ4PxPZXKlLiG4DwLj8vr/ z/jyne72RapVnMzWjcExBjSKb0wQyYjciYNFzl/budSd+ly/NsBVX4S+XBMfUI1zwO/e ghTEechWQNVt1DaUBnIHwj7VxrPcG1ySGoK1DA4dJ0skDmajIxAmU0F0S/BQYxVtVTAA EZKw== MIME-Version: 1.0 X-Received: by 10.180.211.111 with SMTP id nb15mr7340351wic.55.1379018686388; Thu, 12 Sep 2013 13:44:46 -0700 (PDT) Received: by 10.194.38.167 with HTTP; Thu, 12 Sep 2013 13:44:46 -0700 (PDT) In-Reply-To: References: <201309120824.52916.jhb@freebsd.org> Date: Thu, 12 Sep 2013 16:44:46 -0400 Message-ID: Subject: Re: Why do we need to acquire the current thread's lock before context switching? From: Dheeraj Kandula To: Alfred Perlstein Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.14 Cc: Svatopluk Kraus , "freebsd-arch@freebsd.org" X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 12 Sep 2013 20:44:49 -0000 # svn diff Index: sys/sys/proc.h =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D --- sys/sys/proc.h (revision 255488) +++ sys/sys/proc.h (working copy) @@ -197,12 +197,44 @@ }; /* + * Comments by: Svatopluk Kraus & John Baldwin + * + * Svatopluk Kraus' comment: + * Think about td_lock like something what is lent by current thread owner. If + * a thread is running, it's owned by scheduler and td_lock points + * to scheduler lock. If a thread is sleeping, it's owned by sleeping queu= e + * and td_lock points to sleep queue lock. If a thread is contested, it's + * owned by turnstile queue and td_lock points to turnstile queue lock. And so + * on. This way an owner can work with owned threads safely without giant + * lock. The td_lock pointer is changed atomically, so it's safe. + * + * John Baldwin's comment: + * For example: take a thread that is asleep on a sleep + * queue. td_lock points to the relevant SC_LOCK() for the sleep queue chain + * in that case, so any other thread that wants to examine that thread's + * state ends up locking the sleep queue while it examines that thread. I= n + * particular, the thread that is doing a wakeup() can resume all of the + * sleeping threads for a wait channel by holding the one SC_LOCK() for that + * wait channel since that will be td_lock for all those threads. + * + * In general mutexes are only unlocked by the thread that locks them, + * and the td_lock of the old thread is unlocked during sched_switch(). + * However, the old thread has to grab td_lock of the new thread during + * sched_switch() and then hand it off to the new thread when it resumes. + * This is why sched_throw() and sched_switch() in ULE directly assign + * 'mtx_lock' of the run queue lock before calling cpu_throw() or + * cpu_switch(). That gives the effect that the new thread resumes while + * holding the lock pinted to by its td_lock. + */ +/* * Kernel runnable context (thread). * This is what is put to sleep and reactivated. * Thread context. Processes may have multiple threads. */ struct thread { - struct mtx *volatile td_lock; /* replaces sched lock */ + struct mtx *volatile td_lock; /* replaces sched lock. Look at the comment + * above for further details. + */ struct proc *td_proc; /* (*) Associated process. */ TAILQ_ENTRY(thread) td_plist; /* (*) All threads in this proc. */ TAILQ_ENTRY(thread) td_runq; /* (t) Run queue. */ On Thu, Sep 12, 2013 at 4:21 PM, Alfred Perlstein wrote: > Both these explanations are so great. Is there any way we can add this to > proc.h or maybe document somewhere and then link to it from proc.h? > > Sent from my iPhone > > On Sep 12, 2013, at 5:24 AM, John Baldwin wrote: > > > On Thursday, September 12, 2013 7:16:20 am Dheeraj Kandula wrote: > >> Thanks a lot Svatopluk for the clarification. Right after I replied to > >> Alfred's mail, I realized that it can't be thread specific lock as it > >> should also protect the scheduler variables. So if I understand it > right, > >> even though it is a mutex, it can be unlocked by another thread which = is > >> usually not the case with regular mutexes as the thread that locks it > must > >> unlock it unlike a binary semaphore. Isn't it? > > > > It's less complicated than that. :) It is a mutex, but to expand on wh= at > > Svatopluk said with an example: take a thread that is asleep on a sleep > > queue. td_lock points to the relevant SC_LOCK() for the sleep queue > chain > > in that case, so any other thread that wants to examine that thread's > > state ends up locking the sleep queue while it examines that thread. I= n > > particular, the thread that is doing a wakeup() can resume all of the > > sleeping threads for a wait channel by holding the one SC_LOCK() for th= at > > wait channel since that will be td_lock for all those threads. > > > > In general mutexes are only unlocked by the thread that locks them, > > and the td_lock of the old thread is unlocked during sched_switch(). > > However, the old thread has to grab td_lock of the new thread during > > sched_switch() and then hand it off to the new thread when it resumes. > > This is why sched_throw() and sched_switch() in ULE directly assign > > 'mtx_lock' of the run queue lock before calling cpu_throw() or > > cpu_switch(). That gives the effect that the new thread resumes while > > holding the lock pinted to by its td_lock. > > > >> Dheeraj > >> > >> > >> On Thu, Sep 12, 2013 at 7:04 AM, Svatopluk Kraus > wrote: > >> > >>> Think about td_lock like something what is lent by current thread > owner. > >>> If a thread is running, it's owned by scheduler and td_lock points > >>> to scheduler lock. If a thread is sleeping, it's owned by sleeping > queue > >>> and td_lock points to sleep queue lock. If a thread is contested, it'= s > >>> owned by turnstile queue and td_lock points to turnstile queue lock. > And so > >>> on. This way an owner can work with owned threads safely without gian= t > >>> lock. The td_lock pointer is changed atomically, so it's safe. > >>> > >>> Svatopluk Kraus > >>> > >>> On Thu, Sep 12, 2013 at 12:48 PM, Dheeraj Kandula >wrote: > >>> > >>>> Thanks a lot Alfred for the clarification. So is the td_lock granula= r > i.e. > >>>> one separate lock for each thread but also used for protecting the > >>>> scheduler variables or is it just one lock used by all threads and t= he > >>>> scheduler as well. I will anyway go through the code that you > suggested > >>>> but > >>>> just wanted to have a deeper understanding before I go about hunting > in > >>>> the > >>>> code. > >>>> > >>>> Dheeraj > >>>> > >>>> > >>>> On Thu, Sep 12, 2013 at 3:10 AM, Alfred Perlstein > wrote: > >>>> > >>>>> On 9/11/13 2:39 PM, Dheeraj Kandula wrote: > >>>>> > >>>>>> Hey All, > >>>>>> > >>>>>> When the current thread is being context switched with a newly > selected > >>>>>> thread, why is the current thread's lock acquired before context > >>>> switch =96 > >>>>>> mi_switch() is invoked after thread_lock(td) is called. A thread a= t > any > >>>>>> time runs only on one of the cores of a CPU. Hence when it is bein= g > >>>>>> context > >>>>>> switched it is added either to the real time runq or the timeshare > >>>> runq or > >>>>>> the idle runq with the lock still held or it is added to the sleep > >>>> queue > >>>>>> or > >>>>>> the blocked queue. So this happens atomically even without the loc= k. > >>>> Isn't > >>>>>> it? Am I missing something here? I don't see any contention for th= e > >>>> thread > >>>>>> in order to demand a lock for the thread which will basically > protect > >>>> the > >>>>>> contents of the thread structure for the thread. > >>>>>> > >>>>>> Dheeraj > >>>>> The thread lock also happens to protect various scheduler variables= : > >>>>> > >>>>> struct mtx *volatile td_lock; /* replaces sched lock */ > >>>>> > >>>>> see sys/kern/sched_ule.c on how the thread lock td_lock is changed > >>>>> depending on what the thread is doing. > >>>>> > >>>>> -- > >>>>> Alfred Perlstein > >>>> _______________________________________________ > >>>> freebsd-arch@freebsd.org mailing list > >>>> http://lists.freebsd.org/mailman/listinfo/freebsd-arch > >>>> To unsubscribe, send any mail to " > freebsd-arch-unsubscribe@freebsd.org" > >> _______________________________________________ > >> freebsd-arch@freebsd.org mailing list > >> http://lists.freebsd.org/mailman/listinfo/freebsd-arch > >> To unsubscribe, send any mail to "freebsd-arch-unsubscribe@freebsd.org= " > > > > -- > > John Baldwin > > _______________________________________________ > > freebsd-arch@freebsd.org mailing list > > http://lists.freebsd.org/mailman/listinfo/freebsd-arch > > To unsubscribe, send any mail to "freebsd-arch-unsubscribe@freebsd.org" > > > From owner-freebsd-arch@FreeBSD.ORG Thu Sep 12 21:10:27 2013 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTP id 902F7AD9 for ; Thu, 12 Sep 2013 21:10:27 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from bigwig.baldwin.cx (bigwig.baldwin.cx [IPv6:2001:470:1f11:75::1]) (using TLSv1 with cipher ADH-CAMELLIA256-SHA (256/256 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 48D4827C5 for ; Thu, 12 Sep 2013 21:10:27 +0000 (UTC) Received: from jhbbsd.localnet (unknown [209.249.190.124]) by bigwig.baldwin.cx (Postfix) with ESMTPSA id 437EDB95B; Thu, 12 Sep 2013 17:10:26 -0400 (EDT) From: John Baldwin To: Dheeraj Kandula Subject: Re: Why do we need to acquire the current thread's lock before context switching? Date: Thu, 12 Sep 2013 17:10:01 -0400 User-Agent: KMail/1.13.5 (FreeBSD/8.4-CBSD-20130906; KDE/4.5.5; amd64; ; ) References: In-Reply-To: MIME-Version: 1.0 Content-Type: Text/Plain; charset="iso-8859-15" Content-Transfer-Encoding: 7bit Message-Id: <201309121710.01307.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.7 (bigwig.baldwin.cx); Thu, 12 Sep 2013 17:10:26 -0400 (EDT) Cc: Alfred Perlstein , Svatopluk Kraus , freebsd-arch@freebsd.org X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 12 Sep 2013 21:10:27 -0000 On Thursday, September 12, 2013 4:00:56 pm Dheeraj Kandula wrote: > Hey John, > I think I get it now clearly. > > The td_lock of each thread actually points to the Thread Queue's lock on > which it is present. i.e. run queue which may either be the real time runq, > timeshare runq or the idle runq. For sleep the td_lock points to the > blocked_lock which is a global lock protecting the sleep queue I think. > > Before cpu_switch() is invoked, the old thread's td_lock is released as > shown below: the code is from sched_switch of sched_ule.c > > lock_profile_release_lock > (&TDQ_LOCKPTR > (tdq)->lock_object > ); > > TDQ_LOCKPTR (tdq)->mtx_lock > = > (uintptr_t )newtd > ; This is not releasing the td_lock of the old thread. TDQ_LOCK() is td_lock of the new thread that is about to run, and the assignment to mtx_lock is transferring ownership of TDQ_LOCK() from the old thread to the new thread. The lock_profile calls fake an unlock / lock so the profiling code doesn't get confused by the handoff, but TDQ_LOCK() is not unlocked here. The old thread's td_lock is left alone if it is already TDQ_LOCK() (notice that in this snippet, the various branches assert this to be true): /* * The lock pointer in an idle thread should never change. Reset it * to CAN_RUN as well. */ if (TD_IS_IDLETHREAD(td)) { MPASS(td->td_lock == TDQ_LOCKPTR(tdq)); TD_SET_CAN_RUN(td); } else if (TD_IS_RUNNING(td)) { MPASS(td->td_lock == TDQ_LOCKPTR(tdq)); ... However, in the other cases, this code is called: } else { /* This thread must be going to sleep. */ TDQ_LOCK(tdq); mtx = thread_lock_block(td); tdq_load_rem(tdq, td); } thread_lock_block() changes td_lock in the old thread to point to a dummy lock called the "block_lock" and then unlocks the old thread's td_lock. However, it returns the previous value of td_lock as 'mtx'. That is then passed to cpu_switch(). cpu_switch() restores td_lock in the old thread to 'mtx'. The effect of this is to temporarily assign td_lock of the old thread to block_lock while the thread finishes switching out, but to be able to release the old thread's associated td_lock in C rather than having to do it from cpu_switch(). The 'block_lock' is a special lock that is always locked and never unlocked. That causes any other threads that are trying to lock the old thread to spin until the old thread is finished switching out even after the old thread's td_lock has been released (since the new thread will spin on block_lock until cpu_switch() restores td_lock). > Later after cpu_switch is done, > > > lock_profile_obtain_lock_success > (&TDQ_LOCKPTR > (tdq)->lock_object > , > 0, 0, __FILE__ , > __LINE__ ); > > > is executed which locks the lock of the thread queue on the current > CPU which can be on a different CPU. I assume the new thread's td_lock > points to the current CPU's thread queue. As with the first hunk, this is not actually locking the lock, just pacifying the profiling code. The new thread's td_lock does indeed point to the current CPU's thread queue. This is known to be true because the new thread was chosen from the current CPU's thread queue. > Now it is clear that the mutex is unlocked by the same thread that locks it. Except not in this one case. :) In the case of a context switch, the old thread locks TDQ_LOCK(), sets td_lock to blocked_lock and drops its old td_lock, and changes the internals of TDQ_LOCK() so that it is now owned by the new thread. cpu_switch() is called which then restores td_lock in the old thread, switches the MD context (stack and registers, etc.). The new thread returns on its own stack, and it now owns the TDQ_LOCK() that was locked by the old thread. However, since the ownership was transferred, it can unlock TDQ_LOCK(). Note that if a running thread is preempted, it doesn't do the block_lock business, instead it just transfers ownership of the TDQ_LOCK() it already holds to the new thread and never explicitly unlocks that lock. -- John Baldwin From owner-freebsd-arch@FreeBSD.ORG Thu Sep 12 21:29:05 2013 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTP id 7F09912D; Thu, 12 Sep 2013 21:29:05 +0000 (UTC) (envelope-from dkandula@gmail.com) Received: from mail-we0-x22a.google.com (mail-we0-x22a.google.com [IPv6:2a00:1450:400c:c03::22a]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id DA5A928B4; Thu, 12 Sep 2013 21:29:04 +0000 (UTC) Received: by mail-we0-f170.google.com with SMTP id w62so374638wes.15 for ; Thu, 12 Sep 2013 14:29:03 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=+DGrusQIXx8Lk8PudgSXWs6/GOtCb7M4dFuUeS9PKhw=; b=zulVg3s153eCf0OazioARZyzdvOwb8R495oBo0FijRlOtzfrB4JtXCGNP2gj/UowPr yPBZOHG982vDahFBeKoLlkUGhDx00YPPuuEmsFapw93hVjtfilDAHwexyHisEm3nXfwh kS0IEIxOsTBNn37Ek+8X+DnGNPWq08NYrKLTK1+CTsofaAVhPF7Qg9qCPxSmj0f2k5+i yQTNYab7a8VVsxd50mMJK+8fPzMLddmbR49mVzU01DwHGOS7UQst477b3BKokfEyvUCL 905aySae4nZ1KGusuow805rovSi48DPr2uXfugFqsBP8jEqF2NntnxmnoHEhfUi/y5yw TV2Q== MIME-Version: 1.0 X-Received: by 10.180.211.111 with SMTP id nb15mr7474838wic.55.1379021343262; Thu, 12 Sep 2013 14:29:03 -0700 (PDT) Received: by 10.194.38.167 with HTTP; Thu, 12 Sep 2013 14:29:03 -0700 (PDT) In-Reply-To: <201309121710.01307.jhb@freebsd.org> References: <201309121710.01307.jhb@freebsd.org> Date: Thu, 12 Sep 2013 17:29:03 -0400 Message-ID: Subject: Re: Why do we need to acquire the current thread's lock before context switching? From: Dheeraj Kandula To: John Baldwin Content-Type: text/plain; charset=ISO-8859-1 X-Content-Filtered-By: Mailman/MimeDel 2.1.14 Cc: Alfred Perlstein , Svatopluk Kraus , "freebsd-arch@freebsd.org" X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 12 Sep 2013 21:29:05 -0000 Thanks John for the excellent explanation. That helps a lot. Dheeraj On Thu, Sep 12, 2013 at 5:10 PM, John Baldwin wrote: > On Thursday, September 12, 2013 4:00:56 pm Dheeraj Kandula wrote: > > Hey John, > > I think I get it now clearly. > > > > The td_lock of each thread actually points to the Thread Queue's lock on > > which it is present. i.e. run queue which may either be the real time > runq, > > timeshare runq or the idle runq. For sleep the td_lock points to the > > blocked_lock which is a global lock protecting the sleep queue I think. > > > > Before cpu_switch() is invoked, the old thread's td_lock is released as > > shown below: the code is from sched_switch of sched_ule.c > > > > lock_profile_release_lock< > http://nxr.netbsd.org/source/s?defs=lock_profile_release_lock&project=src-freebsd > > > > (&TDQ_LOCKPTR< > http://nxr.netbsd.org/source/xref/src-freebsd/sys/kern/sched_ule.c#TDQ_LOCKPTR > > > > (tdq)->lock_object< > http://nxr.netbsd.org/source/s?defs=lock_object&project=src-freebsd> > > ); > > > > TDQ_LOCKPTR < > http://nxr.netbsd.org/source/xref/src-freebsd/sys/kern/sched_ule.c#TDQ_LOCKPTR > >(tdq)->mtx_lock > > = > > (uintptr_t < > http://nxr.netbsd.org/source/s?defs=uintptr_t&project=src-freebsd>)newtd > > < > http://nxr.netbsd.org/source/xref/src-freebsd/sys/kern/sched_ule.c#newtd>; > > This is not releasing the td_lock of the old thread. TDQ_LOCK() is > td_lock of the new thread that is about to run, and the assignment > to mtx_lock is transferring ownership of TDQ_LOCK() from the old thread > to the new thread. The lock_profile calls fake an unlock / lock so the > profiling code doesn't get confused by the handoff, but TDQ_LOCK() is > not unlocked here. > > The old thread's td_lock is left alone if it is already TDQ_LOCK() > (notice that in this snippet, the various branches assert this to be true): > > /* > * The lock pointer in an idle thread should never change. Reset > it > * to CAN_RUN as well. > */ > if (TD_IS_IDLETHREAD(td)) { > MPASS(td->td_lock == TDQ_LOCKPTR(tdq)); > TD_SET_CAN_RUN(td); > } else if (TD_IS_RUNNING(td)) { > MPASS(td->td_lock == TDQ_LOCKPTR(tdq)); > ... > > However, in the other cases, this code is called: > > } else { > /* This thread must be going to sleep. */ > TDQ_LOCK(tdq); > mtx = thread_lock_block(td); > tdq_load_rem(tdq, td); > } > > thread_lock_block() changes td_lock in the old thread to point to a > dummy lock called the "block_lock" and then unlocks the old thread's > td_lock. However, it returns the previous value of td_lock as 'mtx'. > That is then passed to cpu_switch(). cpu_switch() restores td_lock > in the old thread to 'mtx'. > > The effect of this is to temporarily assign td_lock of the old thread > to block_lock while the thread finishes switching out, but to be able > to release the old thread's associated td_lock in C rather than having > to do it from cpu_switch(). The 'block_lock' is a special lock that > is always locked and never unlocked. That causes any other threads > that are trying to lock the old thread to spin until the old thread > is finished switching out even after the old thread's td_lock has > been released (since the new thread will spin on block_lock until > cpu_switch() restores td_lock). > > > Later after cpu_switch is done, > > > > > > lock_profile_obtain_lock_success > > < > http://nxr.netbsd.org/source/s?defs=lock_profile_obtain_lock_success&project=src-freebsd > >(&TDQ_LOCKPTR > > < > http://nxr.netbsd.org/source/xref/src-freebsd/sys/kern/sched_ule.c#TDQ_LOCKPTR > >(tdq)->lock_object > > , > > 0, 0, __FILE__ < > http://nxr.netbsd.org/source/s?defs=__FILE__&project=src-freebsd>, > > __LINE__ < > http://nxr.netbsd.org/source/s?defs=__LINE__&project=src-freebsd>); > > > > > > is executed which locks the lock of the thread queue on the current > > CPU which can be on a different CPU. I assume the new thread's td_lock > > points to the current CPU's thread queue. > > As with the first hunk, this is not actually locking the lock, just > pacifying the profiling code. The new thread's td_lock does indeed > point to the current CPU's thread queue. This is known to be true > because the new thread was chosen from the current CPU's thread > queue. > > > Now it is clear that the mutex is unlocked by the same thread that locks > it. > > Except not in this one case. :) In the case of a context switch, the > old thread locks TDQ_LOCK(), sets td_lock to blocked_lock and drops > its old td_lock, and changes the internals of TDQ_LOCK() so that it is > now owned by the new thread. cpu_switch() is called which then restores > td_lock in the old thread, switches the MD context (stack and registers, > etc.). The new thread returns on its own stack, and it now owns the > TDQ_LOCK() that was locked by the old thread. However, since the > ownership was transferred, it can unlock TDQ_LOCK(). > > Note that if a running thread is preempted, it doesn't do the block_lock > business, instead it just transfers ownership of the TDQ_LOCK() it > already holds to the new thread and never explicitly unlocks that lock. > > -- > John Baldwin > From owner-freebsd-arch@FreeBSD.ORG Fri Sep 13 04:10:05 2013 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTP id 59E323E2 for ; Fri, 13 Sep 2013 04:10:05 +0000 (UTC) (envelope-from julian@freebsd.org) Received: from vps1.elischer.org (vps1.elischer.org [204.109.63.16]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 1E2D123FD for ; Fri, 13 Sep 2013 04:10:04 +0000 (UTC) Received: from Julian-MBP3.local (ppp121-45-245-177.lns20.per2.internode.on.net [121.45.245.177]) (authenticated bits=0) by vps1.elischer.org (8.14.6/8.14.6) with ESMTP id r8D4A0XR085727 (version=TLSv1/SSLv3 cipher=DHE-RSA-CAMELLIA256-SHA bits=256 verify=NO); Thu, 12 Sep 2013 21:10:02 -0700 (PDT) (envelope-from julian@freebsd.org) Message-ID: <52329012.2050408@freebsd.org> Date: Fri, 13 Sep 2013 12:09:54 +0800 From: Julian Elischer User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.8; rv:17.0) Gecko/20130801 Thunderbird/17.0.8 MIME-Version: 1.0 To: Dheeraj Kandula Subject: Re: Why do we need to acquire the current thread's lock before context switching? References: <201309120824.52916.jhb@freebsd.org> In-Reply-To: Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 8bit Cc: "freebsd-arch@freebsd.org" X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 13 Sep 2013 04:10:05 -0000 On 9/13/13 4:44 AM, Dheeraj Kandula wrote: > # svn diff > Index: sys/sys/proc.h > =================================================================== > --- sys/sys/proc.h (revision 255488) > +++ sys/sys/proc.h (working copy) > @@ -197,12 +197,44 @@ > }; > > /* > + * Comments by: Svatopluk Kraus & John Baldwin > + * > + * Svatopluk Kraus' comment: > + * Think about td_lock like something what is lent by current thread > owner. If > + * a thread is running, it's owned by scheduler and td_lock points > + * to scheduler lock. If a thread is sleeping, it's owned by sleeping queue > + * and td_lock points to sleep queue lock. If a thread is contested, it's > + * owned by turnstile queue and td_lock points to turnstile queue lock. > And so > + * on. This way an owner can work with owned threads safely without giant > + * lock. The td_lock pointer is changed atomically, so it's safe. > + * > + * John Baldwin's comment: > + * For example: take a thread that is asleep on a sleep > + * queue. td_lock points to the relevant SC_LOCK() for the sleep queue > chain > + * in that case, so any other thread that wants to examine that thread's > + * state ends up locking the sleep queue while it examines that thread. In > + * particular, the thread that is doing a wakeup() can resume all of the > + * sleeping threads for a wait channel by holding the one SC_LOCK() for > that > + * wait channel since that will be td_lock for all those threads. > + * > + * In general mutexes are only unlocked by the thread that locks them, > + * and the td_lock of the old thread is unlocked during sched_switch(). > + * However, the old thread has to grab td_lock of the new thread during > + * sched_switch() and then hand it off to the new thread when it resumes. > + * This is why sched_throw() and sched_switch() in ULE directly assign > + * 'mtx_lock' of the run queue lock before calling cpu_throw() or > + * cpu_switch(). That gives the effect that the new thread resumes while > + * holding the lock pinted to by its td_lock. > + */ > +/* > * Kernel runnable context (thread). > * This is what is put to sleep and reactivated. > * Thread context. Processes may have multiple threads. > */ > struct thread { > - struct mtx *volatile td_lock; /* replaces sched lock */ > + struct mtx *volatile td_lock; /* replaces sched lock. Look at the comment > + * above for further details. > + */ > struct proc *td_proc; /* (*) Associated process. */ > TAILQ_ENTRY(thread) td_plist; /* (*) All threads in this proc. */ > TAILQ_ENTRY(thread) td_runq; /* (t) Run queue. */ > > > > On Thu, Sep 12, 2013 at 4:21 PM, Alfred Perlstein wrote: > >> Both these explanations are so great. Is there any way we can add this to >> proc.h or maybe document somewhere and then link to it from proc.h? >> >> Sent from my iPhone >> >> On Sep 12, 2013, at 5:24 AM, John Baldwin wrote: >> >>> On Thursday, September 12, 2013 7:16:20 am Dheeraj Kandula wrote: >>>> Thanks a lot Svatopluk for the clarification. Right after I replied to >>>> Alfred's mail, I realized that it can't be thread specific lock as it >>>> should also protect the scheduler variables. So if I understand it >> right, >>>> even though it is a mutex, it can be unlocked by another thread which is >>>> usually not the case with regular mutexes as the thread that locks it >> must >>>> unlock it unlike a binary semaphore. Isn't it? >>> It's less complicated than that. :) It is a mutex, but to expand on what >>> Svatopluk said with an example: take a thread that is asleep on a sleep >>> queue. td_lock points to the relevant SC_LOCK() for the sleep queue >> chain >>> in that case, so any other thread that wants to examine that thread's >>> state ends up locking the sleep queue while it examines that thread. In >>> particular, the thread that is doing a wakeup() can resume all of the >>> sleeping threads for a wait channel by holding the one SC_LOCK() for that >>> wait channel since that will be td_lock for all those threads. >>> >>> In general mutexes are only unlocked by the thread that locks them, >>> and the td_lock of the old thread is unlocked during sched_switch(). >>> However, the old thread has to grab td_lock of the new thread during >>> sched_switch() and then hand it off to the new thread when it resumes. >>> This is why sched_throw() and sched_switch() in ULE directly assign >>> 'mtx_lock' of the run queue lock before calling cpu_throw() or >>> cpu_switch(). That gives the effect that the new thread resumes while >>> holding the lock pinted to by its td_lock. ^^ typo.. fix before commit >>> >>>> Dheeraj >>>> >>>> >>>> On Thu, Sep 12, 2013 at 7:04 AM, Svatopluk Kraus >> wrote: >>>>> Think about td_lock like something what is lent by current thread >> owner. >>>>> If a thread is running, it's owned by scheduler and td_lock points >>>>> to scheduler lock. If a thread is sleeping, it's owned by sleeping >> queue >>>>> and td_lock points to sleep queue lock. If a thread is contested, it's >>>>> owned by turnstile queue and td_lock points to turnstile queue lock. >> And so >>>>> on. This way an owner can work with owned threads safely without giant >>>>> lock. The td_lock pointer is changed atomically, so it's safe. >>>>> >>>>> Svatopluk Kraus >>>>> >>>>> On Thu, Sep 12, 2013 at 12:48 PM, Dheeraj Kandula >> wrote: >>>>>> Thanks a lot Alfred for the clarification. So is the td_lock granular >> i.e. >>>>>> one separate lock for each thread but also used for protecting the >>>>>> scheduler variables or is it just one lock used by all threads and the >>>>>> scheduler as well. I will anyway go through the code that you >> suggested >>>>>> but >>>>>> just wanted to have a deeper understanding before I go about hunting >> in >>>>>> the >>>>>> code. >>>>>> >>>>>> Dheeraj >>>>>> >>>>>> >>>>>> On Thu, Sep 12, 2013 at 3:10 AM, Alfred Perlstein >> wrote: >>>>>>> On 9/11/13 2:39 PM, Dheeraj Kandula wrote: >>>>>>> >>>>>>>> Hey All, >>>>>>>> >>>>>>>> When the current thread is being context switched with a newly >> selected >>>>>>>> thread, why is the current thread's lock acquired before context >>>>>> switch – >>>>>>>> mi_switch() is invoked after thread_lock(td) is called. A thread at >> any >>>>>>>> time runs only on one of the cores of a CPU. Hence when it is being >>>>>>>> context >>>>>>>> switched it is added either to the real time runq or the timeshare >>>>>> runq or >>>>>>>> the idle runq with the lock still held or it is added to the sleep >>>>>> queue >>>>>>>> or >>>>>>>> the blocked queue. So this happens atomically even without the lock. >>>>>> Isn't >>>>>>>> it? Am I missing something here? I don't see any contention for the >>>>>> thread >>>>>>>> in order to demand a lock for the thread which will basically >> protect >>>>>> the >>>>>>>> contents of the thread structure for the thread. >>>>>>>> >>>>>>>> Dheeraj >>>>>>> The thread lock also happens to protect various scheduler variables: >>>>>>> >>>>>>> struct mtx *volatile td_lock; /* replaces sched lock */ >>>>>>> >>>>>>> see sys/kern/sched_ule.c on how the thread lock td_lock is changed >>>>>>> depending on what the thread is doing. >>>>>>> >>>>>>> -- >>>>>>> Alfred Perlstein >>>>>> _______________________________________________ >>>>>> freebsd-arch@freebsd.org mailing list >>>>>> http://lists.freebsd.org/mailman/listinfo/freebsd-arch >>>>>> To unsubscribe, send any mail to " >> freebsd-arch-unsubscribe@freebsd.org" >>>> _______________________________________________ >>>> freebsd-arch@freebsd.org mailing list >>>> http://lists.freebsd.org/mailman/listinfo/freebsd-arch >>>> To unsubscribe, send any mail to "freebsd-arch-unsubscribe@freebsd.org" >>> -- >>> John Baldwin >>> _______________________________________________ >>> freebsd-arch@freebsd.org mailing list >>> http://lists.freebsd.org/mailman/listinfo/freebsd-arch >>> To unsubscribe, send any mail to "freebsd-arch-unsubscribe@freebsd.org" >>> > _______________________________________________ > freebsd-arch@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-arch > To unsubscribe, send any mail to "freebsd-arch-unsubscribe@freebsd.org" > > > From owner-freebsd-arch@FreeBSD.ORG Fri Sep 13 13:56:02 2013 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTP id A81F9DBB; Fri, 13 Sep 2013 13:56:02 +0000 (UTC) (envelope-from dkandula@gmail.com) Received: from mail-wi0-x22a.google.com (mail-wi0-x22a.google.com [IPv6:2a00:1450:400c:c05::22a]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id DDB8A2467; Fri, 13 Sep 2013 13:56:01 +0000 (UTC) Received: by mail-wi0-f170.google.com with SMTP id cb5so1047049wib.3 for ; Fri, 13 Sep 2013 06:56:00 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=V/6w+2A3kj0W8nsj+pRc88Zeqq3yqpepweCx6Lf1MLQ=; b=QX66MR3mFRM1BJeIwB/D+0fFrwSSsyDJpJeXYxnC/DZoQc4YN0AMLcOSAa+kMtHLfe pnPm8o7A5XFQueY+OjoP00pZgnFlSLpX2Kw3WZDfyWAPMbWRJ/fe8B2VvICFYeHphb/7 fXANvIHl9441C/KTY66C1XcSbBMGmoHEvUJWG/tXHJmKaC2arVo1i2DMs8vc/mfYldGQ LQOC86HckB9HEAwKppaTCBWPWsJ/3x4H6uCNikhxUgdiHifrgJ1StOZPtsL5DsFrFglF 8r6HkkUmMhYgwCyuXxaBOE2We9cTjIMDm7j3FP4KeKMJujHV+ayZNpNAjisU14tG1tju ShWw== MIME-Version: 1.0 X-Received: by 10.180.13.174 with SMTP id i14mr2659371wic.49.1379080560197; Fri, 13 Sep 2013 06:56:00 -0700 (PDT) Received: by 10.194.38.167 with HTTP; Fri, 13 Sep 2013 06:56:00 -0700 (PDT) In-Reply-To: <52329012.2050408@freebsd.org> References: <201309120824.52916.jhb@freebsd.org> <52329012.2050408@freebsd.org> Date: Fri, 13 Sep 2013 09:56:00 -0400 Message-ID: Subject: Re: Why do we need to acquire the current thread's lock before context switching? From: Dheeraj Kandula To: Julian Elischer Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.14 Cc: "freebsd-arch@freebsd.org" X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 13 Sep 2013 13:56:02 -0000 Please find below the updated diff with the type fixed. # svn diff Index: sys/sys/proc.h =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D --- sys/sys/proc.h (revision 255514) +++ sys/sys/proc.h (working copy) @@ -197,12 +197,44 @@ }; /* + * Comments by: Svatopluk Kraus & John Baldwin + * + * Svatopluk Kraus' comment: + * Think about td_lock like something what is lent by current thread owner. If + * a thread is running, it's owned by scheduler and td_lock points + * to scheduler lock. If a thread is sleeping, it's owned by sleeping queu= e + * and td_lock points to sleep queue lock. If a thread is contested, it's + * owned by turnstile queue and td_lock points to turnstile queue lock. And so + * on. This way an owner can work with owned threads safely without giant + * lock. The td_lock pointer is changed atomically, so it's safe. + * + * John Baldwin's comment: + * For example: take a thread that is asleep on a sleep + * queue. td_lock points to the relevant SC_LOCK() for the sleep queue chain + * in that case, so any other thread that wants to examine that thread's + * state ends up locking the sleep queue while it examines that thread. I= n + * particular, the thread that is doing a wakeup() can resume all of the + * sleeping threads for a wait channel by holding the one SC_LOCK() for that + * wait channel since that will be td_lock for all those threads. + * + * In general mutexes are only unlocked by the thread that locks them, + * and the td_lock of the old thread is unlocked during sched_switch(). + * However, the old thread has to grab td_lock of the new thread during + * sched_switch() and then hand it off to the new thread when it resumes. + * This is why sched_throw() and sched_switch() in ULE directly assign + * 'mtx_lock' of the run queue lock before calling cpu_throw() or + * cpu_switch(). That gives the effect that the new thread resumes while + * holding the lock pointed to by its td_lock. + */ +/* * Kernel runnable context (thread). * This is what is put to sleep and reactivated. * Thread context. Processes may have multiple threads. */ struct thread { - struct mtx *volatile td_lock; /* replaces sched lock */ + struct mtx *volatile td_lock; /* replaces sched lock. Look at the comment + * above for further details. + */ struct proc *td_proc; /* (*) Associated process. */ TAILQ_ENTRY(thread) td_plist; /* (*) All threads in this proc. */ TAILQ_ENTRY(thread) td_runq; /* (t) Run queue. */ On Fri, Sep 13, 2013 at 12:09 AM, Julian Elischer wrote= : > On 9/13/13 4:44 AM, Dheeraj Kandula wrote: > >> # svn diff >> Index: sys/sys/proc.h >> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D**=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D**=3D=3D=3D=3D=3D=3D=3D >> --- sys/sys/proc.h (revision 255488) >> +++ sys/sys/proc.h (working copy) >> @@ -197,12 +197,44 @@ >> }; >> >> /* >> + * Comments by: Svatopluk Kraus & John Baldwin >> + * >> + * Svatopluk Kraus' comment: >> + * Think about td_lock like something what is lent by current thread >> owner. If >> + * a thread is running, it's owned by scheduler and td_lock points >> + * to scheduler lock. If a thread is sleeping, it's owned by sleeping >> queue >> + * and td_lock points to sleep queue lock. If a thread is contested, it= 's >> + * owned by turnstile queue and td_lock points to turnstile queue lock. >> And so >> + * on. This way an owner can work with owned threads safely without gia= nt >> + * lock. The td_lock pointer is changed atomically, so it's safe. >> + * >> + * John Baldwin's comment: >> + * For example: take a thread that is asleep on a sleep >> + * queue. td_lock points to the relevant SC_LOCK() for the sleep queue >> chain >> + * in that case, so any other thread that wants to examine that thread'= s >> + * state ends up locking the sleep queue while it examines that thread. >> In >> + * particular, the thread that is doing a wakeup() can resume all of th= e >> + * sleeping threads for a wait channel by holding the one SC_LOCK() for >> that >> + * wait channel since that will be td_lock for all those threads. >> + * >> + * In general mutexes are only unlocked by the thread that locks them, >> + * and the td_lock of the old thread is unlocked during sched_switch(). >> + * However, the old thread has to grab td_lock of the new thread during >> + * sched_switch() and then hand it off to the new thread when it resume= s. >> + * This is why sched_throw() and sched_switch() in ULE directly assign >> + * 'mtx_lock' of the run queue lock before calling cpu_throw() or >> + * cpu_switch(). That gives the effect that the new thread resumes whi= le >> + * holding the lock pinted to by its td_lock. >> + */ >> +/* >> * Kernel runnable context (thread). >> * This is what is put to sleep and reactivated. >> * Thread context. Processes may have multiple threads. >> */ >> struct thread { >> - struct mtx *volatile td_lock; /* replaces sched lock */ >> + struct mtx *volatile td_lock; /* replaces sched lock. Look at the >> comment >> + * above for further details. >> + */ >> struct proc *td_proc; /* (*) Associated process. */ >> TAILQ_ENTRY(thread) td_plist; /* (*) All threads in this proc. */ >> TAILQ_ENTRY(thread) td_runq; /* (t) Run queue. */ >> >> >> >> On Thu, Sep 12, 2013 at 4:21 PM, Alfred Perlstein wrote: >> >> Both these explanations are so great. Is there any way we can add this = to >>> proc.h or maybe document somewhere and then link to it from proc.h? >>> >>> Sent from my iPhone >>> >>> On Sep 12, 2013, at 5:24 AM, John Baldwin wrote: >>> >>> On Thursday, September 12, 2013 7:16:20 am Dheeraj Kandula wrote: >>>> >>>>> Thanks a lot Svatopluk for the clarification. Right after I replied t= o >>>>> Alfred's mail, I realized that it can't be thread specific lock as it >>>>> should also protect the scheduler variables. So if I understand it >>>>> >>>> right, >>> >>>> even though it is a mutex, it can be unlocked by another thread which = is >>>>> usually not the case with regular mutexes as the thread that locks it >>>>> >>>> must >>> >>>> unlock it unlike a binary semaphore. Isn't it? >>>>> >>>> It's less complicated than that. :) It is a mutex, but to expand on >>>> what >>>> Svatopluk said with an example: take a thread that is asleep on a slee= p >>>> queue. td_lock points to the relevant SC_LOCK() for the sleep queue >>>> >>> chain >>> >>>> in that case, so any other thread that wants to examine that thread's >>>> state ends up locking the sleep queue while it examines that thread. = In >>>> particular, the thread that is doing a wakeup() can resume all of the >>>> sleeping threads for a wait channel by holding the one SC_LOCK() for >>>> that >>>> wait channel since that will be td_lock for all those threads. >>>> >>>> In general mutexes are only unlocked by the thread that locks them, >>>> and the td_lock of the old thread is unlocked during sched_switch(). >>>> However, the old thread has to grab td_lock of the new thread during >>>> sched_switch() and then hand it off to the new thread when it resumes. >>>> This is why sched_throw() and sched_switch() in ULE directly assign >>>> 'mtx_lock' of the run queue lock before calling cpu_throw() or >>>> cpu_switch(). That gives the effect that the new thread resumes while >>>> holding the lock pinted to by its td_lock. >>>> >>> ^^ typo.. fix before commit > > >>>> Dheeraj >>>>> >>>>> >>>>> On Thu, Sep 12, 2013 at 7:04 AM, Svatopluk Kraus >>>>> >>>> wrote: >>> >>>> Think about td_lock like something what is lent by current thread >>>>>> >>>>> owner. >>> >>>> If a thread is running, it's owned by scheduler and td_lock points >>>>>> to scheduler lock. If a thread is sleeping, it's owned by sleeping >>>>>> >>>>> queue >>> >>>> and td_lock points to sleep queue lock. If a thread is contested, it's >>>>>> owned by turnstile queue and td_lock points to turnstile queue lock. >>>>>> >>>>> And so >>> >>>> on. This way an owner can work with owned threads safely without giant >>>>>> lock. The td_lock pointer is changed atomically, so it's safe. >>>>>> >>>>>> Svatopluk Kraus >>>>>> >>>>>> On Thu, Sep 12, 2013 at 12:48 PM, Dheeraj Kandula >>>>> >>>>> wrote: >>>> >>>>> Thanks a lot Alfred for the clarification. So is the td_lock granular >>>>>>> >>>>>> i.e. >>> >>>> one separate lock for each thread but also used for protecting the >>>>>>> scheduler variables or is it just one lock used by all threads and >>>>>>> the >>>>>>> scheduler as well. I will anyway go through the code that you >>>>>>> >>>>>> suggested >>> >>>> but >>>>>>> just wanted to have a deeper understanding before I go about huntin= g >>>>>>> >>>>>> in >>> >>>> the >>>>>>> code. >>>>>>> >>>>>>> Dheeraj >>>>>>> >>>>>>> >>>>>>> On Thu, Sep 12, 2013 at 3:10 AM, Alfred Perlstein >>>>>>> >>>>>> wrote: >>> >>>> On 9/11/13 2:39 PM, Dheeraj Kandula wrote: >>>>>>>> >>>>>>>> Hey All, >>>>>>>>> >>>>>>>>> When the current thread is being context switched with a newly >>>>>>>>> >>>>>>>> selected >>> >>>> thread, why is the current thread's lock acquired before context >>>>>>>>> >>>>>>>> switch =96 >>>>>>> >>>>>>>> mi_switch() is invoked after thread_lock(td) is called. A thread a= t >>>>>>>>> >>>>>>>> any >>> >>>> time runs only on one of the cores of a CPU. Hence when it is being >>>>>>>>> context >>>>>>>>> switched it is added either to the real time runq or the timeshar= e >>>>>>>>> >>>>>>>> runq or >>>>>>> >>>>>>>> the idle runq with the lock still held or it is added to the sleep >>>>>>>>> >>>>>>>> queue >>>>>>> >>>>>>>> or >>>>>>>>> the blocked queue. So this happens atomically even without the >>>>>>>>> lock. >>>>>>>>> >>>>>>>> Isn't >>>>>>> >>>>>>>> it? Am I missing something here? I don't see any contention for th= e >>>>>>>>> >>>>>>>> thread >>>>>>> >>>>>>>> in order to demand a lock for the thread which will basically >>>>>>>>> >>>>>>>> protect >>> >>>> the >>>>>>> >>>>>>>> contents of the thread structure for the thread. >>>>>>>>> >>>>>>>>> Dheeraj >>>>>>>>> >>>>>>>> The thread lock also happens to protect various scheduler variable= s: >>>>>>>> >>>>>>>> struct mtx *volatile td_lock; /* replaces sched lock = */ >>>>>>>> >>>>>>>> see sys/kern/sched_ule.c on how the thread lock td_lock is changed >>>>>>>> depending on what the thread is doing. >>>>>>>> >>>>>>>> -- >>>>>>>> Alfred Perlstein >>>>>>>> >>>>>>> ______________________________**_________________ >>>>>>> freebsd-arch@freebsd.org mailing list >>>>>>> http://lists.freebsd.org/**mailman/listinfo/freebsd-arch >>>>>>> To unsubscribe, send any mail to " >>>>>>> >>>>>> freebsd-arch-unsubscribe@**freebsd.org >>> " >>> >>>> ______________________________**_________________ >>>>> freebsd-arch@freebsd.org mailing list >>>>> http://lists.freebsd.org/**mailman/listinfo/freebsd-arch >>>>> To unsubscribe, send any mail to "freebsd-arch-unsubscribe@** >>>>> freebsd.org " >>>>> >>>> -- >>>> John Baldwin >>>> ______________________________**_________________ >>>> freebsd-arch@freebsd.org mailing list >>>> http://lists.freebsd.org/**mailman/listinfo/freebsd-arch >>>> To unsubscribe, send any mail to "freebsd-arch-unsubscribe@** >>>> freebsd.org " >>>> >>>> ______________________________**_________________ >> freebsd-arch@freebsd.org mailing list >> http://lists.freebsd.org/**mailman/listinfo/freebsd-arch >> To unsubscribe, send any mail to "freebsd-arch-unsubscribe@**freebsd.org= >> " >> >> >> >> > From owner-freebsd-arch@FreeBSD.ORG Fri Sep 13 15:08:25 2013 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTP id E57EE4D9; Fri, 13 Sep 2013 15:08:24 +0000 (UTC) (envelope-from gnn@neville-neil.com) Received: from vps.hungerhost.com (vps.hungerhost.com [216.38.53.176]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id ADAE52829; Fri, 13 Sep 2013 15:08:24 +0000 (UTC) Received: from [209.249.190.124] (port=63489 helo=gnnmac.hudson-trading.com) by vps.hungerhost.com with esmtpsa (TLSv1:AES128-SHA:128) (Exim 4.80.1) (envelope-from ) id 1VKUys-0006KF-N7; Fri, 13 Sep 2013 11:08:23 -0400 Content-Type: multipart/signed; boundary="Apple-Mail=_C7AE7CBE-E315-44DA-B15B-4A00DFC704F3"; protocol="application/pgp-signature"; micalg=pgp-sha1 Mime-Version: 1.0 (Mac OS X Mail 6.5 \(1508\)) Subject: Re: Network stack changes From: George Neville-Neil In-Reply-To: Date: Fri, 13 Sep 2013 11:08:27 -0400 Message-Id: <6BDA4619-783C-433E-9819-A7EAA0BD3299@neville-neil.com> References: <521E41CB.30700@yandex-team.ru> To: Adrian Chadd X-Mailer: Apple Mail (2.1508) X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - vps.hungerhost.com X-AntiAbuse: Original Domain - freebsd.org X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12] X-AntiAbuse: Sender Address Domain - neville-neil.com X-Get-Message-Sender-Via: vps.hungerhost.com: authenticated_id: gnn@neville-neil.com Cc: "Alexander V. Chernikov" , Luigi Rizzo , Andre Oppermann , "freebsd-hackers@freebsd.org" , "freebsd-arch@freebsd.org" , "Andrey V. Elsukov" , Gleb Smirnoff , FreeBSD Net X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 13 Sep 2013 15:08:25 -0000 --Apple-Mail=_C7AE7CBE-E315-44DA-B15B-4A00DFC704F3 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=us-ascii On Aug 29, 2013, at 7:49 , Adrian Chadd wrote: > Hi, >=20 > There's a lot of good stuff to review here, thanks! >=20 > Yes, the ixgbe RX lock needs to die in a fire. It's kinda pointless to = keep > locking things like that on a per-packet basis. We should be able to = do > this in a cleaner way - we can defer RX into a CPU pinned taskqueue = and > convert the interrupt handler to a fast handler that just schedules = that > taskqueue. We can ignore the ithread entirely here. >=20 > What do you think? >=20 > Totally pie in the sky handwaving at this point: >=20 > * create an array of mbuf pointers for completed mbufs; > * populate the mbuf array; > * pass the array up to ether_demux(). >=20 > For vlan handling, it may end up populating its own list of mbufs to = push > up to ether_demux(). So maybe we should extend the API to have a = bitmap of > packets to actually handle from the array, so we can pass up a larger = array > of mbufs, note which ones are for the destination and then the upcall = can > mark which frames its consumed. >=20 > I specifically wonder how much work/benefit we may see by doing: >=20 > * batching packets into lists so various steps can batch process = things > rather than run to completion; > * batching the processing of a list of frames under a single lock = instance > - eg, if the forwarding code could do the forwarding lookup for 'n' = packets > under a single lock, then pass that list of frames up to = inet_pfil_hook() > to do the work under one lock, etc, etc. >=20 > Here, the processing would look less like "grab lock and process to > completion" and more like "mark and sweep" - ie, we have a list of = frames > that we mark as needing processing and mark as having been processed = at > each layer, so we know where to next dispatch them. >=20 One quick note here. Every time you increase batching you may increase = bandwidth but you will also increase per packet latency for the last packet in a = batch. That is fine so long as we remember that and that this is a tuning knob to balance the two. > I still have some tool coding to do with PMC before I even think about > tinkering with this as I'd like to measure stuff like per-packet = latency as > well as top-level processing overhead (ie, CPU_CLK_UNHALTED.THREAD_P / > lagg0 TX bytes/pkts, RX bytes/pkts, NIC interrupts on that core, etc.) >=20 This would be very useful in identifying the actual hot spots, and would = be helpful to anyone who can generate a decent stream of packets with, say, an = IXIA. Best, George --Apple-Mail=_C7AE7CBE-E315-44DA-B15B-4A00DFC704F3 Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename=signature.asc Content-Type: application/pgp-signature; name=signature.asc Content-Description: Message signed with OpenPGP using GPGMail -----BEGIN PGP SIGNATURE----- Comment: GPGTools - http://gpgtools.org iEYEARECAAYFAlIzKmsACgkQYdh2wUQKM9Lk2QCeLeRhFPb5zHPhQ4hHJ+H/JXWv OR0AoMDJ9iHjwtGg4DblcC0ZSmxt/noE =gAUE -----END PGP SIGNATURE----- --Apple-Mail=_C7AE7CBE-E315-44DA-B15B-4A00DFC704F3-- From owner-freebsd-arch@FreeBSD.ORG Fri Sep 13 22:44:58 2013 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTP id BA466CF7; Fri, 13 Sep 2013 22:44:58 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from esa-annu.net.uoguelph.ca (esa-annu.mail.uoguelph.ca [131.104.91.36]) by mx1.freebsd.org (Postfix) with ESMTP id F0B502CB0; Fri, 13 Sep 2013 22:44:57 +0000 (UTC) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AqEEAMaUM1KDaFve/2dsb2JhbABbhBGDKr1RgTN0giUBAQQBDhVCFBsYAgINGQJZBhOHfQanYJFpgSmOFDQHgmmBNQOpboNAIIFu X-IronPort-AV: E=Sophos;i="4.90,901,1371096000"; d="scan'208";a="51784977" Received: from muskoka.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca) ([131.104.91.222]) by esa-annu.net.uoguelph.ca with ESMTP; 13 Sep 2013 18:43:23 -0400 Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1]) by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id 11462B3F45; Fri, 13 Sep 2013 18:43:23 -0400 (EDT) Date: Fri, 13 Sep 2013 18:43:23 -0400 (EDT) From: Rick Macklem To: George Neville-Neil Message-ID: <221093226.23439826.1379112203059.JavaMail.root@uoguelph.ca> In-Reply-To: <6BDA4619-783C-433E-9819-A7EAA0BD3299@neville-neil.com> Subject: Re: Network stack changes MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Originating-IP: [172.17.91.203] X-Mailer: Zimbra 7.2.1_GA_2790 (ZimbraWebClient - FF3.0 (Win)/7.2.1_GA_2790) X-Mailman-Approved-At: Fri, 13 Sep 2013 23:03:21 +0000 Cc: "Alexander V. Chernikov" , Luigi Rizzo , Andre Oppermann , freebsd-hackers@freebsd.org, FreeBSD Net , Adrian Chadd , "Andrey V. Elsukov" , Gleb Smirnoff , freebsd-arch@freebsd.org X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 13 Sep 2013 22:44:58 -0000 George Neville-Neil wrote: > > On Aug 29, 2013, at 7:49 , Adrian Chadd wrote: > > > Hi, > > > > There's a lot of good stuff to review here, thanks! > > > > Yes, the ixgbe RX lock needs to die in a fire. It's kinda pointless > > to keep > > locking things like that on a per-packet basis. We should be able > > to do > > this in a cleaner way - we can defer RX into a CPU pinned taskqueue > > and > > convert the interrupt handler to a fast handler that just schedules > > that > > taskqueue. We can ignore the ithread entirely here. > > > > What do you think? > > > > Totally pie in the sky handwaving at this point: > > > > * create an array of mbuf pointers for completed mbufs; > > * populate the mbuf array; > > * pass the array up to ether_demux(). > > > > For vlan handling, it may end up populating its own list of mbufs > > to push > > up to ether_demux(). So maybe we should extend the API to have a > > bitmap of > > packets to actually handle from the array, so we can pass up a > > larger array > > of mbufs, note which ones are for the destination and then the > > upcall can > > mark which frames its consumed. > > > > I specifically wonder how much work/benefit we may see by doing: > > > > * batching packets into lists so various steps can batch process > > things > > rather than run to completion; > > * batching the processing of a list of frames under a single lock > > instance > > - eg, if the forwarding code could do the forwarding lookup for 'n' > > packets > > under a single lock, then pass that list of frames up to > > inet_pfil_hook() > > to do the work under one lock, etc, etc. > > > > Here, the processing would look less like "grab lock and process to > > completion" and more like "mark and sweep" - ie, we have a list of > > frames > > that we mark as needing processing and mark as having been > > processed at > > each layer, so we know where to next dispatch them. > > > > One quick note here. Every time you increase batching you may > increase bandwidth > but you will also increase per packet latency for the last packet in > a batch. > That is fine so long as we remember that and that this is a tuning > knob > to balance the two. > And any time you increase latency, that will have a negative impact on NFS performance. NFS RPCs are usually small messages (except Write requests and Read replies) and the RTT for these (mostly small, bidirectional) messages can have a significant impact on NFS perf. rick > > I still have some tool coding to do with PMC before I even think > > about > > tinkering with this as I'd like to measure stuff like per-packet > > latency as > > well as top-level processing overhead (ie, > > CPU_CLK_UNHALTED.THREAD_P / > > lagg0 TX bytes/pkts, RX bytes/pkts, NIC interrupts on that core, > > etc.) > > > > This would be very useful in identifying the actual hot spots, and > would be helpful > to anyone who can generate a decent stream of packets with, say, an > IXIA. > > Best, > George > > > From owner-freebsd-arch@FreeBSD.ORG Fri Sep 13 23:50:56 2013 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTP id 7F75BF5D; Fri, 13 Sep 2013 23:50:56 +0000 (UTC) (envelope-from sfourman@gmail.com) Received: from mail-vc0-x230.google.com (mail-vc0-x230.google.com [IPv6:2607:f8b0:400c:c03::230]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id AE58024DF; Fri, 13 Sep 2013 23:50:55 +0000 (UTC) Received: by mail-vc0-f176.google.com with SMTP id lf11so1477854vcb.35 for ; Fri, 13 Sep 2013 16:50:54 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=v+1K+CJy8FST36tLGRI1CopeWdhZIx7Suv24TJlXZrM=; b=g/b01xY6i+diDvG6LVOFtHGVgdT7+E8zNQUgDXY1AsBTKvepoYfhQ9v0/fH30G3+5q uZyXq/Yp2p4lcKeGcsGTTME0a9BexzNatw/Unmp5HbFbDPsgKvILeN0xrcyUhCb7Y7Mh /aZfU+E8w8MiutITXjHfnmVI1G/i2+iVmKB7pkaOPGl5nxjrJaeAiFWyC66ZuZOPpJTQ msGuo9cwQ9V2gd6iiuld+7+PYWq8dNIinCYlgKgLi3B9e6mvFFmCTVe0NF0j91fYRgRh 2ThzkZb+tm88RJ+GbzxziK6o5SeJM/+jY9FdZmvuRJyIwcezz+6yg9t5vxj9TyN2DkuF 2Srg== MIME-Version: 1.0 X-Received: by 10.58.137.167 with SMTP id qj7mr14239382veb.1.1379116254223; Fri, 13 Sep 2013 16:50:54 -0700 (PDT) Received: by 10.220.96.78 with HTTP; Fri, 13 Sep 2013 16:50:54 -0700 (PDT) In-Reply-To: <221093226.23439826.1379112203059.JavaMail.root@uoguelph.ca> References: <6BDA4619-783C-433E-9819-A7EAA0BD3299@neville-neil.com> <221093226.23439826.1379112203059.JavaMail.root@uoguelph.ca> Date: Fri, 13 Sep 2013 19:50:54 -0400 Message-ID: Subject: Re: Network stack changes From: "Sam Fourman Jr." To: Rick Macklem X-Mailman-Approved-At: Sat, 14 Sep 2013 03:23:55 +0000 Content-Type: text/plain; charset=ISO-8859-1 X-Content-Filtered-By: Mailman/MimeDel 2.1.14 Cc: "Alexander V. Chernikov" , Luigi Rizzo , Andre Oppermann , freebsd-hackers@freebsd.org, George Neville-Neil , freebsd-arch@freebsd.org, Adrian Chadd , "Andrey V. Elsukov" , Gleb Smirnoff , FreeBSD Net X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 13 Sep 2013 23:50:56 -0000 > > And any time you increase latency, that will have a negative impact on > NFS performance. NFS RPCs are usually small messages (except Write requests > and Read replies) and the RTT for these (mostly small, bidirectional) > messages can have a significant impact on NFS perf. > > rick > > this may be a bit off topic but not much... I have wondered with all of the new tcp algorithms http://freebsdfoundation.blogspot.com/2011/03/summary-of-five-new-tcp-congestion.html what algorithm is best suited for NFS over gigabit Ethernet, say FreeBSD to FreeBSD. and further more would a NFS optimized tcp algorithm be useful? Sam Fourman Jr. From owner-freebsd-arch@FreeBSD.ORG Sat Sep 14 01:39:10 2013 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTP id 588CE92D; Sat, 14 Sep 2013 01:39:10 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from esa-annu.net.uoguelph.ca (esa-annu.mail.uoguelph.ca [131.104.91.36]) by mx1.freebsd.org (Postfix) with ESMTP id 8B6B92916; Sat, 14 Sep 2013 01:39:08 +0000 (UTC) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AqIEAP+9M1KDaFve/2dsb2JhbABbgz9Sgyq9U4E1dIIlAQEEASNWBRYYAgINGQJZBgqIBgYMp2aRc4EpjhQ0B4JpgTUDhRWWbY1sg0AggW4 X-IronPort-AV: E=Sophos;i="4.90,902,1371096000"; d="scan'208";a="51799604" Received: from muskoka.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca) ([131.104.91.222]) by esa-annu.net.uoguelph.ca with ESMTP; 13 Sep 2013 21:38:50 -0400 Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1]) by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id 67F6EB404B; Fri, 13 Sep 2013 21:38:50 -0400 (EDT) Date: Fri, 13 Sep 2013 21:38:50 -0400 (EDT) From: Rick Macklem To: "Sam Fourman Jr." Message-ID: <1087948919.23486338.1379122730412.JavaMail.root@uoguelph.ca> In-Reply-To: Subject: Re: Network stack changes MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Originating-IP: [172.17.91.202] X-Mailer: Zimbra 7.2.1_GA_2790 (ZimbraWebClient - FF3.0 (Win)/7.2.1_GA_2790) X-Mailman-Approved-At: Sat, 14 Sep 2013 04:33:55 +0000 Cc: "Alexander V. Chernikov" , Luigi Rizzo , Andre Oppermann , freebsd-hackers@freebsd.org, George Neville-Neil , freebsd-arch@freebsd.org, Adrian Chadd , "Andrey V. Elsukov" , Gleb Smirnoff , FreeBSD Net X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 14 Sep 2013 01:39:10 -0000 Sam Fourman Jr. wrote: > > > > > And any time you increase latency, that will have a negative impact > > on > > NFS performance. NFS RPCs are usually small messages (except Write > > requests > > and Read replies) and the RTT for these (mostly small, > > bidirectional) > > messages can have a significant impact on NFS perf. > > > > rick > > > > > this may be a bit off topic but not much... I have wondered with all > of the > new > tcp algorithms > http://freebsdfoundation.blogspot.com/2011/03/summary-of-five-new-tcp-congestion.html > > what algorithm is best suited for NFS over gigabit Ethernet, say > FreeBSD to > FreeBSD. > and further more would a NFS optimized tcp algorithm be useful? > I have no idea what effect they might have. NFS traffic is quite different than streaming or bulk data transfer. I think this might make a nice research project for someone. rick > Sam Fourman Jr. > From owner-freebsd-arch@FreeBSD.ORG Sat Sep 14 08:59:14 2013 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTP id 77B8030A; Sat, 14 Sep 2013 08:59:14 +0000 (UTC) (envelope-from anshukla@juniper.net) Received: from tx2outboundpool.messaging.microsoft.com (tx2ehsobe002.messaging.microsoft.com [65.55.88.12]) (using TLSv1 with cipher AES128-SHA (128/128 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 2D2842A2E; Sat, 14 Sep 2013 08:59:13 +0000 (UTC) Received: from mail101-tx2-R.bigfish.com (10.9.14.249) by TX2EHSOBE011.bigfish.com (10.9.40.31) with Microsoft SMTP Server id 14.1.225.22; Sat, 14 Sep 2013 08:44:06 +0000 Received: from mail101-tx2 (localhost [127.0.0.1]) by mail101-tx2-R.bigfish.com (Postfix) with ESMTP id 087621401BE; Sat, 14 Sep 2013 08:44:06 +0000 (UTC) X-Forefront-Antispam-Report: CIP:157.56.240.101; KIP:(null); UIP:(null); IPV:NLI; H:BL2PRD0510HT002.namprd05.prod.outlook.com; RD:none; EFVD:NLI X-SpamScore: -1 X-BigFish: VPS-1(zz4015Izz1202hzzz2fh2a8h668h839h944hd25he5bhf0ah1220h1288h12a5h12a9h12bdh137ah13b6h1441h1504h1537h153bh162dh1631h1758h18e1h1946h19b5h19ceh1ad9h1b0ah1d0ch1d2eh1d3fh1dc1h1dfeh1dffh1e1dh1155h) Received-SPF: pass (mail101-tx2: domain of juniper.net designates 157.56.240.101 as permitted sender) client-ip=157.56.240.101; envelope-from=anshukla@juniper.net; helo=BL2PRD0510HT002.namprd05.prod.outlook.com ; .outlook.com ; X-Forefront-Antispam-Report-Untrusted: SFV:NSPM; SFS:(199002)(189002)(164054003)(59766001)(83322001)(76482001)(56776001)(47736001)(81816001)(74366001)(83072001)(81542001)(76796001)(77982001)(56816003)(54316002)(81686001)(74706001)(80976001)(74876001)(79102001)(76786001)(63696002)(82746002)(4396001)(54356001)(46102001)(77096001)(53806001)(33656001)(74662001)(81342001)(36756003)(50986001)(49866001)(80022001)(65816001)(69226001)(47976001)(47446002)(51856001)(66066001)(76176001)(74502001)(31966008)(83716001); DIR:OUT; SFP:; SCL:1; SRVR:BY2PR05MB126; H:BY2PR05MB111.namprd05.prod.outlook.com; CLIP:66.129.224.53; FPR:; RD:InfoNoRecords; A:1; MX:1; LANG:en; Received: from mail101-tx2 (localhost.localdomain [127.0.0.1]) by mail101-tx2 (MessageSwitch) id 137914824531622_17646; Sat, 14 Sep 2013 08:44:05 +0000 (UTC) Received: from TX2EHSMHS008.bigfish.com (unknown [10.9.14.226]) by mail101-tx2.bigfish.com (Postfix) with ESMTP id 031FB1E0050; Sat, 14 Sep 2013 08:44:05 +0000 (UTC) Received: from BL2PRD0510HT002.namprd05.prod.outlook.com (157.56.240.101) by TX2EHSMHS008.bigfish.com (10.9.99.108) with Microsoft SMTP Server (TLS) id 14.16.227.3; Sat, 14 Sep 2013 08:44:04 +0000 Received: from BY2PR05MB126.namprd05.prod.outlook.com (10.242.38.22) by BL2PRD0510HT002.namprd05.prod.outlook.com (10.255.100.37) with Microsoft SMTP Server (TLS) id 14.16.353.4; Sat, 14 Sep 2013 08:44:03 +0000 Received: from BY2PR05MB111.namprd05.prod.outlook.com (10.242.38.26) by BY2PR05MB126.namprd05.prod.outlook.com (10.242.38.22) with Microsoft SMTP Server (TLS) id 15.0.775.9; Sat, 14 Sep 2013 08:44:01 +0000 Received: from BY2PR05MB111.namprd05.prod.outlook.com ([169.254.6.196]) by BY2PR05MB111.namprd05.prod.outlook.com ([169.254.6.196]) with mapi id 15.00.0745.000; Sat, 14 Sep 2013 08:44:01 +0000 From: Anuranjan Shukla To: "freebsd-net@freebsd.org" Subject: IFNAMSIZ/IF_NAMESIZE change proposal Thread-Topic: IFNAMSIZ/IF_NAMESIZE change proposal Thread-Index: AQHOsSaOJ+rmaQHm2keFzzUFg8ShKg== Date: Sat, 14 Sep 2013 08:44:00 +0000 Message-ID: <9527D72E-5871-4C5E-B2AB-A3BECA4925D4@juniper.net> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [66.129.224.53] x-forefront-prvs: 096943F07A Content-Type: text/plain; charset="us-ascii" Content-ID: <7FC910DFF375144FA42B338C8588FC2C@namprd05.prod.outlook.com> Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-OriginatorOrg: juniper.net X-FOPE-CONNECTOR: Id%0$Dn%*$RO%0$TLS%0$FQDN%$TlsDn% Cc: Marcel Moolenaar , "freebsd-arch@freebsd.org" X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 14 Sep 2013 08:59:14 -0000 Hi, At Juniper Networks, interface name size was needed to be longer than what = FreeBSD has. We're trying to reduce our local changes to FreeBSD to allow u= s an easier time upgrading to newer FreeBSD releases, and support the modul= arization of the network stack we'd proposed earlier. I'm sending this out= to propose changing IFNAMSIZ from 16 to 60 (this is the size we use) in Fr= eeBSD. We don't see any downside (other than increasing the ifreq structure= size for one) to doing this, as allowing longer interface names can be han= dy for vendors. I'd like to hear if there's a strong objection to this. If = not, we'd like to get this into to the FreeBSD codebase. Any thoughts/objec= tions highly appreciated. Thanks, Anu= From owner-freebsd-arch@FreeBSD.ORG Sat Sep 14 03:20:11 2013 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTP id C15D375C; Sat, 14 Sep 2013 03:20:11 +0000 (UTC) (envelope-from adrian.chadd@gmail.com) Received: from mail-wg0-x22c.google.com (mail-wg0-x22c.google.com [IPv6:2a00:1450:400c:c00::22c]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 77D232D5A; Sat, 14 Sep 2013 03:20:10 +0000 (UTC) Received: by mail-wg0-f44.google.com with SMTP id b13so1846169wgh.23 for ; Fri, 13 Sep 2013 20:20:09 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:date:message-id:subject :from:to:cc:content-type; bh=IFtRksCRmfL5B1+1wrQ7GTfCJsIIDDCQHX1+tZW4pFQ=; b=gI6Nxi2aoExXobnlohogoRzOt/qT7IXI0S9dphpVrKUea7aD4TIXmx63GGGEDMC1RJ RA27fo4BQDJ0VIiHocD9bvG5E37i8JnwODvAUpXEJn2OAIBJ2JGT6jwZfTwYLig5VAjm J7TZj0nj1NWZi1FqKh2SoMOjG0roERH3UT+4AbsBozTVV0yA36ecQqdvoYj2OkHeN4XN eB7iFW8OKSUD1qX1mVZhJ/ELCZNLJTkM8J5BBS8MQf192G62xmiN8QUTJ8ANiHntSpYH SC1bDAPmy6yBesJBrzoElWyK5AJ9DmzUj9+jkqOACAjjYPgoMyEzracI7lwUC/CdcGkW UZrQ== MIME-Version: 1.0 X-Received: by 10.180.10.136 with SMTP id i8mr4954841wib.46.1379128808906; Fri, 13 Sep 2013 20:20:08 -0700 (PDT) Sender: adrian.chadd@gmail.com Received: by 10.216.73.133 with HTTP; Fri, 13 Sep 2013 20:20:08 -0700 (PDT) In-Reply-To: <221093226.23439826.1379112203059.JavaMail.root@uoguelph.ca> References: <6BDA4619-783C-433E-9819-A7EAA0BD3299@neville-neil.com> <221093226.23439826.1379112203059.JavaMail.root@uoguelph.ca> Date: Fri, 13 Sep 2013 20:20:08 -0700 X-Google-Sender-Auth: A0kv2cAHQtkyVSyFBvksPLrnLYs Message-ID: Subject: Re: Network stack changes From: Adrian Chadd To: Rick Macklem X-Mailman-Approved-At: Sat, 14 Sep 2013 12:02:03 +0000 Content-Type: text/plain; charset=ISO-8859-1 X-Content-Filtered-By: Mailman/MimeDel 2.1.14 Cc: "Alexander V. Chernikov" , Luigi Rizzo , Andre Oppermann , "freebsd-hackers@freebsd.org" , George Neville-Neil , FreeBSD Net , "Andrey V. Elsukov" , Gleb Smirnoff , "freebsd-arch@freebsd.org" X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 14 Sep 2013 03:20:11 -0000 On 13 September 2013 15:43, Rick Macklem wrote: > And any time you increase latency, that will have a negative impact on > NFS performance. NFS RPCs are usually small messages (except Write requests > and Read replies) and the RTT for these (mostly small, bidirectional) > messages can have a significant impact on NFS perf. > Hi, the penalties to go to main memory quite a few times each time we process a frame is expensive. If we can get some better behaviour through batching leading to more efficient cache usage, it may not actually _have_ a delay. But, that requires a whole lot of design stars to align. And I'm still knee deep elsewhere, so I haven't really finished getting up to speed with what everyone else has done / said about it.. -adrian From owner-freebsd-arch@FreeBSD.ORG Sat Sep 14 16:44:15 2013 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTP id BB788F34 for ; Sat, 14 Sep 2013 16:44:15 +0000 (UTC) (envelope-from imp@bsdimp.com) Received: from mail-pb0-f47.google.com (mail-pb0-f47.google.com [209.85.160.47]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 90CD62F0A for ; Sat, 14 Sep 2013 16:44:15 +0000 (UTC) Received: by mail-pb0-f47.google.com with SMTP id rr4so2444285pbb.34 for ; Sat, 14 Sep 2013 09:44:09 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:sender:subject:mime-version:content-type:from :in-reply-to:date:cc:content-transfer-encoding:message-id:references :to; bh=UuM2F4Cd7SGqOWVDzT4jPsStz/HsrmK1gJpu6DlJwRc=; b=bDvVBCNlpcukzka8piqcJbtWGPyldnNWdKfLPe8dFJRHd+vvv4JZF4soh6ctz7Z4Nj JaAp9otgs6PK/ASeZdO4aq+4BREHjQAWa2bJ/Ui1mzxK0I4ban6cLlhid9WYCNuqkSnD msAlRlGDqaGaf3lfkpZsMX7kbW2PAQPxMhH4RhhnDEf+Xdfl2RnsouDR3k63NW4LuIMP F0yKdZ2nlwcvEtfKM/KSgcHv5r9bv548Il39anMjd9OldapoPHSG7xJ+GH1mt7PnDvMF nvhkvYUB7uR7un8SXvQBZsvZcxsiBSUOgp67RqtxPGFzj3Aq+Amt6UDFNnc5mONtzxKR CigQ== X-Gm-Message-State: ALoCoQn9hSVirCJSU+ezQOabiopipPnaAKbMvgjSy6yu0QYhu67S4tAvjycUjSdrDVyKao23JY6B X-Received: by 10.66.26.112 with SMTP id k16mr21187903pag.65.1379175686312; Sat, 14 Sep 2013 09:21:26 -0700 (PDT) Received: from 53.imp.bsdimp.com (50-78-194-198-static.hfc.comcastbusiness.net. [50.78.194.198]) by mx.google.com with ESMTPSA id vz4sm26015491pab.11.1969.12.31.16.00.00 (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Sat, 14 Sep 2013 09:21:25 -0700 (PDT) Sender: Warner Losh Subject: Re: IFNAMSIZ/IF_NAMESIZE change proposal Mime-Version: 1.0 (Apple Message framework v1085) Content-Type: text/plain; charset=us-ascii From: Warner Losh In-Reply-To: <9527D72E-5871-4C5E-B2AB-A3BECA4925D4@juniper.net> Date: Sat, 14 Sep 2013 10:21:25 -0600 Content-Transfer-Encoding: quoted-printable Message-Id: <19C0CA7F-2857-4533-B5E7-29E1085DE072@bsdimp.com> References: <9527D72E-5871-4C5E-B2AB-A3BECA4925D4@juniper.net> To: Anuranjan Shukla X-Mailer: Apple Mail (2.1085) Cc: "freebsd-net@freebsd.org" , Marcel Moolenaar , "freebsd-arch@freebsd.org" X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 14 Sep 2013 16:44:15 -0000 On Sep 14, 2013, at 2:44 AM, Anuranjan Shukla wrote: > At Juniper Networks, interface name size was needed to be longer than = what FreeBSD has. We're trying to reduce our local changes to FreeBSD to = allow us an easier time upgrading to newer FreeBSD releases, and support = the modularization of the network stack we'd proposed earlier. I'm = sending this out to propose changing IFNAMSIZ from 16 to 60 (this is = the size we use) in FreeBSD. We don't see any downside (other than = increasing the ifreq structure size for one) to doing this, as allowing = longer interface names can be handy for vendors. I'd like to hear if = there's a strong objection to this. If not, we'd like to get this into = to the FreeBSD codebase. Any thoughts/objections highly appreciated. 56 or 64 would be better for alignment, wouldn't it? Warner From owner-freebsd-arch@FreeBSD.ORG Sat Sep 14 17:30:58 2013 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTP id B733AF22; Sat, 14 Sep 2013 17:30:58 +0000 (UTC) (envelope-from marcelm@juniper.net) Received: from co9outboundpool.messaging.microsoft.com (co9ehsobe002.messaging.microsoft.com [207.46.163.25]) (using TLSv1 with cipher AES128-SHA (128/128 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 74C27210C; Sat, 14 Sep 2013 17:30:57 +0000 (UTC) Received: from mail95-co9-R.bigfish.com (10.236.132.251) by CO9EHSOBE027.bigfish.com (10.236.130.90) with Microsoft SMTP Server id 14.1.225.22; Sat, 14 Sep 2013 17:30:51 +0000 Received: from mail95-co9 (localhost [127.0.0.1]) by mail95-co9-R.bigfish.com (Postfix) with ESMTP id 5191F8C0132; Sat, 14 Sep 2013 17:30:51 +0000 (UTC) X-Forefront-Antispam-Report: CIP:157.56.240.101; KIP:(null); UIP:(null); IPV:NLI; H:BL2PRD0510HT001.namprd05.prod.outlook.com; RD:none; EFVD:NLI X-SpamScore: -4 X-BigFish: VPS-4(zzbb2dI98dI9371I1432Izz1202hzzz2fh2a8h668h839h944he5bhf0ah1220h1288h12a5h12a9h12bdh137ah13b6h1441h1504h1537h153bh162dh1631h1758h18e1h1946h19b5h19ceh1ad9h1b0ah1d0ch1d2eh1d3fh1dc1h1dfeh1dffh1e1dh1155h) Received-SPF: pass (mail95-co9: domain of juniper.net designates 157.56.240.101 as permitted sender) client-ip=157.56.240.101; envelope-from=marcelm@juniper.net; helo=BL2PRD0510HT001.namprd05.prod.outlook.com ; .outlook.com ; X-Forefront-Antispam-Report-Untrusted: SFV:NSPM; SFS:(51704005)(377454003)(479174003)(199002)(189002)(24454002)(65816001)(80022001)(31966008)(47446002)(74662001)(80976001)(74502001)(81686001)(76786001)(81816001)(76176001)(83506001)(76796001)(36756003)(66066001)(74706001)(50986001)(47976001)(53806001)(69226001)(83072001)(76482001)(49866001)(47736001)(46102001)(74366001)(74876001)(81342001)(51856001)(4396001)(81542001)(54316002)(54356001)(56776001)(77096001)(56816003)(79102001)(63696002)(83322001)(19580395003)(19580405001)(59766001)(77982001)(1941001); DIR:OUT; SFP:; SCL:1; SRVR:BY2PR05MB109; H:BY2PR05MB127.namprd05.prod.outlook.com; CLIP:66.129.224.36; RD:InfoNoRecords; A:1; MX:1; LANG:en; Received: from mail95-co9 (localhost.localdomain [127.0.0.1]) by mail95-co9 (MessageSwitch) id 1379179849451054_3462; Sat, 14 Sep 2013 17:30:49 +0000 (UTC) Received: from CO9EHSMHS024.bigfish.com (unknown [10.236.132.244]) by mail95-co9.bigfish.com (Postfix) with ESMTP id 5FFD8500041; Sat, 14 Sep 2013 17:30:49 +0000 (UTC) Received: from BL2PRD0510HT001.namprd05.prod.outlook.com (157.56.240.101) by CO9EHSMHS024.bigfish.com (10.236.130.34) with Microsoft SMTP Server (TLS) id 14.16.227.3; Sat, 14 Sep 2013 17:30:49 +0000 Received: from BY2PR05MB109.namprd05.prod.outlook.com (10.242.38.15) by BL2PRD0510HT001.namprd05.prod.outlook.com (10.255.100.36) with Microsoft SMTP Server (TLS) id 14.16.353.4; Sat, 14 Sep 2013 17:30:48 +0000 Received: from BY2PR05MB127.namprd05.prod.outlook.com (10.242.38.24) by BY2PR05MB109.namprd05.prod.outlook.com (10.242.38.15) with Microsoft SMTP Server (TLS) id 15.0.745.25; Sat, 14 Sep 2013 17:30:46 +0000 Received: from BY2PR05MB127.namprd05.prod.outlook.com ([169.254.8.216]) by BY2PR05MB127.namprd05.prod.outlook.com ([169.254.8.216]) with mapi id 15.00.0775.005; Sat, 14 Sep 2013 17:30:45 +0000 From: Marcel Moolenaar To: Warner Losh , Anuranjan Shukla Subject: Re: IFNAMSIZ/IF_NAMESIZE change proposal Thread-Topic: IFNAMSIZ/IF_NAMESIZE change proposal Thread-Index: AQHOsSaOJ+rmaQHm2keFzzUFg8ShKpnFap2A//+eBIA= Date: Sat, 14 Sep 2013 17:30:45 +0000 Message-ID: In-Reply-To: <19C0CA7F-2857-4533-B5E7-29E1085DE072@bsdimp.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: user-agent: Microsoft-MacOutlook/14.3.6.130613 x-originating-ip: [66.129.224.36] x-forefront-prvs: 096943F07A Content-Type: text/plain; charset="us-ascii" Content-ID: Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-OriginatorOrg: juniper.net X-FOPE-CONNECTOR: Id%0$Dn%*$RO%0$TLS%0$FQDN%$TlsDn% Cc: "freebsd-net@freebsd.org" , "freebsd-arch@freebsd.org" X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 14 Sep 2013 17:30:58 -0000 On 9/14/13 9:21 AM, "Warner Losh" wrote: > >On Sep 14, 2013, at 2:44 AM, Anuranjan Shukla wrote: >> At Juniper Networks, interface name size was needed to be longer than >>what FreeBSD has. We're trying to reduce our local changes to FreeBSD to >>allow us an easier time upgrading to newer FreeBSD releases, and support >>the modularization of the network stack we'd proposed earlier. I'm >>sending this out to propose changing IFNAMSIZ from 16 to 60 (this is >>the size we use) in FreeBSD. We don't see any downside (other than >>increasing the ifreq structure size for one) to doing this, as allowing >>longer interface names can be handy for vendors. I'd like to hear if >>there's a strong objection to this. If not, we'd like to get this into >>to the FreeBSD codebase. Any thoughts/objections highly appreciated. > >56 or 64 would be better for alignment, wouldn't it? Yes, but then we need to change Junos' definition to match FreeBSD's and we're not sure yet if that's at all possible. Hence the suggestion to use what we have at Juniper. If a "nicer" length is preferred, then we'll see about making that happen. Thoughts? --=20 Marcel Moolenaar marcelm@juniper.net From owner-freebsd-arch@FreeBSD.ORG Sat Sep 14 14:22:44 2013 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTP id AC2B03DC; Sat, 14 Sep 2013 14:22:44 +0000 (UTC) (envelope-from luigi@onelab2.iet.unipi.it) Received: from onelab2.iet.unipi.it (onelab2.iet.unipi.it [131.114.59.238]) by mx1.freebsd.org (Postfix) with ESMTP id 64E8129D4; Sat, 14 Sep 2013 14:22:43 +0000 (UTC) Received: by onelab2.iet.unipi.it (Postfix, from userid 275) id 9D36D7300B; Sat, 14 Sep 2013 16:28:02 +0200 (CEST) Date: Sat, 14 Sep 2013 16:28:02 +0200 From: Luigi Rizzo To: George Neville-Neil Subject: Re: Network stack changes Message-ID: <20130914142802.GC71010@onelab2.iet.unipi.it> References: <521E41CB.30700@yandex-team.ru> <6BDA4619-783C-433E-9819-A7EAA0BD3299@neville-neil.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <6BDA4619-783C-433E-9819-A7EAA0BD3299@neville-neil.com> User-Agent: Mutt/1.5.20 (2009-06-14) X-Mailman-Approved-At: Sat, 14 Sep 2013 18:43:20 +0000 Cc: "Alexander V. Chernikov" , Adrian Chadd , Andre Oppermann , "freebsd-hackers@freebsd.org" , FreeBSD Net , Luigi Rizzo , "Andrey V. Elsukov" , Gleb Smirnoff , "freebsd-arch@freebsd.org" X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 14 Sep 2013 14:22:44 -0000 On Fri, Sep 13, 2013 at 11:08:27AM -0400, George Neville-Neil wrote: > > On Aug 29, 2013, at 7:49 , Adrian Chadd wrote: ... > > I still have some tool coding to do with PMC before I even think about > > tinkering with this as I'd like to measure stuff like per-packet latency as > > well as top-level processing overhead (ie, CPU_CLK_UNHALTED.THREAD_P / > > lagg0 TX bytes/pkts, RX bytes/pkts, NIC interrupts on that core, etc.) > > > > This would be very useful in identifying the actual hot spots, and would be helpful > to anyone who can generate a decent stream of packets with, say, an IXIA. IXIA ? For the timescales we need to address we don't need an IXIA, a netmap sender is more than enough cheers luigi From owner-freebsd-arch@FreeBSD.ORG Sat Sep 14 14:29:23 2013 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTP id 91632418; Sat, 14 Sep 2013 14:29:23 +0000 (UTC) (envelope-from luigi@onelab2.iet.unipi.it) Received: from onelab2.iet.unipi.it (onelab2.iet.unipi.it [131.114.59.238]) by mx1.freebsd.org (Postfix) with ESMTP id 4B73D29EB; Sat, 14 Sep 2013 14:29:23 +0000 (UTC) Received: by onelab2.iet.unipi.it (Postfix, from userid 275) id 015667300A; Sat, 14 Sep 2013 16:25:26 +0200 (CEST) Date: Sat, 14 Sep 2013 16:25:26 +0200 From: Luigi Rizzo To: George Neville-Neil Subject: Re: Network stack changes Message-ID: <20130914142526.GB71010@onelab2.iet.unipi.it> References: <521E41CB.30700@yandex-team.ru> <6BDA4619-783C-433E-9819-A7EAA0BD3299@neville-neil.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <6BDA4619-783C-433E-9819-A7EAA0BD3299@neville-neil.com> User-Agent: Mutt/1.5.20 (2009-06-14) X-Mailman-Approved-At: Sat, 14 Sep 2013 18:43:42 +0000 Cc: "Alexander V. Chernikov" , Adrian Chadd , Andre Oppermann , "freebsd-hackers@freebsd.org" , "freebsd-arch@freebsd.org" , Luigi Rizzo , "Andrey V. Elsukov" , Gleb Smirnoff , FreeBSD Net X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 14 Sep 2013 14:29:23 -0000 On Fri, Sep 13, 2013 at 11:08:27AM -0400, George Neville-Neil wrote: > > On Aug 29, 2013, at 7:49 , Adrian Chadd wrote: ... > One quick note here. Every time you increase batching you may increase bandwidth > but you will also increase per packet latency for the last packet in a batch. The ones who suffer are the first ones, because their processing is somewhat delayed to 1) let the input batch build up, and 2) complete processing of the batch before pushing results to the next stage. However one should never wait for an input batch to grow; you process whatever your source gives you (one or more packets) by the time you are ready (and if you are slow/overloaded, of course you will get a large backlog at once). Either way, there is no reason to create additional delay on input. cheers luigi From owner-freebsd-arch@FreeBSD.ORG Sat Sep 14 18:50:08 2013 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTP id B36D1F45; Sat, 14 Sep 2013 18:50:08 +0000 (UTC) (envelope-from cochard@gmail.com) Received: from mail-ve0-x231.google.com (mail-ve0-x231.google.com [IPv6:2607:f8b0:400c:c01::231]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id E08292476; Sat, 14 Sep 2013 18:50:07 +0000 (UTC) Received: by mail-ve0-f177.google.com with SMTP id db12so1900995veb.22 for ; Sat, 14 Sep 2013 11:50:07 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:from:date:message-id :subject:to:cc:content-type; bh=/cuWua3KDkh6N+MoMwXcUoLKAbOeUwYx5AACQMHPtbc=; b=O1UlmyHnbTj+sgU6LCj9npvH4o6X88SBN3g84lWL/bVNWoLgp0H9Fv/TNJC0W+7M+R nEWGG/cFPTqz/WmR6+4t4pPZiEieyii4a8v3QixD/nIi42yANuRCy+jJjOQtJo77MBk3 tEfzncV5pH6zoBQmWpYQ+/UxZhJC4rcvku6++QPybEZZLnQAuddvfOoo3qFrEy4Bm+vo vLYCyTwqwd3SgENVZFUjENeCEk+AkPjypH4FmcXs6lKeVQTP8vaCE5FXbhI5Ifb1Pwc2 YzCv1t1m+KVkgRWn3MBK0Xcco/y2Evhg0/Fv5StOGd5NkzCH/j0GA858i7AQ9R7DywfI Dn/A== X-Received: by 10.220.13.20 with SMTP id z20mr18354266vcz.0.1379184607005; Sat, 14 Sep 2013 11:50:07 -0700 (PDT) MIME-Version: 1.0 Sender: cochard@gmail.com Received: by 10.58.221.9 with HTTP; Sat, 14 Sep 2013 11:49:46 -0700 (PDT) In-Reply-To: <20130914142802.GC71010@onelab2.iet.unipi.it> References: <521E41CB.30700@yandex-team.ru> <6BDA4619-783C-433E-9819-A7EAA0BD3299@neville-neil.com> <20130914142802.GC71010@onelab2.iet.unipi.it> From: =?ISO-8859-1?Q?Olivier_Cochard=2DLabb=E9?= Date: Sat, 14 Sep 2013 20:49:46 +0200 X-Google-Sender-Auth: jGoYAycMs8RBkfo6ZVCAv397HvM Message-ID: Subject: Re: Network stack changes To: Luigi Rizzo Content-Type: text/plain; charset=ISO-8859-1 X-Mailman-Approved-At: Sat, 14 Sep 2013 19:58:40 +0000 Cc: "Alexander V. Chernikov" , Adrian Chadd , Andre Oppermann , "freebsd-hackers@freebsd.org" , George Neville-Neil , "freebsd-arch@freebsd.org" , Luigi Rizzo , "Andrey V. Elsukov" , Gleb Smirnoff , FreeBSD Net X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 14 Sep 2013 18:50:08 -0000 On Sat, Sep 14, 2013 at 4:28 PM, Luigi Rizzo wrote: > > IXIA ? For the timescales we need to address we don't need an IXIA, > a netmap sender is more than enough > The great netmap generates only one IP flow (same src/dst IP and same src/dst port). This don't permit to test multi-queue NIC (or SMP packet-filter) on a simple lab like this: netmap sender => freebsd router => netmap receiver Regards, Olivier From owner-freebsd-arch@FreeBSD.ORG Sat Sep 14 19:24:05 2013 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTP id 9DA1161C; Sat, 14 Sep 2013 19:24:05 +0000 (UTC) (envelope-from rizzo.unipi@gmail.com) Received: from mail-la0-x232.google.com (mail-la0-x232.google.com [IPv6:2a00:1450:4010:c03::232]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id D8E8725FD; Sat, 14 Sep 2013 19:24:03 +0000 (UTC) Received: by mail-la0-f50.google.com with SMTP id lv10so1997565lab.37 for ; Sat, 14 Sep 2013 12:24:01 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:date:message-id:subject :from:to:cc:content-type; bh=5s/IVe0iDOLPQnHwtjKoQD9xkSMh7tizoAp5hr4lTvI=; b=GS901BZYZ5iM987h7sm7k/dxsSdQ2yBeR78dv6Lge9kVf1trBrPOztm95RxpEIGYFr XecerDCxWGsHvmRYSV8wCiEv9Ts1aWSsBGkK0d+CCqZw4b2RQ7HXz21lRWBK5BMfypkf sobNwbGh6oWQjxLiUEMK8/MZLb46xOlrZ0Mhs96D/aOKuLw17tiODK3RxRQsMPyKQ4Qm qxGT7/KH8CXkuL4SAHq5yhFsSoLIcqw16qNDwayLDIjcRv0XG3b31wdiSEPCv3uIKa1k 1EJH4qLw6aA/XhEsNU1OstWpBQPRQSFmkMDyqmyp9QbzneSzrWjI7A9VmiM08re6nYhG m9FA== MIME-Version: 1.0 X-Received: by 10.112.64.36 with SMTP id l4mr17268604lbs.15.1379186641610; Sat, 14 Sep 2013 12:24:01 -0700 (PDT) Sender: rizzo.unipi@gmail.com Received: by 10.114.200.165 with HTTP; Sat, 14 Sep 2013 12:24:01 -0700 (PDT) In-Reply-To: References: <521E41CB.30700@yandex-team.ru> <6BDA4619-783C-433E-9819-A7EAA0BD3299@neville-neil.com> <20130914142802.GC71010@onelab2.iet.unipi.it> Date: Sat, 14 Sep 2013 21:24:01 +0200 X-Google-Sender-Auth: As4PNyDtyJpFBvlZh6F6Ginwbys Message-ID: Subject: Re: Network stack changes From: Luigi Rizzo To: =?ISO-8859-1?Q?Olivier_Cochard=2DLabb=E9?= X-Mailman-Approved-At: Sat, 14 Sep 2013 19:58:54 +0000 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.14 Cc: "Alexander V. Chernikov" , Adrian Chadd , Andre Oppermann , "freebsd-hackers@freebsd.org" , George Neville-Neil , "freebsd-arch@freebsd.org" , Luigi Rizzo , "Andrey V. Elsukov" , Gleb Smirnoff , FreeBSD Net X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 14 Sep 2013 19:24:05 -0000 On Saturday, September 14, 2013, Olivier Cochard-Labb=E9 wrote: > On Sat, Sep 14, 2013 at 4:28 PM, Luigi Rizzo wrote: >> >> IXIA ? For the timescales we need to address we don't need an IXIA, >> a netmap sender is more than enough >> > > The great netmap generates only one IP flow (same src/dst IP and same > src/dst port). True the sample app generates only one flow but it is trivial to modify it to generate multiple flows. My point was, we have the ability to generate high rate traffic, as long as we do tolerate a .1-1us jitter. Beyond that, you do need some ixia-like solution. Cheers Luigi > This don't permit to test multi-queue NIC (or SMP packet-filter) on a > simple lab like this: > netmap sender =3D> freebsd router =3D> netmap receiver > > Regards, > > Olivier > --=20 -----------------------------------------+------------------------------- Prof. Luigi RIZZO, rizzo@iet.unipi.it . Dip. di Ing. dell'Informazione http://www.iet.unipi.it/~luigi/ . Universita` di Pisa TEL +39-050-2211611 . via Diotisalvi 2 Mobile +39-338-6809875 . 56122 PISA (Italy) -----------------------------------------+-------------------------------