From owner-freebsd-arch@FreeBSD.ORG  Sun Feb  5 12:02:35 2012
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id D9E98106566B
	for <arch@freebsd.org>; Sun,  5 Feb 2012 12:02:35 +0000 (UTC)
	(envelope-from pawel@dawidek.net)
Received: from mail.dawidek.net (60.wheelsystems.com [83.12.187.60])
	by mx1.freebsd.org (Postfix) with ESMTP id 1CC9D8FC1B
	for <arch@freebsd.org>; Sun,  5 Feb 2012 12:02:34 +0000 (UTC)
Received: from localhost (89-73-195-149.dynamic.chello.pl [89.73.195.149])
	by mail.dawidek.net (Postfix) with ESMTPSA id 9D12D676;
	Sun,  5 Feb 2012 12:45:00 +0100 (CET)
Date: Sun, 5 Feb 2012 12:43:46 +0100
From: Pawel Jakub Dawidek <pjd@FreeBSD.org>
To: Konstantin Belousov <kostikbel@gmail.com>
Message-ID: <20120205114345.GE30033@garage.freebsd.pl>
References: <20120203193719.GB3283@deviant.kiev.zoral.com.ua>
MIME-Version: 1.0
Content-Type: multipart/signed; micalg=pgp-sha1;
	protocol="application/pgp-signature"; boundary="idY8LE8SD6/8DnRI"
Content-Disposition: inline
In-Reply-To: <20120203193719.GB3283@deviant.kiev.zoral.com.ua>
X-OS: FreeBSD 10.0-CURRENT amd64
User-Agent: Mutt/1.5.21 (2010-09-15)
Cc: arch@freebsd.org
Subject: Re: Prefaulting for i/o buffers
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 05 Feb 2012 12:02:36 -0000


--idY8LE8SD6/8DnRI
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Fri, Feb 03, 2012 at 09:37:19PM +0200, Konstantin Belousov wrote:
> FreeBSD I/O infrastructure has well known issue with deadlock caused
> by vnode lock order reversal when buffers supplied to read(2) or
> write(2) syscalls are backed by mmaped file.
>=20
> I previously published the patches to convert i/o path to use VMIO,
> based on the Jeff Roberson proposal, see
> http://wiki.freebsd.org/VM6. As a side effect, the VM6 fixed the
> deadlock. Since that work is very intrusive and did not got any
> follow-up, it get stalled.
>=20
> Below is very lightweight patch which only goal is to fix deadlock in
> the least intrusive way. This is possible after FreeBSD got the
> vm_fault_quick_hold_pages(9) and vm_fault_disable_pagefaults(9) KPIs.
> http://people.freebsd.org/~kib/misc/vm1.3.patch

+struct rl_q_entry *
+rlqentry_alloc()

Missing 'void'.

+static int
+rangelock_incompatible(const struct rl_q_entry *e1,
+    const struct rl_q_entry *e2)
+{
+
+	if ((e1->rl_q_flags & RL_LOCK_TYPE_MASK) =3D=3D RL_LOCK_READ &&
+	    (e2->rl_q_flags & RL_LOCK_TYPE_MASK) =3D=3D RL_LOCK_READ)
+		return (0);
+#define	IN_RANGE(a, e) (a >=3D e->rl_q_start && a < e->rl_q_end)
+	if (IN_RANGE(e1->rl_q_start, e2) || IN_RANGE(e2->rl_q_start, e1) ||
+	    IN_RANGE(e1->rl_q_end, e2) || IN_RANGE(e2->rl_q_end, e1))
+		return (1);
+#undef	IN_RANGE
+	return (0);
+}

The checks here are a bit wrong and imcomplete.

This is correct:

+	if (IN_RANGE(e1->rl_q_start, e2) || IN_RANGE(e2->rl_q_start, e1) ||

This is not:

+	    IN_RANGE(e1->rl_q_end, e2) || IN_RANGE(e2->rl_q_end, e1))

After simplifying one of those checks we get:

	if (end1 >=3D start2 && end1 < end2)

This will be true if end1 =3D=3D start2, but in this case it should be
false.

There are also some checks missing. If the first range covers entire
second range or vice-versa you will return that the locks are
compatible, eg.

	start1-----start2-----end2-----end1

The following check should cover all the cases:

	if (e1->rl_q_start < e2->rl_q_end &&
	    e1->rl_q_end > e2->rl_q_start) {
		return (1);
	}

+static void
+rangelock_calc_block(struct rangelock *lock)
+{
+	struct rl_q_entry *entry, *entry1, *whead;
+
+	if (lock->rl_currdep =3D=3D TAILQ_FIRST(&lock->rl_waiters) &&
+	    lock->rl_currdep !=3D NULL)
+		lock->rl_currdep =3D TAILQ_NEXT(lock->rl_currdep, rl_q_link);
+	for (entry =3D lock->rl_currdep; entry;
+	     entry =3D TAILQ_NEXT(entry, rl_q_link)) {
+		TAILQ_FOREACH(entry1, &lock->rl_waiters, rl_q_link) {
+			if (rangelock_incompatible(entry, entry1))
+				goto out;
+			if (entry1 =3D=3D entry)
+				break;
+		}
+	}
+out:
+	lock->rl_currdep =3D entry;
+	TAILQ_FOREACH(whead, &lock->rl_waiters, rl_q_link) {
+		if (whead =3D=3D lock->rl_currdep)
+			break;
+		if (!(whead->rl_q_flags & RL_LOCK_GRANTED)) {
+			whead->rl_q_flags |=3D RL_LOCK_GRANTED;
+			wakeup(whead);
+		}
+	}
+}

This function doesn't look at it is going to scale. Searching for
incompatible locks looks very expensive.

This function seems to be responsible for waking up waiters. In case of
range locking this might be tricky, as unlocking read lock on the given
range doesn't mean the range can be exclusively locked by a waiter, as
part of this range might be read-locked multiple times or different
range might be read locked, but which is also covered by the waiting
writer.

Could you elaborate a bit on how this all works together? Does the
function look through all the waiters and for each waiter goes though
all the active locks?

Do you protect somehow against starving writers?

+	KASSERT(entry->rl_q_flags & RL_LOCK_GRANTED, ("XXX"));
+	KASSERT(entry->rl_q_start =3D=3D base, ("XXX"));
+	KASSERT(entry->rl_q_end >=3D base + len, ("XXX"));

I expect those XXX will be replaced with real messages in the final
version? We could really use simple ASSERT() without obligatory message
in the kernel...

+void *
+rangelock_rlock(struct rangelock *lock, off_t base, size_t len, struct mtx=
 *ilk)
+{
+	struct rl_q_entry *entry;
+	struct thread *td;
+
+	td =3D curthread;
+	if (td->td_rlqe !=3D NULL) {
+		entry =3D td->td_rlqe;
+		td->td_rlqe =3D NULL;
+	} else
+		entry =3D rlqentry_alloc();
+	entry->rl_q_flags =3D RL_LOCK_READ;
+	entry->rl_q_start =3D base;
+	entry->rl_q_end =3D base + len;
+	return (rangelock_enqueue(lock, entry, ilk));
+}
+
+void *
+rangelock_wlock(struct rangelock *lock, off_t base, size_t len, struct mtx=
 *ilk)
+{
+	struct rl_q_entry *entry;
+	struct thread *td;
+
+	td =3D curthread;
+	if (td->td_rlqe !=3D NULL) {
+		entry =3D td->td_rlqe;
+		td->td_rlqe =3D NULL;
+	} else
+		entry =3D rlqentry_alloc();
+	entry->rl_q_flags =3D RL_LOCK_WRITE;
+	entry->rl_q_start =3D base;
+	entry->rl_q_end =3D base + len;
+	return (rangelock_enqueue(lock, entry, ilk));
+}

Those functions are nearly identical, maybe they can be converted to
something like this:

static void *
rangelock_lock(struct rangelock *lock, off_t base, size_t len, struct mtx *=
ilk,
    int flags)
{
	struct rl_q_entry *entry;
	struct thread *td;

	td =3D curthread;
	if (td->td_rlqe !=3D NULL) {
		entry =3D td->td_rlqe;
		td->td_rlqe =3D NULL;
	} else
		entry =3D rlqentry_alloc();
	entry->rl_q_flags =3D RL_LOCK_READ;
	entry->rl_q_start =3D base;
	entry->rl_q_end =3D base + len;
	return (rangelock_enqueue(lock, entry, ilk));
}

void *
rangelock_rlock(struct rangelock *lock, off_t base, size_t len, struct mtx =
*ilk)
{

	return (range_lock(lock, base, len, ilk, RL_LOCK_READ);
}

void *
rangelock_wlock(struct rangelock *lock, off_t base, size_t len, struct mtx =
*ilk)
{

	return (range_lock(lock, base, len, ilk, RL_LOCK_WRITE);
}

Or even merge rangelock_lock() with rangelock_enqueue()?

@@ -199,6 +200,7 @@ thread_init(void *mem, int size, int flags)
=20
 	td->td_sleepqueue =3D sleepq_alloc();
 	td->td_turnstile =3D turnstile_alloc();
+	td->td_rlqe =3D rlqentry_alloc();

What is the purpose of td_rlqe field? From what I see thread can lock
more than one range. Could you elaborate on that as well?

Do you assert somehow that the same thread is trying to write-lock the
range it already read-locked?

--- a/sys/kern/subr_syscall.c
+++ b/sys/kern/subr_syscall.c
@@ -177,6 +177,12 @@ syscallret(struct thread *td, int error, struct syscal=
l_args *sa __unused)
 	KASSERT(td->td_locks =3D=3D 0,
 	    ("System call %s returning with %d locks held",
 	     syscallname(p, sa->code), td->td_locks));
+	KASSERT((td->td_pflags & TDP_NOFAULTING) =3D=3D 0,
+	    ("System call %s returning with pagefaults disabled",
+	     syscallname(p, sa->code)));
+	KASSERT((td->td_pflags & TDP_NOSLEEPING) =3D=3D 0,
+	    ("System call %s returning with sleep disabled",
+	     syscallname(p, sa->code)));

We can commit those separately, right?

 #ifdef MAC
 	if ((ioflg & IO_NOMACCHECK) =3D=3D 0) {
-		if (rw =3D=3D UIO_READ)
-			error =3D mac_vnode_check_read(active_cred, file_cred,
-			    vp);
-		else
+		if (rw =3D=3D UIO_WRITE)
 			error =3D mac_vnode_check_write(active_cred, file_cred,
 			    vp);
 	}

I don't see mac_vnode_check_read() call being moved anywhere.
Was that intended?

+ * The vn_io_fault() is a wrapper around vn_read() and vn_write() to
+ * prevent the following deadlock:
+ *
+ * Assume that the thread A reads from the vnode vp1 info userspace

s/info/into/

+ * buffer buf1 backed by the pages of vnode vp2.  If a page in buf1 is
+ * currently not resident, then system ends up with the call chain
+ *   vn_read() -> VOP_READ(vp1) -> uiomove() -> [Page Fault] ->
+ *     vm_fault(buf1) -> vnode_pager_getpages(vp2) -> VOP_GETPAGES(vp2)
+ * which establishes lock order vp1->vn_lock, then vp2->vn_lock.
+ * If, at the same time, thread B reads from vnode vp2 into buffer buf2
+ * backed by the pages of vnode vp1, and some page in buf2 is not
+ * resident, we get a reversed order vp2->vn_lock, then vp1->vn_lock.
+ *
+ * To prevent the lock order reversal and deadlock, vn_io_fault() does
+ * not allow page faults to happen during VOP_READ or VOP_WRITE().

s/VOP_READ/VOP_READ()/

+		prot =3D VM_PROT_READ;
+		if ((fp->f_flag & O_APPEND) || !(flags & FOF_OFFSET))
+			/* For appenders, punt and lock the whole range. */
+			rl_cookie =3D vn_rangelock_wlock(vp, 0, (size_t)-1);

On 32bit archs size_t is still 32bit, AFAIR and file size can be bigger
than that. I think we should also use off_t for length (GEOM does that).

+		len =3D uio->uio_iov->iov_len;
+		addr =3D (vm_offset_t)uio->uio_iov->iov_base;
+		end =3D round_page(addr + len);
+		cnt =3D howmany(end - trunc_page(addr), PAGE_SIZE);
+		if (cnt > io_hold_cnt) {
+			len =3D io_hold_cnt * PAGE_SIZE;
+again:
+			end =3D round_page(addr + len);
+			cnt =3D howmany(end - trunc_page(addr), PAGE_SIZE);
+			if (cnt > io_hold_cnt) {
+				len -=3D PAGE_SIZE;
+				goto again;
+			}
+		}

This is a bit hard to understand immediately. Maybe something like the
following is a bit more readable (I assume this goto could happen only
once):

		len =3D MIN(uio->uio_iov->iov_len, io_hold_cnt * PAGE_SIZE);
		addr =3D (vm_offset_t)uio->uio_iov->iov_base;
		end =3D round_page(addr + len);
		if (howmany(end - trunc_page(addr), PAGE_SIZE) > io_hold_cnt) {
			len -=3D PAGE_SIZE;
			end =3D round_page(addr + len);
		}

+vn_truncate(struct file *fp, off_t length, struct ucred *active_cred,
+    struct thread *td)
[...]
+	/*
+	 * Lock the range where the shortening take place. Increase of
+	 * file size does not need rangelock, but it is faster to lock
+	 * the range then call VOP_GETATTR to get the current size and
+	 * deal with races.
+	 */
+	rl_cookie =3D vn_rangelock_wlock(vp, length, -1);

Hmm. In case of increasing file size don't we need the range lock
because we might be racing with write append?

BTW. Comments in sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_rlock.c
might be a useful read.

--=20
Pawel Jakub Dawidek                       http://www.wheelsystems.com
FreeBSD committer                         http://www.FreeBSD.org
Am I Evil? Yes, I Am!                     http://tupytaj.pl

--idY8LE8SD6/8DnRI
Content-Type: application/pgp-signature

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.18 (FreeBSD)

iEYEARECAAYFAk8ua3EACgkQForvXbEpPzRszwCg2nscTg3qYkCHMvXP+s5yj8Y1
slQAoJUnLamwebH26vFztl4OUHbXimS4
=KK/7
-----END PGP SIGNATURE-----

--idY8LE8SD6/8DnRI--

From owner-freebsd-arch@FreeBSD.ORG  Sun Feb  5 16:47:36 2012
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id D78961065673;
	Sun,  5 Feb 2012 16:47:36 +0000 (UTC)
	(envelope-from kostikbel@gmail.com)
Received: from mail.zoral.com.ua (mx0.zoral.com.ua [91.193.166.200])
	by mx1.freebsd.org (Postfix) with ESMTP id EE7778FC0C;
	Sun,  5 Feb 2012 16:47:35 +0000 (UTC)
Received: from skuns.kiev.zoral.com.ua (localhost [127.0.0.1])
	by mail.zoral.com.ua (8.14.2/8.14.2) with ESMTP id q15GlUSq039851
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO);
	Sun, 5 Feb 2012 18:47:30 +0200 (EET)
	(envelope-from kostikbel@gmail.com)
Received: from deviant.kiev.zoral.com.ua (kostik@localhost [127.0.0.1])
	by deviant.kiev.zoral.com.ua (8.14.5/8.14.5) with ESMTP id
	q15GlT5O052800; Sun, 5 Feb 2012 18:47:29 +0200 (EET)
	(envelope-from kostikbel@gmail.com)
Received: (from kostik@localhost)
	by deviant.kiev.zoral.com.ua (8.14.5/8.14.5/Submit) id q15GlTrw052799; 
	Sun, 5 Feb 2012 18:47:29 +0200 (EET)
	(envelope-from kostikbel@gmail.com)
X-Authentication-Warning: deviant.kiev.zoral.com.ua: kostik set sender to
	kostikbel@gmail.com using -f
Date: Sun, 5 Feb 2012 18:47:29 +0200
From: Konstantin Belousov <kostikbel@gmail.com>
To: Pawel Jakub Dawidek <pjd@freebsd.org>
Message-ID: <20120205164729.GH3283@deviant.kiev.zoral.com.ua>
References: <20120203193719.GB3283@deviant.kiev.zoral.com.ua>
	<20120205114345.GE30033@garage.freebsd.pl>
Mime-Version: 1.0
Content-Type: multipart/signed; micalg=pgp-sha1;
	protocol="application/pgp-signature"; boundary="OtP5XuPTaeI2U+qV"
Content-Disposition: inline
In-Reply-To: <20120205114345.GE30033@garage.freebsd.pl>
User-Agent: Mutt/1.4.2.3i
X-Virus-Scanned: clamav-milter 0.95.2 at skuns.kiev.zoral.com.ua
X-Virus-Status: Clean
X-Spam-Status: No, score=-3.9 required=5.0 tests=ALL_TRUSTED,AWL,BAYES_00
	autolearn=ham version=3.2.5
X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on
	skuns.kiev.zoral.com.ua
Cc: arch@freebsd.org
Subject: Re: Prefaulting for i/o buffers
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 05 Feb 2012 16:47:36 -0000


--OtP5XuPTaeI2U+qV
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Sun, Feb 05, 2012 at 12:43:46PM +0100, Pawel Jakub Dawidek wrote:
> On Fri, Feb 03, 2012 at 09:37:19PM +0200, Konstantin Belousov wrote:
> > FreeBSD I/O infrastructure has well known issue with deadlock caused
> > by vnode lock order reversal when buffers supplied to read(2) or
> > write(2) syscalls are backed by mmaped file.
> >=20
> > I previously published the patches to convert i/o path to use VMIO,
> > based on the Jeff Roberson proposal, see
> > http://wiki.freebsd.org/VM6. As a side effect, the VM6 fixed the
> > deadlock. Since that work is very intrusive and did not got any
> > follow-up, it get stalled.
> >=20
> > Below is very lightweight patch which only goal is to fix deadlock in
> > the least intrusive way. This is possible after FreeBSD got the
> > vm_fault_quick_hold_pages(9) and vm_fault_disable_pagefaults(9) KPIs.
> > http://people.freebsd.org/~kib/misc/vm1.3.patch
>=20
> +struct rl_q_entry *
> +rlqentry_alloc()
>=20
> Missing 'void'.
Yes, it is style inconsistency. Fixed.

>=20
> +static int
> +rangelock_incompatible(const struct rl_q_entry *e1,
> +    const struct rl_q_entry *e2)
> +{
> +
> +	if ((e1->rl_q_flags & RL_LOCK_TYPE_MASK) =3D=3D RL_LOCK_READ &&
> +	    (e2->rl_q_flags & RL_LOCK_TYPE_MASK) =3D=3D RL_LOCK_READ)
> +		return (0);
> +#define	IN_RANGE(a, e) (a >=3D e->rl_q_start && a < e->rl_q_end)
> +	if (IN_RANGE(e1->rl_q_start, e2) || IN_RANGE(e2->rl_q_start, e1) ||
> +	    IN_RANGE(e1->rl_q_end, e2) || IN_RANGE(e2->rl_q_end, e1))
> +		return (1);
> +#undef	IN_RANGE
> +	return (0);
> +}
>=20
> The checks here are a bit wrong and imcomplete.
>=20
> This is correct:
>=20
> +	if (IN_RANGE(e1->rl_q_start, e2) || IN_RANGE(e2->rl_q_start, e1) ||
>=20
> This is not:
>=20
> +	    IN_RANGE(e1->rl_q_end, e2) || IN_RANGE(e2->rl_q_end, e1))
>=20
> After simplifying one of those checks we get:
>=20
> 	if (end1 >=3D start2 && end1 < end2)
>=20
> This will be true if end1 =3D=3D start2, but in this case it should be
> false.
Agreed, my tests are too strict there. But note that the only consequence
is that some lock requests are postponed.

>=20
> There are also some checks missing. If the first range covers entire
> second range or vice-versa you will return that the locks are
> compatible, eg.
>=20
> 	start1-----start2-----end2-----end1
Isn't IN_RANGE(e2->rl_q_start, e1) cover this case ?

In essence, the checks I wrote verify whether any one region border lies
inside other region.
>=20
> The following check should cover all the cases:
>=20
> 	if (e1->rl_q_start < e2->rl_q_end &&
> 	    e1->rl_q_end > e2->rl_q_start) {
> 		return (1);
> 	}
Your check is much more elegant, I changed the code as you suggested.

>=20
> +static void
> +rangelock_calc_block(struct rangelock *lock)
> +{
> +	struct rl_q_entry *entry, *entry1, *whead;
> +
> +	if (lock->rl_currdep =3D=3D TAILQ_FIRST(&lock->rl_waiters) &&
> +	    lock->rl_currdep !=3D NULL)
> +		lock->rl_currdep =3D TAILQ_NEXT(lock->rl_currdep, rl_q_link);
> +	for (entry =3D lock->rl_currdep; entry;
> +	     entry =3D TAILQ_NEXT(entry, rl_q_link)) {
> +		TAILQ_FOREACH(entry1, &lock->rl_waiters, rl_q_link) {
> +			if (rangelock_incompatible(entry, entry1))
> +				goto out;
> +			if (entry1 =3D=3D entry)
> +				break;
> +		}
> +	}
> +out:
> +	lock->rl_currdep =3D entry;
> +	TAILQ_FOREACH(whead, &lock->rl_waiters, rl_q_link) {
> +		if (whead =3D=3D lock->rl_currdep)
> +			break;
> +		if (!(whead->rl_q_flags & RL_LOCK_GRANTED)) {
> +			whead->rl_q_flags |=3D RL_LOCK_GRANTED;
> +			wakeup(whead);
> +		}
> +	}
> +}
>=20
> This function doesn't look at it is going to scale. Searching for
> incompatible locks looks very expensive.
Well, it is not. I am aware of the tree structure use for the range lock,
but this probably have to wait some time.

>=20
> This function seems to be responsible for waking up waiters. In case of
> range locking this might be tricky, as unlocking read lock on the given
> range doesn't mean the range can be exclusively locked by a waiter, as
> part of this range might be read-locked multiple times or different
> range might be read locked, but which is also covered by the waiting
> writer.
>=20
> Could you elaborate a bit on how this all works together? Does the
> function look through all the waiters and for each waiter goes though
> all the active locks?
>=20
> Do you protect somehow against starving writers?
Due to different review, I already added the comments to the rangelock
implementation.

Basically, the rangelocks in kern_rangelock.c are fully fair. The locks
are granted in the order of waiters arrival (the order is defined as
the order of acquring interlock, in the latest version it is a sleepchannel
spinlock).

The free of some range causes rescan of the queue of waiters starting
from rl_currdep. The rl_currdep stores the first blocked waiter.
If next waiter is compatible with all granted regions, it is moved
before rl_currdep and woken up.
>=20
> +	KASSERT(entry->rl_q_flags & RL_LOCK_GRANTED, ("XXX"));
> +	KASSERT(entry->rl_q_start =3D=3D base, ("XXX"));
> +	KASSERT(entry->rl_q_end >=3D base + len, ("XXX"));
>=20
> I expect those XXX will be replaced with real messages in the final
> version? We could really use simple ASSERT() without obligatory message
> in the kernel...
>=20
We have MPASS. These XXX are fixed.

> +void *
> +rangelock_rlock(struct rangelock *lock, off_t base, size_t len, struct m=
tx *ilk)
> +{
> +	struct rl_q_entry *entry;
> +	struct thread *td;
> +
> +	td =3D curthread;
> +	if (td->td_rlqe !=3D NULL) {
> +		entry =3D td->td_rlqe;
> +		td->td_rlqe =3D NULL;
> +	} else
> +		entry =3D rlqentry_alloc();
> +	entry->rl_q_flags =3D RL_LOCK_READ;
> +	entry->rl_q_start =3D base;
> +	entry->rl_q_end =3D base + len;
> +	return (rangelock_enqueue(lock, entry, ilk));
> +}
> +
> +void *
> +rangelock_wlock(struct rangelock *lock, off_t base, size_t len, struct m=
tx *ilk)
> +{
> +	struct rl_q_entry *entry;
> +	struct thread *td;
> +
> +	td =3D curthread;
> +	if (td->td_rlqe !=3D NULL) {
> +		entry =3D td->td_rlqe;
> +		td->td_rlqe =3D NULL;
> +	} else
> +		entry =3D rlqentry_alloc();
> +	entry->rl_q_flags =3D RL_LOCK_WRITE;
> +	entry->rl_q_start =3D base;
> +	entry->rl_q_end =3D base + len;
> +	return (rangelock_enqueue(lock, entry, ilk));
> +}
>=20
> Those functions are nearly identical, maybe they can be converted to
> something like this:
>=20
> static void *
> rangelock_lock(struct rangelock *lock, off_t base, size_t len, struct mtx=
 *ilk,
>     int flags)
> {
> 	struct rl_q_entry *entry;
> 	struct thread *td;
>=20
> 	td =3D curthread;
> 	if (td->td_rlqe !=3D NULL) {
> 		entry =3D td->td_rlqe;
> 		td->td_rlqe =3D NULL;
> 	} else
> 		entry =3D rlqentry_alloc();
> 	entry->rl_q_flags =3D RL_LOCK_READ;
> 	entry->rl_q_start =3D base;
> 	entry->rl_q_end =3D base + len;
> 	return (rangelock_enqueue(lock, entry, ilk));
> }
>=20
> void *
> rangelock_rlock(struct rangelock *lock, off_t base, size_t len, struct mt=
x *ilk)
> {
>=20
> 	return (range_lock(lock, base, len, ilk, RL_LOCK_READ);
> }
>=20
> void *
> rangelock_wlock(struct rangelock *lock, off_t base, size_t len, struct mt=
x *ilk)
> {
>=20
> 	return (range_lock(lock, base, len, ilk, RL_LOCK_WRITE);
> }
>=20
> Or even merge rangelock_lock() with rangelock_enqueue()?
Done exactly this.

>=20
> @@ -199,6 +200,7 @@ thread_init(void *mem, int size, int flags)
> =20
>  	td->td_sleepqueue =3D sleepq_alloc();
>  	td->td_turnstile =3D turnstile_alloc();
> +	td->td_rlqe =3D rlqentry_alloc();
>=20
> What is the purpose of td_rlqe field? From what I see thread can lock
> more than one range. Could you elaborate on that as well?
td_rlqe is a cached rangelock entry used for the typical case of the
thread not doing recursive i/o. In this case, it is possible to avoid
memory allocation on the i/o hot path by using entry allocated during
thread creation. Since thread creation already allocates many chunks
of memory, and typical thread performs much more i/o then it suffers
creation, this shorten the i/o calculation path.

If tq_rlqe is already consumed, new entry is allocated at the lock time.
>=20
> Do you assert somehow that the same thread is trying to write-lock the
> range it already read-locked?
No. With the latest patch it is indeed can be done, because I started to
record range owners. For now, I added a comment with TODO mentioning the
possible assert.

>=20
> --- a/sys/kern/subr_syscall.c
> +++ b/sys/kern/subr_syscall.c
> @@ -177,6 +177,12 @@ syscallret(struct thread *td, int error, struct sysc=
all_args *sa __unused)
>  	KASSERT(td->td_locks =3D=3D 0,
>  	    ("System call %s returning with %d locks held",
>  	     syscallname(p, sa->code), td->td_locks));
> +	KASSERT((td->td_pflags & TDP_NOFAULTING) =3D=3D 0,
> +	    ("System call %s returning with pagefaults disabled",
> +	     syscallname(p, sa->code)));
> +	KASSERT((td->td_pflags & TDP_NOSLEEPING) =3D=3D 0,
> +	    ("System call %s returning with sleep disabled",
> +	     syscallname(p, sa->code)));
>=20
> We can commit those separately, right?
Definitely. Rangelocks will be committed separately too. The vfs_vnops.c
change is self-contained.

>=20
>  #ifdef MAC
>  	if ((ioflg & IO_NOMACCHECK) =3D=3D 0) {
> -		if (rw =3D=3D UIO_READ)
> -			error =3D mac_vnode_check_read(active_cred, file_cred,
> -			    vp);
> -		else
> +		if (rw =3D=3D UIO_WRITE)
>  			error =3D mac_vnode_check_write(active_cred, file_cred,
>  			    vp);
>  	}
>=20
> I don't see mac_vnode_check_read() call being moved anywhere.
> Was that intended?
No, thank you for pointing this. mdf already noted this.

>=20
> + * The vn_io_fault() is a wrapper around vn_read() and vn_write() to
> + * prevent the following deadlock:
> + *
> + * Assume that the thread A reads from the vnode vp1 info userspace
>=20
> s/info/into/
>=20
> + * buffer buf1 backed by the pages of vnode vp2.  If a page in buf1 is
> + * currently not resident, then system ends up with the call chain
> + *   vn_read() -> VOP_READ(vp1) -> uiomove() -> [Page Fault] ->
> + *     vm_fault(buf1) -> vnode_pager_getpages(vp2) -> VOP_GETPAGES(vp2)
> + * which establishes lock order vp1->vn_lock, then vp2->vn_lock.
> + * If, at the same time, thread B reads from vnode vp2 into buffer buf2
> + * backed by the pages of vnode vp1, and some page in buf2 is not
> + * resident, we get a reversed order vp2->vn_lock, then vp1->vn_lock.
> + *
> + * To prevent the lock order reversal and deadlock, vn_io_fault() does
> + * not allow page faults to happen during VOP_READ or VOP_WRITE().
>=20
> s/VOP_READ/VOP_READ()/
>=20
> +		prot =3D VM_PROT_READ;
> +		if ((fp->f_flag & O_APPEND) || !(flags & FOF_OFFSET))
> +			/* For appenders, punt and lock the whole range. */
> +			rl_cookie =3D vn_rangelock_wlock(vp, 0, (size_t)-1);
>=20
> On 32bit archs size_t is still 32bit, AFAIR and file size can be bigger
> than that. I think we should also use off_t for length (GEOM does that).
Agreed, but I moved to specifying (start,end) instead of (start,len) alread=
y.

>=20
> +		len =3D uio->uio_iov->iov_len;
> +		addr =3D (vm_offset_t)uio->uio_iov->iov_base;
> +		end =3D round_page(addr + len);
> +		cnt =3D howmany(end - trunc_page(addr), PAGE_SIZE);
> +		if (cnt > io_hold_cnt) {
> +			len =3D io_hold_cnt * PAGE_SIZE;
> +again:
> +			end =3D round_page(addr + len);
> +			cnt =3D howmany(end - trunc_page(addr), PAGE_SIZE);
> +			if (cnt > io_hold_cnt) {
> +				len -=3D PAGE_SIZE;
> +				goto again;
> +			}
> +		}
>=20
> This is a bit hard to understand immediately. Maybe something like the
> following is a bit more readable (I assume this goto could happen only
> once):
>=20
> 		len =3D MIN(uio->uio_iov->iov_len, io_hold_cnt * PAGE_SIZE);
> 		addr =3D (vm_offset_t)uio->uio_iov->iov_base;
> 		end =3D round_page(addr + len);
> 		if (howmany(end - trunc_page(addr), PAGE_SIZE) > io_hold_cnt) {
> 			len -=3D PAGE_SIZE;
> 			end =3D round_page(addr + len);
> 		}
I think it can happen twice, if both start and len are perfectly mis-aligne=
d.

>=20
> +vn_truncate(struct file *fp, off_t length, struct ucred *active_cred,
> +    struct thread *td)
> [...]
> +	/*
> +	 * Lock the range where the shortening take place. Increase of
> +	 * file size does not need rangelock, but it is faster to lock
> +	 * the range then call VOP_GETATTR to get the current size and
> +	 * deal with races.
> +	 */
> +	rl_cookie =3D vn_rangelock_wlock(vp, length, -1);
>=20
> Hmm. In case of increasing file size don't we need the range lock
> because we might be racing with write append?
We need to exclude the split i/o from being performed on the disjoint
ranges of the file if truncation happens in between split.

I changed the rangelock obtained in vn_truncate to get the whole range.
The bug appeared because I imported the rangelock from the bigger
patch that indeed handled the expansions.

Thank you for pointing this out.
>=20
> BTW. Comments in sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_rlock=
.c
> might be a useful read.

Thank you for the review.

http://people.freebsd.org/~kib/misc/vm1.8.patch

--OtP5XuPTaeI2U+qV
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (FreeBSD)

iEYEARECAAYFAk8usqEACgkQC3+MBN1Mb4j4PgCeN08ELUSUm8La1mEHoyIMXbKS
6mEAnRO62Lib+nyRGb2lxQ5zqz732xmS
=6nrX
-----END PGP SIGNATURE-----

--OtP5XuPTaeI2U+qV--

From owner-freebsd-arch@FreeBSD.ORG  Sun Feb  5 19:22:03 2012
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 83FA4106566B
	for <arch@freebsd.org>; Sun,  5 Feb 2012 19:22:03 +0000 (UTC)
	(envelope-from pawel@dawidek.net)
Received: from mail.dawidek.net (60.wheelsystems.com [83.12.187.60])
	by mx1.freebsd.org (Postfix) with ESMTP id 27E898FC0A
	for <arch@freebsd.org>; Sun,  5 Feb 2012 19:22:02 +0000 (UTC)
Received: from localhost (89-73-195-149.dynamic.chello.pl [89.73.195.149])
	by mail.dawidek.net (Postfix) with ESMTPSA id 354897A4;
	Sun,  5 Feb 2012 20:22:01 +0100 (CET)
Date: Sun, 5 Feb 2012 20:20:45 +0100
From: Pawel Jakub Dawidek <pjd@FreeBSD.org>
To: Konstantin Belousov <kostikbel@gmail.com>
Message-ID: <20120205192045.GH30033@garage.freebsd.pl>
References: <20120203193719.GB3283@deviant.kiev.zoral.com.ua>
	<20120205114345.GE30033@garage.freebsd.pl>
	<20120205164729.GH3283@deviant.kiev.zoral.com.ua>
MIME-Version: 1.0
Content-Type: multipart/signed; micalg=pgp-sha1;
	protocol="application/pgp-signature"; boundary="IbVRjBtIbJdbeK1C"
Content-Disposition: inline
In-Reply-To: <20120205164729.GH3283@deviant.kiev.zoral.com.ua>
X-OS: FreeBSD 10.0-CURRENT amd64
User-Agent: Mutt/1.5.21 (2010-09-15)
Cc: arch@freebsd.org
Subject: Re: Prefaulting for i/o buffers
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 05 Feb 2012 19:22:03 -0000


--IbVRjBtIbJdbeK1C
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Sun, Feb 05, 2012 at 06:47:29PM +0200, Konstantin Belousov wrote:
> On Sun, Feb 05, 2012 at 12:43:46PM +0100, Pawel Jakub Dawidek wrote:
> > @@ -199,6 +200,7 @@ thread_init(void *mem, int size, int flags)
> > =20
> >  	td->td_sleepqueue =3D sleepq_alloc();
> >  	td->td_turnstile =3D turnstile_alloc();
> > +	td->td_rlqe =3D rlqentry_alloc();
> >=20
> > What is the purpose of td_rlqe field? From what I see thread can lock
> > more than one range. Could you elaborate on that as well?
> td_rlqe is a cached rangelock entry used for the typical case of the
> thread not doing recursive i/o. In this case, it is possible to avoid
> memory allocation on the i/o hot path by using entry allocated during
> thread creation. Since thread creation already allocates many chunks
> of memory, and typical thread performs much more i/o then it suffers
> creation, this shorten the i/o calculation path.

I see. I'd be in favour of dropping entry allocation on thread creation.
This would make the allocation lazy and it will be done on the first I/O
only. What do you think? Thread creation should be fast, so if we can
avoid adding up, I would go that route.

> > This is a bit hard to understand immediately. Maybe something like the
> > following is a bit more readable (I assume this goto could happen only
> > once):
> >=20
> > 		len =3D MIN(uio->uio_iov->iov_len, io_hold_cnt * PAGE_SIZE);
> > 		addr =3D (vm_offset_t)uio->uio_iov->iov_base;
> > 		end =3D round_page(addr + len);
> > 		if (howmany(end - trunc_page(addr), PAGE_SIZE) > io_hold_cnt) {
> > 			len -=3D PAGE_SIZE;
> > 			end =3D round_page(addr + len);
> > 		}
> I think it can happen twice, if both start and len are perfectly mis-alig=
ned.

I see. That make sense. It would be nice to add comment there, as it
wasn't immediately obvious (at least for me) what's going on.

Another way to simplify this piece of code would be to either make ma[]
array of 'io_hold_cnt + 2' elements or set 'len' to '(io_hold_cnt - 2) *
PAGE_SIZE'.  Short comment describing why we add or subtract 2 would be
of course welcome.

Just curious, why io_hold_cnt is 'static const int' instead of define?

> http://people.freebsd.org/~kib/misc/vm1.8.patch

The new patch looks good to me.

--=20
Pawel Jakub Dawidek                       http://www.wheelsystems.com
FreeBSD committer                         http://www.FreeBSD.org
Am I Evil? Yes, I Am!                     http://tupytaj.pl

--IbVRjBtIbJdbeK1C
Content-Type: application/pgp-signature

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.18 (FreeBSD)

iEYEARECAAYFAk8u1o0ACgkQForvXbEpPzT+QwCgjkC5udmjk3NQxu/ktUgozX2o
DMEAmQGsHDmZR0yBw563Ot6O8ivKYwSB
=k7WD
-----END PGP SIGNATURE-----

--IbVRjBtIbJdbeK1C--

From owner-freebsd-arch@FreeBSD.ORG  Sun Feb  5 22:13:05 2012
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id DDCC1106566B;
	Sun,  5 Feb 2012 22:13:04 +0000 (UTC)
	(envelope-from kostikbel@gmail.com)
Received: from mail.zoral.com.ua (mx0.zoral.com.ua [91.193.166.200])
	by mx1.freebsd.org (Postfix) with ESMTP id 6892D8FC0A;
	Sun,  5 Feb 2012 22:13:03 +0000 (UTC)
Received: from skuns.kiev.zoral.com.ua (localhost [127.0.0.1])
	by mail.zoral.com.ua (8.14.2/8.14.2) with ESMTP id q15MCxwk064757
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO);
	Mon, 6 Feb 2012 00:12:59 +0200 (EET)
	(envelope-from kostikbel@gmail.com)
Received: from deviant.kiev.zoral.com.ua (kostik@localhost [127.0.0.1])
	by deviant.kiev.zoral.com.ua (8.14.5/8.14.5) with ESMTP id
	q15MCxUa054439; Mon, 6 Feb 2012 00:12:59 +0200 (EET)
	(envelope-from kostikbel@gmail.com)
Received: (from kostik@localhost)
	by deviant.kiev.zoral.com.ua (8.14.5/8.14.5/Submit) id q15MCxQ3054438; 
	Mon, 6 Feb 2012 00:12:59 +0200 (EET)
	(envelope-from kostikbel@gmail.com)
X-Authentication-Warning: deviant.kiev.zoral.com.ua: kostik set sender to
	kostikbel@gmail.com using -f
Date: Mon, 6 Feb 2012 00:12:59 +0200
From: Konstantin Belousov <kostikbel@gmail.com>
To: Pawel Jakub Dawidek <pjd@freebsd.org>
Message-ID: <20120205221259.GL3283@deviant.kiev.zoral.com.ua>
References: <20120203193719.GB3283@deviant.kiev.zoral.com.ua>
	<20120205114345.GE30033@garage.freebsd.pl>
	<20120205164729.GH3283@deviant.kiev.zoral.com.ua>
	<20120205192045.GH30033@garage.freebsd.pl>
Mime-Version: 1.0
Content-Type: multipart/signed; micalg=pgp-sha1;
	protocol="application/pgp-signature"; boundary="rYkcoROieYsUh8uu"
Content-Disposition: inline
In-Reply-To: <20120205192045.GH30033@garage.freebsd.pl>
User-Agent: Mutt/1.4.2.3i
X-Virus-Scanned: clamav-milter 0.95.2 at skuns.kiev.zoral.com.ua
X-Virus-Status: Clean
X-Spam-Status: No, score=-3.9 required=5.0 tests=ALL_TRUSTED,AWL,BAYES_00
	autolearn=ham version=3.2.5
X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on
	skuns.kiev.zoral.com.ua
Cc: arch@freebsd.org
Subject: Re: Prefaulting for i/o buffers
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 05 Feb 2012 22:13:05 -0000


--rYkcoROieYsUh8uu
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Sun, Feb 05, 2012 at 08:20:45PM +0100, Pawel Jakub Dawidek wrote:
> On Sun, Feb 05, 2012 at 06:47:29PM +0200, Konstantin Belousov wrote:
> > On Sun, Feb 05, 2012 at 12:43:46PM +0100, Pawel Jakub Dawidek wrote:
> > > @@ -199,6 +200,7 @@ thread_init(void *mem, int size, int flags)
> > > =20
> > >  	td->td_sleepqueue =3D sleepq_alloc();
> > >  	td->td_turnstile =3D turnstile_alloc();
> > > +	td->td_rlqe =3D rlqentry_alloc();
> > >=20
> > > What is the purpose of td_rlqe field? From what I see thread can lock
> > > more than one range. Could you elaborate on that as well?
> > td_rlqe is a cached rangelock entry used for the typical case of the
> > thread not doing recursive i/o. In this case, it is possible to avoid
> > memory allocation on the i/o hot path by using entry allocated during
> > thread creation. Since thread creation already allocates many chunks
> > of memory, and typical thread performs much more i/o then it suffers
> > creation, this shorten the i/o calculation path.
>=20
> I see. I'd be in favour of dropping entry allocation on thread creation.
> This would make the allocation lazy and it will be done on the first I/O
> only. What do you think? Thread creation should be fast, so if we can
> avoid adding up, I would go that route.
I think this is negligible change in the speed of much rare operation
(thread creation) comparing with a hit on the first time i/o, but I just
did it in a way you suggested.

>=20
> > > This is a bit hard to understand immediately. Maybe something like the
> > > following is a bit more readable (I assume this goto could happen only
> > > once):
> > >=20
> > > 		len =3D MIN(uio->uio_iov->iov_len, io_hold_cnt * PAGE_SIZE);
> > > 		addr =3D (vm_offset_t)uio->uio_iov->iov_base;
> > > 		end =3D round_page(addr + len);
> > > 		if (howmany(end - trunc_page(addr), PAGE_SIZE) > io_hold_cnt) {
> > > 			len -=3D PAGE_SIZE;
> > > 			end =3D round_page(addr + len);
> > > 		}
> > I think it can happen twice, if both start and len are perfectly mis-al=
igned.
>=20
> I see. That make sense. It would be nice to add comment there, as it
> wasn't immediately obvious (at least for me) what's going on.
>=20
> Another way to simplify this piece of code would be to either make ma[]
> array of 'io_hold_cnt + 2' elements or set 'len' to '(io_hold_cnt - 2) *
> PAGE_SIZE'.  Short comment describing why we add or subtract 2 would be
> of course welcome.
I am not sure that +2 would be improvement in readability, but indeed it
removes some code. I implemented this and added a comment.

>=20
> Just curious, why io_hold_cnt is 'static const int' instead of define?
I want to change it into a sysctl or tunable one day.

http://people.freebsd.org/~kib/misc/vm1.9.patch

--rYkcoROieYsUh8uu
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (FreeBSD)

iEYEARECAAYFAk8u/usACgkQC3+MBN1Mb4ghkwCgiGZND59lcqCwmTNWk2ztHADq
w1wAnjUFE/i1HhxtS2Z7DxSQJOhec4hN
=FtVf
-----END PGP SIGNATURE-----

--rYkcoROieYsUh8uu--

From owner-freebsd-arch@FreeBSD.ORG  Mon Feb  6 08:59:28 2012
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 46AE2106566C
	for <arch@freebsd.org>; Mon,  6 Feb 2012 08:59:28 +0000 (UTC)
	(envelope-from pawel@dawidek.net)
Received: from mail.dawidek.net (60.wheelsystems.com [83.12.187.60])
	by mx1.freebsd.org (Postfix) with ESMTP id E952A8FC16
	for <arch@freebsd.org>; Mon,  6 Feb 2012 08:59:27 +0000 (UTC)
Received: from localhost (58.wheelsystems.com [83.12.187.58])
	by mail.dawidek.net (Postfix) with ESMTPSA id B9A03938;
	Mon,  6 Feb 2012 09:59:26 +0100 (CET)
Date: Mon, 6 Feb 2012 09:58:14 +0100
From: Pawel Jakub Dawidek <pjd@FreeBSD.org>
To: Konstantin Belousov <kostikbel@gmail.com>
Message-ID: <20120206085814.GB1324@garage.freebsd.pl>
References: <20120203193719.GB3283@deviant.kiev.zoral.com.ua>
	<20120205114345.GE30033@garage.freebsd.pl>
	<20120205164729.GH3283@deviant.kiev.zoral.com.ua>
	<20120205192045.GH30033@garage.freebsd.pl>
	<20120205221259.GL3283@deviant.kiev.zoral.com.ua>
MIME-Version: 1.0
Content-Type: multipart/signed; micalg=pgp-sha1;
	protocol="application/pgp-signature"; boundary="qlTNgmc+xy1dBmNv"
Content-Disposition: inline
In-Reply-To: <20120205221259.GL3283@deviant.kiev.zoral.com.ua>
X-OS: FreeBSD 10.0-CURRENT amd64
User-Agent: Mutt/1.5.21 (2010-09-15)
Cc: arch@freebsd.org
Subject: Re: Prefaulting for i/o buffers
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 06 Feb 2012 08:59:28 -0000


--qlTNgmc+xy1dBmNv
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Mon, Feb 06, 2012 at 12:12:59AM +0200, Konstantin Belousov wrote:
> http://people.freebsd.org/~kib/misc/vm1.9.patch

--- a/sys/fs/nfsclient/nfs_clbio.c
+++ b/sys/fs/nfsclient/nfs_clbio.c
@@ -820,7 +820,10 @@ do_sync:
 			t_uio->uio_segflg =3D UIO_SYSSPACE;
 			t_uio->uio_rw =3D UIO_WRITE;
 			t_uio->uio_td =3D td;
-			bcopy(uiop->uio_iov->iov_base, t_iov->iov_base, size);
+			error =3D copyin(uiop->uio_iov->iov_base,
+			    t_iov->iov_base, size);
+			if (error !=3D 0)
+				goto err_free;

Good catch. It makes me wonder if uiop will always point at userland
buffer. What if we eg. AUDIT subsystem writing from within the kernel to
a file stored on NFS file system?

+	if (lock->rl_currdep =3D=3D TAILQ_FIRST(&lock->rl_waiters) &&
+	    lock->rl_currdep !=3D NULL)
+		lock->rl_currdep =3D TAILQ_NEXT(lock->rl_currdep, rl_q_link);
+	for (entry =3D lock->rl_currdep; entry;

Minor style inconsistency. Two lines earlier you compare pointer with
NULL, which is nice, but here you treat pointer as boolean.

+void
+rangelock_unlock(struct rangelock *lock, void *cookie)
+{
+	struct rl_q_entry *entry;
+
+	MPASS(lock !=3D NULL && cookie !=3D NULL);
+
+	entry =3D cookie;
+	sleepq_lock(lock);
+	rangelock_unlock_locked(lock, entry);
+}

You could drop using addtional 'entry' variable and just pass 'cookie'
directly. Although current code is self-explaining what 'cookie'
actually is, so I kinda like it as-is.

+/*
+ * Add the lock request to the queue of the pending requests for
+ * rangelock.  Sleeps until the request can be granted.

s/request/lock/ ?

@@ -216,6 +218,7 @@ thread_fini(void *mem, int size)
=20
 	td =3D (struct thread *)mem;
 	EVENTHANDLER_INVOKE(thread_fini, td);
+	rlqentry_free(td->td_rlqe);

Not sure if it was intended or not, but td_rlqe can be NULL now, so it
might be worth checking that. It is not a big deal, as uma_zfree() can hand=
le
NULL pointers though.

--=20
Pawel Jakub Dawidek                       http://www.wheelsystems.com
FreeBSD committer                         http://www.FreeBSD.org
Am I Evil? Yes, I Am!                     http://tupytaj.pl

--qlTNgmc+xy1dBmNv
Content-Type: application/pgp-signature

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.18 (FreeBSD)

iEYEARECAAYFAk8vliYACgkQForvXbEpPzTNGgCggyZ6rgH24Bq8z9gzYFDpf+gO
PR8An3JeYbNqfilOEamCbLxTRA/D/Z3n
=YHIV
-----END PGP SIGNATURE-----

--qlTNgmc+xy1dBmNv--

From owner-freebsd-arch@FreeBSD.ORG  Mon Feb  6 09:54:24 2012
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id B0ED5106566B;
	Mon,  6 Feb 2012 09:54:24 +0000 (UTC)
	(envelope-from kostikbel@gmail.com)
Received: from mail.zoral.com.ua (mx0.zoral.com.ua [91.193.166.200])
	by mx1.freebsd.org (Postfix) with ESMTP id 3F97C8FC12;
	Mon,  6 Feb 2012 09:54:23 +0000 (UTC)
Received: from skuns.kiev.zoral.com.ua (localhost [127.0.0.1])
	by mail.zoral.com.ua (8.14.2/8.14.2) with ESMTP id q169sGSe024524
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO);
	Mon, 6 Feb 2012 11:54:16 +0200 (EET)
	(envelope-from kostikbel@gmail.com)
Received: from deviant.kiev.zoral.com.ua (kostik@localhost [127.0.0.1])
	by deviant.kiev.zoral.com.ua (8.14.5/8.14.5) with ESMTP id
	q169sFHJ058022; Mon, 6 Feb 2012 11:54:15 +0200 (EET)
	(envelope-from kostikbel@gmail.com)
Received: (from kostik@localhost)
	by deviant.kiev.zoral.com.ua (8.14.5/8.14.5/Submit) id q169sFD0058021; 
	Mon, 6 Feb 2012 11:54:15 +0200 (EET)
	(envelope-from kostikbel@gmail.com)
X-Authentication-Warning: deviant.kiev.zoral.com.ua: kostik set sender to
	kostikbel@gmail.com using -f
Date: Mon, 6 Feb 2012 11:54:15 +0200
From: Konstantin Belousov <kostikbel@gmail.com>
To: Pawel Jakub Dawidek <pjd@freebsd.org>
Message-ID: <20120206095415.GM3283@deviant.kiev.zoral.com.ua>
References: <20120203193719.GB3283@deviant.kiev.zoral.com.ua>
	<20120205114345.GE30033@garage.freebsd.pl>
	<20120205164729.GH3283@deviant.kiev.zoral.com.ua>
	<20120205192045.GH30033@garage.freebsd.pl>
	<20120205221259.GL3283@deviant.kiev.zoral.com.ua>
	<20120206085814.GB1324@garage.freebsd.pl>
Mime-Version: 1.0
Content-Type: multipart/signed; micalg=pgp-sha1;
	protocol="application/pgp-signature"; boundary="T4pX28sySFCT/oEO"
Content-Disposition: inline
In-Reply-To: <20120206085814.GB1324@garage.freebsd.pl>
User-Agent: Mutt/1.4.2.3i
X-Virus-Scanned: clamav-milter 0.95.2 at skuns.kiev.zoral.com.ua
X-Virus-Status: Clean
X-Spam-Status: No, score=-3.9 required=5.0 tests=ALL_TRUSTED,AWL,BAYES_00
	autolearn=ham version=3.2.5
X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on
	skuns.kiev.zoral.com.ua
Cc: arch@freebsd.org
Subject: Re: Prefaulting for i/o buffers
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 06 Feb 2012 09:54:24 -0000


--T4pX28sySFCT/oEO
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Mon, Feb 06, 2012 at 09:58:14AM +0100, Pawel Jakub Dawidek wrote:
> On Mon, Feb 06, 2012 at 12:12:59AM +0200, Konstantin Belousov wrote:
> > http://people.freebsd.org/~kib/misc/vm1.9.patch
>=20
> --- a/sys/fs/nfsclient/nfs_clbio.c
> +++ b/sys/fs/nfsclient/nfs_clbio.c
> @@ -820,7 +820,10 @@ do_sync:
>  			t_uio->uio_segflg =3D UIO_SYSSPACE;
>  			t_uio->uio_rw =3D UIO_WRITE;
>  			t_uio->uio_td =3D td;
> -			bcopy(uiop->uio_iov->iov_base, t_iov->iov_base, size);
> +			error =3D copyin(uiop->uio_iov->iov_base,
> +			    t_iov->iov_base, size);
> +			if (error !=3D 0)
> +				goto err_free;
>=20
> Good catch. It makes me wonder if uiop will always point at userland
> buffer. What if we eg. AUDIT subsystem writing from within the kernel to
> a file stored on NFS file system?
Hm, I committed to the wrong branch.

Anyway, speaking about this (unrelated) change, I thought that directio
path cannot happen for UIO_SYSSPACE. But since both you and Rick think
that handling UIO_SYSSPACE there makes sense, so be it.

>=20
> +	if (lock->rl_currdep =3D=3D TAILQ_FIRST(&lock->rl_waiters) &&
> +	    lock->rl_currdep !=3D NULL)
> +		lock->rl_currdep =3D TAILQ_NEXT(lock->rl_currdep, rl_q_link);
> +	for (entry =3D lock->rl_currdep; entry;
>=20
> Minor style inconsistency. Two lines earlier you compare pointer with
> NULL, which is nice, but here you treat pointer as boolean.
Fixed.

>=20
> +void
> +rangelock_unlock(struct rangelock *lock, void *cookie)
> +{
> +	struct rl_q_entry *entry;
> +
> +	MPASS(lock !=3D NULL && cookie !=3D NULL);
> +
> +	entry =3D cookie;
> +	sleepq_lock(lock);
> +	rangelock_unlock_locked(lock, entry);
> +}
>=20
> You could drop using addtional 'entry' variable and just pass 'cookie'
> directly. Although current code is self-explaining what 'cookie'
> actually is, so I kinda like it as-is.
Dropped entry.

>=20
> +/*
> + * Add the lock request to the queue of the pending requests for
> + * rangelock.  Sleeps until the request can be granted.
>=20
> s/request/lock/ ?
No, I think request is more precise there.

>=20
> @@ -216,6 +218,7 @@ thread_fini(void *mem, int size)
> =20
>  	td =3D (struct thread *)mem;
>  	EVENTHANDLER_INVOKE(thread_fini, td);
> +	rlqentry_free(td->td_rlqe);
>=20
> Not sure if it was intended or not, but td_rlqe can be NULL now, so it
> might be worth checking that. It is not a big deal, as uma_zfree() can ha=
ndle
> NULL pointers though.
Yes, it is intended. As you noted yourself, uma_zfree() handles NULL.
For typical thread, td_rlqe is not NULL, so it is slightly better to not
check for NULL.

--T4pX28sySFCT/oEO
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (FreeBSD)

iEYEARECAAYFAk8vo0cACgkQC3+MBN1Mb4h1DQCfefzc3n6Z3mj0Ia2dp8hk4RMR
2YwAoOWHv8a6dXehFKAGpB7glFSIV2oI
=JAkA
-----END PGP SIGNATURE-----

--T4pX28sySFCT/oEO--

From owner-freebsd-arch@FreeBSD.ORG  Mon Feb  6 10:01:26 2012
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 6DDD5106564A
	for <arch@freebsd.org>; Mon,  6 Feb 2012 10:01:26 +0000 (UTC)
	(envelope-from pawel@dawidek.net)
Received: from mail.dawidek.net (60.wheelsystems.com [83.12.187.60])
	by mx1.freebsd.org (Postfix) with ESMTP id 1FFFB8FC15
	for <arch@freebsd.org>; Mon,  6 Feb 2012 10:01:25 +0000 (UTC)
Received: from localhost (58.wheelsystems.com [83.12.187.58])
	by mail.dawidek.net (Postfix) with ESMTPSA id C15F595B;
	Mon,  6 Feb 2012 11:01:23 +0100 (CET)
Date: Mon, 6 Feb 2012 11:00:11 +0100
From: Pawel Jakub Dawidek <pjd@FreeBSD.org>
To: Konstantin Belousov <kostikbel@gmail.com>
Message-ID: <20120206100009.GC1324@garage.freebsd.pl>
References: <20120203193719.GB3283@deviant.kiev.zoral.com.ua>
	<20120205114345.GE30033@garage.freebsd.pl>
	<20120205164729.GH3283@deviant.kiev.zoral.com.ua>
	<20120205192045.GH30033@garage.freebsd.pl>
	<20120205221259.GL3283@deviant.kiev.zoral.com.ua>
	<20120206085814.GB1324@garage.freebsd.pl>
	<20120206095415.GM3283@deviant.kiev.zoral.com.ua>
MIME-Version: 1.0
Content-Type: multipart/signed; micalg=pgp-sha1;
	protocol="application/pgp-signature"; boundary="Clx92ZfkiYIKRjnr"
Content-Disposition: inline
In-Reply-To: <20120206095415.GM3283@deviant.kiev.zoral.com.ua>
X-OS: FreeBSD 10.0-CURRENT amd64
User-Agent: Mutt/1.5.21 (2010-09-15)
Cc: arch@freebsd.org
Subject: Re: Prefaulting for i/o buffers
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 06 Feb 2012 10:01:26 -0000


--Clx92ZfkiYIKRjnr
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

Kostik, I have no objections to the patch at this point. Thanks.

--=20
Pawel Jakub Dawidek                       http://www.wheelsystems.com
FreeBSD committer                         http://www.FreeBSD.org
Am I Evil? Yes, I Am!                     http://tupytaj.pl

--Clx92ZfkiYIKRjnr
Content-Type: application/pgp-signature

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.18 (FreeBSD)

iEYEARECAAYFAk8vpKkACgkQForvXbEpPzSItACgqLf+N3mOQ9+74z8u96YbjRl4
c/EAoOnNmoiYImbUsO8thwL4P/lLYm1/
=9bL2
-----END PGP SIGNATURE-----

--Clx92ZfkiYIKRjnr--

From owner-freebsd-arch@FreeBSD.ORG  Mon Feb  6 18:46:21 2012
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 0E397106564A
	for <freebsd-arch@freebsd.org>; Mon,  6 Feb 2012 18:46:21 +0000 (UTC)
	(envelope-from tijl@coosemans.org)
Received: from mailrelay011.isp.belgacom.be (mailrelay011.isp.belgacom.be
	[195.238.6.178])
	by mx1.freebsd.org (Postfix) with ESMTP id 88B978FC14
	for <freebsd-arch@freebsd.org>; Mon,  6 Feb 2012 18:46:19 +0000 (UTC)
X-Belgacom-Dynamic: yes
X-IronPort-Anti-Spam-Filtered: true
X-IronPort-Anti-Spam-Result: AvwEAFQYME9bsaFX/2dsb2JhbABDr0OBBoJPI4EaiDa4aYsuASoGAQsBCAUDAwkIF4J6DixtBA4CCgIhgxwEnlsDiSY
Received: from 87.161-177-91.adsl-dyn.isp.belgacom.be (HELO
	kalimero.tijl.coosemans.org) ([91.177.161.87])
	by relay.skynet.be with ESMTP; 06 Feb 2012 19:16:50 +0100
Received: from kalimero.tijl.coosemans.org (kalimero.tijl.coosemans.org
	[127.0.0.1])
	by kalimero.tijl.coosemans.org (8.14.5/8.14.5) with ESMTP id
	q16IGmH3004224; Mon, 6 Feb 2012 19:16:48 +0100 (CET)
	(envelope-from tijl@coosemans.org)
From: Tijl Coosemans <tijl@coosemans.org>
Date: Mon, 6 Feb 2012 19:16:41 +0100
User-Agent: KMail/1.13.7 (FreeBSD/10.0-CURRENT; KDE/4.7.3; i386; ; )
MIME-Version: 1.0
To: freebsd-arch@freebsd.org
Content-Type: multipart/signed; boundary="nextPart1888453.YQFfBMEuoy";
	protocol="application/pgp-signature"; micalg=pgp-sha256
Content-Transfer-Encoding: 7bit
Message-Id: <201202061916.45998.tijl@coosemans.org>
Cc: Damjan Jovanovic <damjan.jov@gmail.com>
Subject: amd64 cc -m32 support and headers project branch
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 06 Feb 2012 18:46:21 -0000

--nextPart1888453.YQFfBMEuoy
Content-Type: Text/Plain;
  charset="us-ascii"
Content-Transfer-Encoding: 7bit

Hi,

After a hiatus of almost a year I'd like to continue with merging i386
and amd64 headers to get "cc -m32" working on amd64. Previously I tried
to fix any problems with the headers first before merging, but this turned
out to be a long tedious process. So, because lack of -m32 support is
blocking other development I'd like to just merge a minimal set of
headers without any macro/type definition changes or other
reorganisations. The result will be similar to existing powerpc and mips
headers which already support both 32 and 64 bit.

When that's done I can create a development branch to work on the
problems that have come up. Possible goals for such a branch are:

- Fix inconsistencies such as types defined in sys/ and their limits
  in machine/. Also, sometimes the limits don't have the correct type.
- For types/limits defined under machine/ there is a lot of duplication
  between architectures. Try to move some to sys/.
- Try to make headers compiler agnostic; limit use of __attribute__,
  __builtin_* to a single file.
- Maybe remove support for gcc 2.*. The oldest version in ports is 3.4
  and I suspect some headers don't compile anymore with gcc 2.*.
- Add support for new compiler attributes/built-ins.
- Add missing POSIX prototypes, maybe commented out with /* UNIMPL ... */
  or similar so they can be grepped.
- The gcc ports currently patch some system headers where they think
  something is non-standard. These patched headers override the system
  headers which means you have to rebuild these ports whenever you do
  installworld to make sure they contain the latest changes. Some headers
  are trivial to fix, others less so.


If there are no objections, I'd like to start with the -m32 work in
a week or so.

--nextPart1888453.YQFfBMEuoy
Content-Type: application/pgp-signature; name=signature.asc 
Content-Description: This is a digitally signed message part.

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.18 (FreeBSD)

iF4EABEIAAYFAk8wGQ0ACgkQfoCS2CCgtis7CQD7BNh94As/PtYliomKKL5RGARb
5y0MBTnSTZpnwASNNn0A/04yU/w6v+E8Yz1fNlVtIQcXfyLvcIV1/ZsO+yqjnJ20
=4vYu
-----END PGP SIGNATURE-----

--nextPart1888453.YQFfBMEuoy--

From owner-freebsd-arch@FreeBSD.ORG  Tue Feb  7 22:16:17 2012
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id D2D61106566B;
	Tue,  7 Feb 2012 22:16:17 +0000 (UTC)
	(envelope-from giovanni.trematerra@gmail.com)
Received: from mail-qy0-f182.google.com (mail-qy0-f182.google.com
	[209.85.216.182])
	by mx1.freebsd.org (Postfix) with ESMTP id D2CAC8FC15;
	Tue,  7 Feb 2012 22:16:16 +0000 (UTC)
Received: by qcmt40 with SMTP id t40so6039370qcm.13
	for <multiple recipients>; Tue, 07 Feb 2012 14:16:16 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma;
	h=mime-version:sender:date:x-google-sender-auth:message-id:subject
	:from:to:cc:content-type;
	bh=FFPCWrIaj0YRxC1BRifQ7l0AkpOyjxuTDc2yKTaqUzo=;
	b=wWNCvAs0LG4fW8PTmrlAeKh1MbVB0K8JkaJJJciBUrt6aMkiIb3QfntedhK+donCCD
	/wfFiP+LdimZrDU5gKcBcRqM48W+2huZrabemo6y/Ea1OumXm3bF/oYqVoI4rNED3U1g
	Kcb8Pu6cSM9TUmXyfbBJ1NVlCdhzus3Z6Ph9Y=
MIME-Version: 1.0
Received: by 10.224.95.196 with SMTP id e4mr3376446qan.63.1328652975939; Tue,
	07 Feb 2012 14:16:15 -0800 (PST)
Sender: giovanni.trematerra@gmail.com
Received: by 10.229.81.200 with HTTP; Tue, 7 Feb 2012 14:16:15 -0800 (PST)
Date: Tue, 7 Feb 2012 23:16:15 +0100
X-Google-Sender-Auth: xDlBgRLJUCOckRZDgSJUsU5828A
Message-ID: <CACfq090+EDVjj_rRNyafiod_BBzGUzmQ9CjApZLcXWR-Ax=0uQ@mail.gmail.com>
From: Giovanni Trematerra <gianni@freebsd.org>
To: freebsd-arch@freebsd.org
Content-Type: text/plain; charset=ISO-8859-1
Cc: flo@freebsd.org, Kip Macy <kmacy@freebsd.org>,
	Attilio Rao <attilio@freebsd.org>,
	Konstantin Belousov <kib@freebsd.org>, bde@freebsd.org, jilles@freebsd.org
Subject: [LAST ROUND] fifo/pipe merge code
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 07 Feb 2012 22:16:18 -0000

Hi All,
Here a commit candidate patch of the experimental one posted on Jan 8
http://www.trematerra.net/patches/pipefifo_merge.4.3.patch

The patch aims to
- not introduce any performance regressions in pipe code
- speed up performance for fifos
- not introduce any regressions
- fifos and pipes have to have the same behavior as before the patch.
- share as much code as possible between pipes and fifos.

The patch is greatly improved from its first experimental form.
Now it's more clean and compact and integrates all the good suggestions
from previous reviewers. Here some of the major differences:
- no more different UMA zones to allocate things.
- no more pipeinfo structure.
- fix a bug at fifo_close.
- fix a different behavior for fifos during truncate (thanks to jilles).

The patch was reviewed in a previous form by jhb@, bde@, jilles@,
attilio@(only style bug)
This version of the patch was reviewed privately by jilles@,
attilio@(only style bug)

TEST
================================
the patch passes all the regression tests in
tools/regression/fifo
tools/regression/pipe
tools/regression/poll /* same behavior as a stock kernel */
tools/regression/bin/sh /* same behavior as a stock kernel */

pho@ stress tested the patch for 19 hours without any fatal error or panic.

BENCHMARK
=================================
PIPE
the benchmark was performed by compiling a GENERIC kernel.
There's a lot of pipe code involving during a compilation.
Such a test was used to discovery a performance regression in r226042.

x build_stock.txt
+ build_patched.txt
+------------------------------------------------------------------------------+
|     +                       *                       x                       +|
||____________________________M_|_____A_______A_______M_____|______________|   |
+------------------------------------------------------------------------------+
    N           Min           Max        Median           Avg        Stddev
x   3           350           351           351     350.66667    0.57735027
+   3           349           352           350     350.33333     1.5275252
No difference proven at 95.0% confidence

So no performance regression at all into pipes code.

FIFO
To benchmark fifo it was sent 1000 chunks of different size through a fifo.
(128, 512, 4096 bytes)

x fifo_stock_128.txt
+ fifo_patched_128.txt
+------------------------------------------------------------------------------+
| +                                                                            |
| +     +                                              x     x                x|
||M_A___|                                           |________M__A____________| |
+------------------------------------------------------------------------------+
    N           Min           Max        Median           Avg        Stddev
x   3           835           839           836     836.66667      2.081666
+   3           826           827           826     826.33333    0.57735027
Difference at 95.0% confidence
	-10.3333 +/- 3.46228
	-1.23506% +/- 0.413818%
	(Student's t, pooled s = 1.52753)

x fifo_stock_512.txt
+ fifo_patched_512.txt
+------------------------------------------------------------------------------+
|+++                                                                         xx|
||A|                                                                         A||
+------------------------------------------------------------------------------+
    N           Min           Max        Median           Avg        Stddev
x   3          1667          1671          1669          1669             2
+   3          1413          1419          1415     1415.6667     3.0550505
Difference at 95.0% confidence
	-253.333 +/- 5.85232
	-15.1787% +/- 0.350648%
	(Student's t, pooled s = 2.58199)

x fifo_stock_4096.txt
+ fifo_patched_4096.txt
+------------------------------------------------------------------------------+
|+                                                                             |
|+                                                              x              |
|+                                                              x             x|
|A                                                           |__M____A_______| |
+------------------------------------------------------------------------------+
    N           Min           Max        Median           Avg        Stddev
x   3         43682         45237         43694     44204.333     894.33569
+   3         36571         36606         36600     36592.333     18.717194
Difference at 95.0% confidence
	-7612 +/- 1433.69
	-17.22% +/- 3.24332%
	(Student's t, pooled s = 632.529)

if someone would like to reproduce the test can download a copy of the sources a
http://www.trematerra.net/patches/fifoperf/

the code is from zhaoshuai@.
Makefile,
builds benchmark.c

benchmark.c
set up a fifo, fork and send a number of messages of a size depending from
the input parameters.

runme.sh
run the test launching benchmark binary to send different number of messages of
different sizes.

Note that when the chunk is small there isn't so much difference with
the patch but
no regression in any case, when the chunk gets bigger there is a
performance boost.

Many thanks to all the people that spent their time to review the patch
A special thanks to pho@ to have wasted his time testing the patch in
its various forms.

Reviews and possibly merge into the tree are welcomed.

--
Gianni

From owner-freebsd-arch@FreeBSD.ORG  Wed Feb  8 05:49:46 2012
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id CA77B1065674;
	Wed,  8 Feb 2012 05:49:46 +0000 (UTC)
	(envelope-from brde@optusnet.com.au)
Received: from mail08.syd.optusnet.com.au (mail08.syd.optusnet.com.au
	[211.29.132.189])
	by mx1.freebsd.org (Postfix) with ESMTP id 385BB8FC0A;
	Wed,  8 Feb 2012 05:49:45 +0000 (UTC)
Received: from c211-30-171-136.carlnfd1.nsw.optusnet.com.au
	(c211-30-171-136.carlnfd1.nsw.optusnet.com.au [211.30.171.136])
	by mail08.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id
	q185nfS8015540
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO);
	Wed, 8 Feb 2012 16:49:43 +1100
Date: Wed, 8 Feb 2012 16:49:41 +1100 (EST)
From: Bruce Evans <brde@optusnet.com.au>
X-X-Sender: bde@besplex.bde.org
To: Giovanni Trematerra <gianni@FreeBSD.org>
In-Reply-To: <CACfq090+EDVjj_rRNyafiod_BBzGUzmQ9CjApZLcXWR-Ax=0uQ@mail.gmail.com>
Message-ID: <20120208151251.J1853@besplex.bde.org>
References: <CACfq090+EDVjj_rRNyafiod_BBzGUzmQ9CjApZLcXWR-Ax=0uQ@mail.gmail.com>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
X-Mailman-Approved-At: Wed, 08 Feb 2012 06:20:03 +0000
Cc: flo@FreeBSD.org, Kip Macy <kmacy@FreeBSD.org>,
	John Baldwin <jhb@FreeBSD.org>, Peter Holm <pho@FreeBSD.org>,
	Attilio Rao <attilio@FreeBSD.org>,
	Konstantin Belousov <kib@FreeBSD.org>, bde@FreeBSD.org,
	freebsd-arch@FreeBSD.org, jilles@FreeBSD.org
Subject: Re: [LAST ROUND] fifo/pipe merge code
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 08 Feb 2012 05:49:46 -0000

On Tue, 7 Feb 2012, Giovanni Trematerra wrote:

> Hi All,
> Here a commit candidate patch of the experimental one posted on Jan 8
> http://www.trematerra.net/patches/pipefifo_merge.4.3.patch
>
> The patch aims to
> - not introduce any performance regressions in pipe code
> - speed up performance for fifos
> - not introduce any regressions
> - fifos and pipes have to have the same behavior as before the patch.
> - share as much code as possible between pipes and fifos.
>
> The patch is greatly improved from its first experimental form.
> Now it's more clean and compact and integrates all the good suggestions
> from previous reviewers. Here some of the major differences:
> - no more different UMA zones to allocate things.
> - no more pipeinfo structure.
> - fix a bug at fifo_close.
> - fix a different behavior for fifos during truncate (thanks to jilles).
>
> The patch was reviewed in a previous form by jhb@, bde@, jilles@,
> attilio@(only style bug)
> This version of the patch was reviewed privately by jilles@,
> attilio@(only style bug)

It looks quite good now.  I see only a few minor style bugs:

% diff --git a/sys/fs/fifofs/fifo_vnops.c b/sys/fs/fifofs/fifo_vnops.c
% index 94a2713..0d35877 100644
% --- a/sys/fs/fifofs/fifo_vnops.c
% +++ b/sys/fs/fifofs/fifo_vnops.c
% ...
% +int
% +fifo_iseof(struct file *fp)
% ...
% +	KASSERT(fp->f_vnode != NULL, ("fifo_iseof: no vnode info."));
% +	KASSERT(fp->f_vnode->v_fifoinfo != NULL, ("fifo_iseof: no fifoinfo."));

Error messages should not be terminated by a ".".  All other KASSERT()s in
this file and in sys_pipe.c follow this rule.  But 1 old one in sys_pipe.c is
terminated by a "!", and some in sys_pipe.c don't follow the rule that
error messages should not be capitalized.

I wonder if you can assert that some suitable lock is held here.

% +	fip = fp->f_vnode->v_fifoinfo;
% +	return (fp->f_seqcount == fip->fi_wgen);
%  }
% ...
% diff --git a/sys/kern/sys_pipe.c b/sys/kern/sys_pipe.c
% index 9edcb74..5892e9f 100644
% --- a/sys/kern/sys_pipe.c
% +++ b/sys/kern/sys_pipe.c
% @@ -1,4 +1,5 @@
%  /*-
% + * Copyright (c) 2012 Giovanni Trematerra
%   * Copyright (c) 1996 John S. Dyson
%   * All rights reserved.
%   *

Copyrights should be in historical order, as in fifo_vnops.c.

% @@ -135,6 +138,10 @@ __FBSDID("$FreeBSD$");
%   */
%  /* #define PIPE_NODIRECT */
% 
% +#define PIPE_PEER(pipe)												\
% +		(((pipe)->pipe_state & PIPE_NAMED) ?						\
% +		    (pipe) : ((pipe)->pipe_peer))
% +

This has strange indentation for the macro body, and weird indentation
for the backslashes to not line them up at all.  With more normal
indentation, there is no need to split the macro body and thus not
enough backslashes to need lining up:

%%%
#define PIPE_PEER(pipe)	\
 	(((pipe)->pipe_state & PIPE_NAMED) ? (pipe) : ((pipe)->pipe_peer))
%%%

% ...
% @@ -175,6 +184,11 @@ static struct filterops pipe_wfiltops = {
%  	.f_detach = filt_pipedetach,
%  	.f_event = filt_pipewrite
%  };
% +static struct filterops pipe_nfiltops = {
% +	.f_isfd = 1,
% +	.f_detach = filt_pipedetach_notsup,
% +	.f_event = filt_pipenotsup
% +};

This  should probably be sorted before the others (order nrw, not rwn).

% 
%  /*
%   * Default pipe buffer size(s), this can be kind-of large now because pipe
% @@ -220,6 +234,7 @@ static void pipe_clone_write_buffer(struct pipe *wpipe);
%  static int pipespace(struct pipe *cpipe, int size);
%  static int pipespace_new(struct pipe *cpipe, int size);
% 
% +static int	pipe_paircreate(struct thread *td, struct pipepair **p_pp);
%  static int	pipe_zone_ctor(void *mem, int size, void *arg, int flags);
%  static int	pipe_zone_init(void *mem, int size, int flags);
%  static void	pipe_zone_fini(void *mem, int size);

This seems to be unsorted.  I see no reason to sort it with the pipe_zone
constriuctors and desstructors, especially if you don't name it pipe_zone_*.
The declarations of lots of other constructors are unsorted into the main
list.

Old style bugs: the pipe_zone functions shouldn't be in a separate section
and that section shouldn't have a different indentation style (although
that style is closer to KNF), especially since these functions are more
systematically named than the others (the zone_ in their prefix keeps them
together when they are sorted alphabetically).

% @@ -1284,6 +1348,17 @@ pipe_ioctl(fp, cmd, data, active_cred, td)
%  		break;
% 
%  	case FIONREAD:
% +

Extra blank line.

% +		/*
% +		 * FIONREAD will return 0 for non-readable descriptors, and
% +		 * the results of FIONREAD on the read pipe for readable
% +		 * descriptors.
% +		 */
% +		if (!(fp->f_flag & FREAD)) {
% +			*(int *)data = 0;
% +			PIPE_UNLOCK(mpipe);
% +			return (0);
% +		}
%  		if (mpipe->pipe_state & PIPE_DIRECTW)
%  			*(int *)data = mpipe->pipe_map.cnt;
%  		else

Urk, now I see many major style bugs:

% @@ -1333,47 +1408,55 @@ pipe_poll(fp, events, active_cred, td)
%  	int error;
%  #endif
% 
% -	wpipe = rpipe->pipe_peer;
% +	wpipe = PIPE_PEER(rpipe);
%  	PIPE_LOCK(rpipe);
%  #ifdef MAC
%  	error = mac_pipe_check_poll(active_cred, rpipe->pipe_pair);
%  	if (error)
%  		goto locked_error;
%  #endif
% -	if (events & (POLLIN | POLLRDNORM))
% -		if ((rpipe->pipe_state & PIPE_DIRECTW) ||
% -		    (rpipe->pipe_buffer.cnt > 0))
% -			revents |= events & (POLLIN | POLLRDNORM);
% -
% -	if (events & (POLLOUT | POLLWRNORM))
% -		if (wpipe->pipe_present != PIPE_ACTIVE ||
% -		    (wpipe->pipe_state & PIPE_EOF) ||
% -		    (((wpipe->pipe_state & PIPE_DIRECTW) == 0) &&
% -		     ((wpipe->pipe_buffer.size - wpipe->pipe_buffer.cnt) >= PIPE_BUF ||
% -			 wpipe->pipe_buffer.size == 0)))
% -			revents |= events & (POLLOUT | POLLWRNORM);
% -
% -	if ((events & POLLINIGNEOF) == 0) {
% -		if (rpipe->pipe_state & PIPE_EOF) {
% -			revents |= (events & (POLLIN | POLLRDNORM));
% -			if (wpipe->pipe_present != PIPE_ACTIVE ||
% -			    (wpipe->pipe_state & PIPE_EOF))
% -				revents |= POLLHUP;
% +	if (fp->f_flag & FREAD) {
% +		if (events & (POLLIN | POLLRDNORM))
% +			if ((rpipe->pipe_state & PIPE_DIRECTW) ||
% +			    (rpipe->pipe_buffer.cnt > 0))
% +				revents |= events & (POLLIN | POLLRDNORM);
% +

This part of the patch is still unreadable since it indents everything to
add FREAD and FWRITE checks.  I don't like those anyway (they should
probably produce POLLERR POLLNVAL instead of ignoring the error -- see
previous mail), but they are needed for now for compatibility in the
FIFO case (hopefully they don't break compatibility for the plain pipe
case).  To avoid churning this code for them now and later when the
behaviour is fixed, they should be added in a non-invasive way like the
old fifo code did.  That involved using && instead of a nested if.  In
the old code, everything was under the combined && if.  Now there are
3 nested ifs.  This logic change seems to be non-null and give bugs (1).

% +		if (rpipe->pipe_state & PIPE_NAMED)
% +			if (fifo_iseof(fp))

Here the (unrelated) nested if is just a style bug.  It gives verboseness
and extra indentation.

% +				events |= POLLINIGNEOF;
% +

(1) In the old code, POLLINIGNEOF was only set here if we are polling for
some input condition (and this is valid).  Now it is set if the file is
just open for input.  Either way could be right, but this is a change.  If
we are not polling for input, we don't care about it, but POSIX requires
POLLHUP to be set in some cases, and this affects whether we return or
block, and POLLINIGNEOF interacts with this subtly.  There may also be
a problem not going through here to set POLLINIGNEOF in the !FREAD case.

% +		if ((events & POLLINIGNEOF) == 0) {
% +			if (rpipe->pipe_state & PIPE_EOF) {
% +				revents |= (events & (POLLIN | POLLRDNORM));
% +				if (wpipe->pipe_present != PIPE_ACTIVE ||
% +				    (wpipe->pipe_state & PIPE_EOF))
% +					revents |= POLLHUP;
% +			}
%  		}

This is the old sys_pipe.c logic, indented under the FREAD condition.
It is also move up to be inside the FREAD block.  To avoid the
re-indentation, repeat the FREAD test in and && condition.  After
avoiding all re-indentations by repeating FREAD and FWRITE tests,
you don't need to re-order either.

%  	}

This is not consistently verbose in adding extra blank lines.

% +	if (fp->f_flag & FWRITE)
% +		if (events & (POLLOUT | POLLWRNORM))
% +			if (wpipe->pipe_present != PIPE_ACTIVE ||
% +			    (wpipe->pipe_state & PIPE_EOF) || 
% +		        (((wpipe->pipe_state & PIPE_DIRECTW) == 0) &&
% +		        ((wpipe->pipe_buffer.size - wpipe->pipe_buffer.cnt) >=
% +				PIPE_BUF || wpipe->pipe_buffer.size == 0)))
% +				revents |= events & (POLLOUT | POLLWRNORM);
%

I don't like excessive braces, but this statement needs unnecessary ones.

It should also be written using &&, not using a nested if.  Since there
is nothing like POLLINIGNEOF to complicate things, only 1 if statement
is needed for FWRITE (until we get to slrecording).

%  	if (revents == 0) {
% -		if (events & (POLLIN | POLLRDNORM)) {
% -			selrecord(td, &rpipe->pipe_sel);
% -			if (SEL_WAITING(&rpipe->pipe_sel))
% -				rpipe->pipe_state |= PIPE_SEL;
% -		}
% +		if (fp->f_flag & FREAD)
% +			if (events & (POLLIN | POLLRDNORM)) {

Use && here.  It already has sufficient braces, at least after this change.

% +				selrecord(td, &rpipe->pipe_sel);
% +				if (SEL_WAITING(&rpipe->pipe_sel))
% +					rpipe->pipe_state |= PIPE_SEL;
% +			}
% 
% -		if (events & (POLLOUT | POLLWRNORM)) {
% -			selrecord(td, &wpipe->pipe_sel);
% -			if (SEL_WAITING(&wpipe->pipe_sel))
% -				wpipe->pipe_state |= PIPE_SEL;
% -		}
% +		if (fp->f_flag & FWRITE)
% +			if (events & (POLLOUT | POLLWRNORM)) {

Use && here, as above.

% +				selrecord(td, &wpipe->pipe_sel);
% +				if (SEL_WAITING(&wpipe->pipe_sel))
% +					wpipe->pipe_state |= PIPE_SEL;
% +			}
%  	}
%  #ifdef MAC
%  locked_error:
% ...
% diff --git a/sys/sys/pipe.h b/sys/sys/pipe.h
% index 20b1092..479223f 100644
% --- a/sys/sys/pipe.h
% +++ b/sys/sys/pipe.h
% ...
% @@ -138,5 +139,9 @@ struct pipepair {
%  #define PIPE_UNLOCK(pipe)	mtx_unlock(PIPE_MTX(pipe))
%  #define PIPE_LOCK_ASSERT(pipe, type)  mtx_assert(PIPE_MTX(pipe), (type))
% 
% +extern struct fileops pipeops;

This should be declared in the existing extern section for data at the
top of the file.  That section has just one declaration.

Old bug: that section has a bogus comment.  It says /* See sys_pipe.c
for info on what these limits mean. */" (actually a block comment to
be verbose about this unimportant and misdescribed point), but there
are no limits here; there is just 1 variable declaration.  There are
many more variables and limits, but the others are all in sys_pipe.c.
This one (maxpipekva) unfortunately has to be declared here so that
subr_param.c can initialize it, but this is an implementation detail
unrelated to the meaning of this variable.  maxpipekva is also
instantiated in a wrong place (in subr_param.c instead of in
sys_pipe.c).

% +
% +int	pipe_fifo_ctor(struct pipe **ppipe, struct thread *td);
% +void	pipe_dtor(struct pipe *dpipe);

These are sorted backwards.  They aren't really even in logical order
(ctor before dtor, which is also alphabetical), because the ctor is
for fifos only but the dtor is generic.

% 
%  #endif /* !_SYS_PIPE_H_ */

Bruce

From owner-freebsd-arch@FreeBSD.ORG  Wed Feb  8 14:51:48 2012
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id D50E31065672
	for <freebsd-arch@FreeBSD.org>; Wed,  8 Feb 2012 14:51:48 +0000 (UTC)
	(envelope-from theraven@FreeBSD.org)
Received: from theravensnest.org (theravensnest.org [109.169.23.128])
	by mx1.freebsd.org (Postfix) with ESMTP id 7C6F18FC0A
	for <freebsd-arch@FreeBSD.org>; Wed,  8 Feb 2012 14:51:48 +0000 (UTC)
Received: from [192.168.0.2] (cpc1-cwma8-2-0-cust257.7-3.cable.virginmedia.com
	[82.20.153.2]) (authenticated bits=0)
	by theravensnest.org (8.14.4/8.14.4) with ESMTP id q18EEHfJ009567
	(version=TLSv1/SSLv3 cipher=DHE-DSS-AES128-SHA bits=128 verify=NO)
	for <freebsd-arch@freebsd.org>; Wed, 8 Feb 2012 14:14:18 GMT
	(envelope-from theraven@FreeBSD.org)
From: David Chisnall <theraven@FreeBSD.org>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: quoted-printable
Date: Wed, 8 Feb 2012 14:14:12 +0000
Message-Id: <58175263-109E-4FF0-BB29-E0331C01DCD5@FreeBSD.org>
To: freebsd-arch@FreeBSD.org
Mime-Version: 1.0 (Apple Message framework v1257)
X-Mailer: Apple Mail (2.1257)
Cc: 
Subject: NO_TLS flag for public headers
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 08 Feb 2012 14:51:48 -0000

Hi,

The xlocale changes in libc mean that a lot of things that were =
previously global state are now per-thread.  This includes some things =
that were used in ctype.h and are inlined.  We're now losing the benefit =
of inlining them because we now have to call into libc to get the =
per-thread values.  I'd like to use TLS for caching these things, but we =
currently have no equivalent of the NO_TLS flag for public headers.  I'd =
like to have a __NO_TLS that can be set in cdefs.h for architectures =
that don't have TLS support.

Any comments?  Objections?  Anyone want to propose a better place for it =
(or, indeed, add it, since I'm not entirely sure which architectures do =
lack TLS support...)

David=

From owner-freebsd-arch@FreeBSD.ORG  Wed Feb  8 21:14:50 2012
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 8BBF01065672;
	Wed,  8 Feb 2012 21:14:50 +0000 (UTC)
	(envelope-from giovanni.trematerra@gmail.com)
Received: from mail-qw0-f54.google.com (mail-qw0-f54.google.com
	[209.85.216.54])
	by mx1.freebsd.org (Postfix) with ESMTP id A3A038FC1C;
	Wed,  8 Feb 2012 21:14:49 +0000 (UTC)
Received: by qaea17 with SMTP id a17so790369qae.13
	for <multiple recipients>; Wed, 08 Feb 2012 13:14:49 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma;
	h=mime-version:sender:in-reply-to:references:date
	:x-google-sender-auth:message-id:subject:from:to:cc:content-type
	:content-transfer-encoding;
	bh=e2avok8QoI+HF9tv/XTduTAPp5OGNQCPUYWWvaQFVB4=;
	b=mCgbdBJQHvKzWCHaKzSniEGBhNRWLyAz6el/ClooJySqDuM6TEicfKlWZK8fP3bTK2
	bOtx0cnh2/Lvjis7xHJps+xIHd/P27VYNxE9mruCCVJ3XwRnnQLqu2eULzk2ByZDODMn
	9M0VqPjRZkVFGHSNSR44p+NBGJsFFBRIZm3WQ=
MIME-Version: 1.0
Received: by 10.224.179.74 with SMTP id bp10mr3985016qab.89.1328735688924;
	Wed, 08 Feb 2012 13:14:48 -0800 (PST)
Sender: giovanni.trematerra@gmail.com
Received: by 10.229.81.200 with HTTP; Wed, 8 Feb 2012 13:14:48 -0800 (PST)
In-Reply-To: <CACfq091eHpvCbwuU-WOxiyt92YiTbRbWnPmNGGMDidzG=uq4uw@mail.gmail.com>
References: <CACfq090+EDVjj_rRNyafiod_BBzGUzmQ9CjApZLcXWR-Ax=0uQ@mail.gmail.com>
	<20120208151251.J1853@besplex.bde.org>
	<CACfq091eHpvCbwuU-WOxiyt92YiTbRbWnPmNGGMDidzG=uq4uw@mail.gmail.com>
Date: Wed, 8 Feb 2012 22:14:48 +0100
X-Google-Sender-Auth: UzE94AUxdhaqicn8iEnIyNsNrjA
Message-ID: <CACfq092eJHEfR3uKmCobxpS0WiM87ZH9isBMFaW18AC_-Xg6HA@mail.gmail.com>
From: Giovanni Trematerra <gianni@freebsd.org>
To: freebsd-arch@freebsd.org
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable
Cc: Kip Macy <kmacy@freebsd.org>, Attilio Rao <attilio@freebsd.org>,
	Konstantin Belousov <kib@freebsd.org>, bde@freebsd.org, jilles@freebsd.org
Subject: Fwd: [LAST ROUND] fifo/pipe merge code
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 08 Feb 2012 21:14:50 -0000

On Wed, Feb 8, 2012 at 6:49 AM, Bruce Evans <brde@optusnet.com.au> wrote:
> On Tue, 7 Feb 2012, Giovanni Trematerra wrote:
>
>
> It looks quite good now. =A0I see only a few minor style bugs:

Hi Bruce,
Thank you to have found time to review the patch again.

I updated the patch and I hope I fixed all the minor style bugs
that you pointed out.

here the new one:
http://www.trematerra.net/patches/pipefifo_merge.4.4.patch

>
> % diff --git a/sys/fs/fifofs/fifo_vnops.c b/sys/fs/fifofs/fifo_vnops.c
> % index 94a2713..0d35877 100644
> % --- a/sys/fs/fifofs/fifo_vnops.c
> % +++ b/sys/fs/fifofs/fifo_vnops.c
> % ...
> % +int
> % +fifo_iseof(struct file *fp)
> % ...
> % + =A0 =A0 KASSERT(fp->f_vnode !=3D NULL, ("fifo_iseof: no vnode info.")=
);
> % + =A0 =A0 KASSERT(fp->f_vnode->v_fifoinfo !=3D NULL, ("fifo_iseof: no
> fifoinfo."));
>
>
> I wonder if you can assert that some suitable lock is held here.
>
> % + =A0 =A0 fip =3D fp->f_vnode->v_fifoinfo;
> % + =A0 =A0 return (fp->f_seqcount =3D=3D fip->fi_wgen);
> % =A0}
> % ...

Now I changed fifo_open to remove global fifo_mtx mutex, using PIPE_MTX,
and assert to be held in fifo_iseof.
I stress tested the patch with WITNESS and INVARIANT on and got no a panic
and no LOR.

>
> % @@ -1284,6 +1348,17 @@ pipe_ioctl(fp, cmd, data, active_cred, td)
> % =A0 =A0 =A0 =A0 =A0 =A0 =A0 break;
> % % =A0 =A0 case FIONREAD:
> % +
> % + =A0 =A0 =A0 =A0 =A0 =A0 /*
> % + =A0 =A0 =A0 =A0 =A0 =A0 =A0* FIONREAD will return 0 for non-readable =
descriptors, and
> % + =A0 =A0 =A0 =A0 =A0 =A0 =A0* the results of FIONREAD on the read pipe=
 for readable
> % + =A0 =A0 =A0 =A0 =A0 =A0 =A0* descriptors.
> % + =A0 =A0 =A0 =A0 =A0 =A0 =A0*/
> % + =A0 =A0 =A0 =A0 =A0 =A0 if (!(fp->f_flag & FREAD)) {
> % + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 *(int *)data =3D 0;
> % + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 PIPE_UNLOCK(mpipe);
> % + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 return (0);
> % + =A0 =A0 =A0 =A0 =A0 =A0 }
> % =A0 =A0 =A0 =A0 =A0 =A0 =A0 if (mpipe->pipe_state & PIPE_DIRECTW)
> % =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 *(int *)data =3D mpipe->pip=
e_map.cnt;
> % =A0 =A0 =A0 =A0 =A0 =A0 =A0 else

I completely removed that comment now. The code is clear enough.

>
> Urk, now I see many major style bugs:
>
> % @@ -1333,47 +1408,55 @@ pipe_poll(fp, events, active_cred, td)
> % =A0 =A0 =A0 int error;
> % =A0#endif
> % % - =A0 wpipe =3D rpipe->pipe_peer;
> % + =A0 =A0 wpipe =3D PIPE_PEER(rpipe);
> % =A0 =A0 =A0 PIPE_LOCK(rpipe);
> % =A0#ifdef MAC
> % =A0 =A0 =A0 error =3D mac_pipe_check_poll(active_cred, rpipe->pipe_pair=
);
> % =A0 =A0 =A0 if (error)
> % =A0 =A0 =A0 =A0 =A0 =A0 =A0 goto locked_error;
> % =A0#endif
> % - =A0 =A0 if (events & (POLLIN | POLLRDNORM))
> % - =A0 =A0 =A0 =A0 =A0 =A0 if ((rpipe->pipe_state & PIPE_DIRECTW) ||
> % - =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 (rpipe->pipe_buffer.cnt > 0))
> % - =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 revents |=3D events & (POLLIN=
 | POLLRDNORM);
> % -
> % - =A0 =A0 if (events & (POLLOUT | POLLWRNORM))
> % - =A0 =A0 =A0 =A0 =A0 =A0 if (wpipe->pipe_present !=3D PIPE_ACTIVE ||
> % - =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 (wpipe->pipe_state & PIPE_EOF) ||
> % - =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 (((wpipe->pipe_state & PIPE_DIRECTW) =
=3D=3D 0) &&
> % - =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0((wpipe->pipe_buffer.size - wpipe-=
>pipe_buffer.cnt) >=3D
> PIPE_BUF ||
> % - =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0wpipe->pipe_buffer.size =
=3D=3D 0)))
> % - =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 revents |=3D events & (POLLOU=
T | POLLWRNORM);
> % -
> % - =A0 =A0 if ((events & POLLINIGNEOF) =3D=3D 0) {
> % - =A0 =A0 =A0 =A0 =A0 =A0 if (rpipe->pipe_state & PIPE_EOF) {
> % - =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 revents |=3D (events & (POLLI=
N | POLLRDNORM));
> % - =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 if (wpipe->pipe_present !=3D =
PIPE_ACTIVE ||
> % - =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 (wpipe->pipe_state & =
PIPE_EOF))
> % - =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 revents |=3D =
POLLHUP;
> % + =A0 =A0 if (fp->f_flag & FREAD) {
> % + =A0 =A0 =A0 =A0 =A0 =A0 if (events & (POLLIN | POLLRDNORM))
> % + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 if ((rpipe->pipe_state & PIPE=
_DIRECTW) ||
> % + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 (rpipe->pipe_buffer.c=
nt > 0))
> % + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 revents |=3D =
events & (POLLIN | POLLRDNORM);
> % +
>
> This part of the patch is still unreadable since it indents everything to
> add FREAD and FWRITE checks. =A0I don't like those anyway (they should
> probably produce POLLERR POLLNVAL instead of ignoring the error -- see
> previous mail), but they are needed for now for compatibility in the
> FIFO case (hopefully they don't break compatibility for the plain pipe
> case).

I tried to make this part of the patch more readable, I hope I did.
Talking about returning POLLERR/POLLNVAL as I said previously,
I like to work on that and on making pipe/fifo implementation more POSIX
compliant just after the patch will be merge the tree.


> To avoid churning this code for them now and later when the
> behaviour is fixed, they should be added in a non-invasive way like the
> old fifo code did. =A0That involved using && instead of a nested if. =A0I=
n
> the old code, everything was under the combined && if. =A0Now there are
> 3 nested ifs. =A0This logic change seems to be non-null and give bugs (1)=
.
>
> % + =A0 =A0 =A0 =A0 =A0 =A0 if (rpipe->pipe_state & PIPE_NAMED)
> % + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 if (fifo_iseof(fp))
>
> Here the (unrelated) nested if is just a style bug. =A0It gives verbosene=
ss
> and extra indentation.
>
> % + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 events |=3D P=
OLLINIGNEOF;
> % +
>
> (1) In the old code, POLLINIGNEOF was only set here if we are polling for
> some input condition (and this is valid). =A0Now it is set if the file is
> just open for input. =A0Either way could be right, but this is a change. =
=A0If
> we are not polling for input, we don't care about it, but POSIX requires
> POLLHUP to be set in some cases, and this affects whether we return or
> block, and POLLINIGNEOF interacts with this subtly. =A0There may also be
> a problem not going through here to set POLLINIGNEOF in the !FREAD case.

Fixed. Thank you.

While I hope I didn't introduced new bugs, do you agree that at this point
the patch is in such a shape that makes it committable?

--
Gianni

From owner-freebsd-arch@FreeBSD.ORG  Wed Feb  8 21:51:24 2012
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 6D3F01065673
	for <freebsd-arch@freebsd.org>; Wed,  8 Feb 2012 21:51:24 +0000 (UTC)
	(envelope-from marius@alchemy.franken.de)
Received: from alchemy.franken.de (alchemy.franken.de [194.94.249.214])
	by mx1.freebsd.org (Postfix) with ESMTP id 06B1B8FC1E
	for <freebsd-arch@freebsd.org>; Wed,  8 Feb 2012 21:51:23 +0000 (UTC)
Received: from alchemy.franken.de (localhost [127.0.0.1])
	by alchemy.franken.de (8.14.4/8.14.4/ALCHEMY.FRANKEN.DE) with ESMTP id
	q18LpMQq028781; Wed, 8 Feb 2012 22:51:22 +0100 (CET)
	(envelope-from marius@alchemy.franken.de)
Received: (from marius@localhost)
	by alchemy.franken.de (8.14.4/8.14.4/Submit) id q18LpMrw028780;
	Wed, 8 Feb 2012 22:51:22 +0100 (CET) (envelope-from marius)
Date: Wed, 8 Feb 2012 22:51:22 +0100
From: Marius Strobl <marius@alchemy.franken.de>
To: David Chisnall <theraven@freebsd.org>
Message-ID: <20120208215122.GA28769@alchemy.franken.de>
References: <58175263-109E-4FF0-BB29-E0331C01DCD5@FreeBSD.org>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <58175263-109E-4FF0-BB29-E0331C01DCD5@FreeBSD.org>
User-Agent: Mutt/1.4.2.3i
Cc: freebsd-arch@freebsd.org
Subject: Re: NO_TLS flag for public headers
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 08 Feb 2012 21:51:24 -0000

On Wed, Feb 08, 2012 at 02:14:12PM +0000, David Chisnall wrote:
> Hi,
> 
> The xlocale changes in libc mean that a lot of things that were previously global state are now per-thread.  This includes some things that were used in ctype.h and are inlined.  We're now losing the benefit of inlining them because we now have to call into libc to get the per-thread values.  I'd like to use TLS for caching these things, but we currently have no equivalent of the NO_TLS flag for public headers.  I'd like to have a __NO_TLS that can be set in cdefs.h for architectures that don't have TLS support.
> 
> Any comments?  Objections?  Anyone want to propose a better place for it (or, indeed, add it, since I'm not entirely sure which architectures do lack TLS support...)
> 

See lib/libc/stdlib/malloc.c, arm and mips currently are the only
supported FreeBSD architectures that have no support for TLS.

Marius


From owner-freebsd-arch@FreeBSD.ORG  Fri Feb 10 10:57:28 2012
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 1552C1065670;
	Fri, 10 Feb 2012 10:57:28 +0000 (UTC) (envelope-from ed@hoeg.nl)
Received: from mx0.hoeg.nl (mx0.hoeg.nl [IPv6:2a01:4f8:101:5343::aa])
	by mx1.freebsd.org (Postfix) with ESMTP id A07C68FC1C;
	Fri, 10 Feb 2012 10:57:27 +0000 (UTC)
Received: by mx0.hoeg.nl (Postfix, from userid 1000)
	id D48202A28CCE; Fri, 10 Feb 2012 11:57:26 +0100 (CET)
Date: Fri, 10 Feb 2012 11:57:26 +0100
From: Ed Schouten <ed@80386.nl>
To: Marius Strobl <marius@alchemy.franken.de>
Message-ID: <20120210105726.GO1860@hoeg.nl>
References: <58175263-109E-4FF0-BB29-E0331C01DCD5@FreeBSD.org>
	<20120208215122.GA28769@alchemy.franken.de>
MIME-Version: 1.0
Content-Type: multipart/signed; micalg=pgp-sha1;
	protocol="application/pgp-signature"; boundary="TiGNc3ldf20CNpkJ"
Content-Disposition: inline
In-Reply-To: <20120208215122.GA28769@alchemy.franken.de>
User-Agent: Mutt/1.5.21 (2010-09-15)
Cc: David Chisnall <theraven@freebsd.org>, freebsd-arch@freebsd.org
Subject: Re: NO_TLS flag for public headers
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 10 Feb 2012 10:57:28 -0000


--TiGNc3ldf20CNpkJ
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

Hi Marius,

* Marius Strobl <marius@alchemy.franken.de>, 20120208 22:51:
> See lib/libc/stdlib/malloc.c, arm and mips currently are the only
> supported FreeBSD architectures that have no support for TLS.

Just out of curiosity, what is needed to make TLS work? Is it just the
lack of support by our toolchain or is there also stuff on our side that
needs to be done?

--=20
 Ed Schouten <ed@80386.nl>
 WWW: http://80386.nl/

--TiGNc3ldf20CNpkJ
Content-Type: application/pgp-signature

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (FreeBSD)

iQIcBAEBAgAGBQJPNPgWAAoJEG5e2P40kaK7r1cQAI4cak0Q33LdRJXiMzmgx9iC
7CYBf+MNqPcw7ZtrYcVO7xfISvBXWw10h2mZ4K8w8DEIarIM4bOOO9mpGtQ7PSg1
qCF9W46rgimNTeE2RpQT3tlCnCo8/KB31DWXrG8gnS5kGfdSv3MFDg+TqQLGxL5F
+ot0Dsz2E2iI1MObRn1VudaXYJZmFVOiattZko87yXUD3Go9PXmd9hxs6SHZ+YyR
OKqp+H1djas+OacKtR4ktz2OI8B5Tj684CnnUDsFY3aTopYKEndYTTFXIEZqs60W
iOIZiK8EsRxorpwszf5NvPPb9GQ2FFSh1ego+9nwQb4GiPPG4l5Mcy87aQx+azfd
CWK4PAnGROgTm6y5kdO+IweftGZPR0Oq+1TThOV15I8AtQLa8gruYFVEsoZilmSc
ZLrVVJXfDMRH1wJV/wft8NSio51Q/vr5TDCtoo+VfFIVjF73zTGtzVGjO0vFR6q8
MBWxgGpOVTv80jI5NWESnk9y4jRIm0DRSUoEOI+EJP+HqE/lsZGWnvY7xxgwyOUl
LfpSIuY/0jGEt9fiC2mVFcGQT081u9bnAw2Tp1Eue+E3w2r3exXeSB48iVb1fESY
O560D9Cy8BnEp9B+Tnx/MUh1LjsMHWDV2WSW6KEvEY0tVNR8LjE4sCz/uKkpK+Ln
x2q93Fq08BTfdeoF05bC
=qdkC
-----END PGP SIGNATURE-----

--TiGNc3ldf20CNpkJ--

From owner-freebsd-arch@FreeBSD.ORG  Fri Feb 10 11:42:55 2012
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 8B523106566C
	for <freebsd-arch@freebsd.org>; Fri, 10 Feb 2012 11:42:55 +0000 (UTC)
	(envelope-from brooks@lor.one-eyed-alien.net)
Received: from lor.one-eyed-alien.net (lor.one-eyed-alien.net [69.66.77.232])
	by mx1.freebsd.org (Postfix) with ESMTP id 4FDD18FC12
	for <freebsd-arch@freebsd.org>; Fri, 10 Feb 2012 11:42:54 +0000 (UTC)
Received: from lor.one-eyed-alien.net (localhost [127.0.0.1])
	by lor.one-eyed-alien.net (8.14.4/8.14.4) with ESMTP id q1AB2i6Q002411; 
	Fri, 10 Feb 2012 05:02:44 -0600 (CST)
	(envelope-from brooks@lor.one-eyed-alien.net)
Received: (from brooks@localhost)
	by lor.one-eyed-alien.net (8.14.4/8.14.4/Submit) id q1AB2h2u002410;
	Fri, 10 Feb 2012 05:02:43 -0600 (CST) (envelope-from brooks)
Date: Fri, 10 Feb 2012 05:02:43 -0600
From: Brooks Davis <brooks@freebsd.org>
To: Aleksandr Rybalko <ray@ddteam.net>
Message-ID: <20120210110243.GA2012@lor.one-eyed-alien.net>
References: <20120131213857.86c81626.ray@ddteam.net>
MIME-Version: 1.0
Content-Type: multipart/signed; micalg=pgp-sha1;
	protocol="application/pgp-signature"; boundary="xHFwDpU9dbj6ez1V"
Content-Disposition: inline
In-Reply-To: <20120131213857.86c81626.ray@ddteam.net>
User-Agent: Mutt/1.5.21 (2010-09-15)
Cc: freebsd-arch@freebsd.org
Subject: Re: dynamic attach of hinted devices
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 10 Feb 2012 11:42:55 -0000


--xHFwDpU9dbj6ez1V
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Tue, Jan 31, 2012 at 09:38:57PM +0200, Aleksandr Rybalko wrote:
> Hi FreeBSD hackers,
>=20
> at first I want to say this: :)
> WARNING: FOLLOWING DEVCTL PATCH MAY EASILY PANIC YOUR SYSTEM, PLEASE
> DO NOT TRY IT ON PRODUCTION SERVERS AND TRY IT WITH FILESYSTEMS MOUNTED
> AS READONLY :)))))
>=20
> So I introduce two patches first one [1] used to migrate from
> static_hints or hints in the static_kenv to dynamic hints.
>=20
> sysctl kern.hintmode=3D2 will copy hints from static hints or from static
> kenv and put it into dynamic kenv. Those will allow to manipulate hints
> values and attach hinted devices with devctl tool.
>=20
> Second [2] allow attach/detach devices with userland tool devctl.
>=20
> devctl tool allow add and initialize new devices which is not able to
> be autoenumerating, such a hinted devices.
>=20
> Both designed to have ability update EEPROM items in runtime, since
> some device can't work in mode when it accessible like a EEPROM chip.
>=20
> Example:
> # sysctl kern.hintmode=3D2
> # kenv hint.mx25l.0.at=3D"spibus0"
> # kenv hint.mx25l.0.cs=3D0
> # kenv hint.mx25l.0.chipname=3D"at25128"
> # devctl hinted spibus 0 mx25l 0
> mx25l0: <M25Pxx Flash Family> at cs 0 mode 0 on spibus0
> mx25l0: at25128, sector 64 bytes, 256 sectors
> GEOM: new disk flash/spi0
>=20
> Someone may found it also useful for testing device attach/detach code
> (memory leaks, resource allocation, etc).
>=20
> So, say me please your opinion.

I skimmed over the patch and the concept looks good to me.  I'm probably
not the right person to review this proposal in detail, but perhaps John
Baldwin (cc'd) could do it or suggest someone.

-- Brooks

--xHFwDpU9dbj6ez1V
Content-Type: application/pgp-signature

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (FreeBSD)

iD8DBQFPNPlSXY6L6fI4GtQRAvOlAJ9RPlUxogB3GHyiQaOSfxocY8flBQCglN3k
e7C3GpmwOACndMYqUW5O8Cg=
=IsAC
-----END PGP SIGNATURE-----

--xHFwDpU9dbj6ez1V--

From owner-freebsd-arch@FreeBSD.ORG  Fri Feb 10 16:52:44 2012
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 2E88A1065675;
	Fri, 10 Feb 2012 16:52:44 +0000 (UTC) (envelope-from imp@bsdimp.com)
Received: from harmony.bsdimp.com (bsdimp.com [199.45.160.85])
	by mx1.freebsd.org (Postfix) with ESMTP id BB91B8FC12;
	Fri, 10 Feb 2012 16:52:43 +0000 (UTC)
Received: from [10.30.101.53] ([209.117.142.2]) (authenticated bits=0)
	by harmony.bsdimp.com (8.14.4/8.14.3) with ESMTP id q1AGjrAQ045973
	(version=TLSv1/SSLv3 cipher=DHE-DSS-AES128-SHA bits=128 verify=NO);
	Fri, 10 Feb 2012 09:45:56 -0700 (MST) (envelope-from imp@bsdimp.com)
Mime-Version: 1.0 (Apple Message framework v1084)
Content-Type: text/plain; charset=us-ascii
From: Warner Losh <imp@bsdimp.com>
In-Reply-To: <20120210110243.GA2012@lor.one-eyed-alien.net>
Date: Fri, 10 Feb 2012 09:45:48 -0700
Content-Transfer-Encoding: quoted-printable
Message-Id: <4996D1CF-25E5-4422-996C-DF23B5C10732@bsdimp.com>
References: <20120131213857.86c81626.ray@ddteam.net>
	<20120210110243.GA2012@lor.one-eyed-alien.net>
To: Brooks Davis <brooks@FreeBSD.org>
X-Mailer: Apple Mail (2.1084)
X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.0.1
	(harmony.bsdimp.com [10.0.0.6]);
	Fri, 10 Feb 2012 09:45:57 -0700 (MST)
Cc: Aleksandr Rybalko <ray@ddteam.net>, freebsd-arch@FreeBSD.org
Subject: Re: dynamic attach of hinted devices
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 10 Feb 2012 16:52:44 -0000


On Feb 10, 2012, at 4:02 AM, Brooks Davis wrote:

> On Tue, Jan 31, 2012 at 09:38:57PM +0200, Aleksandr Rybalko wrote:
>> Hi FreeBSD hackers,
>>=20
>> at first I want to say this: :)
>> WARNING: FOLLOWING DEVCTL PATCH MAY EASILY PANIC YOUR SYSTEM, PLEASE
>> DO NOT TRY IT ON PRODUCTION SERVERS AND TRY IT WITH FILESYSTEMS =
MOUNTED
>> AS READONLY :)))))
>>=20
>> So I introduce two patches first one [1] used to migrate from
>> static_hints or hints in the static_kenv to dynamic hints.
>>=20
>> sysctl kern.hintmode=3D2 will copy hints from static hints or from =
static
>> kenv and put it into dynamic kenv. Those will allow to manipulate =
hints
>> values and attach hinted devices with devctl tool.
>>=20
>> Second [2] allow attach/detach devices with userland tool devctl.
>>=20
>> devctl tool allow add and initialize new devices which is not able to
>> be autoenumerating, such a hinted devices.
>>=20
>> Both designed to have ability update EEPROM items in runtime, since
>> some device can't work in mode when it accessible like a EEPROM chip.
>>=20
>> Example:
>> # sysctl kern.hintmode=3D2
>> # kenv hint.mx25l.0.at=3D"spibus0"
>> # kenv hint.mx25l.0.cs=3D0
>> # kenv hint.mx25l.0.chipname=3D"at25128"
>> # devctl hinted spibus 0 mx25l 0
>> mx25l0: <M25Pxx Flash Family> at cs 0 mode 0 on spibus0
>> mx25l0: at25128, sector 64 bytes, 256 sectors
>> GEOM: new disk flash/spi0
>>=20
>> Someone may found it also useful for testing device attach/detach =
code
>> (memory leaks, resource allocation, etc).
>>=20
>> So, say me please your opinion.
>=20
> I skimmed over the patch and the concept looks good to me.  I'm =
probably
> not the right person to review this proposal in detail, but perhaps =
John
> Baldwin (cc'd) could do it or suggest someone.

I've looked at it and it looks OK, but I had lots of feedback I didn't =
have time to write up.  I like the concept, but have lots of suggestions =
for improvement since I'd like to see this more generically done.

Warner=

From owner-freebsd-arch@FreeBSD.ORG  Fri Feb 10 16:52:54 2012
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 6B9DD106566C;
	Fri, 10 Feb 2012 16:52:54 +0000 (UTC) (envelope-from imp@bsdimp.com)
Received: from harmony.bsdimp.com (bsdimp.com [199.45.160.85])
	by mx1.freebsd.org (Postfix) with ESMTP id 23F2E8FC19;
	Fri, 10 Feb 2012 16:52:54 +0000 (UTC)
Received: from [10.30.101.53] ([209.117.142.2]) (authenticated bits=0)
	by harmony.bsdimp.com (8.14.4/8.14.3) with ESMTP id q1AGiOm9045940
	(version=TLSv1/SSLv3 cipher=DHE-DSS-AES128-SHA bits=128 verify=NO);
	Fri, 10 Feb 2012 09:44:26 -0700 (MST) (envelope-from imp@bsdimp.com)
Mime-Version: 1.0 (Apple Message framework v1084)
Content-Type: text/plain; charset=us-ascii
From: Warner Losh <imp@bsdimp.com>
In-Reply-To: <20120210105726.GO1860@hoeg.nl>
Date: Fri, 10 Feb 2012 09:44:16 -0700
Content-Transfer-Encoding: quoted-printable
Message-Id: <D5C04950-6F94-49F6-8613-A8B9684A63A3@bsdimp.com>
References: <58175263-109E-4FF0-BB29-E0331C01DCD5@FreeBSD.org>
	<20120208215122.GA28769@alchemy.franken.de>
	<20120210105726.GO1860@hoeg.nl>
To: Ed Schouten <ed@80386.nl>
X-Mailer: Apple Mail (2.1084)
X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.0.1
	(harmony.bsdimp.com [10.0.0.6]);
	Fri, 10 Feb 2012 09:44:26 -0700 (MST)
Cc: freebsd-arch@FreeBSD.org, David Chisnall <theraven@FreeBSD.org>,
	Marius Strobl <marius@alchemy.franken.de>
Subject: Re: NO_TLS flag for public headers
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 10 Feb 2012 16:52:54 -0000


On Feb 10, 2012, at 3:57 AM, Ed Schouten wrote:

> Hi Marius,
>=20
> * Marius Strobl <marius@alchemy.franken.de>, 20120208 22:51:
>> See lib/libc/stdlib/malloc.c, arm and mips currently are the only
>> supported FreeBSD architectures that have no support for TLS.
>=20
> Just out of curiosity, what is needed to make TLS work? Is it just the
> lack of support by our toolchain or is there also stuff on our side =
that
> needs to be done?

On MIPS it is partially a toolchain issue and partially a kernel issue.  =
I believe the kernel issue has been fixed recently...  I'm not sure =
what's left on the toolchain side.

Warner


From owner-freebsd-arch@FreeBSD.ORG  Fri Feb 10 17:36:48 2012
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id F15C21065670
	for <freebsd-arch@freebsd.org>; Fri, 10 Feb 2012 17:36:48 +0000 (UTC)
	(envelope-from gonzo@hq.bluezbox.com)
Received: from hq.bluezbox.com (hq.bluezbox.com [70.38.37.145])
	by mx1.freebsd.org (Postfix) with ESMTP id ACE168FC0A
	for <freebsd-arch@freebsd.org>; Fri, 10 Feb 2012 17:36:48 +0000 (UTC)
Received: from [24.87.53.93] (helo=[192.168.1.116])
	by hq.bluezbox.com with esmtpsa (TLSv1:AES128-SHA:128)
	(Exim 4.73 (FreeBSD)) (envelope-from <gonzo@hq.bluezbox.com>)
	id 1RvuOj-000MDI-2H; Fri, 10 Feb 2012 09:36:40 -0800
Mime-Version: 1.0 (Apple Message framework v1084)
Content-Type: text/plain; charset=us-ascii
From: Oleksandr Tymoshenko <gonzo@bluezbox.com>
In-Reply-To: <20120210105726.GO1860@hoeg.nl>
Date: Fri, 10 Feb 2012 09:36:40 -0800
Content-Transfer-Encoding: quoted-printable
Message-Id: <94EAEADD-440A-4148-B9CC-B7BED1BCF206@bluezbox.com>
References: <58175263-109E-4FF0-BB29-E0331C01DCD5@FreeBSD.org>
	<20120208215122.GA28769@alchemy.franken.de>
	<20120210105726.GO1860@hoeg.nl>
To: Ed Schouten <ed@80386.nl>
X-Mailer: Apple Mail (2.1084)
Sender: gonzo@hq.bluezbox.com
X-Spam-Level: ---
X-Spam-Report: Spam detection software, running on the system "hq.bluezbox.com",
	has
	identified this incoming email as possible spam. The original message
	has been attached to this so you can view it (if it isn't spam) or
	label similar future email.  If you have any questions, see
	The administrator of that system for details.
	Content preview:  On 2012-02-10, at 2:57 AM,
	Ed Schouten wrote: > Hi Marius, 
	> > * Marius Strobl <marius@alchemy.franken.de>,
	20120208 22:51: >> See lib/libc/stdlib/malloc.c, 
	arm and mips currently are the only >> supported FreeBSD architectures
	that have no support for TLS. > > Just out of curiosity,
	what is needed to make
	TLS work? Is it just the > lack of support by our toolchain or is there
	also stuff on our side that > needs to be done? [...] 
	Content analysis details:   (-3.8 points, 5.0 required)
	pts rule name              description
	---- ----------------------
	--------------------------------------------------
	-1.8 ALL_TRUSTED Passed through trusted hosts only via SMTP
	-2.6 BAYES_00 BODY: Bayesian spam probability is 0 to 1%
	[score: 0.0000]
	0.6 AWL AWL: From: address is in the auto white-list
Cc: freebsd-arch@freebsd.org, David Chisnall <theraven@freebsd.org>,
	Marius Strobl <marius@alchemy.franken.de>
Subject: Re: NO_TLS flag for public headers
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 10 Feb 2012 17:36:49 -0000


On 2012-02-10, at 2:57 AM, Ed Schouten wrote:

> Hi Marius,
>=20
> * Marius Strobl <marius@alchemy.franken.de>, 20120208 22:51:
>> See lib/libc/stdlib/malloc.c, arm and mips currently are the only
>> supported FreeBSD architectures that have no support for TLS.
>=20
> Just out of curiosity, what is needed to make TLS work? Is it just the
> lack of support by our toolchain or is there also stuff on our side =
that
> needs to be done?

There are several places that should be fixed and in sync in order to =
get TLS=20
working for architecture: rtld-elf, libthr, cpu_set_user_tls in kernel =
and
thread pointer handling in toolchain. Look at my commits as of Feb 9 - =
they cover
all of these but toolchain part.

TLS on MIPS is fixed for the simplest test case. TLS data in dynamic =
library=20
might need some more work for rtld, but not much. I'll take a look at =
ARM when I have=20
working root over NFS on my ARM board. That is if nobody step up and fix =
it soon :)=20=

From owner-freebsd-arch@FreeBSD.ORG  Sat Feb 11 05:23:23 2012
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 085AF1065672;
	Sat, 11 Feb 2012 05:23:23 +0000 (UTC)
	(envelope-from adrian.chadd@gmail.com)
Received: from mail-wi0-f182.google.com (mail-wi0-f182.google.com
	[209.85.212.182])
	by mx1.freebsd.org (Postfix) with ESMTP id E7D938FC08;
	Sat, 11 Feb 2012 05:23:21 +0000 (UTC)
Received: by wibhn14 with SMTP id hn14so3674870wib.13
	for <multiple recipients>; Fri, 10 Feb 2012 21:23:21 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma;
	h=mime-version:sender:in-reply-to:references:date
	:x-google-sender-auth:message-id:subject:from:to:cc:content-type;
	bh=fcSuGSJmK4p7jpl/F6OtDjJAHb04Ra3FiSf5KkS9jQY=;
	b=OLFsO0agE45F2uNtjeJHStn0oBtR6C9trDV4MYgV2rcS9hXwQQcqMwvxHN1XZnH1Qj
	t8bf78x+qQSa6wmXWwKX3X+9zSmtK2uzeiKfcfFGOH/6ooOG1l/vWZarQqvuXxM1f3N8
	NjLGxvodNkhDtkUr2A2Qf+TFy5Hv+sKU8Unys=
MIME-Version: 1.0
Received: by 10.180.78.6 with SMTP id x6mr7056903wiw.18.1328937801101; Fri, 10
	Feb 2012 21:23:21 -0800 (PST)
Sender: adrian.chadd@gmail.com
Received: by 10.216.175.136 with HTTP; Fri, 10 Feb 2012 21:23:21 -0800 (PST)
In-Reply-To: <AF2CF7A4-27B8-4181-96F5-7998B126CD1C@lassitu.de>
References: <47ABA638-7E08-4350-A03C-3D4A23BF2D7E@lassitu.de>
	<1763C3FF-1EA0-4DC0-891D-63816EBF4A04@lassitu.de>
	<20120106182756.GA88161@alchemy.franken.de>
	<95372FB3-406F-46C2-8684-4FDB672D9FCF@lassitu.de>
	<20120106214741.GB88161@alchemy.franken.de>
	<F60B2B70-049F-4497-BBA8-3C421088C1EA@lassitu.de>
	<20120108130039.GG88161@alchemy.franken.de>
	<23477898-8D85-498C-8E30-192810BD68A8@lassitu.de>
	<20120111193738.GB44286@alchemy.franken.de>
	<66DDA0A2-F878-43FF-8824-54868F493B18@lassitu.de>
	<20120125221753.GA17821@alchemy.franken.de>
	<AF2CF7A4-27B8-4181-96F5-7998B126CD1C@lassitu.de>
Date: Fri, 10 Feb 2012 21:23:21 -0800
X-Google-Sender-Auth: -7V-EWjq2a_5Qz89jtAqighUzrA
Message-ID: <CAJ-VmomcgC6V-sY7jN+h6T7uPfVesPBV+KPu2TVD4YDKrdk4LQ@mail.gmail.com>
From: Adrian Chadd <adrian@freebsd.org>
To: Stefan Bethke <stb@lassitu.de>, Aleksandr Rybalko <ray@freebsd.org>,
	"M. Warner Losh" <imp@bsdimp.com>, Juli Mallett <jmallett@freebsd.org>
Content-Type: text/plain; charset=ISO-8859-1
Cc: FreeBSD-arch <freebsd-arch@freebsd.org>,
	Marius Strobl <marius@alchemy.franken.de>
Subject: Re: Extending sys/dev/mii
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 11 Feb 2012 05:23:23 -0000

Hi all,

So where'd we get to with this?

I'd like to finalise a unified proposal for this.

I still like the idea of tidying up the miibus/mdiobus stuff (ie,
miibus really is an mdiobus for speaking to things, along with some
methods to do MII stuff like media change, media set, MAC PLL/type
set, etc) but I agree with ray@ that things begin to look a _lot_ more
complicated when we start trying to handle alternate methods of how
switches are connected.

So I'd like to not lose interest on this.

If we can't come to some kind of consensus, I'll just commit ray@'s
work (and cop the flak) then work with stb@ to tidy up the newbus bits
to hopefully be better for the long term.

In summary - I'm fed up that we're this close to having _something_
that looks like a workable switch API and working code but it's not in
the tree. So I'm happy stirring up trouble and copping the flak from
things if it means it Just Gets Done.


Adrian

From owner-freebsd-arch@FreeBSD.ORG  Sat Feb 11 11:17:37 2012
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id ED96D1065672;
	Sat, 11 Feb 2012 11:17:37 +0000 (UTC)
	(envelope-from marius@alchemy.franken.de)
Received: from alchemy.franken.de (alchemy.franken.de [194.94.249.214])
	by mx1.freebsd.org (Postfix) with ESMTP id 64E348FC15;
	Sat, 11 Feb 2012 11:17:37 +0000 (UTC)
Received: from alchemy.franken.de (localhost [127.0.0.1])
	by alchemy.franken.de (8.14.4/8.14.4/ALCHEMY.FRANKEN.DE) with ESMTP id
	q1BBHV6T050544; Sat, 11 Feb 2012 12:17:31 +0100 (CET)
	(envelope-from marius@alchemy.franken.de)
Received: (from marius@localhost)
	by alchemy.franken.de (8.14.4/8.14.4/Submit) id q1BBHVmK050543;
	Sat, 11 Feb 2012 12:17:31 +0100 (CET) (envelope-from marius)
Date: Sat, 11 Feb 2012 12:17:31 +0100
From: Marius Strobl <marius@alchemy.franken.de>
To: Adrian Chadd <adrian@freebsd.org>
Message-ID: <20120211111731.GE39861@alchemy.franken.de>
References: <95372FB3-406F-46C2-8684-4FDB672D9FCF@lassitu.de>
	<20120106214741.GB88161@alchemy.franken.de>
	<F60B2B70-049F-4497-BBA8-3C421088C1EA@lassitu.de>
	<20120108130039.GG88161@alchemy.franken.de>
	<23477898-8D85-498C-8E30-192810BD68A8@lassitu.de>
	<20120111193738.GB44286@alchemy.franken.de>
	<66DDA0A2-F878-43FF-8824-54868F493B18@lassitu.de>
	<20120125221753.GA17821@alchemy.franken.de>
	<AF2CF7A4-27B8-4181-96F5-7998B126CD1C@lassitu.de>
	<CAJ-VmomcgC6V-sY7jN+h6T7uPfVesPBV+KPu2TVD4YDKrdk4LQ@mail.gmail.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <CAJ-VmomcgC6V-sY7jN+h6T7uPfVesPBV+KPu2TVD4YDKrdk4LQ@mail.gmail.com>
User-Agent: Mutt/1.4.2.3i
Cc: Juli Mallett <jmallett@freebsd.org>, Aleksandr Rybalko <ray@freebsd.org>,
	Stefan Bethke <stb@lassitu.de>, FreeBSD-arch <freebsd-arch@freebsd.org>
Subject: Re: Extending sys/dev/mii
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 11 Feb 2012 11:17:38 -0000

On Fri, Feb 10, 2012 at 09:23:21PM -0800, Adrian Chadd wrote:
> Hi all,
> 
> So where'd we get to with this?
> 
> I'd like to finalise a unified proposal for this.
> 
> I still like the idea of tidying up the miibus/mdiobus stuff (ie,
> miibus really is an mdiobus for speaking to things, along with some
> methods to do MII stuff like media change, media set, MAC PLL/type
> set, etc) but I agree with ray@ that things begin to look a _lot_ more
> complicated when we start trying to handle alternate methods of how
> switches are connected.
> 
> So I'd like to not lose interest on this.
> 
> If we can't come to some kind of consensus, I'll just commit ray@'s
> work (and cop the flak) then work with stb@ to tidy up the newbus bits
> to hopefully be better for the long term.
> 
> In summary - I'm fed up that we're this close to having _something_
> that looks like a workable switch API and working code but it's not in
> the tree. So I'm happy stirring up trouble and copping the flak from
> things if it means it Just Gets Done.
> 

I haven't seen ray@'s work (where is it?) but the general approach
sounds backwards to me. As you say the whole picture of how switches
are connected in reality is complicated. Therefore I'd like to see a
proposal for a framework first that can handle the various ways of
taking to a switch (GPIO, I2C, MDIO etc or combinations thereof)
separately from a MAC (as there may be no MAC associated with the
switch to begin with). The stuff proposed so far (again, I haven't
looked at ray@'s current work) only dealt with the more or less
low hanging fruit in that picture with discussions how we may hack
some more scenarios into working. Getting _something_ in at this
stage just for the sake that there's something in the tree really
asks for one of two typical things happening:
o it sticks as-is forever as nobody really wants to do the work
  to get it right and in the end the supposedly generic framework
  is ignored and people implement their own local stuff to get what
  they need, or
o there actually are some brave souls working on getting it right
  over time but requiring API and ABI breakages over and over again
  to get there.

Marius


From owner-freebsd-arch@FreeBSD.ORG  Sat Feb 11 12:46:25 2012
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 0BABE106566C;
	Sat, 11 Feb 2012 12:46:25 +0000 (UTC) (envelope-from ray@ddteam.net)
Received: from mail-ee0-f54.google.com (mail-ee0-f54.google.com [74.125.83.54])
	by mx1.freebsd.org (Postfix) with ESMTP id 1276F8FC0A;
	Sat, 11 Feb 2012 12:46:23 +0000 (UTC)
Received: by eekb47 with SMTP id b47so1427427eek.13
	for <multiple recipients>; Sat, 11 Feb 2012 04:46:23 -0800 (PST)
Received: by 10.213.31.208 with SMTP id z16mr945102ebc.47.1328964382860;
	Sat, 11 Feb 2012 04:46:22 -0800 (PST)
Received: from rnote.ddteam.net (85-184-133-95.pool.ukrtel.net.
	[95.133.184.85])
	by mx.google.com with ESMTPS id v51sm34903556eef.2.2012.02.11.04.46.18
	(version=SSLv3 cipher=OTHER); Sat, 11 Feb 2012 04:46:21 -0800 (PST)
Date: Sat, 11 Feb 2012 14:45:44 +0200
From: Aleksandr Rybalko <ray@ddteam.net>
To: Marius Strobl <marius@alchemy.franken.de>
Message-Id: <20120211144544.c91701d9.ray@ddteam.net>
In-Reply-To: <20120211111731.GE39861@alchemy.franken.de>
References: <95372FB3-406F-46C2-8684-4FDB672D9FCF@lassitu.de>
	<20120106214741.GB88161@alchemy.franken.de>
	<F60B2B70-049F-4497-BBA8-3C421088C1EA@lassitu.de>
	<20120108130039.GG88161@alchemy.franken.de>
	<23477898-8D85-498C-8E30-192810BD68A8@lassitu.de>
	<20120111193738.GB44286@alchemy.franken.de>
	<66DDA0A2-F878-43FF-8824-54868F493B18@lassitu.de>
	<20120125221753.GA17821@alchemy.franken.de>
	<AF2CF7A4-27B8-4181-96F5-7998B126CD1C@lassitu.de>
	<CAJ-VmomcgC6V-sY7jN+h6T7uPfVesPBV+KPu2TVD4YDKrdk4LQ@mail.gmail.com>
	<20120211111731.GE39861@alchemy.franken.de>
X-Mailer: Sylpheed 3.1.2 (GTK+ 2.24.5; amd64-portbld-freebsd9.0)
Mime-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
X-Gm-Message-State: ALoCoQlOyQVFBf+IcEOh8w4ZoETADHmn4Rda+ARIIVlwQyVt4LzmQFtQRcniQMVJbe8XmdJjtYiS
Cc: Juli Mallett <jmallett@freebsd.org>, Adrian Chadd <adrian@freebsd.org>,
	FreeBSD-arch <freebsd-arch@freebsd.org>,
	Aleksandr Rybalko <ray@freebsd.org>, Stefan Bethke <stb@lassitu.de>
Subject: Re: Extending sys/dev/mii
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 11 Feb 2012 12:46:25 -0000

Good day hackers!

On Sat, 11 Feb 2012 12:17:31 +0100
Marius Strobl <marius@alchemy.franken.de> wrote:

> On Fri, Feb 10, 2012 at 09:23:21PM -0800, Adrian Chadd wrote:
> > Hi all,
> > 
> > So where'd we get to with this?
> > 
> > I'd like to finalise a unified proposal for this.
> > 
> > I still like the idea of tidying up the miibus/mdiobus stuff (ie,
> > miibus really is an mdiobus for speaking to things, along with some
> > methods to do MII stuff like media change, media set, MAC PLL/type
> > set, etc) but I agree with ray@ that things begin to look a _lot_
> > more complicated when we start trying to handle alternate methods
> > of how switches are connected.
> > 
> > So I'd like to not lose interest on this.
> > 
> > If we can't come to some kind of consensus, I'll just commit ray@'s
> > work (and cop the flak) then work with stb@ to tidy up the newbus
> > bits to hopefully be better for the long term.
> > 
> > In summary - I'm fed up that we're this close to having _something_
> > that looks like a workable switch API and working code but it's not
> > in the tree. So I'm happy stirring up trouble and copping the flak
> > from things if it means it Just Gets Done.
> > 
> 
> I haven't seen ray@'s work (where is it?)
There is thread about it:
http://lists.freebsd.org/pipermail/freebsd-net/2012-January/031132.html

> but the general approach
> sounds backwards to me. As you say the whole picture of how switches
> are connected in reality is complicated. Therefore I'd like to see a
> proposal for a framework first that can handle the various ways of
> taking to a switch (GPIO, I2C, MDIO etc or combinations thereof)

Currently framework handle OBIO(memory attached) and MDIO, but designed
to handle various bus attachment. And as i see GPIO variant must use
some bus attached to GPIO (f.e. gpiospi, gpioiic), the switch framework
attach to it. When i design framework core, i was keep in mind that:
1. some switches can be controlled with many ways (BCM53115 can be
controlled by MDIO and SPI).
2. MAC driver must not be modified, as long as possible. if_arge was
modified only to clear special bits like phymask from it.

> separately from a MAC (as there may be no MAC associated with the
> switch to begin with). The stuff proposed so far (again, I haven't
> looked at ray@'s current work) only dealt with the more or less
> low hanging fruit in that picture with discussions how we may hack
> some more scenarios into working. Getting _something_ in at this
> stage just for the sake that there's something in the tree really
> asks for one of two typical things happening:
> o it sticks as-is forever as nobody really wants to do the work
>   to get it right and in the end the supposedly generic framework
>   is ignored and people implement their own local stuff to get what
>   they need, or
> o there actually are some brave souls working on getting it right
>   over time but requiring API and ABI breakages over and over again
>   to get there.

I think second is near to be true :), but last time i add methods to
manipulate Port based VLANs on BCM53115, and 1) i'm not add anything to
ioctl definitions (switch_ioctl.h), 2) most of newbus methods have
defaults, so adding new feature to some driver, will not break other
drivers.

Updated work accessible here:
http://zrouter.org/hg/FreeBSD/head/file/default/head/sys/dev/switch/

> 
> Marius
> 
> _______________________________________________
> freebsd-arch@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-arch
> To unsubscribe, send any mail to
> "freebsd-arch-unsubscribe@freebsd.org"


-- 
Aleksandr Rybalko <ray@ddteam.net>