From owner-freebsd-arch@FreeBSD.ORG Sun Feb 5 12:02:35 2012 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id D9E98106566B for ; Sun, 5 Feb 2012 12:02:35 +0000 (UTC) (envelope-from pawel@dawidek.net) Received: from mail.dawidek.net (60.wheelsystems.com [83.12.187.60]) by mx1.freebsd.org (Postfix) with ESMTP id 1CC9D8FC1B for ; Sun, 5 Feb 2012 12:02:34 +0000 (UTC) Received: from localhost (89-73-195-149.dynamic.chello.pl [89.73.195.149]) by mail.dawidek.net (Postfix) with ESMTPSA id 9D12D676; Sun, 5 Feb 2012 12:45:00 +0100 (CET) Date: Sun, 5 Feb 2012 12:43:46 +0100 From: Pawel Jakub Dawidek To: Konstantin Belousov Message-ID: <20120205114345.GE30033@garage.freebsd.pl> References: <20120203193719.GB3283@deviant.kiev.zoral.com.ua> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="idY8LE8SD6/8DnRI" Content-Disposition: inline In-Reply-To: <20120203193719.GB3283@deviant.kiev.zoral.com.ua> X-OS: FreeBSD 10.0-CURRENT amd64 User-Agent: Mutt/1.5.21 (2010-09-15) Cc: arch@freebsd.org Subject: Re: Prefaulting for i/o buffers X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 05 Feb 2012 12:02:36 -0000 --idY8LE8SD6/8DnRI Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Fri, Feb 03, 2012 at 09:37:19PM +0200, Konstantin Belousov wrote: > FreeBSD I/O infrastructure has well known issue with deadlock caused > by vnode lock order reversal when buffers supplied to read(2) or > write(2) syscalls are backed by mmaped file. >=20 > I previously published the patches to convert i/o path to use VMIO, > based on the Jeff Roberson proposal, see > http://wiki.freebsd.org/VM6. As a side effect, the VM6 fixed the > deadlock. Since that work is very intrusive and did not got any > follow-up, it get stalled. >=20 > Below is very lightweight patch which only goal is to fix deadlock in > the least intrusive way. This is possible after FreeBSD got the > vm_fault_quick_hold_pages(9) and vm_fault_disable_pagefaults(9) KPIs. > http://people.freebsd.org/~kib/misc/vm1.3.patch +struct rl_q_entry * +rlqentry_alloc() Missing 'void'. +static int +rangelock_incompatible(const struct rl_q_entry *e1, + const struct rl_q_entry *e2) +{ + + if ((e1->rl_q_flags & RL_LOCK_TYPE_MASK) =3D=3D RL_LOCK_READ && + (e2->rl_q_flags & RL_LOCK_TYPE_MASK) =3D=3D RL_LOCK_READ) + return (0); +#define IN_RANGE(a, e) (a >=3D e->rl_q_start && a < e->rl_q_end) + if (IN_RANGE(e1->rl_q_start, e2) || IN_RANGE(e2->rl_q_start, e1) || + IN_RANGE(e1->rl_q_end, e2) || IN_RANGE(e2->rl_q_end, e1)) + return (1); +#undef IN_RANGE + return (0); +} The checks here are a bit wrong and imcomplete. This is correct: + if (IN_RANGE(e1->rl_q_start, e2) || IN_RANGE(e2->rl_q_start, e1) || This is not: + IN_RANGE(e1->rl_q_end, e2) || IN_RANGE(e2->rl_q_end, e1)) After simplifying one of those checks we get: if (end1 >=3D start2 && end1 < end2) This will be true if end1 =3D=3D start2, but in this case it should be false. There are also some checks missing. If the first range covers entire second range or vice-versa you will return that the locks are compatible, eg. start1-----start2-----end2-----end1 The following check should cover all the cases: if (e1->rl_q_start < e2->rl_q_end && e1->rl_q_end > e2->rl_q_start) { return (1); } +static void +rangelock_calc_block(struct rangelock *lock) +{ + struct rl_q_entry *entry, *entry1, *whead; + + if (lock->rl_currdep =3D=3D TAILQ_FIRST(&lock->rl_waiters) && + lock->rl_currdep !=3D NULL) + lock->rl_currdep =3D TAILQ_NEXT(lock->rl_currdep, rl_q_link); + for (entry =3D lock->rl_currdep; entry; + entry =3D TAILQ_NEXT(entry, rl_q_link)) { + TAILQ_FOREACH(entry1, &lock->rl_waiters, rl_q_link) { + if (rangelock_incompatible(entry, entry1)) + goto out; + if (entry1 =3D=3D entry) + break; + } + } +out: + lock->rl_currdep =3D entry; + TAILQ_FOREACH(whead, &lock->rl_waiters, rl_q_link) { + if (whead =3D=3D lock->rl_currdep) + break; + if (!(whead->rl_q_flags & RL_LOCK_GRANTED)) { + whead->rl_q_flags |=3D RL_LOCK_GRANTED; + wakeup(whead); + } + } +} This function doesn't look at it is going to scale. Searching for incompatible locks looks very expensive. This function seems to be responsible for waking up waiters. In case of range locking this might be tricky, as unlocking read lock on the given range doesn't mean the range can be exclusively locked by a waiter, as part of this range might be read-locked multiple times or different range might be read locked, but which is also covered by the waiting writer. Could you elaborate a bit on how this all works together? Does the function look through all the waiters and for each waiter goes though all the active locks? Do you protect somehow against starving writers? + KASSERT(entry->rl_q_flags & RL_LOCK_GRANTED, ("XXX")); + KASSERT(entry->rl_q_start =3D=3D base, ("XXX")); + KASSERT(entry->rl_q_end >=3D base + len, ("XXX")); I expect those XXX will be replaced with real messages in the final version? We could really use simple ASSERT() without obligatory message in the kernel... +void * +rangelock_rlock(struct rangelock *lock, off_t base, size_t len, struct mtx= *ilk) +{ + struct rl_q_entry *entry; + struct thread *td; + + td =3D curthread; + if (td->td_rlqe !=3D NULL) { + entry =3D td->td_rlqe; + td->td_rlqe =3D NULL; + } else + entry =3D rlqentry_alloc(); + entry->rl_q_flags =3D RL_LOCK_READ; + entry->rl_q_start =3D base; + entry->rl_q_end =3D base + len; + return (rangelock_enqueue(lock, entry, ilk)); +} + +void * +rangelock_wlock(struct rangelock *lock, off_t base, size_t len, struct mtx= *ilk) +{ + struct rl_q_entry *entry; + struct thread *td; + + td =3D curthread; + if (td->td_rlqe !=3D NULL) { + entry =3D td->td_rlqe; + td->td_rlqe =3D NULL; + } else + entry =3D rlqentry_alloc(); + entry->rl_q_flags =3D RL_LOCK_WRITE; + entry->rl_q_start =3D base; + entry->rl_q_end =3D base + len; + return (rangelock_enqueue(lock, entry, ilk)); +} Those functions are nearly identical, maybe they can be converted to something like this: static void * rangelock_lock(struct rangelock *lock, off_t base, size_t len, struct mtx *= ilk, int flags) { struct rl_q_entry *entry; struct thread *td; td =3D curthread; if (td->td_rlqe !=3D NULL) { entry =3D td->td_rlqe; td->td_rlqe =3D NULL; } else entry =3D rlqentry_alloc(); entry->rl_q_flags =3D RL_LOCK_READ; entry->rl_q_start =3D base; entry->rl_q_end =3D base + len; return (rangelock_enqueue(lock, entry, ilk)); } void * rangelock_rlock(struct rangelock *lock, off_t base, size_t len, struct mtx = *ilk) { return (range_lock(lock, base, len, ilk, RL_LOCK_READ); } void * rangelock_wlock(struct rangelock *lock, off_t base, size_t len, struct mtx = *ilk) { return (range_lock(lock, base, len, ilk, RL_LOCK_WRITE); } Or even merge rangelock_lock() with rangelock_enqueue()? @@ -199,6 +200,7 @@ thread_init(void *mem, int size, int flags) =20 td->td_sleepqueue =3D sleepq_alloc(); td->td_turnstile =3D turnstile_alloc(); + td->td_rlqe =3D rlqentry_alloc(); What is the purpose of td_rlqe field? From what I see thread can lock more than one range. Could you elaborate on that as well? Do you assert somehow that the same thread is trying to write-lock the range it already read-locked? --- a/sys/kern/subr_syscall.c +++ b/sys/kern/subr_syscall.c @@ -177,6 +177,12 @@ syscallret(struct thread *td, int error, struct syscal= l_args *sa __unused) KASSERT(td->td_locks =3D=3D 0, ("System call %s returning with %d locks held", syscallname(p, sa->code), td->td_locks)); + KASSERT((td->td_pflags & TDP_NOFAULTING) =3D=3D 0, + ("System call %s returning with pagefaults disabled", + syscallname(p, sa->code))); + KASSERT((td->td_pflags & TDP_NOSLEEPING) =3D=3D 0, + ("System call %s returning with sleep disabled", + syscallname(p, sa->code))); We can commit those separately, right? #ifdef MAC if ((ioflg & IO_NOMACCHECK) =3D=3D 0) { - if (rw =3D=3D UIO_READ) - error =3D mac_vnode_check_read(active_cred, file_cred, - vp); - else + if (rw =3D=3D UIO_WRITE) error =3D mac_vnode_check_write(active_cred, file_cred, vp); } I don't see mac_vnode_check_read() call being moved anywhere. Was that intended? + * The vn_io_fault() is a wrapper around vn_read() and vn_write() to + * prevent the following deadlock: + * + * Assume that the thread A reads from the vnode vp1 info userspace s/info/into/ + * buffer buf1 backed by the pages of vnode vp2. If a page in buf1 is + * currently not resident, then system ends up with the call chain + * vn_read() -> VOP_READ(vp1) -> uiomove() -> [Page Fault] -> + * vm_fault(buf1) -> vnode_pager_getpages(vp2) -> VOP_GETPAGES(vp2) + * which establishes lock order vp1->vn_lock, then vp2->vn_lock. + * If, at the same time, thread B reads from vnode vp2 into buffer buf2 + * backed by the pages of vnode vp1, and some page in buf2 is not + * resident, we get a reversed order vp2->vn_lock, then vp1->vn_lock. + * + * To prevent the lock order reversal and deadlock, vn_io_fault() does + * not allow page faults to happen during VOP_READ or VOP_WRITE(). s/VOP_READ/VOP_READ()/ + prot =3D VM_PROT_READ; + if ((fp->f_flag & O_APPEND) || !(flags & FOF_OFFSET)) + /* For appenders, punt and lock the whole range. */ + rl_cookie =3D vn_rangelock_wlock(vp, 0, (size_t)-1); On 32bit archs size_t is still 32bit, AFAIR and file size can be bigger than that. I think we should also use off_t for length (GEOM does that). + len =3D uio->uio_iov->iov_len; + addr =3D (vm_offset_t)uio->uio_iov->iov_base; + end =3D round_page(addr + len); + cnt =3D howmany(end - trunc_page(addr), PAGE_SIZE); + if (cnt > io_hold_cnt) { + len =3D io_hold_cnt * PAGE_SIZE; +again: + end =3D round_page(addr + len); + cnt =3D howmany(end - trunc_page(addr), PAGE_SIZE); + if (cnt > io_hold_cnt) { + len -=3D PAGE_SIZE; + goto again; + } + } This is a bit hard to understand immediately. Maybe something like the following is a bit more readable (I assume this goto could happen only once): len =3D MIN(uio->uio_iov->iov_len, io_hold_cnt * PAGE_SIZE); addr =3D (vm_offset_t)uio->uio_iov->iov_base; end =3D round_page(addr + len); if (howmany(end - trunc_page(addr), PAGE_SIZE) > io_hold_cnt) { len -=3D PAGE_SIZE; end =3D round_page(addr + len); } +vn_truncate(struct file *fp, off_t length, struct ucred *active_cred, + struct thread *td) [...] + /* + * Lock the range where the shortening take place. Increase of + * file size does not need rangelock, but it is faster to lock + * the range then call VOP_GETATTR to get the current size and + * deal with races. + */ + rl_cookie =3D vn_rangelock_wlock(vp, length, -1); Hmm. In case of increasing file size don't we need the range lock because we might be racing with write append? BTW. Comments in sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_rlock.c might be a useful read. --=20 Pawel Jakub Dawidek http://www.wheelsystems.com FreeBSD committer http://www.FreeBSD.org Am I Evil? Yes, I Am! http://tupytaj.pl --idY8LE8SD6/8DnRI Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.18 (FreeBSD) iEYEARECAAYFAk8ua3EACgkQForvXbEpPzRszwCg2nscTg3qYkCHMvXP+s5yj8Y1 slQAoJUnLamwebH26vFztl4OUHbXimS4 =KK/7 -----END PGP SIGNATURE----- --idY8LE8SD6/8DnRI-- From owner-freebsd-arch@FreeBSD.ORG Sun Feb 5 16:47:36 2012 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id D78961065673; Sun, 5 Feb 2012 16:47:36 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from mail.zoral.com.ua (mx0.zoral.com.ua [91.193.166.200]) by mx1.freebsd.org (Postfix) with ESMTP id EE7778FC0C; Sun, 5 Feb 2012 16:47:35 +0000 (UTC) Received: from skuns.kiev.zoral.com.ua (localhost [127.0.0.1]) by mail.zoral.com.ua (8.14.2/8.14.2) with ESMTP id q15GlUSq039851 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Sun, 5 Feb 2012 18:47:30 +0200 (EET) (envelope-from kostikbel@gmail.com) Received: from deviant.kiev.zoral.com.ua (kostik@localhost [127.0.0.1]) by deviant.kiev.zoral.com.ua (8.14.5/8.14.5) with ESMTP id q15GlT5O052800; Sun, 5 Feb 2012 18:47:29 +0200 (EET) (envelope-from kostikbel@gmail.com) Received: (from kostik@localhost) by deviant.kiev.zoral.com.ua (8.14.5/8.14.5/Submit) id q15GlTrw052799; Sun, 5 Feb 2012 18:47:29 +0200 (EET) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: deviant.kiev.zoral.com.ua: kostik set sender to kostikbel@gmail.com using -f Date: Sun, 5 Feb 2012 18:47:29 +0200 From: Konstantin Belousov To: Pawel Jakub Dawidek Message-ID: <20120205164729.GH3283@deviant.kiev.zoral.com.ua> References: <20120203193719.GB3283@deviant.kiev.zoral.com.ua> <20120205114345.GE30033@garage.freebsd.pl> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="OtP5XuPTaeI2U+qV" Content-Disposition: inline In-Reply-To: <20120205114345.GE30033@garage.freebsd.pl> User-Agent: Mutt/1.4.2.3i X-Virus-Scanned: clamav-milter 0.95.2 at skuns.kiev.zoral.com.ua X-Virus-Status: Clean X-Spam-Status: No, score=-3.9 required=5.0 tests=ALL_TRUSTED,AWL,BAYES_00 autolearn=ham version=3.2.5 X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on skuns.kiev.zoral.com.ua Cc: arch@freebsd.org Subject: Re: Prefaulting for i/o buffers X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 05 Feb 2012 16:47:36 -0000 --OtP5XuPTaeI2U+qV Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Sun, Feb 05, 2012 at 12:43:46PM +0100, Pawel Jakub Dawidek wrote: > On Fri, Feb 03, 2012 at 09:37:19PM +0200, Konstantin Belousov wrote: > > FreeBSD I/O infrastructure has well known issue with deadlock caused > > by vnode lock order reversal when buffers supplied to read(2) or > > write(2) syscalls are backed by mmaped file. > >=20 > > I previously published the patches to convert i/o path to use VMIO, > > based on the Jeff Roberson proposal, see > > http://wiki.freebsd.org/VM6. As a side effect, the VM6 fixed the > > deadlock. Since that work is very intrusive and did not got any > > follow-up, it get stalled. > >=20 > > Below is very lightweight patch which only goal is to fix deadlock in > > the least intrusive way. This is possible after FreeBSD got the > > vm_fault_quick_hold_pages(9) and vm_fault_disable_pagefaults(9) KPIs. > > http://people.freebsd.org/~kib/misc/vm1.3.patch >=20 > +struct rl_q_entry * > +rlqentry_alloc() >=20 > Missing 'void'. Yes, it is style inconsistency. Fixed. >=20 > +static int > +rangelock_incompatible(const struct rl_q_entry *e1, > + const struct rl_q_entry *e2) > +{ > + > + if ((e1->rl_q_flags & RL_LOCK_TYPE_MASK) =3D=3D RL_LOCK_READ && > + (e2->rl_q_flags & RL_LOCK_TYPE_MASK) =3D=3D RL_LOCK_READ) > + return (0); > +#define IN_RANGE(a, e) (a >=3D e->rl_q_start && a < e->rl_q_end) > + if (IN_RANGE(e1->rl_q_start, e2) || IN_RANGE(e2->rl_q_start, e1) || > + IN_RANGE(e1->rl_q_end, e2) || IN_RANGE(e2->rl_q_end, e1)) > + return (1); > +#undef IN_RANGE > + return (0); > +} >=20 > The checks here are a bit wrong and imcomplete. >=20 > This is correct: >=20 > + if (IN_RANGE(e1->rl_q_start, e2) || IN_RANGE(e2->rl_q_start, e1) || >=20 > This is not: >=20 > + IN_RANGE(e1->rl_q_end, e2) || IN_RANGE(e2->rl_q_end, e1)) >=20 > After simplifying one of those checks we get: >=20 > if (end1 >=3D start2 && end1 < end2) >=20 > This will be true if end1 =3D=3D start2, but in this case it should be > false. Agreed, my tests are too strict there. But note that the only consequence is that some lock requests are postponed. >=20 > There are also some checks missing. If the first range covers entire > second range or vice-versa you will return that the locks are > compatible, eg. >=20 > start1-----start2-----end2-----end1 Isn't IN_RANGE(e2->rl_q_start, e1) cover this case ? In essence, the checks I wrote verify whether any one region border lies inside other region. >=20 > The following check should cover all the cases: >=20 > if (e1->rl_q_start < e2->rl_q_end && > e1->rl_q_end > e2->rl_q_start) { > return (1); > } Your check is much more elegant, I changed the code as you suggested. >=20 > +static void > +rangelock_calc_block(struct rangelock *lock) > +{ > + struct rl_q_entry *entry, *entry1, *whead; > + > + if (lock->rl_currdep =3D=3D TAILQ_FIRST(&lock->rl_waiters) && > + lock->rl_currdep !=3D NULL) > + lock->rl_currdep =3D TAILQ_NEXT(lock->rl_currdep, rl_q_link); > + for (entry =3D lock->rl_currdep; entry; > + entry =3D TAILQ_NEXT(entry, rl_q_link)) { > + TAILQ_FOREACH(entry1, &lock->rl_waiters, rl_q_link) { > + if (rangelock_incompatible(entry, entry1)) > + goto out; > + if (entry1 =3D=3D entry) > + break; > + } > + } > +out: > + lock->rl_currdep =3D entry; > + TAILQ_FOREACH(whead, &lock->rl_waiters, rl_q_link) { > + if (whead =3D=3D lock->rl_currdep) > + break; > + if (!(whead->rl_q_flags & RL_LOCK_GRANTED)) { > + whead->rl_q_flags |=3D RL_LOCK_GRANTED; > + wakeup(whead); > + } > + } > +} >=20 > This function doesn't look at it is going to scale. Searching for > incompatible locks looks very expensive. Well, it is not. I am aware of the tree structure use for the range lock, but this probably have to wait some time. >=20 > This function seems to be responsible for waking up waiters. In case of > range locking this might be tricky, as unlocking read lock on the given > range doesn't mean the range can be exclusively locked by a waiter, as > part of this range might be read-locked multiple times or different > range might be read locked, but which is also covered by the waiting > writer. >=20 > Could you elaborate a bit on how this all works together? Does the > function look through all the waiters and for each waiter goes though > all the active locks? >=20 > Do you protect somehow against starving writers? Due to different review, I already added the comments to the rangelock implementation. Basically, the rangelocks in kern_rangelock.c are fully fair. The locks are granted in the order of waiters arrival (the order is defined as the order of acquring interlock, in the latest version it is a sleepchannel spinlock). The free of some range causes rescan of the queue of waiters starting from rl_currdep. The rl_currdep stores the first blocked waiter. If next waiter is compatible with all granted regions, it is moved before rl_currdep and woken up. >=20 > + KASSERT(entry->rl_q_flags & RL_LOCK_GRANTED, ("XXX")); > + KASSERT(entry->rl_q_start =3D=3D base, ("XXX")); > + KASSERT(entry->rl_q_end >=3D base + len, ("XXX")); >=20 > I expect those XXX will be replaced with real messages in the final > version? We could really use simple ASSERT() without obligatory message > in the kernel... >=20 We have MPASS. These XXX are fixed. > +void * > +rangelock_rlock(struct rangelock *lock, off_t base, size_t len, struct m= tx *ilk) > +{ > + struct rl_q_entry *entry; > + struct thread *td; > + > + td =3D curthread; > + if (td->td_rlqe !=3D NULL) { > + entry =3D td->td_rlqe; > + td->td_rlqe =3D NULL; > + } else > + entry =3D rlqentry_alloc(); > + entry->rl_q_flags =3D RL_LOCK_READ; > + entry->rl_q_start =3D base; > + entry->rl_q_end =3D base + len; > + return (rangelock_enqueue(lock, entry, ilk)); > +} > + > +void * > +rangelock_wlock(struct rangelock *lock, off_t base, size_t len, struct m= tx *ilk) > +{ > + struct rl_q_entry *entry; > + struct thread *td; > + > + td =3D curthread; > + if (td->td_rlqe !=3D NULL) { > + entry =3D td->td_rlqe; > + td->td_rlqe =3D NULL; > + } else > + entry =3D rlqentry_alloc(); > + entry->rl_q_flags =3D RL_LOCK_WRITE; > + entry->rl_q_start =3D base; > + entry->rl_q_end =3D base + len; > + return (rangelock_enqueue(lock, entry, ilk)); > +} >=20 > Those functions are nearly identical, maybe they can be converted to > something like this: >=20 > static void * > rangelock_lock(struct rangelock *lock, off_t base, size_t len, struct mtx= *ilk, > int flags) > { > struct rl_q_entry *entry; > struct thread *td; >=20 > td =3D curthread; > if (td->td_rlqe !=3D NULL) { > entry =3D td->td_rlqe; > td->td_rlqe =3D NULL; > } else > entry =3D rlqentry_alloc(); > entry->rl_q_flags =3D RL_LOCK_READ; > entry->rl_q_start =3D base; > entry->rl_q_end =3D base + len; > return (rangelock_enqueue(lock, entry, ilk)); > } >=20 > void * > rangelock_rlock(struct rangelock *lock, off_t base, size_t len, struct mt= x *ilk) > { >=20 > return (range_lock(lock, base, len, ilk, RL_LOCK_READ); > } >=20 > void * > rangelock_wlock(struct rangelock *lock, off_t base, size_t len, struct mt= x *ilk) > { >=20 > return (range_lock(lock, base, len, ilk, RL_LOCK_WRITE); > } >=20 > Or even merge rangelock_lock() with rangelock_enqueue()? Done exactly this. >=20 > @@ -199,6 +200,7 @@ thread_init(void *mem, int size, int flags) > =20 > td->td_sleepqueue =3D sleepq_alloc(); > td->td_turnstile =3D turnstile_alloc(); > + td->td_rlqe =3D rlqentry_alloc(); >=20 > What is the purpose of td_rlqe field? From what I see thread can lock > more than one range. Could you elaborate on that as well? td_rlqe is a cached rangelock entry used for the typical case of the thread not doing recursive i/o. In this case, it is possible to avoid memory allocation on the i/o hot path by using entry allocated during thread creation. Since thread creation already allocates many chunks of memory, and typical thread performs much more i/o then it suffers creation, this shorten the i/o calculation path. If tq_rlqe is already consumed, new entry is allocated at the lock time. >=20 > Do you assert somehow that the same thread is trying to write-lock the > range it already read-locked? No. With the latest patch it is indeed can be done, because I started to record range owners. For now, I added a comment with TODO mentioning the possible assert. >=20 > --- a/sys/kern/subr_syscall.c > +++ b/sys/kern/subr_syscall.c > @@ -177,6 +177,12 @@ syscallret(struct thread *td, int error, struct sysc= all_args *sa __unused) > KASSERT(td->td_locks =3D=3D 0, > ("System call %s returning with %d locks held", > syscallname(p, sa->code), td->td_locks)); > + KASSERT((td->td_pflags & TDP_NOFAULTING) =3D=3D 0, > + ("System call %s returning with pagefaults disabled", > + syscallname(p, sa->code))); > + KASSERT((td->td_pflags & TDP_NOSLEEPING) =3D=3D 0, > + ("System call %s returning with sleep disabled", > + syscallname(p, sa->code))); >=20 > We can commit those separately, right? Definitely. Rangelocks will be committed separately too. The vfs_vnops.c change is self-contained. >=20 > #ifdef MAC > if ((ioflg & IO_NOMACCHECK) =3D=3D 0) { > - if (rw =3D=3D UIO_READ) > - error =3D mac_vnode_check_read(active_cred, file_cred, > - vp); > - else > + if (rw =3D=3D UIO_WRITE) > error =3D mac_vnode_check_write(active_cred, file_cred, > vp); > } >=20 > I don't see mac_vnode_check_read() call being moved anywhere. > Was that intended? No, thank you for pointing this. mdf already noted this. >=20 > + * The vn_io_fault() is a wrapper around vn_read() and vn_write() to > + * prevent the following deadlock: > + * > + * Assume that the thread A reads from the vnode vp1 info userspace >=20 > s/info/into/ >=20 > + * buffer buf1 backed by the pages of vnode vp2. If a page in buf1 is > + * currently not resident, then system ends up with the call chain > + * vn_read() -> VOP_READ(vp1) -> uiomove() -> [Page Fault] -> > + * vm_fault(buf1) -> vnode_pager_getpages(vp2) -> VOP_GETPAGES(vp2) > + * which establishes lock order vp1->vn_lock, then vp2->vn_lock. > + * If, at the same time, thread B reads from vnode vp2 into buffer buf2 > + * backed by the pages of vnode vp1, and some page in buf2 is not > + * resident, we get a reversed order vp2->vn_lock, then vp1->vn_lock. > + * > + * To prevent the lock order reversal and deadlock, vn_io_fault() does > + * not allow page faults to happen during VOP_READ or VOP_WRITE(). >=20 > s/VOP_READ/VOP_READ()/ >=20 > + prot =3D VM_PROT_READ; > + if ((fp->f_flag & O_APPEND) || !(flags & FOF_OFFSET)) > + /* For appenders, punt and lock the whole range. */ > + rl_cookie =3D vn_rangelock_wlock(vp, 0, (size_t)-1); >=20 > On 32bit archs size_t is still 32bit, AFAIR and file size can be bigger > than that. I think we should also use off_t for length (GEOM does that). Agreed, but I moved to specifying (start,end) instead of (start,len) alread= y. >=20 > + len =3D uio->uio_iov->iov_len; > + addr =3D (vm_offset_t)uio->uio_iov->iov_base; > + end =3D round_page(addr + len); > + cnt =3D howmany(end - trunc_page(addr), PAGE_SIZE); > + if (cnt > io_hold_cnt) { > + len =3D io_hold_cnt * PAGE_SIZE; > +again: > + end =3D round_page(addr + len); > + cnt =3D howmany(end - trunc_page(addr), PAGE_SIZE); > + if (cnt > io_hold_cnt) { > + len -=3D PAGE_SIZE; > + goto again; > + } > + } >=20 > This is a bit hard to understand immediately. Maybe something like the > following is a bit more readable (I assume this goto could happen only > once): >=20 > len =3D MIN(uio->uio_iov->iov_len, io_hold_cnt * PAGE_SIZE); > addr =3D (vm_offset_t)uio->uio_iov->iov_base; > end =3D round_page(addr + len); > if (howmany(end - trunc_page(addr), PAGE_SIZE) > io_hold_cnt) { > len -=3D PAGE_SIZE; > end =3D round_page(addr + len); > } I think it can happen twice, if both start and len are perfectly mis-aligne= d. >=20 > +vn_truncate(struct file *fp, off_t length, struct ucred *active_cred, > + struct thread *td) > [...] > + /* > + * Lock the range where the shortening take place. Increase of > + * file size does not need rangelock, but it is faster to lock > + * the range then call VOP_GETATTR to get the current size and > + * deal with races. > + */ > + rl_cookie =3D vn_rangelock_wlock(vp, length, -1); >=20 > Hmm. In case of increasing file size don't we need the range lock > because we might be racing with write append? We need to exclude the split i/o from being performed on the disjoint ranges of the file if truncation happens in between split. I changed the rangelock obtained in vn_truncate to get the whole range. The bug appeared because I imported the rangelock from the bigger patch that indeed handled the expansions. Thank you for pointing this out. >=20 > BTW. Comments in sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_rlock= .c > might be a useful read. Thank you for the review. http://people.freebsd.org/~kib/misc/vm1.8.patch --OtP5XuPTaeI2U+qV Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (FreeBSD) iEYEARECAAYFAk8usqEACgkQC3+MBN1Mb4j4PgCeN08ELUSUm8La1mEHoyIMXbKS 6mEAnRO62Lib+nyRGb2lxQ5zqz732xmS =6nrX -----END PGP SIGNATURE----- --OtP5XuPTaeI2U+qV-- From owner-freebsd-arch@FreeBSD.ORG Sun Feb 5 19:22:03 2012 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 83FA4106566B for ; Sun, 5 Feb 2012 19:22:03 +0000 (UTC) (envelope-from pawel@dawidek.net) Received: from mail.dawidek.net (60.wheelsystems.com [83.12.187.60]) by mx1.freebsd.org (Postfix) with ESMTP id 27E898FC0A for ; Sun, 5 Feb 2012 19:22:02 +0000 (UTC) Received: from localhost (89-73-195-149.dynamic.chello.pl [89.73.195.149]) by mail.dawidek.net (Postfix) with ESMTPSA id 354897A4; Sun, 5 Feb 2012 20:22:01 +0100 (CET) Date: Sun, 5 Feb 2012 20:20:45 +0100 From: Pawel Jakub Dawidek To: Konstantin Belousov Message-ID: <20120205192045.GH30033@garage.freebsd.pl> References: <20120203193719.GB3283@deviant.kiev.zoral.com.ua> <20120205114345.GE30033@garage.freebsd.pl> <20120205164729.GH3283@deviant.kiev.zoral.com.ua> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="IbVRjBtIbJdbeK1C" Content-Disposition: inline In-Reply-To: <20120205164729.GH3283@deviant.kiev.zoral.com.ua> X-OS: FreeBSD 10.0-CURRENT amd64 User-Agent: Mutt/1.5.21 (2010-09-15) Cc: arch@freebsd.org Subject: Re: Prefaulting for i/o buffers X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 05 Feb 2012 19:22:03 -0000 --IbVRjBtIbJdbeK1C Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Sun, Feb 05, 2012 at 06:47:29PM +0200, Konstantin Belousov wrote: > On Sun, Feb 05, 2012 at 12:43:46PM +0100, Pawel Jakub Dawidek wrote: > > @@ -199,6 +200,7 @@ thread_init(void *mem, int size, int flags) > > =20 > > td->td_sleepqueue =3D sleepq_alloc(); > > td->td_turnstile =3D turnstile_alloc(); > > + td->td_rlqe =3D rlqentry_alloc(); > >=20 > > What is the purpose of td_rlqe field? From what I see thread can lock > > more than one range. Could you elaborate on that as well? > td_rlqe is a cached rangelock entry used for the typical case of the > thread not doing recursive i/o. In this case, it is possible to avoid > memory allocation on the i/o hot path by using entry allocated during > thread creation. Since thread creation already allocates many chunks > of memory, and typical thread performs much more i/o then it suffers > creation, this shorten the i/o calculation path. I see. I'd be in favour of dropping entry allocation on thread creation. This would make the allocation lazy and it will be done on the first I/O only. What do you think? Thread creation should be fast, so if we can avoid adding up, I would go that route. > > This is a bit hard to understand immediately. Maybe something like the > > following is a bit more readable (I assume this goto could happen only > > once): > >=20 > > len =3D MIN(uio->uio_iov->iov_len, io_hold_cnt * PAGE_SIZE); > > addr =3D (vm_offset_t)uio->uio_iov->iov_base; > > end =3D round_page(addr + len); > > if (howmany(end - trunc_page(addr), PAGE_SIZE) > io_hold_cnt) { > > len -=3D PAGE_SIZE; > > end =3D round_page(addr + len); > > } > I think it can happen twice, if both start and len are perfectly mis-alig= ned. I see. That make sense. It would be nice to add comment there, as it wasn't immediately obvious (at least for me) what's going on. Another way to simplify this piece of code would be to either make ma[] array of 'io_hold_cnt + 2' elements or set 'len' to '(io_hold_cnt - 2) * PAGE_SIZE'. Short comment describing why we add or subtract 2 would be of course welcome. Just curious, why io_hold_cnt is 'static const int' instead of define? > http://people.freebsd.org/~kib/misc/vm1.8.patch The new patch looks good to me. --=20 Pawel Jakub Dawidek http://www.wheelsystems.com FreeBSD committer http://www.FreeBSD.org Am I Evil? Yes, I Am! http://tupytaj.pl --IbVRjBtIbJdbeK1C Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.18 (FreeBSD) iEYEARECAAYFAk8u1o0ACgkQForvXbEpPzT+QwCgjkC5udmjk3NQxu/ktUgozX2o DMEAmQGsHDmZR0yBw563Ot6O8ivKYwSB =k7WD -----END PGP SIGNATURE----- --IbVRjBtIbJdbeK1C-- From owner-freebsd-arch@FreeBSD.ORG Sun Feb 5 22:13:05 2012 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id DDCC1106566B; Sun, 5 Feb 2012 22:13:04 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from mail.zoral.com.ua (mx0.zoral.com.ua [91.193.166.200]) by mx1.freebsd.org (Postfix) with ESMTP id 6892D8FC0A; Sun, 5 Feb 2012 22:13:03 +0000 (UTC) Received: from skuns.kiev.zoral.com.ua (localhost [127.0.0.1]) by mail.zoral.com.ua (8.14.2/8.14.2) with ESMTP id q15MCxwk064757 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Mon, 6 Feb 2012 00:12:59 +0200 (EET) (envelope-from kostikbel@gmail.com) Received: from deviant.kiev.zoral.com.ua (kostik@localhost [127.0.0.1]) by deviant.kiev.zoral.com.ua (8.14.5/8.14.5) with ESMTP id q15MCxUa054439; Mon, 6 Feb 2012 00:12:59 +0200 (EET) (envelope-from kostikbel@gmail.com) Received: (from kostik@localhost) by deviant.kiev.zoral.com.ua (8.14.5/8.14.5/Submit) id q15MCxQ3054438; Mon, 6 Feb 2012 00:12:59 +0200 (EET) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: deviant.kiev.zoral.com.ua: kostik set sender to kostikbel@gmail.com using -f Date: Mon, 6 Feb 2012 00:12:59 +0200 From: Konstantin Belousov To: Pawel Jakub Dawidek Message-ID: <20120205221259.GL3283@deviant.kiev.zoral.com.ua> References: <20120203193719.GB3283@deviant.kiev.zoral.com.ua> <20120205114345.GE30033@garage.freebsd.pl> <20120205164729.GH3283@deviant.kiev.zoral.com.ua> <20120205192045.GH30033@garage.freebsd.pl> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="rYkcoROieYsUh8uu" Content-Disposition: inline In-Reply-To: <20120205192045.GH30033@garage.freebsd.pl> User-Agent: Mutt/1.4.2.3i X-Virus-Scanned: clamav-milter 0.95.2 at skuns.kiev.zoral.com.ua X-Virus-Status: Clean X-Spam-Status: No, score=-3.9 required=5.0 tests=ALL_TRUSTED,AWL,BAYES_00 autolearn=ham version=3.2.5 X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on skuns.kiev.zoral.com.ua Cc: arch@freebsd.org Subject: Re: Prefaulting for i/o buffers X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 05 Feb 2012 22:13:05 -0000 --rYkcoROieYsUh8uu Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Sun, Feb 05, 2012 at 08:20:45PM +0100, Pawel Jakub Dawidek wrote: > On Sun, Feb 05, 2012 at 06:47:29PM +0200, Konstantin Belousov wrote: > > On Sun, Feb 05, 2012 at 12:43:46PM +0100, Pawel Jakub Dawidek wrote: > > > @@ -199,6 +200,7 @@ thread_init(void *mem, int size, int flags) > > > =20 > > > td->td_sleepqueue =3D sleepq_alloc(); > > > td->td_turnstile =3D turnstile_alloc(); > > > + td->td_rlqe =3D rlqentry_alloc(); > > >=20 > > > What is the purpose of td_rlqe field? From what I see thread can lock > > > more than one range. Could you elaborate on that as well? > > td_rlqe is a cached rangelock entry used for the typical case of the > > thread not doing recursive i/o. In this case, it is possible to avoid > > memory allocation on the i/o hot path by using entry allocated during > > thread creation. Since thread creation already allocates many chunks > > of memory, and typical thread performs much more i/o then it suffers > > creation, this shorten the i/o calculation path. >=20 > I see. I'd be in favour of dropping entry allocation on thread creation. > This would make the allocation lazy and it will be done on the first I/O > only. What do you think? Thread creation should be fast, so if we can > avoid adding up, I would go that route. I think this is negligible change in the speed of much rare operation (thread creation) comparing with a hit on the first time i/o, but I just did it in a way you suggested. >=20 > > > This is a bit hard to understand immediately. Maybe something like the > > > following is a bit more readable (I assume this goto could happen only > > > once): > > >=20 > > > len =3D MIN(uio->uio_iov->iov_len, io_hold_cnt * PAGE_SIZE); > > > addr =3D (vm_offset_t)uio->uio_iov->iov_base; > > > end =3D round_page(addr + len); > > > if (howmany(end - trunc_page(addr), PAGE_SIZE) > io_hold_cnt) { > > > len -=3D PAGE_SIZE; > > > end =3D round_page(addr + len); > > > } > > I think it can happen twice, if both start and len are perfectly mis-al= igned. >=20 > I see. That make sense. It would be nice to add comment there, as it > wasn't immediately obvious (at least for me) what's going on. >=20 > Another way to simplify this piece of code would be to either make ma[] > array of 'io_hold_cnt + 2' elements or set 'len' to '(io_hold_cnt - 2) * > PAGE_SIZE'. Short comment describing why we add or subtract 2 would be > of course welcome. I am not sure that +2 would be improvement in readability, but indeed it removes some code. I implemented this and added a comment. >=20 > Just curious, why io_hold_cnt is 'static const int' instead of define? I want to change it into a sysctl or tunable one day. http://people.freebsd.org/~kib/misc/vm1.9.patch --rYkcoROieYsUh8uu Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (FreeBSD) iEYEARECAAYFAk8u/usACgkQC3+MBN1Mb4ghkwCgiGZND59lcqCwmTNWk2ztHADq w1wAnjUFE/i1HhxtS2Z7DxSQJOhec4hN =FtVf -----END PGP SIGNATURE----- --rYkcoROieYsUh8uu-- From owner-freebsd-arch@FreeBSD.ORG Mon Feb 6 08:59:28 2012 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 46AE2106566C for ; Mon, 6 Feb 2012 08:59:28 +0000 (UTC) (envelope-from pawel@dawidek.net) Received: from mail.dawidek.net (60.wheelsystems.com [83.12.187.60]) by mx1.freebsd.org (Postfix) with ESMTP id E952A8FC16 for ; Mon, 6 Feb 2012 08:59:27 +0000 (UTC) Received: from localhost (58.wheelsystems.com [83.12.187.58]) by mail.dawidek.net (Postfix) with ESMTPSA id B9A03938; Mon, 6 Feb 2012 09:59:26 +0100 (CET) Date: Mon, 6 Feb 2012 09:58:14 +0100 From: Pawel Jakub Dawidek To: Konstantin Belousov Message-ID: <20120206085814.GB1324@garage.freebsd.pl> References: <20120203193719.GB3283@deviant.kiev.zoral.com.ua> <20120205114345.GE30033@garage.freebsd.pl> <20120205164729.GH3283@deviant.kiev.zoral.com.ua> <20120205192045.GH30033@garage.freebsd.pl> <20120205221259.GL3283@deviant.kiev.zoral.com.ua> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="qlTNgmc+xy1dBmNv" Content-Disposition: inline In-Reply-To: <20120205221259.GL3283@deviant.kiev.zoral.com.ua> X-OS: FreeBSD 10.0-CURRENT amd64 User-Agent: Mutt/1.5.21 (2010-09-15) Cc: arch@freebsd.org Subject: Re: Prefaulting for i/o buffers X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 06 Feb 2012 08:59:28 -0000 --qlTNgmc+xy1dBmNv Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Mon, Feb 06, 2012 at 12:12:59AM +0200, Konstantin Belousov wrote: > http://people.freebsd.org/~kib/misc/vm1.9.patch --- a/sys/fs/nfsclient/nfs_clbio.c +++ b/sys/fs/nfsclient/nfs_clbio.c @@ -820,7 +820,10 @@ do_sync: t_uio->uio_segflg =3D UIO_SYSSPACE; t_uio->uio_rw =3D UIO_WRITE; t_uio->uio_td =3D td; - bcopy(uiop->uio_iov->iov_base, t_iov->iov_base, size); + error =3D copyin(uiop->uio_iov->iov_base, + t_iov->iov_base, size); + if (error !=3D 0) + goto err_free; Good catch. It makes me wonder if uiop will always point at userland buffer. What if we eg. AUDIT subsystem writing from within the kernel to a file stored on NFS file system? + if (lock->rl_currdep =3D=3D TAILQ_FIRST(&lock->rl_waiters) && + lock->rl_currdep !=3D NULL) + lock->rl_currdep =3D TAILQ_NEXT(lock->rl_currdep, rl_q_link); + for (entry =3D lock->rl_currdep; entry; Minor style inconsistency. Two lines earlier you compare pointer with NULL, which is nice, but here you treat pointer as boolean. +void +rangelock_unlock(struct rangelock *lock, void *cookie) +{ + struct rl_q_entry *entry; + + MPASS(lock !=3D NULL && cookie !=3D NULL); + + entry =3D cookie; + sleepq_lock(lock); + rangelock_unlock_locked(lock, entry); +} You could drop using addtional 'entry' variable and just pass 'cookie' directly. Although current code is self-explaining what 'cookie' actually is, so I kinda like it as-is. +/* + * Add the lock request to the queue of the pending requests for + * rangelock. Sleeps until the request can be granted. s/request/lock/ ? @@ -216,6 +218,7 @@ thread_fini(void *mem, int size) =20 td =3D (struct thread *)mem; EVENTHANDLER_INVOKE(thread_fini, td); + rlqentry_free(td->td_rlqe); Not sure if it was intended or not, but td_rlqe can be NULL now, so it might be worth checking that. It is not a big deal, as uma_zfree() can hand= le NULL pointers though. --=20 Pawel Jakub Dawidek http://www.wheelsystems.com FreeBSD committer http://www.FreeBSD.org Am I Evil? Yes, I Am! http://tupytaj.pl --qlTNgmc+xy1dBmNv Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.18 (FreeBSD) iEYEARECAAYFAk8vliYACgkQForvXbEpPzTNGgCggyZ6rgH24Bq8z9gzYFDpf+gO PR8An3JeYbNqfilOEamCbLxTRA/D/Z3n =YHIV -----END PGP SIGNATURE----- --qlTNgmc+xy1dBmNv-- From owner-freebsd-arch@FreeBSD.ORG Mon Feb 6 09:54:24 2012 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id B0ED5106566B; Mon, 6 Feb 2012 09:54:24 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from mail.zoral.com.ua (mx0.zoral.com.ua [91.193.166.200]) by mx1.freebsd.org (Postfix) with ESMTP id 3F97C8FC12; Mon, 6 Feb 2012 09:54:23 +0000 (UTC) Received: from skuns.kiev.zoral.com.ua (localhost [127.0.0.1]) by mail.zoral.com.ua (8.14.2/8.14.2) with ESMTP id q169sGSe024524 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Mon, 6 Feb 2012 11:54:16 +0200 (EET) (envelope-from kostikbel@gmail.com) Received: from deviant.kiev.zoral.com.ua (kostik@localhost [127.0.0.1]) by deviant.kiev.zoral.com.ua (8.14.5/8.14.5) with ESMTP id q169sFHJ058022; Mon, 6 Feb 2012 11:54:15 +0200 (EET) (envelope-from kostikbel@gmail.com) Received: (from kostik@localhost) by deviant.kiev.zoral.com.ua (8.14.5/8.14.5/Submit) id q169sFD0058021; Mon, 6 Feb 2012 11:54:15 +0200 (EET) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: deviant.kiev.zoral.com.ua: kostik set sender to kostikbel@gmail.com using -f Date: Mon, 6 Feb 2012 11:54:15 +0200 From: Konstantin Belousov To: Pawel Jakub Dawidek Message-ID: <20120206095415.GM3283@deviant.kiev.zoral.com.ua> References: <20120203193719.GB3283@deviant.kiev.zoral.com.ua> <20120205114345.GE30033@garage.freebsd.pl> <20120205164729.GH3283@deviant.kiev.zoral.com.ua> <20120205192045.GH30033@garage.freebsd.pl> <20120205221259.GL3283@deviant.kiev.zoral.com.ua> <20120206085814.GB1324@garage.freebsd.pl> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="T4pX28sySFCT/oEO" Content-Disposition: inline In-Reply-To: <20120206085814.GB1324@garage.freebsd.pl> User-Agent: Mutt/1.4.2.3i X-Virus-Scanned: clamav-milter 0.95.2 at skuns.kiev.zoral.com.ua X-Virus-Status: Clean X-Spam-Status: No, score=-3.9 required=5.0 tests=ALL_TRUSTED,AWL,BAYES_00 autolearn=ham version=3.2.5 X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on skuns.kiev.zoral.com.ua Cc: arch@freebsd.org Subject: Re: Prefaulting for i/o buffers X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 06 Feb 2012 09:54:24 -0000 --T4pX28sySFCT/oEO Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Mon, Feb 06, 2012 at 09:58:14AM +0100, Pawel Jakub Dawidek wrote: > On Mon, Feb 06, 2012 at 12:12:59AM +0200, Konstantin Belousov wrote: > > http://people.freebsd.org/~kib/misc/vm1.9.patch >=20 > --- a/sys/fs/nfsclient/nfs_clbio.c > +++ b/sys/fs/nfsclient/nfs_clbio.c > @@ -820,7 +820,10 @@ do_sync: > t_uio->uio_segflg =3D UIO_SYSSPACE; > t_uio->uio_rw =3D UIO_WRITE; > t_uio->uio_td =3D td; > - bcopy(uiop->uio_iov->iov_base, t_iov->iov_base, size); > + error =3D copyin(uiop->uio_iov->iov_base, > + t_iov->iov_base, size); > + if (error !=3D 0) > + goto err_free; >=20 > Good catch. It makes me wonder if uiop will always point at userland > buffer. What if we eg. AUDIT subsystem writing from within the kernel to > a file stored on NFS file system? Hm, I committed to the wrong branch. Anyway, speaking about this (unrelated) change, I thought that directio path cannot happen for UIO_SYSSPACE. But since both you and Rick think that handling UIO_SYSSPACE there makes sense, so be it. >=20 > + if (lock->rl_currdep =3D=3D TAILQ_FIRST(&lock->rl_waiters) && > + lock->rl_currdep !=3D NULL) > + lock->rl_currdep =3D TAILQ_NEXT(lock->rl_currdep, rl_q_link); > + for (entry =3D lock->rl_currdep; entry; >=20 > Minor style inconsistency. Two lines earlier you compare pointer with > NULL, which is nice, but here you treat pointer as boolean. Fixed. >=20 > +void > +rangelock_unlock(struct rangelock *lock, void *cookie) > +{ > + struct rl_q_entry *entry; > + > + MPASS(lock !=3D NULL && cookie !=3D NULL); > + > + entry =3D cookie; > + sleepq_lock(lock); > + rangelock_unlock_locked(lock, entry); > +} >=20 > You could drop using addtional 'entry' variable and just pass 'cookie' > directly. Although current code is self-explaining what 'cookie' > actually is, so I kinda like it as-is. Dropped entry. >=20 > +/* > + * Add the lock request to the queue of the pending requests for > + * rangelock. Sleeps until the request can be granted. >=20 > s/request/lock/ ? No, I think request is more precise there. >=20 > @@ -216,6 +218,7 @@ thread_fini(void *mem, int size) > =20 > td =3D (struct thread *)mem; > EVENTHANDLER_INVOKE(thread_fini, td); > + rlqentry_free(td->td_rlqe); >=20 > Not sure if it was intended or not, but td_rlqe can be NULL now, so it > might be worth checking that. It is not a big deal, as uma_zfree() can ha= ndle > NULL pointers though. Yes, it is intended. As you noted yourself, uma_zfree() handles NULL. For typical thread, td_rlqe is not NULL, so it is slightly better to not check for NULL. --T4pX28sySFCT/oEO Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (FreeBSD) iEYEARECAAYFAk8vo0cACgkQC3+MBN1Mb4h1DQCfefzc3n6Z3mj0Ia2dp8hk4RMR 2YwAoOWHv8a6dXehFKAGpB7glFSIV2oI =JAkA -----END PGP SIGNATURE----- --T4pX28sySFCT/oEO-- From owner-freebsd-arch@FreeBSD.ORG Mon Feb 6 10:01:26 2012 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 6DDD5106564A for ; Mon, 6 Feb 2012 10:01:26 +0000 (UTC) (envelope-from pawel@dawidek.net) Received: from mail.dawidek.net (60.wheelsystems.com [83.12.187.60]) by mx1.freebsd.org (Postfix) with ESMTP id 1FFFB8FC15 for ; Mon, 6 Feb 2012 10:01:25 +0000 (UTC) Received: from localhost (58.wheelsystems.com [83.12.187.58]) by mail.dawidek.net (Postfix) with ESMTPSA id C15F595B; Mon, 6 Feb 2012 11:01:23 +0100 (CET) Date: Mon, 6 Feb 2012 11:00:11 +0100 From: Pawel Jakub Dawidek To: Konstantin Belousov Message-ID: <20120206100009.GC1324@garage.freebsd.pl> References: <20120203193719.GB3283@deviant.kiev.zoral.com.ua> <20120205114345.GE30033@garage.freebsd.pl> <20120205164729.GH3283@deviant.kiev.zoral.com.ua> <20120205192045.GH30033@garage.freebsd.pl> <20120205221259.GL3283@deviant.kiev.zoral.com.ua> <20120206085814.GB1324@garage.freebsd.pl> <20120206095415.GM3283@deviant.kiev.zoral.com.ua> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="Clx92ZfkiYIKRjnr" Content-Disposition: inline In-Reply-To: <20120206095415.GM3283@deviant.kiev.zoral.com.ua> X-OS: FreeBSD 10.0-CURRENT amd64 User-Agent: Mutt/1.5.21 (2010-09-15) Cc: arch@freebsd.org Subject: Re: Prefaulting for i/o buffers X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 06 Feb 2012 10:01:26 -0000 --Clx92ZfkiYIKRjnr Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable Kostik, I have no objections to the patch at this point. Thanks. --=20 Pawel Jakub Dawidek http://www.wheelsystems.com FreeBSD committer http://www.FreeBSD.org Am I Evil? Yes, I Am! http://tupytaj.pl --Clx92ZfkiYIKRjnr Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.18 (FreeBSD) iEYEARECAAYFAk8vpKkACgkQForvXbEpPzSItACgqLf+N3mOQ9+74z8u96YbjRl4 c/EAoOnNmoiYImbUsO8thwL4P/lLYm1/ =9bL2 -----END PGP SIGNATURE----- --Clx92ZfkiYIKRjnr-- From owner-freebsd-arch@FreeBSD.ORG Mon Feb 6 18:46:21 2012 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 0E397106564A for ; Mon, 6 Feb 2012 18:46:21 +0000 (UTC) (envelope-from tijl@coosemans.org) Received: from mailrelay011.isp.belgacom.be (mailrelay011.isp.belgacom.be [195.238.6.178]) by mx1.freebsd.org (Postfix) with ESMTP id 88B978FC14 for ; Mon, 6 Feb 2012 18:46:19 +0000 (UTC) X-Belgacom-Dynamic: yes X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AvwEAFQYME9bsaFX/2dsb2JhbABDr0OBBoJPI4EaiDa4aYsuASoGAQsBCAUDAwkIF4J6DixtBA4CCgIhgxwEnlsDiSY Received: from 87.161-177-91.adsl-dyn.isp.belgacom.be (HELO kalimero.tijl.coosemans.org) ([91.177.161.87]) by relay.skynet.be with ESMTP; 06 Feb 2012 19:16:50 +0100 Received: from kalimero.tijl.coosemans.org (kalimero.tijl.coosemans.org [127.0.0.1]) by kalimero.tijl.coosemans.org (8.14.5/8.14.5) with ESMTP id q16IGmH3004224; Mon, 6 Feb 2012 19:16:48 +0100 (CET) (envelope-from tijl@coosemans.org) From: Tijl Coosemans Date: Mon, 6 Feb 2012 19:16:41 +0100 User-Agent: KMail/1.13.7 (FreeBSD/10.0-CURRENT; KDE/4.7.3; i386; ; ) MIME-Version: 1.0 To: freebsd-arch@freebsd.org Content-Type: multipart/signed; boundary="nextPart1888453.YQFfBMEuoy"; protocol="application/pgp-signature"; micalg=pgp-sha256 Content-Transfer-Encoding: 7bit Message-Id: <201202061916.45998.tijl@coosemans.org> Cc: Damjan Jovanovic Subject: amd64 cc -m32 support and headers project branch X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 06 Feb 2012 18:46:21 -0000 --nextPart1888453.YQFfBMEuoy Content-Type: Text/Plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Hi, After a hiatus of almost a year I'd like to continue with merging i386 and amd64 headers to get "cc -m32" working on amd64. Previously I tried to fix any problems with the headers first before merging, but this turned out to be a long tedious process. So, because lack of -m32 support is blocking other development I'd like to just merge a minimal set of headers without any macro/type definition changes or other reorganisations. The result will be similar to existing powerpc and mips headers which already support both 32 and 64 bit. When that's done I can create a development branch to work on the problems that have come up. Possible goals for such a branch are: - Fix inconsistencies such as types defined in sys/ and their limits in machine/. Also, sometimes the limits don't have the correct type. - For types/limits defined under machine/ there is a lot of duplication between architectures. Try to move some to sys/. - Try to make headers compiler agnostic; limit use of __attribute__, __builtin_* to a single file. - Maybe remove support for gcc 2.*. The oldest version in ports is 3.4 and I suspect some headers don't compile anymore with gcc 2.*. - Add support for new compiler attributes/built-ins. - Add missing POSIX prototypes, maybe commented out with /* UNIMPL ... */ or similar so they can be grepped. - The gcc ports currently patch some system headers where they think something is non-standard. These patched headers override the system headers which means you have to rebuild these ports whenever you do installworld to make sure they contain the latest changes. Some headers are trivial to fix, others less so. If there are no objections, I'd like to start with the -m32 work in a week or so. --nextPart1888453.YQFfBMEuoy Content-Type: application/pgp-signature; name=signature.asc Content-Description: This is a digitally signed message part. -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.18 (FreeBSD) iF4EABEIAAYFAk8wGQ0ACgkQfoCS2CCgtis7CQD7BNh94As/PtYliomKKL5RGARb 5y0MBTnSTZpnwASNNn0A/04yU/w6v+E8Yz1fNlVtIQcXfyLvcIV1/ZsO+yqjnJ20 =4vYu -----END PGP SIGNATURE----- --nextPart1888453.YQFfBMEuoy-- From owner-freebsd-arch@FreeBSD.ORG Tue Feb 7 22:16:17 2012 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id D2D61106566B; Tue, 7 Feb 2012 22:16:17 +0000 (UTC) (envelope-from giovanni.trematerra@gmail.com) Received: from mail-qy0-f182.google.com (mail-qy0-f182.google.com [209.85.216.182]) by mx1.freebsd.org (Postfix) with ESMTP id D2CAC8FC15; Tue, 7 Feb 2012 22:16:16 +0000 (UTC) Received: by qcmt40 with SMTP id t40so6039370qcm.13 for ; Tue, 07 Feb 2012 14:16:16 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=mime-version:sender:date:x-google-sender-auth:message-id:subject :from:to:cc:content-type; bh=FFPCWrIaj0YRxC1BRifQ7l0AkpOyjxuTDc2yKTaqUzo=; b=wWNCvAs0LG4fW8PTmrlAeKh1MbVB0K8JkaJJJciBUrt6aMkiIb3QfntedhK+donCCD /wfFiP+LdimZrDU5gKcBcRqM48W+2huZrabemo6y/Ea1OumXm3bF/oYqVoI4rNED3U1g Kcb8Pu6cSM9TUmXyfbBJ1NVlCdhzus3Z6Ph9Y= MIME-Version: 1.0 Received: by 10.224.95.196 with SMTP id e4mr3376446qan.63.1328652975939; Tue, 07 Feb 2012 14:16:15 -0800 (PST) Sender: giovanni.trematerra@gmail.com Received: by 10.229.81.200 with HTTP; Tue, 7 Feb 2012 14:16:15 -0800 (PST) Date: Tue, 7 Feb 2012 23:16:15 +0100 X-Google-Sender-Auth: xDlBgRLJUCOckRZDgSJUsU5828A Message-ID: From: Giovanni Trematerra To: freebsd-arch@freebsd.org Content-Type: text/plain; charset=ISO-8859-1 Cc: flo@freebsd.org, Kip Macy , Attilio Rao , Konstantin Belousov , bde@freebsd.org, jilles@freebsd.org Subject: [LAST ROUND] fifo/pipe merge code X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 07 Feb 2012 22:16:18 -0000 Hi All, Here a commit candidate patch of the experimental one posted on Jan 8 http://www.trematerra.net/patches/pipefifo_merge.4.3.patch The patch aims to - not introduce any performance regressions in pipe code - speed up performance for fifos - not introduce any regressions - fifos and pipes have to have the same behavior as before the patch. - share as much code as possible between pipes and fifos. The patch is greatly improved from its first experimental form. Now it's more clean and compact and integrates all the good suggestions from previous reviewers. Here some of the major differences: - no more different UMA zones to allocate things. - no more pipeinfo structure. - fix a bug at fifo_close. - fix a different behavior for fifos during truncate (thanks to jilles). The patch was reviewed in a previous form by jhb@, bde@, jilles@, attilio@(only style bug) This version of the patch was reviewed privately by jilles@, attilio@(only style bug) TEST ================================ the patch passes all the regression tests in tools/regression/fifo tools/regression/pipe tools/regression/poll /* same behavior as a stock kernel */ tools/regression/bin/sh /* same behavior as a stock kernel */ pho@ stress tested the patch for 19 hours without any fatal error or panic. BENCHMARK ================================= PIPE the benchmark was performed by compiling a GENERIC kernel. There's a lot of pipe code involving during a compilation. Such a test was used to discovery a performance regression in r226042. x build_stock.txt + build_patched.txt +------------------------------------------------------------------------------+ | + * x +| ||____________________________M_|_____A_______A_______M_____|______________| | +------------------------------------------------------------------------------+ N Min Max Median Avg Stddev x 3 350 351 351 350.66667 0.57735027 + 3 349 352 350 350.33333 1.5275252 No difference proven at 95.0% confidence So no performance regression at all into pipes code. FIFO To benchmark fifo it was sent 1000 chunks of different size through a fifo. (128, 512, 4096 bytes) x fifo_stock_128.txt + fifo_patched_128.txt +------------------------------------------------------------------------------+ | + | | + + x x x| ||M_A___| |________M__A____________| | +------------------------------------------------------------------------------+ N Min Max Median Avg Stddev x 3 835 839 836 836.66667 2.081666 + 3 826 827 826 826.33333 0.57735027 Difference at 95.0% confidence -10.3333 +/- 3.46228 -1.23506% +/- 0.413818% (Student's t, pooled s = 1.52753) x fifo_stock_512.txt + fifo_patched_512.txt +------------------------------------------------------------------------------+ |+++ xx| ||A| A|| +------------------------------------------------------------------------------+ N Min Max Median Avg Stddev x 3 1667 1671 1669 1669 2 + 3 1413 1419 1415 1415.6667 3.0550505 Difference at 95.0% confidence -253.333 +/- 5.85232 -15.1787% +/- 0.350648% (Student's t, pooled s = 2.58199) x fifo_stock_4096.txt + fifo_patched_4096.txt +------------------------------------------------------------------------------+ |+ | |+ x | |+ x x| |A |__M____A_______| | +------------------------------------------------------------------------------+ N Min Max Median Avg Stddev x 3 43682 45237 43694 44204.333 894.33569 + 3 36571 36606 36600 36592.333 18.717194 Difference at 95.0% confidence -7612 +/- 1433.69 -17.22% +/- 3.24332% (Student's t, pooled s = 632.529) if someone would like to reproduce the test can download a copy of the sources a http://www.trematerra.net/patches/fifoperf/ the code is from zhaoshuai@. Makefile, builds benchmark.c benchmark.c set up a fifo, fork and send a number of messages of a size depending from the input parameters. runme.sh run the test launching benchmark binary to send different number of messages of different sizes. Note that when the chunk is small there isn't so much difference with the patch but no regression in any case, when the chunk gets bigger there is a performance boost. Many thanks to all the people that spent their time to review the patch A special thanks to pho@ to have wasted his time testing the patch in its various forms. Reviews and possibly merge into the tree are welcomed. -- Gianni From owner-freebsd-arch@FreeBSD.ORG Wed Feb 8 05:49:46 2012 Return-Path: Delivered-To: freebsd-arch@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id CA77B1065674; Wed, 8 Feb 2012 05:49:46 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail08.syd.optusnet.com.au (mail08.syd.optusnet.com.au [211.29.132.189]) by mx1.freebsd.org (Postfix) with ESMTP id 385BB8FC0A; Wed, 8 Feb 2012 05:49:45 +0000 (UTC) Received: from c211-30-171-136.carlnfd1.nsw.optusnet.com.au (c211-30-171-136.carlnfd1.nsw.optusnet.com.au [211.30.171.136]) by mail08.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id q185nfS8015540 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Wed, 8 Feb 2012 16:49:43 +1100 Date: Wed, 8 Feb 2012 16:49:41 +1100 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Giovanni Trematerra In-Reply-To: Message-ID: <20120208151251.J1853@besplex.bde.org> References: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Mailman-Approved-At: Wed, 08 Feb 2012 06:20:03 +0000 Cc: flo@FreeBSD.org, Kip Macy , John Baldwin , Peter Holm , Attilio Rao , Konstantin Belousov , bde@FreeBSD.org, freebsd-arch@FreeBSD.org, jilles@FreeBSD.org Subject: Re: [LAST ROUND] fifo/pipe merge code X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 08 Feb 2012 05:49:46 -0000 On Tue, 7 Feb 2012, Giovanni Trematerra wrote: > Hi All, > Here a commit candidate patch of the experimental one posted on Jan 8 > http://www.trematerra.net/patches/pipefifo_merge.4.3.patch > > The patch aims to > - not introduce any performance regressions in pipe code > - speed up performance for fifos > - not introduce any regressions > - fifos and pipes have to have the same behavior as before the patch. > - share as much code as possible between pipes and fifos. > > The patch is greatly improved from its first experimental form. > Now it's more clean and compact and integrates all the good suggestions > from previous reviewers. Here some of the major differences: > - no more different UMA zones to allocate things. > - no more pipeinfo structure. > - fix a bug at fifo_close. > - fix a different behavior for fifos during truncate (thanks to jilles). > > The patch was reviewed in a previous form by jhb@, bde@, jilles@, > attilio@(only style bug) > This version of the patch was reviewed privately by jilles@, > attilio@(only style bug) It looks quite good now. I see only a few minor style bugs: % diff --git a/sys/fs/fifofs/fifo_vnops.c b/sys/fs/fifofs/fifo_vnops.c % index 94a2713..0d35877 100644 % --- a/sys/fs/fifofs/fifo_vnops.c % +++ b/sys/fs/fifofs/fifo_vnops.c % ... % +int % +fifo_iseof(struct file *fp) % ... % + KASSERT(fp->f_vnode != NULL, ("fifo_iseof: no vnode info.")); % + KASSERT(fp->f_vnode->v_fifoinfo != NULL, ("fifo_iseof: no fifoinfo.")); Error messages should not be terminated by a ".". All other KASSERT()s in this file and in sys_pipe.c follow this rule. But 1 old one in sys_pipe.c is terminated by a "!", and some in sys_pipe.c don't follow the rule that error messages should not be capitalized. I wonder if you can assert that some suitable lock is held here. % + fip = fp->f_vnode->v_fifoinfo; % + return (fp->f_seqcount == fip->fi_wgen); % } % ... % diff --git a/sys/kern/sys_pipe.c b/sys/kern/sys_pipe.c % index 9edcb74..5892e9f 100644 % --- a/sys/kern/sys_pipe.c % +++ b/sys/kern/sys_pipe.c % @@ -1,4 +1,5 @@ % /*- % + * Copyright (c) 2012 Giovanni Trematerra % * Copyright (c) 1996 John S. Dyson % * All rights reserved. % * Copyrights should be in historical order, as in fifo_vnops.c. % @@ -135,6 +138,10 @@ __FBSDID("$FreeBSD$"); % */ % /* #define PIPE_NODIRECT */ % % +#define PIPE_PEER(pipe) \ % + (((pipe)->pipe_state & PIPE_NAMED) ? \ % + (pipe) : ((pipe)->pipe_peer)) % + This has strange indentation for the macro body, and weird indentation for the backslashes to not line them up at all. With more normal indentation, there is no need to split the macro body and thus not enough backslashes to need lining up: %%% #define PIPE_PEER(pipe) \ (((pipe)->pipe_state & PIPE_NAMED) ? (pipe) : ((pipe)->pipe_peer)) %%% % ... % @@ -175,6 +184,11 @@ static struct filterops pipe_wfiltops = { % .f_detach = filt_pipedetach, % .f_event = filt_pipewrite % }; % +static struct filterops pipe_nfiltops = { % + .f_isfd = 1, % + .f_detach = filt_pipedetach_notsup, % + .f_event = filt_pipenotsup % +}; This should probably be sorted before the others (order nrw, not rwn). % % /* % * Default pipe buffer size(s), this can be kind-of large now because pipe % @@ -220,6 +234,7 @@ static void pipe_clone_write_buffer(struct pipe *wpipe); % static int pipespace(struct pipe *cpipe, int size); % static int pipespace_new(struct pipe *cpipe, int size); % % +static int pipe_paircreate(struct thread *td, struct pipepair **p_pp); % static int pipe_zone_ctor(void *mem, int size, void *arg, int flags); % static int pipe_zone_init(void *mem, int size, int flags); % static void pipe_zone_fini(void *mem, int size); This seems to be unsorted. I see no reason to sort it with the pipe_zone constriuctors and desstructors, especially if you don't name it pipe_zone_*. The declarations of lots of other constructors are unsorted into the main list. Old style bugs: the pipe_zone functions shouldn't be in a separate section and that section shouldn't have a different indentation style (although that style is closer to KNF), especially since these functions are more systematically named than the others (the zone_ in their prefix keeps them together when they are sorted alphabetically). % @@ -1284,6 +1348,17 @@ pipe_ioctl(fp, cmd, data, active_cred, td) % break; % % case FIONREAD: % + Extra blank line. % + /* % + * FIONREAD will return 0 for non-readable descriptors, and % + * the results of FIONREAD on the read pipe for readable % + * descriptors. % + */ % + if (!(fp->f_flag & FREAD)) { % + *(int *)data = 0; % + PIPE_UNLOCK(mpipe); % + return (0); % + } % if (mpipe->pipe_state & PIPE_DIRECTW) % *(int *)data = mpipe->pipe_map.cnt; % else Urk, now I see many major style bugs: % @@ -1333,47 +1408,55 @@ pipe_poll(fp, events, active_cred, td) % int error; % #endif % % - wpipe = rpipe->pipe_peer; % + wpipe = PIPE_PEER(rpipe); % PIPE_LOCK(rpipe); % #ifdef MAC % error = mac_pipe_check_poll(active_cred, rpipe->pipe_pair); % if (error) % goto locked_error; % #endif % - if (events & (POLLIN | POLLRDNORM)) % - if ((rpipe->pipe_state & PIPE_DIRECTW) || % - (rpipe->pipe_buffer.cnt > 0)) % - revents |= events & (POLLIN | POLLRDNORM); % - % - if (events & (POLLOUT | POLLWRNORM)) % - if (wpipe->pipe_present != PIPE_ACTIVE || % - (wpipe->pipe_state & PIPE_EOF) || % - (((wpipe->pipe_state & PIPE_DIRECTW) == 0) && % - ((wpipe->pipe_buffer.size - wpipe->pipe_buffer.cnt) >= PIPE_BUF || % - wpipe->pipe_buffer.size == 0))) % - revents |= events & (POLLOUT | POLLWRNORM); % - % - if ((events & POLLINIGNEOF) == 0) { % - if (rpipe->pipe_state & PIPE_EOF) { % - revents |= (events & (POLLIN | POLLRDNORM)); % - if (wpipe->pipe_present != PIPE_ACTIVE || % - (wpipe->pipe_state & PIPE_EOF)) % - revents |= POLLHUP; % + if (fp->f_flag & FREAD) { % + if (events & (POLLIN | POLLRDNORM)) % + if ((rpipe->pipe_state & PIPE_DIRECTW) || % + (rpipe->pipe_buffer.cnt > 0)) % + revents |= events & (POLLIN | POLLRDNORM); % + This part of the patch is still unreadable since it indents everything to add FREAD and FWRITE checks. I don't like those anyway (they should probably produce POLLERR POLLNVAL instead of ignoring the error -- see previous mail), but they are needed for now for compatibility in the FIFO case (hopefully they don't break compatibility for the plain pipe case). To avoid churning this code for them now and later when the behaviour is fixed, they should be added in a non-invasive way like the old fifo code did. That involved using && instead of a nested if. In the old code, everything was under the combined && if. Now there are 3 nested ifs. This logic change seems to be non-null and give bugs (1). % + if (rpipe->pipe_state & PIPE_NAMED) % + if (fifo_iseof(fp)) Here the (unrelated) nested if is just a style bug. It gives verboseness and extra indentation. % + events |= POLLINIGNEOF; % + (1) In the old code, POLLINIGNEOF was only set here if we are polling for some input condition (and this is valid). Now it is set if the file is just open for input. Either way could be right, but this is a change. If we are not polling for input, we don't care about it, but POSIX requires POLLHUP to be set in some cases, and this affects whether we return or block, and POLLINIGNEOF interacts with this subtly. There may also be a problem not going through here to set POLLINIGNEOF in the !FREAD case. % + if ((events & POLLINIGNEOF) == 0) { % + if (rpipe->pipe_state & PIPE_EOF) { % + revents |= (events & (POLLIN | POLLRDNORM)); % + if (wpipe->pipe_present != PIPE_ACTIVE || % + (wpipe->pipe_state & PIPE_EOF)) % + revents |= POLLHUP; % + } % } This is the old sys_pipe.c logic, indented under the FREAD condition. It is also move up to be inside the FREAD block. To avoid the re-indentation, repeat the FREAD test in and && condition. After avoiding all re-indentations by repeating FREAD and FWRITE tests, you don't need to re-order either. % } This is not consistently verbose in adding extra blank lines. % + if (fp->f_flag & FWRITE) % + if (events & (POLLOUT | POLLWRNORM)) % + if (wpipe->pipe_present != PIPE_ACTIVE || % + (wpipe->pipe_state & PIPE_EOF) || % + (((wpipe->pipe_state & PIPE_DIRECTW) == 0) && % + ((wpipe->pipe_buffer.size - wpipe->pipe_buffer.cnt) >= % + PIPE_BUF || wpipe->pipe_buffer.size == 0))) % + revents |= events & (POLLOUT | POLLWRNORM); % I don't like excessive braces, but this statement needs unnecessary ones. It should also be written using &&, not using a nested if. Since there is nothing like POLLINIGNEOF to complicate things, only 1 if statement is needed for FWRITE (until we get to slrecording). % if (revents == 0) { % - if (events & (POLLIN | POLLRDNORM)) { % - selrecord(td, &rpipe->pipe_sel); % - if (SEL_WAITING(&rpipe->pipe_sel)) % - rpipe->pipe_state |= PIPE_SEL; % - } % + if (fp->f_flag & FREAD) % + if (events & (POLLIN | POLLRDNORM)) { Use && here. It already has sufficient braces, at least after this change. % + selrecord(td, &rpipe->pipe_sel); % + if (SEL_WAITING(&rpipe->pipe_sel)) % + rpipe->pipe_state |= PIPE_SEL; % + } % % - if (events & (POLLOUT | POLLWRNORM)) { % - selrecord(td, &wpipe->pipe_sel); % - if (SEL_WAITING(&wpipe->pipe_sel)) % - wpipe->pipe_state |= PIPE_SEL; % - } % + if (fp->f_flag & FWRITE) % + if (events & (POLLOUT | POLLWRNORM)) { Use && here, as above. % + selrecord(td, &wpipe->pipe_sel); % + if (SEL_WAITING(&wpipe->pipe_sel)) % + wpipe->pipe_state |= PIPE_SEL; % + } % } % #ifdef MAC % locked_error: % ... % diff --git a/sys/sys/pipe.h b/sys/sys/pipe.h % index 20b1092..479223f 100644 % --- a/sys/sys/pipe.h % +++ b/sys/sys/pipe.h % ... % @@ -138,5 +139,9 @@ struct pipepair { % #define PIPE_UNLOCK(pipe) mtx_unlock(PIPE_MTX(pipe)) % #define PIPE_LOCK_ASSERT(pipe, type) mtx_assert(PIPE_MTX(pipe), (type)) % % +extern struct fileops pipeops; This should be declared in the existing extern section for data at the top of the file. That section has just one declaration. Old bug: that section has a bogus comment. It says /* See sys_pipe.c for info on what these limits mean. */" (actually a block comment to be verbose about this unimportant and misdescribed point), but there are no limits here; there is just 1 variable declaration. There are many more variables and limits, but the others are all in sys_pipe.c. This one (maxpipekva) unfortunately has to be declared here so that subr_param.c can initialize it, but this is an implementation detail unrelated to the meaning of this variable. maxpipekva is also instantiated in a wrong place (in subr_param.c instead of in sys_pipe.c). % + % +int pipe_fifo_ctor(struct pipe **ppipe, struct thread *td); % +void pipe_dtor(struct pipe *dpipe); These are sorted backwards. They aren't really even in logical order (ctor before dtor, which is also alphabetical), because the ctor is for fifos only but the dtor is generic. % % #endif /* !_SYS_PIPE_H_ */ Bruce From owner-freebsd-arch@FreeBSD.ORG Wed Feb 8 14:51:48 2012 Return-Path: Delivered-To: freebsd-arch@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id D50E31065672 for ; Wed, 8 Feb 2012 14:51:48 +0000 (UTC) (envelope-from theraven@FreeBSD.org) Received: from theravensnest.org (theravensnest.org [109.169.23.128]) by mx1.freebsd.org (Postfix) with ESMTP id 7C6F18FC0A for ; Wed, 8 Feb 2012 14:51:48 +0000 (UTC) Received: from [192.168.0.2] (cpc1-cwma8-2-0-cust257.7-3.cable.virginmedia.com [82.20.153.2]) (authenticated bits=0) by theravensnest.org (8.14.4/8.14.4) with ESMTP id q18EEHfJ009567 (version=TLSv1/SSLv3 cipher=DHE-DSS-AES128-SHA bits=128 verify=NO) for ; Wed, 8 Feb 2012 14:14:18 GMT (envelope-from theraven@FreeBSD.org) From: David Chisnall Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: quoted-printable Date: Wed, 8 Feb 2012 14:14:12 +0000 Message-Id: <58175263-109E-4FF0-BB29-E0331C01DCD5@FreeBSD.org> To: freebsd-arch@FreeBSD.org Mime-Version: 1.0 (Apple Message framework v1257) X-Mailer: Apple Mail (2.1257) Cc: Subject: NO_TLS flag for public headers X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 08 Feb 2012 14:51:48 -0000 Hi, The xlocale changes in libc mean that a lot of things that were = previously global state are now per-thread. This includes some things = that were used in ctype.h and are inlined. We're now losing the benefit = of inlining them because we now have to call into libc to get the = per-thread values. I'd like to use TLS for caching these things, but we = currently have no equivalent of the NO_TLS flag for public headers. I'd = like to have a __NO_TLS that can be set in cdefs.h for architectures = that don't have TLS support. Any comments? Objections? Anyone want to propose a better place for it = (or, indeed, add it, since I'm not entirely sure which architectures do = lack TLS support...) David= From owner-freebsd-arch@FreeBSD.ORG Wed Feb 8 21:14:50 2012 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 8BBF01065672; Wed, 8 Feb 2012 21:14:50 +0000 (UTC) (envelope-from giovanni.trematerra@gmail.com) Received: from mail-qw0-f54.google.com (mail-qw0-f54.google.com [209.85.216.54]) by mx1.freebsd.org (Postfix) with ESMTP id A3A038FC1C; Wed, 8 Feb 2012 21:14:49 +0000 (UTC) Received: by qaea17 with SMTP id a17so790369qae.13 for ; Wed, 08 Feb 2012 13:14:49 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:cc:content-type :content-transfer-encoding; bh=e2avok8QoI+HF9tv/XTduTAPp5OGNQCPUYWWvaQFVB4=; b=mCgbdBJQHvKzWCHaKzSniEGBhNRWLyAz6el/ClooJySqDuM6TEicfKlWZK8fP3bTK2 bOtx0cnh2/Lvjis7xHJps+xIHd/P27VYNxE9mruCCVJ3XwRnnQLqu2eULzk2ByZDODMn 9M0VqPjRZkVFGHSNSR44p+NBGJsFFBRIZm3WQ= MIME-Version: 1.0 Received: by 10.224.179.74 with SMTP id bp10mr3985016qab.89.1328735688924; Wed, 08 Feb 2012 13:14:48 -0800 (PST) Sender: giovanni.trematerra@gmail.com Received: by 10.229.81.200 with HTTP; Wed, 8 Feb 2012 13:14:48 -0800 (PST) In-Reply-To: References: <20120208151251.J1853@besplex.bde.org> Date: Wed, 8 Feb 2012 22:14:48 +0100 X-Google-Sender-Auth: UzE94AUxdhaqicn8iEnIyNsNrjA Message-ID: From: Giovanni Trematerra To: freebsd-arch@freebsd.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Cc: Kip Macy , Attilio Rao , Konstantin Belousov , bde@freebsd.org, jilles@freebsd.org Subject: Fwd: [LAST ROUND] fifo/pipe merge code X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 08 Feb 2012 21:14:50 -0000 On Wed, Feb 8, 2012 at 6:49 AM, Bruce Evans wrote: > On Tue, 7 Feb 2012, Giovanni Trematerra wrote: > > > It looks quite good now. =A0I see only a few minor style bugs: Hi Bruce, Thank you to have found time to review the patch again. I updated the patch and I hope I fixed all the minor style bugs that you pointed out. here the new one: http://www.trematerra.net/patches/pipefifo_merge.4.4.patch > > % diff --git a/sys/fs/fifofs/fifo_vnops.c b/sys/fs/fifofs/fifo_vnops.c > % index 94a2713..0d35877 100644 > % --- a/sys/fs/fifofs/fifo_vnops.c > % +++ b/sys/fs/fifofs/fifo_vnops.c > % ... > % +int > % +fifo_iseof(struct file *fp) > % ... > % + =A0 =A0 KASSERT(fp->f_vnode !=3D NULL, ("fifo_iseof: no vnode info.")= ); > % + =A0 =A0 KASSERT(fp->f_vnode->v_fifoinfo !=3D NULL, ("fifo_iseof: no > fifoinfo.")); > > > I wonder if you can assert that some suitable lock is held here. > > % + =A0 =A0 fip =3D fp->f_vnode->v_fifoinfo; > % + =A0 =A0 return (fp->f_seqcount =3D=3D fip->fi_wgen); > % =A0} > % ... Now I changed fifo_open to remove global fifo_mtx mutex, using PIPE_MTX, and assert to be held in fifo_iseof. I stress tested the patch with WITNESS and INVARIANT on and got no a panic and no LOR. > > % @@ -1284,6 +1348,17 @@ pipe_ioctl(fp, cmd, data, active_cred, td) > % =A0 =A0 =A0 =A0 =A0 =A0 =A0 break; > % % =A0 =A0 case FIONREAD: > % + > % + =A0 =A0 =A0 =A0 =A0 =A0 /* > % + =A0 =A0 =A0 =A0 =A0 =A0 =A0* FIONREAD will return 0 for non-readable = descriptors, and > % + =A0 =A0 =A0 =A0 =A0 =A0 =A0* the results of FIONREAD on the read pipe= for readable > % + =A0 =A0 =A0 =A0 =A0 =A0 =A0* descriptors. > % + =A0 =A0 =A0 =A0 =A0 =A0 =A0*/ > % + =A0 =A0 =A0 =A0 =A0 =A0 if (!(fp->f_flag & FREAD)) { > % + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 *(int *)data =3D 0; > % + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 PIPE_UNLOCK(mpipe); > % + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 return (0); > % + =A0 =A0 =A0 =A0 =A0 =A0 } > % =A0 =A0 =A0 =A0 =A0 =A0 =A0 if (mpipe->pipe_state & PIPE_DIRECTW) > % =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 *(int *)data =3D mpipe->pip= e_map.cnt; > % =A0 =A0 =A0 =A0 =A0 =A0 =A0 else I completely removed that comment now. The code is clear enough. > > Urk, now I see many major style bugs: > > % @@ -1333,47 +1408,55 @@ pipe_poll(fp, events, active_cred, td) > % =A0 =A0 =A0 int error; > % =A0#endif > % % - =A0 wpipe =3D rpipe->pipe_peer; > % + =A0 =A0 wpipe =3D PIPE_PEER(rpipe); > % =A0 =A0 =A0 PIPE_LOCK(rpipe); > % =A0#ifdef MAC > % =A0 =A0 =A0 error =3D mac_pipe_check_poll(active_cred, rpipe->pipe_pair= ); > % =A0 =A0 =A0 if (error) > % =A0 =A0 =A0 =A0 =A0 =A0 =A0 goto locked_error; > % =A0#endif > % - =A0 =A0 if (events & (POLLIN | POLLRDNORM)) > % - =A0 =A0 =A0 =A0 =A0 =A0 if ((rpipe->pipe_state & PIPE_DIRECTW) || > % - =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 (rpipe->pipe_buffer.cnt > 0)) > % - =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 revents |=3D events & (POLLIN= | POLLRDNORM); > % - > % - =A0 =A0 if (events & (POLLOUT | POLLWRNORM)) > % - =A0 =A0 =A0 =A0 =A0 =A0 if (wpipe->pipe_present !=3D PIPE_ACTIVE || > % - =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 (wpipe->pipe_state & PIPE_EOF) || > % - =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 (((wpipe->pipe_state & PIPE_DIRECTW) = =3D=3D 0) && > % - =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0((wpipe->pipe_buffer.size - wpipe-= >pipe_buffer.cnt) >=3D > PIPE_BUF || > % - =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0wpipe->pipe_buffer.size = =3D=3D 0))) > % - =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 revents |=3D events & (POLLOU= T | POLLWRNORM); > % - > % - =A0 =A0 if ((events & POLLINIGNEOF) =3D=3D 0) { > % - =A0 =A0 =A0 =A0 =A0 =A0 if (rpipe->pipe_state & PIPE_EOF) { > % - =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 revents |=3D (events & (POLLI= N | POLLRDNORM)); > % - =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 if (wpipe->pipe_present !=3D = PIPE_ACTIVE || > % - =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 (wpipe->pipe_state & = PIPE_EOF)) > % - =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 revents |=3D = POLLHUP; > % + =A0 =A0 if (fp->f_flag & FREAD) { > % + =A0 =A0 =A0 =A0 =A0 =A0 if (events & (POLLIN | POLLRDNORM)) > % + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 if ((rpipe->pipe_state & PIPE= _DIRECTW) || > % + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 (rpipe->pipe_buffer.c= nt > 0)) > % + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 revents |=3D = events & (POLLIN | POLLRDNORM); > % + > > This part of the patch is still unreadable since it indents everything to > add FREAD and FWRITE checks. =A0I don't like those anyway (they should > probably produce POLLERR POLLNVAL instead of ignoring the error -- see > previous mail), but they are needed for now for compatibility in the > FIFO case (hopefully they don't break compatibility for the plain pipe > case). I tried to make this part of the patch more readable, I hope I did. Talking about returning POLLERR/POLLNVAL as I said previously, I like to work on that and on making pipe/fifo implementation more POSIX compliant just after the patch will be merge the tree. > To avoid churning this code for them now and later when the > behaviour is fixed, they should be added in a non-invasive way like the > old fifo code did. =A0That involved using && instead of a nested if. =A0I= n > the old code, everything was under the combined && if. =A0Now there are > 3 nested ifs. =A0This logic change seems to be non-null and give bugs (1)= . > > % + =A0 =A0 =A0 =A0 =A0 =A0 if (rpipe->pipe_state & PIPE_NAMED) > % + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 if (fifo_iseof(fp)) > > Here the (unrelated) nested if is just a style bug. =A0It gives verbosene= ss > and extra indentation. > > % + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 events |=3D P= OLLINIGNEOF; > % + > > (1) In the old code, POLLINIGNEOF was only set here if we are polling for > some input condition (and this is valid). =A0Now it is set if the file is > just open for input. =A0Either way could be right, but this is a change. = =A0If > we are not polling for input, we don't care about it, but POSIX requires > POLLHUP to be set in some cases, and this affects whether we return or > block, and POLLINIGNEOF interacts with this subtly. =A0There may also be > a problem not going through here to set POLLINIGNEOF in the !FREAD case. Fixed. Thank you. While I hope I didn't introduced new bugs, do you agree that at this point the patch is in such a shape that makes it committable? -- Gianni From owner-freebsd-arch@FreeBSD.ORG Wed Feb 8 21:51:24 2012 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 6D3F01065673 for ; Wed, 8 Feb 2012 21:51:24 +0000 (UTC) (envelope-from marius@alchemy.franken.de) Received: from alchemy.franken.de (alchemy.franken.de [194.94.249.214]) by mx1.freebsd.org (Postfix) with ESMTP id 06B1B8FC1E for ; Wed, 8 Feb 2012 21:51:23 +0000 (UTC) Received: from alchemy.franken.de (localhost [127.0.0.1]) by alchemy.franken.de (8.14.4/8.14.4/ALCHEMY.FRANKEN.DE) with ESMTP id q18LpMQq028781; Wed, 8 Feb 2012 22:51:22 +0100 (CET) (envelope-from marius@alchemy.franken.de) Received: (from marius@localhost) by alchemy.franken.de (8.14.4/8.14.4/Submit) id q18LpMrw028780; Wed, 8 Feb 2012 22:51:22 +0100 (CET) (envelope-from marius) Date: Wed, 8 Feb 2012 22:51:22 +0100 From: Marius Strobl To: David Chisnall Message-ID: <20120208215122.GA28769@alchemy.franken.de> References: <58175263-109E-4FF0-BB29-E0331C01DCD5@FreeBSD.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <58175263-109E-4FF0-BB29-E0331C01DCD5@FreeBSD.org> User-Agent: Mutt/1.4.2.3i Cc: freebsd-arch@freebsd.org Subject: Re: NO_TLS flag for public headers X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 08 Feb 2012 21:51:24 -0000 On Wed, Feb 08, 2012 at 02:14:12PM +0000, David Chisnall wrote: > Hi, > > The xlocale changes in libc mean that a lot of things that were previously global state are now per-thread. This includes some things that were used in ctype.h and are inlined. We're now losing the benefit of inlining them because we now have to call into libc to get the per-thread values. I'd like to use TLS for caching these things, but we currently have no equivalent of the NO_TLS flag for public headers. I'd like to have a __NO_TLS that can be set in cdefs.h for architectures that don't have TLS support. > > Any comments? Objections? Anyone want to propose a better place for it (or, indeed, add it, since I'm not entirely sure which architectures do lack TLS support...) > See lib/libc/stdlib/malloc.c, arm and mips currently are the only supported FreeBSD architectures that have no support for TLS. Marius From owner-freebsd-arch@FreeBSD.ORG Fri Feb 10 10:57:28 2012 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 1552C1065670; Fri, 10 Feb 2012 10:57:28 +0000 (UTC) (envelope-from ed@hoeg.nl) Received: from mx0.hoeg.nl (mx0.hoeg.nl [IPv6:2a01:4f8:101:5343::aa]) by mx1.freebsd.org (Postfix) with ESMTP id A07C68FC1C; Fri, 10 Feb 2012 10:57:27 +0000 (UTC) Received: by mx0.hoeg.nl (Postfix, from userid 1000) id D48202A28CCE; Fri, 10 Feb 2012 11:57:26 +0100 (CET) Date: Fri, 10 Feb 2012 11:57:26 +0100 From: Ed Schouten To: Marius Strobl Message-ID: <20120210105726.GO1860@hoeg.nl> References: <58175263-109E-4FF0-BB29-E0331C01DCD5@FreeBSD.org> <20120208215122.GA28769@alchemy.franken.de> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="TiGNc3ldf20CNpkJ" Content-Disposition: inline In-Reply-To: <20120208215122.GA28769@alchemy.franken.de> User-Agent: Mutt/1.5.21 (2010-09-15) Cc: David Chisnall , freebsd-arch@freebsd.org Subject: Re: NO_TLS flag for public headers X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 10 Feb 2012 10:57:28 -0000 --TiGNc3ldf20CNpkJ Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable Hi Marius, * Marius Strobl , 20120208 22:51: > See lib/libc/stdlib/malloc.c, arm and mips currently are the only > supported FreeBSD architectures that have no support for TLS. Just out of curiosity, what is needed to make TLS work? Is it just the lack of support by our toolchain or is there also stuff on our side that needs to be done? --=20 Ed Schouten WWW: http://80386.nl/ --TiGNc3ldf20CNpkJ Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (FreeBSD) iQIcBAEBAgAGBQJPNPgWAAoJEG5e2P40kaK7r1cQAI4cak0Q33LdRJXiMzmgx9iC 7CYBf+MNqPcw7ZtrYcVO7xfISvBXWw10h2mZ4K8w8DEIarIM4bOOO9mpGtQ7PSg1 qCF9W46rgimNTeE2RpQT3tlCnCo8/KB31DWXrG8gnS5kGfdSv3MFDg+TqQLGxL5F +ot0Dsz2E2iI1MObRn1VudaXYJZmFVOiattZko87yXUD3Go9PXmd9hxs6SHZ+YyR OKqp+H1djas+OacKtR4ktz2OI8B5Tj684CnnUDsFY3aTopYKEndYTTFXIEZqs60W iOIZiK8EsRxorpwszf5NvPPb9GQ2FFSh1ego+9nwQb4GiPPG4l5Mcy87aQx+azfd CWK4PAnGROgTm6y5kdO+IweftGZPR0Oq+1TThOV15I8AtQLa8gruYFVEsoZilmSc ZLrVVJXfDMRH1wJV/wft8NSio51Q/vr5TDCtoo+VfFIVjF73zTGtzVGjO0vFR6q8 MBWxgGpOVTv80jI5NWESnk9y4jRIm0DRSUoEOI+EJP+HqE/lsZGWnvY7xxgwyOUl LfpSIuY/0jGEt9fiC2mVFcGQT081u9bnAw2Tp1Eue+E3w2r3exXeSB48iVb1fESY O560D9Cy8BnEp9B+Tnx/MUh1LjsMHWDV2WSW6KEvEY0tVNR8LjE4sCz/uKkpK+Ln x2q93Fq08BTfdeoF05bC =qdkC -----END PGP SIGNATURE----- --TiGNc3ldf20CNpkJ-- From owner-freebsd-arch@FreeBSD.ORG Fri Feb 10 11:42:55 2012 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 8B523106566C for ; Fri, 10 Feb 2012 11:42:55 +0000 (UTC) (envelope-from brooks@lor.one-eyed-alien.net) Received: from lor.one-eyed-alien.net (lor.one-eyed-alien.net [69.66.77.232]) by mx1.freebsd.org (Postfix) with ESMTP id 4FDD18FC12 for ; Fri, 10 Feb 2012 11:42:54 +0000 (UTC) Received: from lor.one-eyed-alien.net (localhost [127.0.0.1]) by lor.one-eyed-alien.net (8.14.4/8.14.4) with ESMTP id q1AB2i6Q002411; Fri, 10 Feb 2012 05:02:44 -0600 (CST) (envelope-from brooks@lor.one-eyed-alien.net) Received: (from brooks@localhost) by lor.one-eyed-alien.net (8.14.4/8.14.4/Submit) id q1AB2h2u002410; Fri, 10 Feb 2012 05:02:43 -0600 (CST) (envelope-from brooks) Date: Fri, 10 Feb 2012 05:02:43 -0600 From: Brooks Davis To: Aleksandr Rybalko Message-ID: <20120210110243.GA2012@lor.one-eyed-alien.net> References: <20120131213857.86c81626.ray@ddteam.net> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="xHFwDpU9dbj6ez1V" Content-Disposition: inline In-Reply-To: <20120131213857.86c81626.ray@ddteam.net> User-Agent: Mutt/1.5.21 (2010-09-15) Cc: freebsd-arch@freebsd.org Subject: Re: dynamic attach of hinted devices X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 10 Feb 2012 11:42:55 -0000 --xHFwDpU9dbj6ez1V Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Tue, Jan 31, 2012 at 09:38:57PM +0200, Aleksandr Rybalko wrote: > Hi FreeBSD hackers, >=20 > at first I want to say this: :) > WARNING: FOLLOWING DEVCTL PATCH MAY EASILY PANIC YOUR SYSTEM, PLEASE > DO NOT TRY IT ON PRODUCTION SERVERS AND TRY IT WITH FILESYSTEMS MOUNTED > AS READONLY :))))) >=20 > So I introduce two patches first one [1] used to migrate from > static_hints or hints in the static_kenv to dynamic hints. >=20 > sysctl kern.hintmode=3D2 will copy hints from static hints or from static > kenv and put it into dynamic kenv. Those will allow to manipulate hints > values and attach hinted devices with devctl tool. >=20 > Second [2] allow attach/detach devices with userland tool devctl. >=20 > devctl tool allow add and initialize new devices which is not able to > be autoenumerating, such a hinted devices. >=20 > Both designed to have ability update EEPROM items in runtime, since > some device can't work in mode when it accessible like a EEPROM chip. >=20 > Example: > # sysctl kern.hintmode=3D2 > # kenv hint.mx25l.0.at=3D"spibus0" > # kenv hint.mx25l.0.cs=3D0 > # kenv hint.mx25l.0.chipname=3D"at25128" > # devctl hinted spibus 0 mx25l 0 > mx25l0: at cs 0 mode 0 on spibus0 > mx25l0: at25128, sector 64 bytes, 256 sectors > GEOM: new disk flash/spi0 >=20 > Someone may found it also useful for testing device attach/detach code > (memory leaks, resource allocation, etc). >=20 > So, say me please your opinion. I skimmed over the patch and the concept looks good to me. I'm probably not the right person to review this proposal in detail, but perhaps John Baldwin (cc'd) could do it or suggest someone. -- Brooks --xHFwDpU9dbj6ez1V Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (FreeBSD) iD8DBQFPNPlSXY6L6fI4GtQRAvOlAJ9RPlUxogB3GHyiQaOSfxocY8flBQCglN3k e7C3GpmwOACndMYqUW5O8Cg= =IsAC -----END PGP SIGNATURE----- --xHFwDpU9dbj6ez1V-- From owner-freebsd-arch@FreeBSD.ORG Fri Feb 10 16:52:44 2012 Return-Path: Delivered-To: freebsd-arch@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 2E88A1065675; Fri, 10 Feb 2012 16:52:44 +0000 (UTC) (envelope-from imp@bsdimp.com) Received: from harmony.bsdimp.com (bsdimp.com [199.45.160.85]) by mx1.freebsd.org (Postfix) with ESMTP id BB91B8FC12; Fri, 10 Feb 2012 16:52:43 +0000 (UTC) Received: from [10.30.101.53] ([209.117.142.2]) (authenticated bits=0) by harmony.bsdimp.com (8.14.4/8.14.3) with ESMTP id q1AGjrAQ045973 (version=TLSv1/SSLv3 cipher=DHE-DSS-AES128-SHA bits=128 verify=NO); Fri, 10 Feb 2012 09:45:56 -0700 (MST) (envelope-from imp@bsdimp.com) Mime-Version: 1.0 (Apple Message framework v1084) Content-Type: text/plain; charset=us-ascii From: Warner Losh In-Reply-To: <20120210110243.GA2012@lor.one-eyed-alien.net> Date: Fri, 10 Feb 2012 09:45:48 -0700 Content-Transfer-Encoding: quoted-printable Message-Id: <4996D1CF-25E5-4422-996C-DF23B5C10732@bsdimp.com> References: <20120131213857.86c81626.ray@ddteam.net> <20120210110243.GA2012@lor.one-eyed-alien.net> To: Brooks Davis X-Mailer: Apple Mail (2.1084) X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.0.1 (harmony.bsdimp.com [10.0.0.6]); Fri, 10 Feb 2012 09:45:57 -0700 (MST) Cc: Aleksandr Rybalko , freebsd-arch@FreeBSD.org Subject: Re: dynamic attach of hinted devices X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 10 Feb 2012 16:52:44 -0000 On Feb 10, 2012, at 4:02 AM, Brooks Davis wrote: > On Tue, Jan 31, 2012 at 09:38:57PM +0200, Aleksandr Rybalko wrote: >> Hi FreeBSD hackers, >>=20 >> at first I want to say this: :) >> WARNING: FOLLOWING DEVCTL PATCH MAY EASILY PANIC YOUR SYSTEM, PLEASE >> DO NOT TRY IT ON PRODUCTION SERVERS AND TRY IT WITH FILESYSTEMS = MOUNTED >> AS READONLY :))))) >>=20 >> So I introduce two patches first one [1] used to migrate from >> static_hints or hints in the static_kenv to dynamic hints. >>=20 >> sysctl kern.hintmode=3D2 will copy hints from static hints or from = static >> kenv and put it into dynamic kenv. Those will allow to manipulate = hints >> values and attach hinted devices with devctl tool. >>=20 >> Second [2] allow attach/detach devices with userland tool devctl. >>=20 >> devctl tool allow add and initialize new devices which is not able to >> be autoenumerating, such a hinted devices. >>=20 >> Both designed to have ability update EEPROM items in runtime, since >> some device can't work in mode when it accessible like a EEPROM chip. >>=20 >> Example: >> # sysctl kern.hintmode=3D2 >> # kenv hint.mx25l.0.at=3D"spibus0" >> # kenv hint.mx25l.0.cs=3D0 >> # kenv hint.mx25l.0.chipname=3D"at25128" >> # devctl hinted spibus 0 mx25l 0 >> mx25l0: at cs 0 mode 0 on spibus0 >> mx25l0: at25128, sector 64 bytes, 256 sectors >> GEOM: new disk flash/spi0 >>=20 >> Someone may found it also useful for testing device attach/detach = code >> (memory leaks, resource allocation, etc). >>=20 >> So, say me please your opinion. >=20 > I skimmed over the patch and the concept looks good to me. I'm = probably > not the right person to review this proposal in detail, but perhaps = John > Baldwin (cc'd) could do it or suggest someone. I've looked at it and it looks OK, but I had lots of feedback I didn't = have time to write up. I like the concept, but have lots of suggestions = for improvement since I'd like to see this more generically done. Warner= From owner-freebsd-arch@FreeBSD.ORG Fri Feb 10 16:52:54 2012 Return-Path: Delivered-To: freebsd-arch@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 6B9DD106566C; Fri, 10 Feb 2012 16:52:54 +0000 (UTC) (envelope-from imp@bsdimp.com) Received: from harmony.bsdimp.com (bsdimp.com [199.45.160.85]) by mx1.freebsd.org (Postfix) with ESMTP id 23F2E8FC19; Fri, 10 Feb 2012 16:52:54 +0000 (UTC) Received: from [10.30.101.53] ([209.117.142.2]) (authenticated bits=0) by harmony.bsdimp.com (8.14.4/8.14.3) with ESMTP id q1AGiOm9045940 (version=TLSv1/SSLv3 cipher=DHE-DSS-AES128-SHA bits=128 verify=NO); Fri, 10 Feb 2012 09:44:26 -0700 (MST) (envelope-from imp@bsdimp.com) Mime-Version: 1.0 (Apple Message framework v1084) Content-Type: text/plain; charset=us-ascii From: Warner Losh In-Reply-To: <20120210105726.GO1860@hoeg.nl> Date: Fri, 10 Feb 2012 09:44:16 -0700 Content-Transfer-Encoding: quoted-printable Message-Id: References: <58175263-109E-4FF0-BB29-E0331C01DCD5@FreeBSD.org> <20120208215122.GA28769@alchemy.franken.de> <20120210105726.GO1860@hoeg.nl> To: Ed Schouten X-Mailer: Apple Mail (2.1084) X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.0.1 (harmony.bsdimp.com [10.0.0.6]); Fri, 10 Feb 2012 09:44:26 -0700 (MST) Cc: freebsd-arch@FreeBSD.org, David Chisnall , Marius Strobl Subject: Re: NO_TLS flag for public headers X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 10 Feb 2012 16:52:54 -0000 On Feb 10, 2012, at 3:57 AM, Ed Schouten wrote: > Hi Marius, >=20 > * Marius Strobl , 20120208 22:51: >> See lib/libc/stdlib/malloc.c, arm and mips currently are the only >> supported FreeBSD architectures that have no support for TLS. >=20 > Just out of curiosity, what is needed to make TLS work? Is it just the > lack of support by our toolchain or is there also stuff on our side = that > needs to be done? On MIPS it is partially a toolchain issue and partially a kernel issue. = I believe the kernel issue has been fixed recently... I'm not sure = what's left on the toolchain side. Warner From owner-freebsd-arch@FreeBSD.ORG Fri Feb 10 17:36:48 2012 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id F15C21065670 for ; Fri, 10 Feb 2012 17:36:48 +0000 (UTC) (envelope-from gonzo@hq.bluezbox.com) Received: from hq.bluezbox.com (hq.bluezbox.com [70.38.37.145]) by mx1.freebsd.org (Postfix) with ESMTP id ACE168FC0A for ; Fri, 10 Feb 2012 17:36:48 +0000 (UTC) Received: from [24.87.53.93] (helo=[192.168.1.116]) by hq.bluezbox.com with esmtpsa (TLSv1:AES128-SHA:128) (Exim 4.73 (FreeBSD)) (envelope-from ) id 1RvuOj-000MDI-2H; Fri, 10 Feb 2012 09:36:40 -0800 Mime-Version: 1.0 (Apple Message framework v1084) Content-Type: text/plain; charset=us-ascii From: Oleksandr Tymoshenko In-Reply-To: <20120210105726.GO1860@hoeg.nl> Date: Fri, 10 Feb 2012 09:36:40 -0800 Content-Transfer-Encoding: quoted-printable Message-Id: <94EAEADD-440A-4148-B9CC-B7BED1BCF206@bluezbox.com> References: <58175263-109E-4FF0-BB29-E0331C01DCD5@FreeBSD.org> <20120208215122.GA28769@alchemy.franken.de> <20120210105726.GO1860@hoeg.nl> To: Ed Schouten X-Mailer: Apple Mail (2.1084) Sender: gonzo@hq.bluezbox.com X-Spam-Level: --- X-Spam-Report: Spam detection software, running on the system "hq.bluezbox.com", has identified this incoming email as possible spam. The original message has been attached to this so you can view it (if it isn't spam) or label similar future email. If you have any questions, see The administrator of that system for details. Content preview: On 2012-02-10, at 2:57 AM, Ed Schouten wrote: > Hi Marius, > > * Marius Strobl , 20120208 22:51: >> See lib/libc/stdlib/malloc.c, arm and mips currently are the only >> supported FreeBSD architectures that have no support for TLS. > > Just out of curiosity, what is needed to make TLS work? Is it just the > lack of support by our toolchain or is there also stuff on our side that > needs to be done? [...] Content analysis details: (-3.8 points, 5.0 required) pts rule name description ---- ---------------------- -------------------------------------------------- -1.8 ALL_TRUSTED Passed through trusted hosts only via SMTP -2.6 BAYES_00 BODY: Bayesian spam probability is 0 to 1% [score: 0.0000] 0.6 AWL AWL: From: address is in the auto white-list Cc: freebsd-arch@freebsd.org, David Chisnall , Marius Strobl Subject: Re: NO_TLS flag for public headers X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 10 Feb 2012 17:36:49 -0000 On 2012-02-10, at 2:57 AM, Ed Schouten wrote: > Hi Marius, >=20 > * Marius Strobl , 20120208 22:51: >> See lib/libc/stdlib/malloc.c, arm and mips currently are the only >> supported FreeBSD architectures that have no support for TLS. >=20 > Just out of curiosity, what is needed to make TLS work? Is it just the > lack of support by our toolchain or is there also stuff on our side = that > needs to be done? There are several places that should be fixed and in sync in order to = get TLS=20 working for architecture: rtld-elf, libthr, cpu_set_user_tls in kernel = and thread pointer handling in toolchain. Look at my commits as of Feb 9 - = they cover all of these but toolchain part. TLS on MIPS is fixed for the simplest test case. TLS data in dynamic = library=20 might need some more work for rtld, but not much. I'll take a look at = ARM when I have=20 working root over NFS on my ARM board. That is if nobody step up and fix = it soon :)=20= From owner-freebsd-arch@FreeBSD.ORG Sat Feb 11 05:23:23 2012 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 085AF1065672; Sat, 11 Feb 2012 05:23:23 +0000 (UTC) (envelope-from adrian.chadd@gmail.com) Received: from mail-wi0-f182.google.com (mail-wi0-f182.google.com [209.85.212.182]) by mx1.freebsd.org (Postfix) with ESMTP id E7D938FC08; Sat, 11 Feb 2012 05:23:21 +0000 (UTC) Received: by wibhn14 with SMTP id hn14so3674870wib.13 for ; Fri, 10 Feb 2012 21:23:21 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:cc:content-type; bh=fcSuGSJmK4p7jpl/F6OtDjJAHb04Ra3FiSf5KkS9jQY=; b=OLFsO0agE45F2uNtjeJHStn0oBtR6C9trDV4MYgV2rcS9hXwQQcqMwvxHN1XZnH1Qj t8bf78x+qQSa6wmXWwKX3X+9zSmtK2uzeiKfcfFGOH/6ooOG1l/vWZarQqvuXxM1f3N8 NjLGxvodNkhDtkUr2A2Qf+TFy5Hv+sKU8Unys= MIME-Version: 1.0 Received: by 10.180.78.6 with SMTP id x6mr7056903wiw.18.1328937801101; Fri, 10 Feb 2012 21:23:21 -0800 (PST) Sender: adrian.chadd@gmail.com Received: by 10.216.175.136 with HTTP; Fri, 10 Feb 2012 21:23:21 -0800 (PST) In-Reply-To: References: <47ABA638-7E08-4350-A03C-3D4A23BF2D7E@lassitu.de> <1763C3FF-1EA0-4DC0-891D-63816EBF4A04@lassitu.de> <20120106182756.GA88161@alchemy.franken.de> <95372FB3-406F-46C2-8684-4FDB672D9FCF@lassitu.de> <20120106214741.GB88161@alchemy.franken.de> <20120108130039.GG88161@alchemy.franken.de> <23477898-8D85-498C-8E30-192810BD68A8@lassitu.de> <20120111193738.GB44286@alchemy.franken.de> <66DDA0A2-F878-43FF-8824-54868F493B18@lassitu.de> <20120125221753.GA17821@alchemy.franken.de> Date: Fri, 10 Feb 2012 21:23:21 -0800 X-Google-Sender-Auth: -7V-EWjq2a_5Qz89jtAqighUzrA Message-ID: From: Adrian Chadd To: Stefan Bethke , Aleksandr Rybalko , "M. Warner Losh" , Juli Mallett Content-Type: text/plain; charset=ISO-8859-1 Cc: FreeBSD-arch , Marius Strobl Subject: Re: Extending sys/dev/mii X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 11 Feb 2012 05:23:23 -0000 Hi all, So where'd we get to with this? I'd like to finalise a unified proposal for this. I still like the idea of tidying up the miibus/mdiobus stuff (ie, miibus really is an mdiobus for speaking to things, along with some methods to do MII stuff like media change, media set, MAC PLL/type set, etc) but I agree with ray@ that things begin to look a _lot_ more complicated when we start trying to handle alternate methods of how switches are connected. So I'd like to not lose interest on this. If we can't come to some kind of consensus, I'll just commit ray@'s work (and cop the flak) then work with stb@ to tidy up the newbus bits to hopefully be better for the long term. In summary - I'm fed up that we're this close to having _something_ that looks like a workable switch API and working code but it's not in the tree. So I'm happy stirring up trouble and copping the flak from things if it means it Just Gets Done. Adrian From owner-freebsd-arch@FreeBSD.ORG Sat Feb 11 11:17:37 2012 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id ED96D1065672; Sat, 11 Feb 2012 11:17:37 +0000 (UTC) (envelope-from marius@alchemy.franken.de) Received: from alchemy.franken.de (alchemy.franken.de [194.94.249.214]) by mx1.freebsd.org (Postfix) with ESMTP id 64E348FC15; Sat, 11 Feb 2012 11:17:37 +0000 (UTC) Received: from alchemy.franken.de (localhost [127.0.0.1]) by alchemy.franken.de (8.14.4/8.14.4/ALCHEMY.FRANKEN.DE) with ESMTP id q1BBHV6T050544; Sat, 11 Feb 2012 12:17:31 +0100 (CET) (envelope-from marius@alchemy.franken.de) Received: (from marius@localhost) by alchemy.franken.de (8.14.4/8.14.4/Submit) id q1BBHVmK050543; Sat, 11 Feb 2012 12:17:31 +0100 (CET) (envelope-from marius) Date: Sat, 11 Feb 2012 12:17:31 +0100 From: Marius Strobl To: Adrian Chadd Message-ID: <20120211111731.GE39861@alchemy.franken.de> References: <95372FB3-406F-46C2-8684-4FDB672D9FCF@lassitu.de> <20120106214741.GB88161@alchemy.franken.de> <20120108130039.GG88161@alchemy.franken.de> <23477898-8D85-498C-8E30-192810BD68A8@lassitu.de> <20120111193738.GB44286@alchemy.franken.de> <66DDA0A2-F878-43FF-8824-54868F493B18@lassitu.de> <20120125221753.GA17821@alchemy.franken.de> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.4.2.3i Cc: Juli Mallett , Aleksandr Rybalko , Stefan Bethke , FreeBSD-arch Subject: Re: Extending sys/dev/mii X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 11 Feb 2012 11:17:38 -0000 On Fri, Feb 10, 2012 at 09:23:21PM -0800, Adrian Chadd wrote: > Hi all, > > So where'd we get to with this? > > I'd like to finalise a unified proposal for this. > > I still like the idea of tidying up the miibus/mdiobus stuff (ie, > miibus really is an mdiobus for speaking to things, along with some > methods to do MII stuff like media change, media set, MAC PLL/type > set, etc) but I agree with ray@ that things begin to look a _lot_ more > complicated when we start trying to handle alternate methods of how > switches are connected. > > So I'd like to not lose interest on this. > > If we can't come to some kind of consensus, I'll just commit ray@'s > work (and cop the flak) then work with stb@ to tidy up the newbus bits > to hopefully be better for the long term. > > In summary - I'm fed up that we're this close to having _something_ > that looks like a workable switch API and working code but it's not in > the tree. So I'm happy stirring up trouble and copping the flak from > things if it means it Just Gets Done. > I haven't seen ray@'s work (where is it?) but the general approach sounds backwards to me. As you say the whole picture of how switches are connected in reality is complicated. Therefore I'd like to see a proposal for a framework first that can handle the various ways of taking to a switch (GPIO, I2C, MDIO etc or combinations thereof) separately from a MAC (as there may be no MAC associated with the switch to begin with). The stuff proposed so far (again, I haven't looked at ray@'s current work) only dealt with the more or less low hanging fruit in that picture with discussions how we may hack some more scenarios into working. Getting _something_ in at this stage just for the sake that there's something in the tree really asks for one of two typical things happening: o it sticks as-is forever as nobody really wants to do the work to get it right and in the end the supposedly generic framework is ignored and people implement their own local stuff to get what they need, or o there actually are some brave souls working on getting it right over time but requiring API and ABI breakages over and over again to get there. Marius From owner-freebsd-arch@FreeBSD.ORG Sat Feb 11 12:46:25 2012 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 0BABE106566C; Sat, 11 Feb 2012 12:46:25 +0000 (UTC) (envelope-from ray@ddteam.net) Received: from mail-ee0-f54.google.com (mail-ee0-f54.google.com [74.125.83.54]) by mx1.freebsd.org (Postfix) with ESMTP id 1276F8FC0A; Sat, 11 Feb 2012 12:46:23 +0000 (UTC) Received: by eekb47 with SMTP id b47so1427427eek.13 for ; Sat, 11 Feb 2012 04:46:23 -0800 (PST) Received: by 10.213.31.208 with SMTP id z16mr945102ebc.47.1328964382860; Sat, 11 Feb 2012 04:46:22 -0800 (PST) Received: from rnote.ddteam.net (85-184-133-95.pool.ukrtel.net. [95.133.184.85]) by mx.google.com with ESMTPS id v51sm34903556eef.2.2012.02.11.04.46.18 (version=SSLv3 cipher=OTHER); Sat, 11 Feb 2012 04:46:21 -0800 (PST) Date: Sat, 11 Feb 2012 14:45:44 +0200 From: Aleksandr Rybalko To: Marius Strobl Message-Id: <20120211144544.c91701d9.ray@ddteam.net> In-Reply-To: <20120211111731.GE39861@alchemy.franken.de> References: <95372FB3-406F-46C2-8684-4FDB672D9FCF@lassitu.de> <20120106214741.GB88161@alchemy.franken.de> <20120108130039.GG88161@alchemy.franken.de> <23477898-8D85-498C-8E30-192810BD68A8@lassitu.de> <20120111193738.GB44286@alchemy.franken.de> <66DDA0A2-F878-43FF-8824-54868F493B18@lassitu.de> <20120125221753.GA17821@alchemy.franken.de> <20120211111731.GE39861@alchemy.franken.de> X-Mailer: Sylpheed 3.1.2 (GTK+ 2.24.5; amd64-portbld-freebsd9.0) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-Gm-Message-State: ALoCoQlOyQVFBf+IcEOh8w4ZoETADHmn4Rda+ARIIVlwQyVt4LzmQFtQRcniQMVJbe8XmdJjtYiS Cc: Juli Mallett , Adrian Chadd , FreeBSD-arch , Aleksandr Rybalko , Stefan Bethke Subject: Re: Extending sys/dev/mii X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 11 Feb 2012 12:46:25 -0000 Good day hackers! On Sat, 11 Feb 2012 12:17:31 +0100 Marius Strobl wrote: > On Fri, Feb 10, 2012 at 09:23:21PM -0800, Adrian Chadd wrote: > > Hi all, > > > > So where'd we get to with this? > > > > I'd like to finalise a unified proposal for this. > > > > I still like the idea of tidying up the miibus/mdiobus stuff (ie, > > miibus really is an mdiobus for speaking to things, along with some > > methods to do MII stuff like media change, media set, MAC PLL/type > > set, etc) but I agree with ray@ that things begin to look a _lot_ > > more complicated when we start trying to handle alternate methods > > of how switches are connected. > > > > So I'd like to not lose interest on this. > > > > If we can't come to some kind of consensus, I'll just commit ray@'s > > work (and cop the flak) then work with stb@ to tidy up the newbus > > bits to hopefully be better for the long term. > > > > In summary - I'm fed up that we're this close to having _something_ > > that looks like a workable switch API and working code but it's not > > in the tree. So I'm happy stirring up trouble and copping the flak > > from things if it means it Just Gets Done. > > > > I haven't seen ray@'s work (where is it?) There is thread about it: http://lists.freebsd.org/pipermail/freebsd-net/2012-January/031132.html > but the general approach > sounds backwards to me. As you say the whole picture of how switches > are connected in reality is complicated. Therefore I'd like to see a > proposal for a framework first that can handle the various ways of > taking to a switch (GPIO, I2C, MDIO etc or combinations thereof) Currently framework handle OBIO(memory attached) and MDIO, but designed to handle various bus attachment. And as i see GPIO variant must use some bus attached to GPIO (f.e. gpiospi, gpioiic), the switch framework attach to it. When i design framework core, i was keep in mind that: 1. some switches can be controlled with many ways (BCM53115 can be controlled by MDIO and SPI). 2. MAC driver must not be modified, as long as possible. if_arge was modified only to clear special bits like phymask from it. > separately from a MAC (as there may be no MAC associated with the > switch to begin with). The stuff proposed so far (again, I haven't > looked at ray@'s current work) only dealt with the more or less > low hanging fruit in that picture with discussions how we may hack > some more scenarios into working. Getting _something_ in at this > stage just for the sake that there's something in the tree really > asks for one of two typical things happening: > o it sticks as-is forever as nobody really wants to do the work > to get it right and in the end the supposedly generic framework > is ignored and people implement their own local stuff to get what > they need, or > o there actually are some brave souls working on getting it right > over time but requiring API and ABI breakages over and over again > to get there. I think second is near to be true :), but last time i add methods to manipulate Port based VLANs on BCM53115, and 1) i'm not add anything to ioctl definitions (switch_ioctl.h), 2) most of newbus methods have defaults, so adding new feature to some driver, will not break other drivers. Updated work accessible here: http://zrouter.org/hg/FreeBSD/head/file/default/head/sys/dev/switch/ > > Marius > > _______________________________________________ > freebsd-arch@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-arch > To unsubscribe, send any mail to > "freebsd-arch-unsubscribe@freebsd.org" -- Aleksandr Rybalko