From owner-freebsd-arch@FreeBSD.ORG Sun Aug 31 16:50:25 2014 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 6540ABA8 for ; Sun, 31 Aug 2014 16:50:25 +0000 (UTC) Received: from cell.glebius.int.ru (glebius.int.ru [81.19.69.10]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "cell.glebius.int.ru", Issuer "cell.glebius.int.ru" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id DE8171BEB for ; Sun, 31 Aug 2014 16:50:24 +0000 (UTC) Received: from cell.glebius.int.ru (localhost [127.0.0.1]) by cell.glebius.int.ru (8.14.9/8.14.9) with ESMTP id s7VGoMGZ087310 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO) for ; Sun, 31 Aug 2014 20:50:22 +0400 (MSK) (envelope-from glebius@FreeBSD.org) Received: (from glebius@localhost) by cell.glebius.int.ru (8.14.9/8.14.9/Submit) id s7VGoMGL087309 for arch@freebsd.org; Sun, 31 Aug 2014 20:50:22 +0400 (MSK) (envelope-from glebius@FreeBSD.org) X-Authentication-Warning: cell.glebius.int.ru: glebius set sender to glebius@FreeBSD.org using -f Date: Sun, 31 Aug 2014 20:50:22 +0400 From: Gleb Smirnoff To: arch@freebsd.org Subject: Re: [CFT/review] new sendfile(2) Message-ID: <20140831165022.GE7693@FreeBSD.org> References: <20140529102054.GX50679@FreeBSD.org> <20140729232404.GF43962@funkthat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20140729232404.GF43962@funkthat.com> User-Agent: Mutt/1.5.23 (2014-03-12) X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 31 Aug 2014 16:50:25 -0000 John-Mark, On Tue, Jul 29, 2014 at 04:24:04PM -0700, John-Mark Gurney wrote: J> Gleb Smirnoff wrote this message on Thu, May 29, 2014 at 14:20 +0400: J> > One of the approaches we are experimenting with is new sendfile(2) J> > implementation, that doesn't block on the I/O done from the file J> > descriptor. J> J> I know this is a reply to an old message, but... I am also sorry for late reply on late reply :) J> How is this different from: J> SF_NODISKIO. This flag causes any sendfile() call which would J> block on disk I/O to instead return EBUSY. Busy servers may bene- J> fit by transferring requests that would block to a separate I/O J> worker thread. It is very different. New sendfile(2) simply doesn't block and returns success :) The I/O completes outside of syscall context. J> > 1) Split of socket buffer sb_cc field into sb_acc and sb_ccc. Where J> > sb_acc stands for "available character count" and sb_ccc is "claimed J> > character count". This allows us to write a data to a socket, that is J> > not ready yet. The data sits in the socket, consumes its space, and J> > keeps itself in the right order with earlier or later writes to socket. J> > But it can be send only after it is marked as ready. This change is J> > split across many files. J> J> This change really should be split out and possibly committed seperately J> after a review by the proper people... Of course. It actually makes 80% of the volume of the patch. -- Totus tuus, Glebius. From owner-freebsd-arch@FreeBSD.ORG Sun Aug 31 16:48:29 2014 Return-Path: Delivered-To: arch@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id EF7BEB04; Sun, 31 Aug 2014 16:48:28 +0000 (UTC) Received: from cell.glebius.int.ru (glebius.int.ru [81.19.69.10]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "cell.glebius.int.ru", Issuer "cell.glebius.int.ru" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id 526A61BD9; Sun, 31 Aug 2014 16:48:22 +0000 (UTC) Received: from cell.glebius.int.ru (localhost [127.0.0.1]) by cell.glebius.int.ru (8.14.9/8.14.9) with ESMTP id s7VGmKhO087298 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO); Sun, 31 Aug 2014 20:48:20 +0400 (MSK) (envelope-from glebius@FreeBSD.org) Received: (from glebius@localhost) by cell.glebius.int.ru (8.14.9/8.14.9/Submit) id s7VGmKKA087297; Sun, 31 Aug 2014 20:48:20 +0400 (MSK) (envelope-from glebius@FreeBSD.org) X-Authentication-Warning: cell.glebius.int.ru: glebius set sender to glebius@FreeBSD.org using -f Date: Sun, 31 Aug 2014 20:48:20 +0400 From: Gleb Smirnoff To: arch@FreeBSD.org Subject: Re: [CFT/review] new sendfile(2) Message-ID: <20140831164820.GD7693@FreeBSD.org> References: <20140529102054.GX50679@FreeBSD.org> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="hTiIB9CRvBOLTyqY" Content-Disposition: inline In-Reply-To: <20140529102054.GX50679@FreeBSD.org> User-Agent: Mutt/1.5.23 (2014-03-12) X-Mailman-Approved-At: Sun, 31 Aug 2014 18:58:29 +0000 Cc: alc@FreeBSD.org X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 31 Aug 2014 16:48:29 -0000 --hTiIB9CRvBOLTyqY Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Hi! Just a followup with fresh version of the patch. For details see below. On Thu, May 29, 2014 at 02:20:54PM +0400, Gleb Smirnoff wrote: T> Hello! T> T> At Netflix and Nginx we are experimenting with improving FreeBSD T> wrt sending large amounts of static data via HTTP. T> T> One of the approaches we are experimenting with is new sendfile(2) T> implementation, that doesn't block on the I/O done from the file T> descriptor. T> T> The problem with classic sendfile(2) is that if the the request T> length is large enough, and file data is not cached in VM, then T> sendfile(2) syscall would not return until it fills socket buffer T> with data. With modern internet socket buffers can be up to 1 Mb, T> thus time taken by the syscall raises by order of magnitude. All T> the time, the nginx worker is blocked in syscall and doesn't T> process data from other clients. The best current practice to T> mitigate that is known as "sendfile(2) + aio_read(2)". This is T> special mode of nginx operation on FreeBSD. The sendfile(2) call T> is issued with SF_NODISKIO flag, that forbids the syscall to T> perform disk I/O, and send only data that is cached by VM. If T> sendfile(2) reports that I/O needs to be done (but forbidden), then T> nginx would do aio_read() of a chunk of the file. The data read T> is cached by VM, as side affect. Then sendfile() is called again. T> T> Now for the new sendfile. The core idea is that sendfile() T> schedules the I/O, but doesn't wait for it to complete. It T> returns immediately to the process, and I/O completion is T> processed in kernel context. Unlike aio(4), no additional T> threads in kernel are created. The new sendfile is a drop-in T> replacement for the old one. Applications (like nginx) doesn't T> need recompile, neither configuration change. The SF_NODISKIO is T> ignored. T> T> The patch for review is available at: T> T> https://phabric.freebsd.org/D102 T> T> And for those who prefer email attachments, it is also attached. T> The patch has 3 logically separate changes in itself: T> T> 1) Split of socket buffer sb_cc field into sb_acc and sb_ccc. Where T> sb_acc stands for "available character count" and sb_ccc is "claimed T> character count". This allows us to write a data to a socket, that is T> not ready yet. The data sits in the socket, consumes its space, and T> keeps itself in the right order with earlier or later writes to socket. T> But it can be send only after it is marked as ready. This change is T> split across many files. T> T> 2) A new vnode operation: VOP_GETPAGES_ASYNC(). This one lives in sys/vm. T> T> 3) Actual implementation of new sendfile(2). This one lives in T> kern/uipc_syscalls.c T> T> T> T> At Netflix, we already see improvements with new sendfile(2). T> We can send more data utilizing same amount of CPU, and we can T> push closer to 0% idle, without experiencing short lags. T> T> However, we have somewhat modified VM subsystem, that behaves T> optimal for our task, but suboptimal for average FreeBSD system. T> I'd like someone from community to try the new sendfile(2) at T> other setup and see how does it serve for you. T> T> To be the early tester you need to checkout projects/sendfile T> branch and build kernel from it. The world from head/ would T> run fine with it. T> T> svn co http://svn.freebsd.org/base/projects/sendfile T> cd sendfile T> ... build kernel ... T> T> Limitations: T> - No testing were done on serving files on NFS. T> - No testing were done on serving files on ZFS. T> T> -- T> Totus tuus, Glebius. T> Index: sys/dev/ti/if_ti.c T> =================================================================== T> --- sys/dev/ti/if_ti.c (.../head) (revision 266804) T> +++ sys/dev/ti/if_ti.c (.../projects/sendfile) (revision 266807) T> @@ -1629,7 +1629,7 @@ ti_newbuf_jumbo(struct ti_softc *sc, int idx, stru T> m[i]->m_data = (void *)sf_buf_kva(sf[i]); T> m[i]->m_len = PAGE_SIZE; T> MEXTADD(m[i], sf_buf_kva(sf[i]), PAGE_SIZE, T> - sf_buf_mext, (void*)sf_buf_kva(sf[i]), sf[i], T> + sf_mext_free, (void*)sf_buf_kva(sf[i]), sf[i], T> 0, EXT_DISPOSABLE); T> m[i]->m_next = m[i+1]; T> } T> @@ -1694,7 +1694,7 @@ nobufs: T> if (m[i]) T> m_freem(m[i]); T> if (sf[i]) T> - sf_buf_mext((void *)sf_buf_kva(sf[i]), sf[i]); T> + sf_mext_free((void *)sf_buf_kva(sf[i]), sf[i]); T> } T> return (ENOBUFS); T> } T> Index: sys/dev/cxgbe/tom/t4_cpl_io.c T> =================================================================== T> --- sys/dev/cxgbe/tom/t4_cpl_io.c (.../head) (revision 266804) T> +++ sys/dev/cxgbe/tom/t4_cpl_io.c (.../projects/sendfile) (revision 266807) T> @@ -338,11 +338,11 @@ t4_rcvd(struct toedev *tod, struct tcpcb *tp) T> INP_WLOCK_ASSERT(inp); T> T> SOCKBUF_LOCK(sb); T> - KASSERT(toep->sb_cc >= sb->sb_cc, T> + KASSERT(toep->sb_cc >= sbused(sb), T> ("%s: sb %p has more data (%d) than last time (%d).", T> - __func__, sb, sb->sb_cc, toep->sb_cc)); T> - toep->rx_credits += toep->sb_cc - sb->sb_cc; T> - toep->sb_cc = sb->sb_cc; T> + __func__, sb, sbused(sb), toep->sb_cc)); T> + toep->rx_credits += toep->sb_cc - sbused(sb); T> + toep->sb_cc = sbused(sb); T> credits = toep->rx_credits; T> SOCKBUF_UNLOCK(sb); T> T> @@ -863,15 +863,15 @@ do_peer_close(struct sge_iq *iq, const struct rss_ T> tp->rcv_nxt = be32toh(cpl->rcv_nxt); T> toep->ddp_flags &= ~(DDP_BUF0_ACTIVE | DDP_BUF1_ACTIVE); T> T> - KASSERT(toep->sb_cc >= sb->sb_cc, T> + KASSERT(toep->sb_cc >= sbused(sb), T> ("%s: sb %p has more data (%d) than last time (%d).", T> - __func__, sb, sb->sb_cc, toep->sb_cc)); T> - toep->rx_credits += toep->sb_cc - sb->sb_cc; T> + __func__, sb, sbused(sb), toep->sb_cc)); T> + toep->rx_credits += toep->sb_cc - sbused(sb); T> #ifdef USE_DDP_RX_FLOW_CONTROL T> toep->rx_credits -= m->m_len; /* adjust for F_RX_FC_DDP */ T> #endif T> - sbappendstream_locked(sb, m); T> - toep->sb_cc = sb->sb_cc; T> + sbappendstream_locked(sb, m, 0); T> + toep->sb_cc = sbused(sb); T> } T> socantrcvmore_locked(so); /* unlocks the sockbuf */ T> T> @@ -1281,12 +1281,12 @@ do_rx_data(struct sge_iq *iq, const struct rss_hea T> } T> } T> T> - KASSERT(toep->sb_cc >= sb->sb_cc, T> + KASSERT(toep->sb_cc >= sbused(sb), T> ("%s: sb %p has more data (%d) than last time (%d).", T> - __func__, sb, sb->sb_cc, toep->sb_cc)); T> - toep->rx_credits += toep->sb_cc - sb->sb_cc; T> - sbappendstream_locked(sb, m); T> - toep->sb_cc = sb->sb_cc; T> + __func__, sb, sbused(sb), toep->sb_cc)); T> + toep->rx_credits += toep->sb_cc - sbused(sb); T> + sbappendstream_locked(sb, m, 0); T> + toep->sb_cc = sbused(sb); T> sorwakeup_locked(so); T> SOCKBUF_UNLOCK_ASSERT(sb); T> T> Index: sys/dev/cxgbe/tom/t4_ddp.c T> =================================================================== T> --- sys/dev/cxgbe/tom/t4_ddp.c (.../head) (revision 266804) T> +++ sys/dev/cxgbe/tom/t4_ddp.c (.../projects/sendfile) (revision 266807) T> @@ -224,15 +224,15 @@ insert_ddp_data(struct toepcb *toep, uint32_t n) T> tp->rcv_wnd -= n; T> #endif T> T> - KASSERT(toep->sb_cc >= sb->sb_cc, T> + KASSERT(toep->sb_cc >= sbused(sb), T> ("%s: sb %p has more data (%d) than last time (%d).", T> - __func__, sb, sb->sb_cc, toep->sb_cc)); T> - toep->rx_credits += toep->sb_cc - sb->sb_cc; T> + __func__, sb, sbused(sb), toep->sb_cc)); T> + toep->rx_credits += toep->sb_cc - sbused(sb); T> #ifdef USE_DDP_RX_FLOW_CONTROL T> toep->rx_credits -= n; /* adjust for F_RX_FC_DDP */ T> #endif T> - sbappendstream_locked(sb, m); T> - toep->sb_cc = sb->sb_cc; T> + sbappendstream_locked(sb, m, 0); T> + toep->sb_cc = sbused(sb); T> } T> T> /* SET_TCB_FIELD sent as a ULP command looks like this */ T> @@ -459,15 +459,15 @@ handle_ddp_data(struct toepcb *toep, __be32 ddp_re T> else T> discourage_ddp(toep); T> T> - KASSERT(toep->sb_cc >= sb->sb_cc, T> + KASSERT(toep->sb_cc >= sbused(sb), T> ("%s: sb %p has more data (%d) than last time (%d).", T> - __func__, sb, sb->sb_cc, toep->sb_cc)); T> - toep->rx_credits += toep->sb_cc - sb->sb_cc; T> + __func__, sb, sbused(sb), toep->sb_cc)); T> + toep->rx_credits += toep->sb_cc - sbused(sb); T> #ifdef USE_DDP_RX_FLOW_CONTROL T> toep->rx_credits -= len; /* adjust for F_RX_FC_DDP */ T> #endif T> - sbappendstream_locked(sb, m); T> - toep->sb_cc = sb->sb_cc; T> + sbappendstream_locked(sb, m, 0); T> + toep->sb_cc = sbused(sb); T> wakeup: T> KASSERT(toep->ddp_flags & db_flag, T> ("%s: DDP buffer not active. toep %p, ddp_flags 0x%x, report 0x%x", T> @@ -897,7 +897,7 @@ handle_ddp(struct socket *so, struct uio *uio, int T> #endif T> T> /* XXX: too eager to disable DDP, could handle NBIO better than this. */ T> - if (sb->sb_cc >= uio->uio_resid || uio->uio_resid < sc->tt.ddp_thres || T> + if (sbused(sb) >= uio->uio_resid || uio->uio_resid < sc->tt.ddp_thres || T> uio->uio_resid > MAX_DDP_BUFFER_SIZE || uio->uio_iovcnt > 1 || T> so->so_state & SS_NBIO || flags & (MSG_DONTWAIT | MSG_NBIO) || T> error || so->so_error || sb->sb_state & SBS_CANTRCVMORE) T> @@ -935,7 +935,7 @@ handle_ddp(struct socket *so, struct uio *uio, int T> * payload. T> */ T> ddp_flags = select_ddp_flags(so, flags, db_idx); T> - wr = mk_update_tcb_for_ddp(sc, toep, db_idx, sb->sb_cc, ddp_flags); T> + wr = mk_update_tcb_for_ddp(sc, toep, db_idx, sbused(sb), ddp_flags); T> if (wr == NULL) { T> /* T> * Just unhold the pages. The DDP buffer's software state is T> @@ -960,8 +960,9 @@ handle_ddp(struct socket *so, struct uio *uio, int T> */ T> rc = sbwait(sb); T> while (toep->ddp_flags & buf_flag) { T> + /* XXXGL: shouldn't here be sbwait() call? */ T> sb->sb_flags |= SB_WAIT; T> - msleep(&sb->sb_cc, &sb->sb_mtx, PSOCK , "sbwait", 0); T> + msleep(&sb->sb_acc, &sb->sb_mtx, PSOCK , "sbwait", 0); T> } T> unwire_ddp_buffer(db); T> return (rc); T> @@ -1123,8 +1124,8 @@ restart: T> T> /* uio should be just as it was at entry */ T> KASSERT(oresid == uio->uio_resid, T> - ("%s: oresid = %d, uio_resid = %zd, sb_cc = %d", T> - __func__, oresid, uio->uio_resid, sb->sb_cc)); T> + ("%s: oresid = %d, uio_resid = %zd, sbused = %d", T> + __func__, oresid, uio->uio_resid, sbused(sb))); T> T> error = handle_ddp(so, uio, flags, 0); T> ddp_handled = 1; T> @@ -1134,7 +1135,7 @@ restart: T> T> /* Abort if socket has reported problems. */ T> if (so->so_error) { T> - if (sb->sb_cc > 0) T> + if (sbused(sb)) T> goto deliver; T> if (oresid > uio->uio_resid) T> goto out; T> @@ -1146,7 +1147,7 @@ restart: T> T> /* Door is closed. Deliver what is left, if any. */ T> if (sb->sb_state & SBS_CANTRCVMORE) { T> - if (sb->sb_cc > 0) T> + if (sbused(sb)) T> goto deliver; T> else T> goto out; T> @@ -1153,7 +1154,7 @@ restart: T> } T> T> /* Socket buffer is empty and we shall not block. */ T> - if (sb->sb_cc == 0 && T> + if (sbused(sb) == 0 && T> ((so->so_state & SS_NBIO) || (flags & (MSG_DONTWAIT|MSG_NBIO)))) { T> error = EAGAIN; T> goto out; T> @@ -1160,18 +1161,18 @@ restart: T> } T> T> /* Socket buffer got some data that we shall deliver now. */ T> - if (sb->sb_cc > 0 && !(flags & MSG_WAITALL) && T> + if (sbused(sb) && !(flags & MSG_WAITALL) && T> ((sb->sb_flags & SS_NBIO) || T> (flags & (MSG_DONTWAIT|MSG_NBIO)) || T> - sb->sb_cc >= sb->sb_lowat || T> - sb->sb_cc >= uio->uio_resid || T> - sb->sb_cc >= sb->sb_hiwat) ) { T> + sbused(sb) >= sb->sb_lowat || T> + sbused(sb) >= uio->uio_resid || T> + sbused(sb) >= sb->sb_hiwat) ) { T> goto deliver; T> } T> T> /* On MSG_WAITALL we must wait until all data or error arrives. */ T> if ((flags & MSG_WAITALL) && T> - (sb->sb_cc >= uio->uio_resid || sb->sb_cc >= sb->sb_lowat)) T> + (sbused(sb) >= uio->uio_resid || sbused(sb) >= sb->sb_lowat)) T> goto deliver; T> T> /* T> @@ -1190,7 +1191,7 @@ restart: T> T> deliver: T> SOCKBUF_LOCK_ASSERT(&so->so_rcv); T> - KASSERT(sb->sb_cc > 0, ("%s: sockbuf empty", __func__)); T> + KASSERT(sbused(sb) > 0, ("%s: sockbuf empty", __func__)); T> KASSERT(sb->sb_mb != NULL, ("%s: sb_mb == NULL", __func__)); T> T> if (sb->sb_flags & SB_DDP_INDICATE && !ddp_handled) T> @@ -1201,7 +1202,7 @@ deliver: T> uio->uio_td->td_ru.ru_msgrcv++; T> T> /* Fill uio until full or current end of socket buffer is reached. */ T> - len = min(uio->uio_resid, sb->sb_cc); T> + len = min(uio->uio_resid, sbused(sb)); T> if (mp0 != NULL) { T> /* Dequeue as many mbufs as possible. */ T> if (!(flags & MSG_PEEK) && len >= sb->sb_mb->m_len) { T> Index: sys/dev/cxgbe/iw_cxgbe/cm.c T> =================================================================== T> --- sys/dev/cxgbe/iw_cxgbe/cm.c (.../head) (revision 266804) T> +++ sys/dev/cxgbe/iw_cxgbe/cm.c (.../projects/sendfile) (revision 266807) T> @@ -585,8 +585,8 @@ process_data(struct c4iw_ep *ep) T> { T> struct sockaddr_in *local, *remote; T> T> - CTR5(KTR_IW_CXGBE, "%s: so %p, ep %p, state %s, sb_cc %d", __func__, T> - ep->com.so, ep, states[ep->com.state], ep->com.so->so_rcv.sb_cc); T> + CTR5(KTR_IW_CXGBE, "%s: so %p, ep %p, state %s, sbused %d", __func__, T> + ep->com.so, ep, states[ep->com.state], sbused(&ep->com.so->so_rcv)); T> T> switch (state_read(&ep->com)) { T> case MPA_REQ_SENT: T> @@ -602,11 +602,11 @@ process_data(struct c4iw_ep *ep) T> process_mpa_request(ep); T> break; T> default: T> - if (ep->com.so->so_rcv.sb_cc) T> - log(LOG_ERR, "%s: Unexpected streaming data. " T> - "ep %p, state %d, so %p, so_state 0x%x, sb_cc %u\n", T> + if (sbused(&ep->com.so->so_rcv)) T> + log(LOG_ERR, "%s: Unexpected streaming data. ep %p, " T> + "state %d, so %p, so_state 0x%x, sbused %u\n", T> __func__, ep, state_read(&ep->com), ep->com.so, T> - ep->com.so->so_state, ep->com.so->so_rcv.sb_cc); T> + ep->com.so->so_state, sbused(&ep->com.so->so_rcv)); T> break; T> } T> } T> Index: sys/dev/iscsi/icl.c T> =================================================================== T> --- sys/dev/iscsi/icl.c (.../head) (revision 266804) T> +++ sys/dev/iscsi/icl.c (.../projects/sendfile) (revision 266807) T> @@ -758,7 +758,7 @@ icl_receive_thread(void *arg) T> * is enough data received to read the PDU. T> */ T> SOCKBUF_LOCK(&so->so_rcv); T> - available = so->so_rcv.sb_cc; T> + available = sbavail(&so->so_rcv); T> if (available < ic->ic_receive_len) { T> so->so_rcv.sb_lowat = ic->ic_receive_len; T> cv_wait(&ic->ic_receive_cv, &so->so_rcv.sb_mtx); T> Index: sys/dev/cxgb/ulp/tom/cxgb_cpl_io.c T> =================================================================== T> --- sys/dev/cxgb/ulp/tom/cxgb_cpl_io.c (.../head) (revision 266804) T> +++ sys/dev/cxgb/ulp/tom/cxgb_cpl_io.c (.../projects/sendfile) (revision 266807) T> @@ -445,8 +445,8 @@ t3_push_frames(struct socket *so, int req_completi T> * Autosize the send buffer. T> */ T> if (snd->sb_flags & SB_AUTOSIZE && VNET(tcp_do_autosndbuf)) { T> - if (snd->sb_cc >= (snd->sb_hiwat / 8 * 7) && T> - snd->sb_cc < VNET(tcp_autosndbuf_max)) { T> + if (sbused(snd) >= (snd->sb_hiwat / 8 * 7) && T> + sbused(snd) < VNET(tcp_autosndbuf_max)) { T> if (!sbreserve_locked(snd, min(snd->sb_hiwat + T> VNET(tcp_autosndbuf_inc), VNET(tcp_autosndbuf_max)), T> so, curthread)) T> @@ -597,10 +597,10 @@ t3_rcvd(struct toedev *tod, struct tcpcb *tp) T> INP_WLOCK_ASSERT(inp); T> T> SOCKBUF_LOCK(so_rcv); T> - KASSERT(toep->tp_enqueued >= so_rcv->sb_cc, T> - ("%s: so_rcv->sb_cc > enqueued", __func__)); T> - toep->tp_rx_credits += toep->tp_enqueued - so_rcv->sb_cc; T> - toep->tp_enqueued = so_rcv->sb_cc; T> + KASSERT(toep->tp_enqueued >= sbused(so_rcv), T> + ("%s: sbused(so_rcv) > enqueued", __func__)); T> + toep->tp_rx_credits += toep->tp_enqueued - sbused(so_rcv); T> + toep->tp_enqueued = sbused(so_rcv); T> SOCKBUF_UNLOCK(so_rcv); T> T> must_send = toep->tp_rx_credits + 16384 >= tp->rcv_wnd; T> @@ -1199,7 +1199,7 @@ do_rx_data(struct sge_qset *qs, struct rsp_desc *r T> } T> T> toep->tp_enqueued += m->m_pkthdr.len; T> - sbappendstream_locked(so_rcv, m); T> + sbappendstream_locked(so_rcv, m, 0); T> sorwakeup_locked(so); T> SOCKBUF_UNLOCK_ASSERT(so_rcv); T> T> @@ -1768,7 +1768,7 @@ wr_ack(struct toepcb *toep, struct mbuf *m) T> so_sowwakeup_locked(so); T> } T> T> - if (snd->sb_sndptroff < snd->sb_cc) T> + if (snd->sb_sndptroff < sbused(snd)) T> t3_push_frames(so, 0); T> T> out_free: T> Index: sys/dev/cxgb/ulp/iw_cxgb/iw_cxgb_cm.c T> =================================================================== T> --- sys/dev/cxgb/ulp/iw_cxgb/iw_cxgb_cm.c (.../head) (revision 266804) T> +++ sys/dev/cxgb/ulp/iw_cxgb/iw_cxgb_cm.c (.../projects/sendfile) (revision 266807) T> @@ -1515,11 +1515,11 @@ process_data(struct iwch_ep *ep) T> process_mpa_request(ep); T> break; T> default: T> - if (ep->com.so->so_rcv.sb_cc) T> + if (sbavail(&ep->com.so->so_rcv)) T> printf("%s Unexpected streaming data." T> " ep %p state %d so %p so_state %x so_rcv.sb_cc %u so_rcv.sb_mb %p\n", T> __FUNCTION__, ep, state_read(&ep->com), ep->com.so, ep->com.so->so_state, T> - ep->com.so->so_rcv.sb_cc, ep->com.so->so_rcv.sb_mb); T> + sbavail(&ep->com.so->so_rcv), ep->com.so->so_rcv.sb_mb); T> break; T> } T> return; T> Index: sys/kern/uipc_debug.c T> =================================================================== T> --- sys/kern/uipc_debug.c (.../head) (revision 266804) T> +++ sys/kern/uipc_debug.c (.../projects/sendfile) (revision 266807) T> @@ -403,7 +403,8 @@ db_print_sockbuf(struct sockbuf *sb, const char *s T> db_printf("sb_sndptroff: %u\n", sb->sb_sndptroff); T> T> db_print_indent(indent); T> - db_printf("sb_cc: %u ", sb->sb_cc); T> + db_printf("sb_acc: %u ", sb->sb_acc); T> + db_printf("sb_ccc: %u ", sb->sb_ccc); T> db_printf("sb_hiwat: %u ", sb->sb_hiwat); T> db_printf("sb_mbcnt: %u ", sb->sb_mbcnt); T> db_printf("sb_mbmax: %u\n", sb->sb_mbmax); T> Index: sys/kern/uipc_mbuf.c T> =================================================================== T> --- sys/kern/uipc_mbuf.c (.../head) (revision 266804) T> +++ sys/kern/uipc_mbuf.c (.../projects/sendfile) (revision 266807) T> @@ -389,7 +389,7 @@ mb_dupcl(struct mbuf *n, struct mbuf *m) T> * cleaned too. T> */ T> void T> -m_demote(struct mbuf *m0, int all) T> +m_demote(struct mbuf *m0, int all, int flags) T> { T> struct mbuf *m; T> T> @@ -405,7 +405,7 @@ void T> m_freem(m->m_nextpkt); T> m->m_nextpkt = NULL; T> } T> - m->m_flags = m->m_flags & (M_EXT|M_RDONLY|M_NOFREE); T> + m->m_flags = m->m_flags & (M_EXT | M_RDONLY | M_NOFREE | flags); T> } T> } T> T> Index: sys/kern/sys_socket.c T> =================================================================== T> --- sys/kern/sys_socket.c (.../head) (revision 266804) T> +++ sys/kern/sys_socket.c (.../projects/sendfile) (revision 266807) T> @@ -167,20 +167,17 @@ soo_ioctl(struct file *fp, u_long cmd, void *data, T> T> case FIONREAD: T> /* Unlocked read. */ T> - *(int *)data = so->so_rcv.sb_cc; T> + *(int *)data = sbavail(&so->so_rcv); T> break; T> T> case FIONWRITE: T> /* Unlocked read. */ T> - *(int *)data = so->so_snd.sb_cc; T> + *(int *)data = sbavail(&so->so_snd); T> break; T> T> case FIONSPACE: T> - if ((so->so_snd.sb_hiwat < so->so_snd.sb_cc) || T> - (so->so_snd.sb_mbmax < so->so_snd.sb_mbcnt)) T> - *(int *)data = 0; T> - else T> - *(int *)data = sbspace(&so->so_snd); T> + /* Unlocked read. */ T> + *(int *)data = sbspace(&so->so_snd); T> break; T> T> case FIOSETOWN: T> @@ -246,6 +243,7 @@ soo_stat(struct file *fp, struct stat *ub, struct T> struct thread *td) T> { T> struct socket *so = fp->f_data; T> + struct sockbuf *sb; T> #ifdef MAC T> int error; T> #endif T> @@ -261,15 +259,18 @@ soo_stat(struct file *fp, struct stat *ub, struct T> * If SBS_CANTRCVMORE is set, but there's still data left in the T> * receive buffer, the socket is still readable. T> */ T> - SOCKBUF_LOCK(&so->so_rcv); T> - if ((so->so_rcv.sb_state & SBS_CANTRCVMORE) == 0 || T> - so->so_rcv.sb_cc != 0) T> + sb = &so->so_rcv; T> + SOCKBUF_LOCK(sb); T> + if ((sb->sb_state & SBS_CANTRCVMORE) == 0 || sbavail(sb)) T> ub->st_mode |= S_IRUSR | S_IRGRP | S_IROTH; T> - ub->st_size = so->so_rcv.sb_cc - so->so_rcv.sb_ctl; T> - SOCKBUF_UNLOCK(&so->so_rcv); T> - /* Unlocked read. */ T> - if ((so->so_snd.sb_state & SBS_CANTSENDMORE) == 0) T> + ub->st_size = sbavail(sb) - sb->sb_ctl; T> + SOCKBUF_UNLOCK(sb); T> + T> + sb = &so->so_snd; T> + SOCKBUF_LOCK(sb); T> + if ((sb->sb_state & SBS_CANTSENDMORE) == 0) T> ub->st_mode |= S_IWUSR | S_IWGRP | S_IWOTH; T> + SOCKBUF_UNLOCK(sb); T> ub->st_uid = so->so_cred->cr_uid; T> ub->st_gid = so->so_cred->cr_gid; T> return (*so->so_proto->pr_usrreqs->pru_sense)(so, ub); T> Index: sys/kern/uipc_usrreq.c T> =================================================================== T> --- sys/kern/uipc_usrreq.c (.../head) (revision 266804) T> +++ sys/kern/uipc_usrreq.c (.../projects/sendfile) (revision 266807) T> @@ -790,11 +790,10 @@ uipc_rcvd(struct socket *so, int flags) T> u_int mbcnt, sbcc; T> T> unp = sotounpcb(so); T> - KASSERT(unp != NULL, ("uipc_rcvd: unp == NULL")); T> + KASSERT(unp != NULL, ("%s: unp == NULL", __func__)); T> + KASSERT(so->so_type == SOCK_STREAM || so->so_type == SOCK_SEQPACKET, T> + ("%s: socktype %d", __func__, so->so_type)); T> T> - if (so->so_type != SOCK_STREAM && so->so_type != SOCK_SEQPACKET) T> - panic("uipc_rcvd socktype %d", so->so_type); T> - T> /* T> * Adjust backpressure on sender and wakeup any waiting to write. T> * T> @@ -807,7 +806,7 @@ uipc_rcvd(struct socket *so, int flags) T> */ T> SOCKBUF_LOCK(&so->so_rcv); T> mbcnt = so->so_rcv.sb_mbcnt; T> - sbcc = so->so_rcv.sb_cc; T> + sbcc = sbavail(&so->so_rcv); T> SOCKBUF_UNLOCK(&so->so_rcv); T> /* T> * There is a benign race condition at this point. If we're planning to T> @@ -843,7 +842,10 @@ uipc_send(struct socket *so, int flags, struct mbu T> int error = 0; T> T> unp = sotounpcb(so); T> - KASSERT(unp != NULL, ("uipc_send: unp == NULL")); T> + KASSERT(unp != NULL, ("%s: unp == NULL", __func__)); T> + KASSERT(so->so_type == SOCK_STREAM || so->so_type == SOCK_DGRAM || T> + so->so_type == SOCK_SEQPACKET, T> + ("%s: socktype %d", __func__, so->so_type)); T> T> if (flags & PRUS_OOB) { T> error = EOPNOTSUPP; T> @@ -994,7 +996,7 @@ uipc_send(struct socket *so, int flags, struct mbu T> } T> T> mbcnt = so2->so_rcv.sb_mbcnt; T> - sbcc = so2->so_rcv.sb_cc; T> + sbcc = sbavail(&so2->so_rcv); T> sorwakeup_locked(so2); T> T> /* T> @@ -1011,9 +1013,6 @@ uipc_send(struct socket *so, int flags, struct mbu T> UNP_PCB_UNLOCK(unp2); T> m = NULL; T> break; T> - T> - default: T> - panic("uipc_send unknown socktype"); T> } T> T> /* T> Index: sys/kern/vfs_default.c T> =================================================================== T> --- sys/kern/vfs_default.c (.../head) (revision 266804) T> +++ sys/kern/vfs_default.c (.../projects/sendfile) (revision 266807) T> @@ -111,6 +111,7 @@ struct vop_vector default_vnodeops = { T> .vop_close = VOP_NULL, T> .vop_fsync = VOP_NULL, T> .vop_getpages = vop_stdgetpages, T> + .vop_getpages_async = vop_stdgetpages_async, T> .vop_getwritemount = vop_stdgetwritemount, T> .vop_inactive = VOP_NULL, T> .vop_ioctl = VOP_ENOTTY, T> @@ -726,10 +727,19 @@ vop_stdgetpages(ap) T> { T> T> return vnode_pager_generic_getpages(ap->a_vp, ap->a_m, T> - ap->a_count, ap->a_reqpage); T> + ap->a_count, ap->a_reqpage, NULL, NULL); T> } T> T> +/* XXX Needs good comment and a manpage. */ T> int T> +vop_stdgetpages_async(struct vop_getpages_async_args *ap) T> +{ T> + T> + return vnode_pager_generic_getpages(ap->a_vp, ap->a_m, T> + ap->a_count, ap->a_reqpage, ap->a_vop_getpages_iodone, ap->a_arg); T> +} T> + T> +int T> vop_stdkqfilter(struct vop_kqfilter_args *ap) T> { T> return vfs_kqfilter(ap); T> Index: sys/kern/uipc_socket.c T> =================================================================== T> --- sys/kern/uipc_socket.c (.../head) (revision 266804) T> +++ sys/kern/uipc_socket.c (.../projects/sendfile) (revision 266807) T> @@ -1459,12 +1459,12 @@ restart: T> * 2. MSG_DONTWAIT is not set T> */ T> if (m == NULL || (((flags & MSG_DONTWAIT) == 0 && T> - so->so_rcv.sb_cc < uio->uio_resid) && T> - so->so_rcv.sb_cc < so->so_rcv.sb_lowat && T> + sbavail(&so->so_rcv) < uio->uio_resid) && T> + sbavail(&so->so_rcv) < so->so_rcv.sb_lowat && T> m->m_nextpkt == NULL && (pr->pr_flags & PR_ATOMIC) == 0)) { T> - KASSERT(m != NULL || !so->so_rcv.sb_cc, T> - ("receive: m == %p so->so_rcv.sb_cc == %u", T> - m, so->so_rcv.sb_cc)); T> + KASSERT(m != NULL || !sbavail(&so->so_rcv), T> + ("receive: m == %p sbavail == %u", T> + m, sbavail(&so->so_rcv))); T> if (so->so_error) { T> if (m != NULL) T> goto dontblock; T> @@ -1746,9 +1746,7 @@ dontblock: T> SOCKBUF_LOCK(&so->so_rcv); T> } T> } T> - m->m_data += len; T> - m->m_len -= len; T> - so->so_rcv.sb_cc -= len; T> + sbmtrim(&so->so_rcv, m, len); T> } T> } T> SOCKBUF_LOCK_ASSERT(&so->so_rcv); T> @@ -1913,7 +1911,7 @@ restart: T> T> /* Abort if socket has reported problems. */ T> if (so->so_error) { T> - if (sb->sb_cc > 0) T> + if (sbavail(sb) > 0) T> goto deliver; T> if (oresid > uio->uio_resid) T> goto out; T> @@ -1925,7 +1923,7 @@ restart: T> T> /* Door is closed. Deliver what is left, if any. */ T> if (sb->sb_state & SBS_CANTRCVMORE) { T> - if (sb->sb_cc > 0) T> + if (sbavail(sb) > 0) T> goto deliver; T> else T> goto out; T> @@ -1932,7 +1930,7 @@ restart: T> } T> T> /* Socket buffer is empty and we shall not block. */ T> - if (sb->sb_cc == 0 && T> + if (sbavail(sb) == 0 && T> ((so->so_state & SS_NBIO) || (flags & (MSG_DONTWAIT|MSG_NBIO)))) { T> error = EAGAIN; T> goto out; T> @@ -1939,18 +1937,18 @@ restart: T> } T> T> /* Socket buffer got some data that we shall deliver now. */ T> - if (sb->sb_cc > 0 && !(flags & MSG_WAITALL) && T> + if (sbavail(sb) > 0 && !(flags & MSG_WAITALL) && T> ((sb->sb_flags & SS_NBIO) || T> (flags & (MSG_DONTWAIT|MSG_NBIO)) || T> - sb->sb_cc >= sb->sb_lowat || T> - sb->sb_cc >= uio->uio_resid || T> - sb->sb_cc >= sb->sb_hiwat) ) { T> + sbavail(sb) >= sb->sb_lowat || T> + sbavail(sb) >= uio->uio_resid || T> + sbavail(sb) >= sb->sb_hiwat) ) { T> goto deliver; T> } T> T> /* On MSG_WAITALL we must wait until all data or error arrives. */ T> if ((flags & MSG_WAITALL) && T> - (sb->sb_cc >= uio->uio_resid || sb->sb_cc >= sb->sb_hiwat)) T> + (sbavail(sb) >= uio->uio_resid || sbavail(sb) >= sb->sb_hiwat)) T> goto deliver; T> T> /* T> @@ -1964,7 +1962,7 @@ restart: T> T> deliver: T> SOCKBUF_LOCK_ASSERT(&so->so_rcv); T> - KASSERT(sb->sb_cc > 0, ("%s: sockbuf empty", __func__)); T> + KASSERT(sbavail(sb) > 0, ("%s: sockbuf empty", __func__)); T> KASSERT(sb->sb_mb != NULL, ("%s: sb_mb == NULL", __func__)); T> T> /* Statistics. */ T> @@ -1972,7 +1970,7 @@ deliver: T> uio->uio_td->td_ru.ru_msgrcv++; T> T> /* Fill uio until full or current end of socket buffer is reached. */ T> - len = min(uio->uio_resid, sb->sb_cc); T> + len = min(uio->uio_resid, sbavail(sb)); T> if (mp0 != NULL) { T> /* Dequeue as many mbufs as possible. */ T> if (!(flags & MSG_PEEK) && len >= sb->sb_mb->m_len) { T> @@ -1983,6 +1981,8 @@ deliver: T> for (m = sb->sb_mb; T> m != NULL && m->m_len <= len; T> m = m->m_next) { T> + KASSERT(!(m->m_flags & M_NOTAVAIL), T> + ("%s: m %p not available", __func__, m)); T> len -= m->m_len; T> uio->uio_resid -= m->m_len; T> sbfree(sb, m); T> @@ -2107,9 +2107,9 @@ soreceive_dgram(struct socket *so, struct sockaddr T> */ T> SOCKBUF_LOCK(&so->so_rcv); T> while ((m = so->so_rcv.sb_mb) == NULL) { T> - KASSERT(so->so_rcv.sb_cc == 0, T> - ("soreceive_dgram: sb_mb NULL but sb_cc %u", T> - so->so_rcv.sb_cc)); T> + KASSERT(sbavail(&so->so_rcv) == 0, T> + ("soreceive_dgram: sb_mb NULL but sbavail %u", T> + sbavail(&so->so_rcv))); T> if (so->so_error) { T> error = so->so_error; T> so->so_error = 0; T> @@ -3157,7 +3157,7 @@ filt_soread(struct knote *kn, long hint) T> so = kn->kn_fp->f_data; T> SOCKBUF_LOCK_ASSERT(&so->so_rcv); T> T> - kn->kn_data = so->so_rcv.sb_cc - so->so_rcv.sb_ctl; T> + kn->kn_data = sbavail(&so->so_rcv) - so->so_rcv.sb_ctl; T> if (so->so_rcv.sb_state & SBS_CANTRCVMORE) { T> kn->kn_flags |= EV_EOF; T> kn->kn_fflags = so->so_error; T> @@ -3167,7 +3167,7 @@ filt_soread(struct knote *kn, long hint) T> else if (kn->kn_sfflags & NOTE_LOWAT) T> return (kn->kn_data >= kn->kn_sdata); T> else T> - return (so->so_rcv.sb_cc >= so->so_rcv.sb_lowat); T> + return (sbavail(&so->so_rcv) >= so->so_rcv.sb_lowat); T> } T> T> static void T> @@ -3350,7 +3350,7 @@ soisdisconnected(struct socket *so) T> sorwakeup_locked(so); T> SOCKBUF_LOCK(&so->so_snd); T> so->so_snd.sb_state |= SBS_CANTSENDMORE; T> - sbdrop_locked(&so->so_snd, so->so_snd.sb_cc); T> + sbdrop_locked(&so->so_snd, sbused(&so->so_snd)); T> sowwakeup_locked(so); T> wakeup(&so->so_timeo); T> } T> Index: sys/kern/vnode_if.src T> =================================================================== T> --- sys/kern/vnode_if.src (.../head) (revision 266804) T> +++ sys/kern/vnode_if.src (.../projects/sendfile) (revision 266807) T> @@ -477,6 +477,19 @@ vop_getpages { T> }; T> T> T> +%% getpages_async vp L L L T> + T> +vop_getpages_async { T> + IN struct vnode *vp; T> + IN vm_page_t *m; T> + IN int count; T> + IN int reqpage; T> + IN vm_ooffset_t offset; T> + IN void (*vop_getpages_iodone)(void *); T> + IN void *arg; T> +}; T> + T> + T> %% putpages vp L L L T> T> vop_putpages { T> Index: sys/kern/uipc_sockbuf.c T> =================================================================== T> --- sys/kern/uipc_sockbuf.c (.../head) (revision 266804) T> +++ sys/kern/uipc_sockbuf.c (.../projects/sendfile) (revision 266807) T> @@ -68,7 +68,152 @@ static u_long sb_efficiency = 8; /* parameter for T> static struct mbuf *sbcut_internal(struct sockbuf *sb, int len); T> static void sbflush_internal(struct sockbuf *sb); T> T> +static void T> +sb_shift_nrdy(struct sockbuf *sb, struct mbuf *m) T> +{ T> + T> + SOCKBUF_LOCK_ASSERT(sb); T> + KASSERT(m->m_flags & M_NOTREADY, ("%s: m %p !M_NOTREADY", __func__, m)); T> + T> + m = m->m_next; T> + while (m != NULL && !(m->m_flags & M_NOTREADY)) { T> + m->m_flags &= ~M_BLOCKED; T> + sb->sb_acc += m->m_len; T> + m = m->m_next; T> + } T> + T> + sb->sb_fnrdy = m; T> +} T> + T> +int T> +sbready(struct sockbuf *sb, struct mbuf *m, int count) T> +{ T> + u_int blocker; T> + T> + SOCKBUF_LOCK(sb); T> + T> + if (sb->sb_state & SBS_CANTSENDMORE) { T> + SOCKBUF_UNLOCK(sb); T> + return (ENOTCONN); T> + } T> + T> + KASSERT(sb->sb_fnrdy != NULL, ("%s: sb %p NULL fnrdy", __func__, sb)); T> + T> + blocker = (sb->sb_fnrdy == m) ? M_BLOCKED : 0; T> + T> + for (int i = 0; i < count; i++, m = m->m_next) { T> + KASSERT(m->m_flags & M_NOTREADY, T> + ("%s: m %p !M_NOTREADY", __func__, m)); T> + m->m_flags &= ~(M_NOTREADY | blocker); T> + if (blocker) T> + sb->sb_acc += m->m_len; T> + } T> + T> + if (!blocker) { T> + SOCKBUF_UNLOCK(sb); T> + return (EWOULDBLOCK); T> + } T> + T> + /* This one was blocking all the queue. */ T> + for (; m && (m->m_flags & M_NOTREADY) == 0; m = m->m_next) { T> + KASSERT(m->m_flags & M_BLOCKED, T> + ("%s: m %p !M_BLOCKED", __func__, m)); T> + m->m_flags &= ~M_BLOCKED; T> + sb->sb_acc += m->m_len; T> + } T> + T> + sb->sb_fnrdy = m; T> + T> + SOCKBUF_UNLOCK(sb); T> + T> + return (0); T> +} T> + T> /* T> + * Adjust sockbuf state reflecting allocation of m. T> + */ T> +void T> +sballoc(struct sockbuf *sb, struct mbuf *m) T> +{ T> + T> + SOCKBUF_LOCK_ASSERT(sb); T> + T> + sb->sb_ccc += m->m_len; T> + T> + if (sb->sb_fnrdy == NULL) { T> + if (m->m_flags & M_NOTREADY) T> + sb->sb_fnrdy = m; T> + else T> + sb->sb_acc += m->m_len; T> + } else T> + m->m_flags |= M_BLOCKED; T> + T> + if (m->m_type != MT_DATA && m->m_type != MT_OOBDATA) T> + sb->sb_ctl += m->m_len; T> + T> + sb->sb_mbcnt += MSIZE; T> + sb->sb_mcnt += 1; T> + T> + if (m->m_flags & M_EXT) { T> + sb->sb_mbcnt += m->m_ext.ext_size; T> + sb->sb_ccnt += 1; T> + } T> +} T> + T> +/* T> + * Adjust sockbuf state reflecting freeing of m. T> + */ T> +void T> +sbfree(struct sockbuf *sb, struct mbuf *m) T> +{ T> + T> +#if 0 /* XXX: not yet: soclose() call path comes here w/o lock. */ T> + SOCKBUF_LOCK_ASSERT(sb); T> +#endif T> + T> + sb->sb_ccc -= m->m_len; T> + T> + if (!(m->m_flags & M_NOTAVAIL)) T> + sb->sb_acc -= m->m_len; T> + T> + if (sb->sb_fnrdy == m) T> + sb_shift_nrdy(sb, m); T> + T> + if (m->m_type != MT_DATA && m->m_type != MT_OOBDATA) T> + sb->sb_ctl -= m->m_len; T> + T> + sb->sb_mbcnt -= MSIZE; T> + sb->sb_mcnt -= 1; T> + if (m->m_flags & M_EXT) { T> + sb->sb_mbcnt -= m->m_ext.ext_size; T> + sb->sb_ccnt -= 1; T> + } T> + T> + if (sb->sb_sndptr == m) { T> + sb->sb_sndptr = NULL; T> + sb->sb_sndptroff = 0; T> + } T> + if (sb->sb_sndptroff != 0) T> + sb->sb_sndptroff -= m->m_len; T> +} T> + T> +/* T> + * Trim some amount of data from (first?) mbuf in buffer. T> + */ T> +void T> +sbmtrim(struct sockbuf *sb, struct mbuf *m, int len) T> +{ T> + T> + SOCKBUF_LOCK_ASSERT(sb); T> + KASSERT(len < m->m_len, ("%s: m %p len %d", __func__, m, len)); T> + T> + m->m_data += len; T> + m->m_len -= len; T> + sb->sb_acc -= len; T> + sb->sb_ccc -= len; T> +} T> + T> +/* T> * Socantsendmore indicates that no more data will be sent on the socket; it T> * would normally be applied to a socket when the user informs the system T> * that no more data is to be sent, by the protocol code (in case T> @@ -127,7 +272,7 @@ sbwait(struct sockbuf *sb) T> SOCKBUF_LOCK_ASSERT(sb); T> T> sb->sb_flags |= SB_WAIT; T> - return (msleep_sbt(&sb->sb_cc, &sb->sb_mtx, T> + return (msleep_sbt(&sb->sb_acc, &sb->sb_mtx, T> (sb->sb_flags & SB_NOINTR) ? PSOCK : PSOCK | PCATCH, "sbwait", T> sb->sb_timeo, 0, 0)); T> } T> @@ -184,7 +329,7 @@ sowakeup(struct socket *so, struct sockbuf *sb) T> sb->sb_flags &= ~SB_SEL; T> if (sb->sb_flags & SB_WAIT) { T> sb->sb_flags &= ~SB_WAIT; T> - wakeup(&sb->sb_cc); T> + wakeup(&sb->sb_acc); T> } T> KNOTE_LOCKED(&sb->sb_sel.si_note, 0); T> if (sb->sb_upcall != NULL) { T> @@ -519,7 +664,7 @@ sbappend(struct sockbuf *sb, struct mbuf *m) T> * that is, a stream protocol (such as TCP). T> */ T> void T> -sbappendstream_locked(struct sockbuf *sb, struct mbuf *m) T> +sbappendstream_locked(struct sockbuf *sb, struct mbuf *m, int flags) T> { T> SOCKBUF_LOCK_ASSERT(sb); T> T> @@ -529,8 +674,8 @@ void T> SBLASTMBUFCHK(sb); T> T> /* Remove all packet headers and mbuf tags to get a pure data chain. */ T> - m_demote(m, 1); T> - T> + m_demote(m, 1, flags & PRUS_NOTREADY ? M_NOTREADY : 0); T> + T> sbcompress(sb, m, sb->sb_mbtail); T> T> sb->sb_lastrecord = sb->sb_mb; T> @@ -543,38 +688,59 @@ void T> * that is, a stream protocol (such as TCP). T> */ T> void T> -sbappendstream(struct sockbuf *sb, struct mbuf *m) T> +sbappendstream(struct sockbuf *sb, struct mbuf *m, int flags) T> { T> T> SOCKBUF_LOCK(sb); T> - sbappendstream_locked(sb, m); T> + sbappendstream_locked(sb, m, flags); T> SOCKBUF_UNLOCK(sb); T> } T> T> #ifdef SOCKBUF_DEBUG T> void T> -sbcheck(struct sockbuf *sb) T> +sbcheck(struct sockbuf *sb, const char *file, int line) T> { T> - struct mbuf *m; T> - struct mbuf *n = 0; T> - u_long len = 0, mbcnt = 0; T> + struct mbuf *m, *n, *fnrdy; T> + u_long acc, ccc, mbcnt; T> T> SOCKBUF_LOCK_ASSERT(sb); T> T> + acc = ccc = mbcnt = 0; T> + fnrdy = NULL; T> + T> for (m = sb->sb_mb; m; m = n) { T> n = m->m_nextpkt; T> for (; m; m = m->m_next) { T> - len += m->m_len; T> + if ((m->m_flags & M_NOTREADY) && fnrdy == NULL) { T> + if (m != sb->sb_fnrdy) { T> + printf("sb %p: fnrdy %p != m %p\n", T> + sb, sb->sb_fnrdy, m); T> + goto fail; T> + } T> + fnrdy = m; T> + } T> + if (fnrdy) { T> + if (!(m->m_flags & M_NOTAVAIL)) { T> + printf("sb %p: fnrdy %p, m %p is avail\n", T> + sb, sb->sb_fnrdy, m); T> + goto fail; T> + } T> + } else T> + acc += m->m_len; T> + ccc += m->m_len; T> mbcnt += MSIZE; T> if (m->m_flags & M_EXT) /*XXX*/ /* pretty sure this is bogus */ T> mbcnt += m->m_ext.ext_size; T> } T> } T> - if (len != sb->sb_cc || mbcnt != sb->sb_mbcnt) { T> - printf("cc %ld != %u || mbcnt %ld != %u\n", len, sb->sb_cc, T> - mbcnt, sb->sb_mbcnt); T> - panic("sbcheck"); T> + if (acc != sb->sb_acc || ccc != sb->sb_ccc || mbcnt != sb->sb_mbcnt) { T> + printf("acc %ld/%u ccc %ld/%u mbcnt %ld/%u\n", T> + acc, sb->sb_acc, ccc, sb->sb_ccc, mbcnt, sb->sb_mbcnt); T> + goto fail; T> } T> + return; T> +fail: T> + panic("%s from %s:%u", __func__, file, line); T> } T> #endif T> T> @@ -800,6 +966,7 @@ sbcompress(struct sockbuf *sb, struct mbuf *m, str T> if (n && (n->m_flags & M_EOR) == 0 && T> M_WRITABLE(n) && T> ((sb->sb_flags & SB_NOCOALESCE) == 0) && T> + !(m->m_flags & M_NOTREADY) && T> m->m_len <= MCLBYTES / 4 && /* XXX: Don't copy too much */ T> m->m_len <= M_TRAILINGSPACE(n) && T> n->m_type == m->m_type) { T> @@ -806,7 +973,9 @@ sbcompress(struct sockbuf *sb, struct mbuf *m, str T> bcopy(mtod(m, caddr_t), mtod(n, caddr_t) + n->m_len, T> (unsigned)m->m_len); T> n->m_len += m->m_len; T> - sb->sb_cc += m->m_len; T> + sb->sb_ccc += m->m_len; T> + if (sb->sb_fnrdy == NULL) T> + sb->sb_acc += m->m_len; T> if (m->m_type != MT_DATA && m->m_type != MT_OOBDATA) T> /* XXX: Probably don't need.*/ T> sb->sb_ctl += m->m_len; T> @@ -843,13 +1012,13 @@ sbflush_internal(struct sockbuf *sb) T> * Don't call sbcut(sb, 0) if the leading mbuf is non-empty: T> * we would loop forever. Panic instead. T> */ T> - if (!sb->sb_cc && (sb->sb_mb == NULL || sb->sb_mb->m_len)) T> + if (sb->sb_ccc == 0 && (sb->sb_mb == NULL || sb->sb_mb->m_len)) T> break; T> - m_freem(sbcut_internal(sb, (int)sb->sb_cc)); T> + m_freem(sbcut_internal(sb, (int)sb->sb_ccc)); T> } T> - if (sb->sb_cc || sb->sb_mb || sb->sb_mbcnt) T> - panic("sbflush_internal: cc %u || mb %p || mbcnt %u", T> - sb->sb_cc, (void *)sb->sb_mb, sb->sb_mbcnt); T> + KASSERT(sb->sb_ccc == 0 && sb->sb_mb == 0 && sb->sb_mbcnt == 0, T> + ("%s: ccc %u mb %p mbcnt %u", __func__, T> + sb->sb_ccc, (void *)sb->sb_mb, sb->sb_mbcnt)); T> } T> T> void T> @@ -891,7 +1060,9 @@ sbcut_internal(struct sockbuf *sb, int len) T> if (m->m_len > len) { T> m->m_len -= len; T> m->m_data += len; T> - sb->sb_cc -= len; T> + sb->sb_ccc -= len; T> + if (!(m->m_flags & M_NOTAVAIL)) T> + sb->sb_acc -= len; T> if (sb->sb_sndptroff != 0) T> sb->sb_sndptroff -= len; T> if (m->m_type != MT_DATA && m->m_type != MT_OOBDATA) T> @@ -977,8 +1148,8 @@ sbsndptr(struct sockbuf *sb, u_int off, u_int len, T> struct mbuf *m, *ret; T> T> KASSERT(sb->sb_mb != NULL, ("%s: sb_mb is NULL", __func__)); T> - KASSERT(off + len <= sb->sb_cc, ("%s: beyond sb", __func__)); T> - KASSERT(sb->sb_sndptroff <= sb->sb_cc, ("%s: sndptroff broken", __func__)); T> + KASSERT(off + len <= sb->sb_acc, ("%s: beyond sb", __func__)); T> + KASSERT(sb->sb_sndptroff <= sb->sb_acc, ("%s: sndptroff broken", __func__)); T> T> /* T> * Is off below stored offset? Happens on retransmits. T> @@ -1091,7 +1262,7 @@ void T> sbtoxsockbuf(struct sockbuf *sb, struct xsockbuf *xsb) T> { T> T> - xsb->sb_cc = sb->sb_cc; T> + xsb->sb_cc = sb->sb_ccc; T> xsb->sb_hiwat = sb->sb_hiwat; T> xsb->sb_mbcnt = sb->sb_mbcnt; T> xsb->sb_mcnt = sb->sb_mcnt; T> Index: sys/kern/uipc_syscalls.c T> =================================================================== T> --- sys/kern/uipc_syscalls.c (.../head) (revision 266804) T> +++ sys/kern/uipc_syscalls.c (.../projects/sendfile) (revision 266807) T> @@ -132,9 +132,10 @@ static int filt_sfsync(struct knote *kn, long hint T> */ T> static SYSCTL_NODE(_kern_ipc, OID_AUTO, sendfile, CTLFLAG_RW, 0, T> "sendfile(2) tunables"); T> -static int sfreadahead = 1; T> + T> +static int sfreadahead = 0; T> SYSCTL_INT(_kern_ipc_sendfile, OID_AUTO, readahead, CTLFLAG_RW, T> - &sfreadahead, 0, "Number of sendfile(2) read-ahead MAXBSIZE blocks"); T> + &sfreadahead, 0, "Read this more pages than socket buffer can accept"); T> T> #ifdef SFSYNC_DEBUG T> static int sf_sync_debug = 0; T> @@ -1988,7 +1989,7 @@ filt_sfsync(struct knote *kn, long hint) T> * Detach mapped page and release resources back to the system. T> */ T> int T> -sf_buf_mext(struct mbuf *mb, void *addr, void *args) T> +sf_mext_free(struct mbuf *mb, void *addr, void *args) T> { T> vm_page_t m; T> struct sendfile_sync *sfs; T> @@ -2009,13 +2010,42 @@ int T> sfs = addr; T> sf_sync_deref(sfs); T> } T> - /* T> - * sfs may be invalid at this point, don't use it! T> - */ T> return (EXT_FREE_OK); T> } T> T> /* T> + * Same as above, but forces the page to be detached from the object T> + * and go into free pool. T> + */ T> +static int T> +sf_mext_free_nocache(struct mbuf *mb, void *addr, void *args) T> +{ T> + vm_page_t m; T> + struct sendfile_sync *sfs; T> + T> + m = sf_buf_page(args); T> + sf_buf_free(args); T> + vm_page_lock(m); T> + vm_page_unwire(m, 0); T> + if (m->wire_count == 0) { T> + vm_object_t obj; T> + T> + if ((obj = m->object) == NULL) T> + vm_page_free(m); T> + else if (!vm_page_xbusied(m) && VM_OBJECT_TRYWLOCK(obj)) { T> + vm_page_free(m); T> + VM_OBJECT_WUNLOCK(obj); T> + } T> + } T> + vm_page_unlock(m); T> + if (addr != NULL) { T> + sfs = addr; T> + sf_sync_deref(sfs); T> + } T> + return (EXT_FREE_OK); T> +} T> + T> +/* T> * Called to remove a reference to a sf_sync object. T> * T> * This is generally done during the mbuf free path to signify T> @@ -2608,106 +2638,181 @@ freebsd4_sendfile(struct thread *td, struct freebs T> } T> #endif /* COMPAT_FREEBSD4 */ T> T> + /* T> + * How much data to put into page i of n. T> + * Only first and last pages are special. T> + */ T> +static inline off_t T> +xfsize(int i, int n, off_t off, off_t len) T> +{ T> + T> + if (i == 0) T> + return (omin(PAGE_SIZE - (off & PAGE_MASK), len)); T> + T> + if (i == n - 1 && ((off + len) & PAGE_MASK) > 0) T> + return ((off + len) & PAGE_MASK); T> + T> + return (PAGE_SIZE); T> +} T> + T> +/* T> + * Offset within object for i page. T> + */ T> +static inline vm_offset_t T> +vmoff(int i, off_t off) T> +{ T> + T> + if (i == 0) T> + return ((vm_offset_t)off); T> + T> + return (trunc_page(off + i * PAGE_SIZE)); T> +} T> + T> +/* T> + * Pretend as if we don't have enough space, subtract xfsize() of T> + * all pages that failed. T> + */ T> +static inline void T> +fixspace(int old, int new, off_t off, int *space) T> +{ T> + T> + KASSERT(old > new, ("%s: old %d new %d", __func__, old, new)); T> + T> + /* Subtract last one. */ T> + *space -= xfsize(old - 1, old, off, *space); T> + old--; T> + T> + if (new == old) T> + /* There was only one page. */ T> + return; T> + T> + /* Subtract first one. */ T> + if (new == 0) { T> + *space -= xfsize(0, old, off, *space); T> + new++; T> + } T> + T> + /* Rest of pages are full sized. */ T> + *space -= (old - new) * PAGE_SIZE; T> + T> + KASSERT(*space >= 0, ("%s: space went backwards", __func__)); T> +} T> + T> +struct sf_io { T> + u_int nios; T> + int npages; T> + struct file *sock_fp; T> + struct mbuf *m; T> + vm_page_t pa[]; T> +}; T> + T> +static void T> +sf_io_done(void *arg) T> +{ T> + struct sf_io *sfio = arg; T> + struct socket *so; T> + T> + if (!refcount_release(&sfio->nios)) T> + return; T> + T> + so = sfio->sock_fp->f_data; T> + T> + if (sbready(&so->so_snd, sfio->m, sfio->npages) == 0) { T> + struct mbuf *m; T> + T> + m = m_get(M_NOWAIT, MT_DATA); T> + if (m == NULL) { T> + panic("XXXGL"); T> + } T> + m->m_len = 0; T> + CURVNET_SET(so->so_vnet); T> + /* XXXGL: curthread */ T> + (void )(so->so_proto->pr_usrreqs->pru_send) T> + (so, 0, m, NULL, NULL, curthread); T> + CURVNET_RESTORE(); T> + } T> + T> + /* XXXGL: curthread */ T> + fdrop(sfio->sock_fp, curthread); T> + free(sfio, M_TEMP); T> +} T> + T> static int T> -sendfile_readpage(vm_object_t obj, struct vnode *vp, int nd, T> - off_t off, int xfsize, int bsize, struct thread *td, vm_page_t *res) T> +sendfile_swapin(vm_object_t obj, struct sf_io *sfio, off_t off, off_t len, T> + int npages, int rhpages) T> { T> - vm_page_t m; T> - vm_pindex_t pindex; T> - ssize_t resid; T> - int error, readahead, rv; T> + vm_page_t *pa = sfio->pa; T> + int nios; T> T> - pindex = OFF_TO_IDX(off); T> + nios = 0; T> VM_OBJECT_WLOCK(obj); T> - m = vm_page_grab(obj, pindex, (vp != NULL ? VM_ALLOC_NOBUSY | T> - VM_ALLOC_IGN_SBUSY : 0) | VM_ALLOC_WIRED | VM_ALLOC_NORMAL); T> + for (int i = 0; i < npages; i++) T> + pa[i] = vm_page_grab(obj, OFF_TO_IDX(vmoff(i, off)), T> + VM_ALLOC_WIRED | VM_ALLOC_NORMAL); T> T> - /* T> - * Check if page is valid for what we need, otherwise initiate I/O. T> - * T> - * The non-zero nd argument prevents disk I/O, instead we T> - * return the caller what he specified in nd. In particular, T> - * if we already turned some pages into mbufs, nd == EAGAIN T> - * and the main function send them the pages before we come T> - * here again and block. T> - */ T> - if (m->valid != 0 && vm_page_is_valid(m, off & PAGE_MASK, xfsize)) { T> - if (vp == NULL) T> - vm_page_xunbusy(m); T> - VM_OBJECT_WUNLOCK(obj); T> - *res = m; T> - return (0); T> - } else if (nd != 0) { T> - if (vp == NULL) T> - vm_page_xunbusy(m); T> - error = nd; T> - goto free_page; T> - } T> + for (int i = 0; i < npages;) { T> + int j, a, count, rv; T> T> - /* T> - * Get the page from backing store. T> - */ T> - error = 0; T> - if (vp != NULL) { T> - VM_OBJECT_WUNLOCK(obj); T> - readahead = sfreadahead * MAXBSIZE; T> + if (vm_page_is_valid(pa[i], vmoff(i, off) & PAGE_MASK, T> + xfsize(i, npages, off, len))) { T> + vm_page_xunbusy(pa[i]); T> + i++; T> + continue; T> + } T> T> - /* T> - * Use vn_rdwr() instead of the pager interface for T> - * the vnode, to allow the read-ahead. T> - * T> - * XXXMAC: Because we don't have fp->f_cred here, we T> - * pass in NOCRED. This is probably wrong, but is T> - * consistent with our original implementation. T> - */ T> - error = vn_rdwr(UIO_READ, vp, NULL, readahead, trunc_page(off), T> - UIO_NOCOPY, IO_NODELOCKED | IO_VMIO | ((readahead / T> - bsize) << IO_SEQSHIFT), td->td_ucred, NOCRED, &resid, td); T> - SFSTAT_INC(sf_iocnt); T> - VM_OBJECT_WLOCK(obj); T> - } else { T> - if (vm_pager_has_page(obj, pindex, NULL, NULL)) { T> - rv = vm_pager_get_pages(obj, &m, 1, 0); T> - SFSTAT_INC(sf_iocnt); T> - m = vm_page_lookup(obj, pindex); T> - if (m == NULL) T> - error = EIO; T> - else if (rv != VM_PAGER_OK) { T> - vm_page_lock(m); T> - vm_page_free(m); T> - vm_page_unlock(m); T> - m = NULL; T> - error = EIO; T> + for (j = i + 1; j < npages; j++) T> + if (vm_page_is_valid(pa[j], vmoff(j, off) & PAGE_MASK, T> + xfsize(j, npages, off, len))) T> + break; T> + T> + while (!vm_pager_has_page(obj, OFF_TO_IDX(vmoff(i, off)), T> + NULL, &a) && i < j) { T> + pmap_zero_page(pa[i]); T> + pa[i]->valid = VM_PAGE_BITS_ALL; T> + pa[i]->dirty = 0; T> + vm_page_xunbusy(pa[i]); T> + i++; T> + } T> + if (i == j) T> + continue; T> + T> + count = min(a + 1, npages + rhpages - i); T> + for (j = npages; j < i + count; j++) { T> + pa[j] = vm_page_grab(obj, OFF_TO_IDX(vmoff(j, off)), T> + VM_ALLOC_NORMAL | VM_ALLOC_NOWAIT); T> + if (pa[j] == NULL) { T> + count = j - i; T> + break; T> } T> - } else { T> - pmap_zero_page(m); T> - m->valid = VM_PAGE_BITS_ALL; T> - m->dirty = 0; T> + if (pa[j]->valid) { T> + vm_page_xunbusy(pa[j]); T> + count = j - i; T> + break; T> + } T> } T> - if (m != NULL) T> - vm_page_xunbusy(m); T> + T> + refcount_acquire(&sfio->nios); T> + rv = vm_pager_get_pages_async(obj, pa + i, count, 0, T> + &sf_io_done, sfio); T> + T> + KASSERT(rv == VM_PAGER_OK, ("%s: pager fail obj %p page %p", T> + __func__, obj, pa[i])); T> + T> + SFSTAT_INC(sf_iocnt); T> + nios++; T> + T> + for (j = i; j < i + count && j < npages; j++) T> + KASSERT(pa[j] == vm_page_lookup(obj, T> + OFF_TO_IDX(vmoff(j, off))), T> + ("pa[j] %p lookup %p\n", pa[j], T> + vm_page_lookup(obj, OFF_TO_IDX(vmoff(j, off))))); T> + T> + i += count; T> } T> - if (error == 0) { T> - *res = m; T> - } else if (m != NULL) { T> -free_page: T> - vm_page_lock(m); T> - vm_page_unwire(m, 0); T> T> - /* T> - * See if anyone else might know about this page. If T> - * not and it is not valid, then free it. T> - */ T> - if (m->wire_count == 0 && m->valid == 0 && !vm_page_busied(m)) T> - vm_page_free(m); T> - vm_page_unlock(m); T> - } T> - KASSERT(error != 0 || (m->wire_count > 0 && T> - vm_page_is_valid(m, off & PAGE_MASK, xfsize)), T> - ("wrong page state m %p off %#jx xfsize %d", m, (uintmax_t)off, T> - xfsize)); T> VM_OBJECT_WUNLOCK(obj); T> - return (error); T> + T> + return (nios); T> } T> T> static int T> @@ -2814,41 +2919,26 @@ vn_sendfile(struct file *fp, int sockfd, struct ui T> struct vnode *vp; T> struct vm_object *obj; T> struct socket *so; T> - struct mbuf *m; T> + struct mbuf *m, *mh, *mhtail; T> struct sf_buf *sf; T> - struct vm_page *pg; T> struct shmfd *shmfd; T> struct vattr va; T> - off_t off, xfsize, fsbytes, sbytes, rem, obj_size; T> - int error, bsize, nd, hdrlen, mnw; T> + off_t off, sbytes, rem, obj_size; T> + int error, serror, bsize, hdrlen; T> T> - pg = NULL; T> obj = NULL; T> so = NULL; T> - m = NULL; T> - fsbytes = sbytes = 0; T> - hdrlen = mnw = 0; T> - rem = nbytes; T> - obj_size = 0; T> + m = mh = NULL; T> + sbytes = 0; T> T> error = sendfile_getobj(td, fp, &obj, &vp, &shmfd, &obj_size, &bsize); T> if (error != 0) T> return (error); T> - if (rem == 0) T> - rem = obj_size; T> T> error = kern_sendfile_getsock(td, sockfd, &sock_fp, &so); T> if (error != 0) T> goto out; T> T> - /* T> - * Do not wait on memory allocations but return ENOMEM for T> - * caller to retry later. T> - * XXX: Experimental. T> - */ T> - if (flags & SF_MNOWAIT) T> - mnw = 1; T> - T> #ifdef MAC T> error = mac_socket_check_send(td->td_ucred, so); T> if (error != 0) T> @@ -2856,31 +2946,27 @@ vn_sendfile(struct file *fp, int sockfd, struct ui T> #endif T> T> /* If headers are specified copy them into mbufs. */ T> - if (hdr_uio != NULL) { T> + if (hdr_uio != NULL && hdr_uio->uio_resid > 0) { T> hdr_uio->uio_td = td; T> hdr_uio->uio_rw = UIO_WRITE; T> - if (hdr_uio->uio_resid > 0) { T> - /* T> - * In FBSD < 5.0 the nbytes to send also included T> - * the header. If compat is specified subtract the T> - * header size from nbytes. T> - */ T> - if (kflags & SFK_COMPAT) { T> - if (nbytes > hdr_uio->uio_resid) T> - nbytes -= hdr_uio->uio_resid; T> - else T> - nbytes = 0; T> - } T> - m = m_uiotombuf(hdr_uio, (mnw ? M_NOWAIT : M_WAITOK), T> - 0, 0, 0); T> - if (m == NULL) { T> - error = mnw ? EAGAIN : ENOBUFS; T> - goto out; T> - } T> - hdrlen = m_length(m, NULL); T> + /* T> + * In FBSD < 5.0 the nbytes to send also included T> + * the header. If compat is specified subtract the T> + * header size from nbytes. T> + */ T> + if (kflags & SFK_COMPAT) { T> + if (nbytes > hdr_uio->uio_resid) T> + nbytes -= hdr_uio->uio_resid; T> + else T> + nbytes = 0; T> } T> - } T> + mh = m_uiotombuf(hdr_uio, M_WAITOK, 0, 0, 0); T> + hdrlen = m_length(mh, &mhtail); T> + } else T> + hdrlen = 0; T> T> + rem = nbytes ? omin(nbytes, obj_size - offset) : obj_size - offset; T> + T> /* T> * Protect against multiple writers to the socket. T> * T> @@ -2900,21 +2986,13 @@ vn_sendfile(struct file *fp, int sockfd, struct ui T> * The outer loop checks the state and available space of the socket T> * and takes care of the overall progress. T> */ T> - for (off = offset; ; ) { T> + for (off = offset; rem > 0; ) { T> + struct sf_io *sfio; T> + vm_page_t *pa; T> struct mbuf *mtail; T> - int loopbytes; T> - int space; T> - int done; T> + int nios, space, npages, rhpages; T> T> - if ((nbytes != 0 && nbytes == fsbytes) || T> - (nbytes == 0 && obj_size == fsbytes)) T> - break; T> - T> mtail = NULL; T> - loopbytes = 0; T> - space = 0; T> - done = 0; T> - T> /* T> * Check the socket state for ongoing connection, T> * no errors and space in socket buffer. T> @@ -2990,53 +3068,44 @@ retry_space: T> VOP_UNLOCK(vp, 0); T> goto done; T> } T> - obj_size = va.va_size; T> + if (va.va_size != obj_size) { T> + if (nbytes == 0) T> + rem += va.va_size - obj_size; T> + else if (offset + nbytes > va.va_size) T> + rem -= (offset + nbytes - va.va_size); T> + obj_size = va.va_size; T> + } T> } T> T> + if (space > rem) T> + space = rem; T> + T> + if (off & PAGE_MASK) T> + npages = 1 + howmany(space - T> + (PAGE_SIZE - (off & PAGE_MASK)), PAGE_SIZE); T> + else T> + npages = howmany(space, PAGE_SIZE); T> + T> + rhpages = SF_READAHEAD(flags) ? T> + SF_READAHEAD(flags) : sfreadahead; T> + rhpages = min(howmany(obj_size - (off & ~PAGE_MASK) - T> + (npages * PAGE_SIZE), PAGE_SIZE), rhpages); T> + T> + sfio = malloc(sizeof(struct sf_io) + T> + (rhpages + npages) * sizeof(vm_page_t), M_TEMP, M_WAITOK); T> + refcount_init(&sfio->nios, 1); T> + T> + nios = sendfile_swapin(obj, sfio, off, space, npages, rhpages); T> + T> /* T> * Loop and construct maximum sized mbuf chain to be bulk T> * dumped into socket buffer. T> */ T> - while (space > loopbytes) { T> - vm_offset_t pgoff; T> + pa = sfio->pa; T> + for (int i = 0; i < npages; i++) { T> struct mbuf *m0; T> T> /* T> - * Calculate the amount to transfer. T> - * Not to exceed a page, the EOF, T> - * or the passed in nbytes. T> - */ T> - pgoff = (vm_offset_t)(off & PAGE_MASK); T> - rem = obj_size - offset; T> - if (nbytes != 0) T> - rem = omin(rem, nbytes); T> - rem -= fsbytes + loopbytes; T> - xfsize = omin(PAGE_SIZE - pgoff, rem); T> - xfsize = omin(space - loopbytes, xfsize); T> - if (xfsize <= 0) { T> - done = 1; /* all data sent */ T> - break; T> - } T> - T> - /* T> - * Attempt to look up the page. Allocate T> - * if not found or wait and loop if busy. T> - */ T> - if (m != NULL) T> - nd = EAGAIN; /* send what we already got */ T> - else if ((flags & SF_NODISKIO) != 0) T> - nd = EBUSY; T> - else T> - nd = 0; T> - error = sendfile_readpage(obj, vp, nd, off, T> - xfsize, bsize, td, &pg); T> - if (error != 0) { T> - if (error == EAGAIN) T> - error = 0; /* not a real error */ T> - break; T> - } T> - T> - /* T> * Get a sendfile buf. When allocating the T> * first buffer for mbuf chain, we usually T> * wait as long as necessary, but this wait T> @@ -3045,17 +3114,18 @@ retry_space: T> * threads might exhaust the buffers and then T> * deadlock. T> */ T> - sf = sf_buf_alloc(pg, (mnw || m != NULL) ? SFB_NOWAIT : T> - SFB_CATCH); T> + sf = sf_buf_alloc(pa[i], T> + m != NULL ? SFB_NOWAIT : SFB_CATCH); T> if (sf == NULL) { T> SFSTAT_INC(sf_allocfail); T> - vm_page_lock(pg); T> - vm_page_unwire(pg, 0); T> - KASSERT(pg->object != NULL, T> - ("%s: object disappeared", __func__)); T> - vm_page_unlock(pg); T> + for (int j = i; j < npages; j++) { T> + vm_page_lock(pa[j]); T> + vm_page_unwire(pa[j], 0); T> + vm_page_unlock(pa[j]); T> + } T> if (m == NULL) T> - error = (mnw ? EAGAIN : EINTR); T> + error = ENOBUFS; T> + fixspace(npages, i, off, &space); T> break; T> } T> T> @@ -3063,36 +3133,26 @@ retry_space: T> * Get an mbuf and set it up as having T> * external storage. T> */ T> - m0 = m_get((mnw ? M_NOWAIT : M_WAITOK), MT_DATA); T> - if (m0 == NULL) { T> - error = (mnw ? EAGAIN : ENOBUFS); T> - (void)sf_buf_mext(NULL, NULL, sf); T> - break; T> - } T> - if (m_extadd(m0, (caddr_t )sf_buf_kva(sf), PAGE_SIZE, T> - sf_buf_mext, sfs, sf, M_RDONLY, EXT_SFBUF, T> - (mnw ? M_NOWAIT : M_WAITOK)) != 0) { T> - error = (mnw ? EAGAIN : ENOBUFS); T> - (void)sf_buf_mext(NULL, NULL, sf); T> - m_freem(m0); T> - break; T> - } T> - m0->m_data = (char *)sf_buf_kva(sf) + pgoff; T> - m0->m_len = xfsize; T> + m0 = m_get(M_WAITOK, MT_DATA); T> + (void )m_extadd(m0, (caddr_t )sf_buf_kva(sf), PAGE_SIZE, T> + (flags & SF_NOCACHE) ? sf_mext_free_nocache : T> + sf_mext_free, sfs, sf, M_RDONLY, EXT_SFBUF, T> + M_WAITOK); T> + m0->m_data = (char *)sf_buf_kva(sf) + T> + (vmoff(i, off) & PAGE_MASK); T> + m0->m_len = xfsize(i, npages, off, space); T> + m0->m_flags |= M_NOTREADY; T> T> + if (i == 0) T> + sfio->m = m0; T> + T> /* Append to mbuf chain. */ T> if (mtail != NULL) T> mtail->m_next = m0; T> - else if (m != NULL) T> - m_last(m)->m_next = m0; T> else T> m = m0; T> mtail = m0; T> T> - /* Keep track of bits processed. */ T> - loopbytes += xfsize; T> - off += xfsize; T> - T> /* T> * XXX eventually this should be a sfsync T> * method call! T> @@ -3104,47 +3164,51 @@ retry_space: T> if (vp != NULL) T> VOP_UNLOCK(vp, 0); T> T> + /* Keep track of bytes processed. */ T> + off += space; T> + rem -= space; T> + T> + /* Prepend header, if any. */ T> + if (hdrlen) { T> + mhtail->m_next = m; T> + m = mh; T> + mh = NULL; T> + } T> + T> + if (error) { T> + free(sfio, M_TEMP); T> + goto done; T> + } T> + T> /* Add the buffer chain to the socket buffer. */ T> - if (m != NULL) { T> - int mlen, err; T> + KASSERT(m_length(m, NULL) == space + hdrlen, T> + ("%s: mlen %u space %d hdrlen %d", T> + __func__, m_length(m, NULL), space, hdrlen)); T> T> - mlen = m_length(m, NULL); T> - SOCKBUF_LOCK(&so->so_snd); T> - if (so->so_snd.sb_state & SBS_CANTSENDMORE) { T> - error = EPIPE; T> - SOCKBUF_UNLOCK(&so->so_snd); T> - goto done; T> - } T> - SOCKBUF_UNLOCK(&so->so_snd); T> - CURVNET_SET(so->so_vnet); T> - /* Avoid error aliasing. */ T> - err = (*so->so_proto->pr_usrreqs->pru_send) T> - (so, 0, m, NULL, NULL, td); T> - CURVNET_RESTORE(); T> - if (err == 0) { T> - /* T> - * We need two counters to get the T> - * file offset and nbytes to send T> - * right: T> - * - sbytes contains the total amount T> - * of bytes sent, including headers. T> - * - fsbytes contains the total amount T> - * of bytes sent from the file. T> - */ T> - sbytes += mlen; T> - fsbytes += mlen; T> - if (hdrlen) { T> - fsbytes -= hdrlen; T> - hdrlen = 0; T> - } T> - } else if (error == 0) T> - error = err; T> - m = NULL; /* pru_send always consumes */ T> + CURVNET_SET(so->so_vnet); T> + if (nios == 0) { T> + free(sfio, M_TEMP); T> + serror = (*so->so_proto->pr_usrreqs->pru_send) T> + (so, 0, m, NULL, NULL, td); T> + } else { T> + sfio->sock_fp = sock_fp; T> + sfio->npages = npages; T> + fhold(sock_fp); T> + serror = (*so->so_proto->pr_usrreqs->pru_send) T> + (so, PRUS_NOTREADY, m, NULL, NULL, td); T> + sf_io_done(sfio); T> } T> + CURVNET_RESTORE(); T> T> - /* Quit outer loop on error or when we're done. */ T> - if (done) T> - break; T> + if (serror == 0) { T> + sbytes += space + hdrlen; T> + if (hdrlen) T> + hdrlen = 0; T> + } else if (error == 0) T> + error = serror; T> + m = NULL; /* pru_send always consumes */ T> + T> + /* Quit outer loop on error. */ T> if (error != 0) T> goto done; T> } T> @@ -3179,6 +3243,8 @@ out: T> fdrop(sock_fp, td); T> if (m) T> m_freem(m); T> + if (mh) T> + m_freem(mh); T> T> if (error == ERESTART) T> error = EINTR; T> Index: sys/netgraph/bluetooth/socket/ng_btsocket_l2cap.c T> =================================================================== T> --- sys/netgraph/bluetooth/socket/ng_btsocket_l2cap.c (.../head) (revision 266804) T> +++ sys/netgraph/bluetooth/socket/ng_btsocket_l2cap.c (.../projects/sendfile) (revision 266807) T> @@ -1127,9 +1127,8 @@ ng_btsocket_l2cap_process_l2ca_write_rsp(struct ng T> /* T> * Check if we have more data to send T> */ T> - T> sbdroprecord(&pcb->so->so_snd); T> - if (pcb->so->so_snd.sb_cc > 0) { T> + if (sbavail(&pcb->so->so_snd) > 0) { T> if (ng_btsocket_l2cap_send2(pcb) == 0) T> ng_btsocket_l2cap_timeout(pcb); T> else T> @@ -2510,7 +2509,7 @@ ng_btsocket_l2cap_send2(ng_btsocket_l2cap_pcb_p pc T> T> mtx_assert(&pcb->pcb_mtx, MA_OWNED); T> T> - if (pcb->so->so_snd.sb_cc == 0) T> + if (sbavail(&pcb->so->so_snd) == 0) T> return (EINVAL); /* XXX */ T> T> m = m_dup(pcb->so->so_snd.sb_mb, M_NOWAIT); T> Index: sys/netgraph/bluetooth/socket/ng_btsocket_rfcomm.c T> =================================================================== T> --- sys/netgraph/bluetooth/socket/ng_btsocket_rfcomm.c (.../head) (revision 266804) T> +++ sys/netgraph/bluetooth/socket/ng_btsocket_rfcomm.c (.../projects/sendfile) (revision 266807) T> @@ -3274,7 +3274,7 @@ ng_btsocket_rfcomm_pcb_send(ng_btsocket_rfcomm_pcb T> } T> T> for (error = 0, sent = 0; sent < limit; sent ++) { T> - length = min(pcb->mtu, pcb->so->so_snd.sb_cc); T> + length = min(pcb->mtu, sbavail(&pcb->so->so_snd)); T> if (length == 0) T> break; T> T> Index: sys/netgraph/bluetooth/socket/ng_btsocket_sco.c T> =================================================================== T> --- sys/netgraph/bluetooth/socket/ng_btsocket_sco.c (.../head) (revision 266804) T> +++ sys/netgraph/bluetooth/socket/ng_btsocket_sco.c (.../projects/sendfile) (revision 266807) T> @@ -906,7 +906,7 @@ ng_btsocket_sco_default_msg_input(struct ng_mesg * T> sbdroprecord(&pcb->so->so_snd); T> T> /* Send more if we have any */ T> - if (pcb->so->so_snd.sb_cc > 0) T> + if (sbavail(&pcb->so->so_snd) > 0) T> if (ng_btsocket_sco_send2(pcb) == 0) T> ng_btsocket_sco_timeout(pcb); T> T> @@ -1744,7 +1744,7 @@ ng_btsocket_sco_send2(ng_btsocket_sco_pcb_p pcb) T> mtx_assert(&pcb->pcb_mtx, MA_OWNED); T> T> while (pcb->rt->pending < pcb->rt->num_pkts && T> - pcb->so->so_snd.sb_cc > 0) { T> + sbavail(&pcb->so->so_snd) > 0) { T> /* Get a copy of the first packet on send queue */ T> m = m_dup(pcb->so->so_snd.sb_mb, M_NOWAIT); T> if (m == NULL) { T> Index: sys/ofed/drivers/infiniband/ulp/sdp/sdp_main.c T> =================================================================== T> --- sys/ofed/drivers/infiniband/ulp/sdp/sdp_main.c (.../head) (revision 266804) T> +++ sys/ofed/drivers/infiniband/ulp/sdp/sdp_main.c (.../projects/sendfile) (revision 266807) T> @@ -746,7 +746,7 @@ sdp_start_disconnect(struct sdp_sock *ssk) T> ("sdp_start_disconnect: sdp_drop() returned NULL")); T> } else { T> soisdisconnecting(so); T> - unread = so->so_rcv.sb_cc; T> + unread = sbused(&so->so_rcv); T> sbflush(&so->so_rcv); T> sdp_usrclosed(ssk); T> if (!(ssk->flags & SDP_DROPPED)) { T> @@ -888,7 +888,7 @@ sdp_append(struct sdp_sock *ssk, struct sockbuf *s T> m_adj(mb, SDP_HEAD_SIZE); T> n->m_pkthdr.len += mb->m_pkthdr.len; T> n->m_flags |= mb->m_flags & (M_PUSH | M_URG); T> - m_demote(mb, 1); T> + m_demote(mb, 1, 0); T> sbcompress(sb, mb, sb->sb_mbtail); T> return; T> } T> @@ -1258,7 +1258,7 @@ sdp_sorecv(struct socket *so, struct sockaddr **ps T> /* We will never ever get anything unless we are connected. */ T> if (!(so->so_state & (SS_ISCONNECTED|SS_ISDISCONNECTED))) { T> /* When disconnecting there may be still some data left. */ T> - if (sb->sb_cc > 0) T> + if (sbavail(sb)) T> goto deliver; T> if (!(so->so_state & SS_ISDISCONNECTED)) T> error = ENOTCONN; T> @@ -1266,7 +1266,7 @@ sdp_sorecv(struct socket *so, struct sockaddr **ps T> } T> T> /* Socket buffer is empty and we shall not block. */ T> - if (sb->sb_cc == 0 && T> + if (sbavail(sb) == 0 && T> ((so->so_state & SS_NBIO) || (flags & (MSG_DONTWAIT|MSG_NBIO)))) { T> error = EAGAIN; T> goto out; T> @@ -1277,7 +1277,7 @@ restart: T> T> /* Abort if socket has reported problems. */ T> if (so->so_error) { T> - if (sb->sb_cc > 0) T> + if (sbavail(sb)) T> goto deliver; T> if (oresid > uio->uio_resid) T> goto out; T> @@ -1289,7 +1289,7 @@ restart: T> T> /* Door is closed. Deliver what is left, if any. */ T> if (sb->sb_state & SBS_CANTRCVMORE) { T> - if (sb->sb_cc > 0) T> + if (sbavail(sb)) T> goto deliver; T> else T> goto out; T> @@ -1296,18 +1296,18 @@ restart: T> } T> T> /* Socket buffer got some data that we shall deliver now. */ T> - if (sb->sb_cc > 0 && !(flags & MSG_WAITALL) && T> + if (sbavail(sb) && !(flags & MSG_WAITALL) && T> ((so->so_state & SS_NBIO) || T> (flags & (MSG_DONTWAIT|MSG_NBIO)) || T> - sb->sb_cc >= sb->sb_lowat || T> - sb->sb_cc >= uio->uio_resid || T> - sb->sb_cc >= sb->sb_hiwat) ) { T> + sbavail(sb) >= sb->sb_lowat || T> + sbavail(sb) >= uio->uio_resid || T> + sbavail(sb) >= sb->sb_hiwat) ) { T> goto deliver; T> } T> T> /* On MSG_WAITALL we must wait until all data or error arrives. */ T> if ((flags & MSG_WAITALL) && T> - (sb->sb_cc >= uio->uio_resid || sb->sb_cc >= sb->sb_lowat)) T> + (sbavail(sb) >= uio->uio_resid || sbavail(sb) >= sb->sb_lowat)) T> goto deliver; T> T> /* T> @@ -1321,7 +1321,7 @@ restart: T> T> deliver: T> SOCKBUF_LOCK_ASSERT(&so->so_rcv); T> - KASSERT(sb->sb_cc > 0, ("%s: sockbuf empty", __func__)); T> + KASSERT(sbavail(sb), ("%s: sockbuf empty", __func__)); T> KASSERT(sb->sb_mb != NULL, ("%s: sb_mb == NULL", __func__)); T> T> /* Statistics. */ T> @@ -1329,7 +1329,7 @@ deliver: T> uio->uio_td->td_ru.ru_msgrcv++; T> T> /* Fill uio until full or current end of socket buffer is reached. */ T> - len = min(uio->uio_resid, sb->sb_cc); T> + len = min(uio->uio_resid, sbavail(sb)); T> if (mp0 != NULL) { T> /* Dequeue as many mbufs as possible. */ T> if (!(flags & MSG_PEEK) && len >= sb->sb_mb->m_len) { T> @@ -1509,7 +1509,7 @@ sdp_urg(struct sdp_sock *ssk, struct mbuf *mb) T> if (so == NULL) T> return; T> T> - so->so_oobmark = so->so_rcv.sb_cc + mb->m_pkthdr.len - 1; T> + so->so_oobmark = sbused(&so->so_rcv) + mb->m_pkthdr.len - 1; T> sohasoutofband(so); T> ssk->oobflags &= ~(SDP_HAVEOOB | SDP_HADOOB); T> if (!(so->so_options & SO_OOBINLINE)) { T> Index: sys/ofed/drivers/infiniband/ulp/sdp/sdp_rx.c T> =================================================================== T> --- sys/ofed/drivers/infiniband/ulp/sdp/sdp_rx.c (.../head) (revision 266804) T> +++ sys/ofed/drivers/infiniband/ulp/sdp/sdp_rx.c (.../projects/sendfile) (revision 266807) T> @@ -183,7 +183,7 @@ sdp_post_recvs_needed(struct sdp_sock *ssk) T> * Compute bytes in the receive queue and socket buffer. T> */ T> bytes_in_process = (posted - SDP_MIN_TX_CREDITS) * buffer_size; T> - bytes_in_process += ssk->socket->so_rcv.sb_cc; T> + bytes_in_process += sbused(&ssk->socket->so_rcv); T> T> return bytes_in_process < max_bytes; T> } T> Index: sys/sys/socket.h T> =================================================================== T> --- sys/sys/socket.h (.../head) (revision 266804) T> +++ sys/sys/socket.h (.../projects/sendfile) (revision 266807) T> @@ -602,12 +602,15 @@ struct sf_hdtr_all { T> * Sendfile-specific flag(s) T> */ T> #define SF_NODISKIO 0x00000001 T> -#define SF_MNOWAIT 0x00000002 T> +#define SF_MNOWAIT 0x00000002 /* unused since 11.0 */ T> #define SF_SYNC 0x00000004 T> #define SF_KQUEUE 0x00000008 T> +#define SF_NOCACHE 0x00000010 T> +#define SF_FLAGS(rh, flags) (((rh) << 16) | (flags)) T> T> #ifdef _KERNEL T> #define SFK_COMPAT 0x00000001 T> +#define SF_READAHEAD(flags) ((flags) >> 16) T> #endif /* _KERNEL */ T> #endif /* __BSD_VISIBLE */ T> T> Index: sys/sys/sockbuf.h T> =================================================================== T> --- sys/sys/sockbuf.h (.../head) (revision 266804) T> +++ sys/sys/sockbuf.h (.../projects/sendfile) (revision 266807) T> @@ -89,8 +89,13 @@ struct sockbuf { T> struct mbuf *sb_lastrecord; /* (c/d) first mbuf of last T> * record in socket buffer */ T> struct mbuf *sb_sndptr; /* (c/d) pointer into mbuf chain */ T> + struct mbuf *sb_fnrdy; /* (c/d) pointer to first not ready buffer */ T> +#if 0 T> + struct mbuf *sb_lnrdy; /* (c/d) pointer to last not ready buffer */ T> +#endif T> u_int sb_sndptroff; /* (c/d) byte offset of ptr into chain */ T> - u_int sb_cc; /* (c/d) actual chars in buffer */ T> + u_int sb_acc; /* (c/d) available chars in buffer */ T> + u_int sb_ccc; /* (c/d) claimed chars in buffer */ T> u_int sb_hiwat; /* (c/d) max actual char count */ T> u_int sb_mbcnt; /* (c/d) chars of mbufs used */ T> u_int sb_mcnt; /* (c/d) number of mbufs in buffer */ T> @@ -120,10 +125,17 @@ struct sockbuf { T> #define SOCKBUF_LOCK_ASSERT(_sb) mtx_assert(SOCKBUF_MTX(_sb), MA_OWNED) T> #define SOCKBUF_UNLOCK_ASSERT(_sb) mtx_assert(SOCKBUF_MTX(_sb), MA_NOTOWNED) T> T> +/* T> + * Socket buffer private mbuf(9) flags. T> + */ T> +#define M_NOTREADY M_PROTO1 /* m_data not populated yet */ T> +#define M_BLOCKED M_PROTO2 /* M_NOTREADY in front of m */ T> +#define M_NOTAVAIL (M_NOTREADY | M_BLOCKED) T> + T> void sbappend(struct sockbuf *sb, struct mbuf *m); T> void sbappend_locked(struct sockbuf *sb, struct mbuf *m); T> -void sbappendstream(struct sockbuf *sb, struct mbuf *m); T> -void sbappendstream_locked(struct sockbuf *sb, struct mbuf *m); T> +void sbappendstream(struct sockbuf *sb, struct mbuf *m, int flags); T> +void sbappendstream_locked(struct sockbuf *sb, struct mbuf *m, int flags); T> int sbappendaddr(struct sockbuf *sb, const struct sockaddr *asa, T> struct mbuf *m0, struct mbuf *control); T> int sbappendaddr_locked(struct sockbuf *sb, const struct sockaddr *asa, T> @@ -136,7 +148,6 @@ int sbappendcontrol_locked(struct sockbuf *sb, str T> struct mbuf *control); T> void sbappendrecord(struct sockbuf *sb, struct mbuf *m0); T> void sbappendrecord_locked(struct sockbuf *sb, struct mbuf *m0); T> -void sbcheck(struct sockbuf *sb); T> void sbcompress(struct sockbuf *sb, struct mbuf *m, struct mbuf *n); T> struct mbuf * T> sbcreatecontrol(caddr_t p, int size, int type, int level); T> @@ -162,59 +173,54 @@ void sbtoxsockbuf(struct sockbuf *sb, struct xsock T> int sbwait(struct sockbuf *sb); T> int sblock(struct sockbuf *sb, int flags); T> void sbunlock(struct sockbuf *sb); T> +void sballoc(struct sockbuf *, struct mbuf *); T> +void sbfree(struct sockbuf *, struct mbuf *); T> +void sbmtrim(struct sockbuf *, struct mbuf *, int); T> +int sbready(struct sockbuf *, struct mbuf *, int); T> T> +static inline u_int T> +sbavail(struct sockbuf *sb) T> +{ T> + T> +#if 0 T> + SOCKBUF_LOCK_ASSERT(sb); T> +#endif T> + return (sb->sb_acc); T> +} T> + T> +static inline u_int T> +sbused(struct sockbuf *sb) T> +{ T> + T> +#if 0 T> + SOCKBUF_LOCK_ASSERT(sb); T> +#endif T> + return (sb->sb_ccc); T> +} T> + T> /* T> * How much space is there in a socket buffer (so->so_snd or so->so_rcv)? T> * This is problematical if the fields are unsigned, as the space might T> - * still be negative (cc > hiwat or mbcnt > mbmax). Should detect T> - * overflow and return 0. Should use "lmin" but it doesn't exist now. T> + * still be negative (ccc > hiwat or mbcnt > mbmax). T> */ T> -static __inline T> -long T> +static inline long T> sbspace(struct sockbuf *sb) T> { T> - long bleft; T> - long mleft; T> + long bleft, mleft; T> T> +#if 0 T> + SOCKBUF_LOCK_ASSERT(sb); T> +#endif T> + T> if (sb->sb_flags & SB_STOP) T> return(0); T> - bleft = sb->sb_hiwat - sb->sb_cc; T> + T> + bleft = sb->sb_hiwat - sb->sb_ccc; T> mleft = sb->sb_mbmax - sb->sb_mbcnt; T> - return((bleft < mleft) ? bleft : mleft); T> -} T> T> -/* adjust counters in sb reflecting allocation of m */ T> -#define sballoc(sb, m) { \ T> - (sb)->sb_cc += (m)->m_len; \ T> - if ((m)->m_type != MT_DATA && (m)->m_type != MT_OOBDATA) \ T> - (sb)->sb_ctl += (m)->m_len; \ T> - (sb)->sb_mbcnt += MSIZE; \ T> - (sb)->sb_mcnt += 1; \ T> - if ((m)->m_flags & M_EXT) { \ T> - (sb)->sb_mbcnt += (m)->m_ext.ext_size; \ T> - (sb)->sb_ccnt += 1; \ T> - } \ T> + return ((bleft < mleft) ? bleft : mleft); T> } T> T> -/* adjust counters in sb reflecting freeing of m */ T> -#define sbfree(sb, m) { \ T> - (sb)->sb_cc -= (m)->m_len; \ T> - if ((m)->m_type != MT_DATA && (m)->m_type != MT_OOBDATA) \ T> - (sb)->sb_ctl -= (m)->m_len; \ T> - (sb)->sb_mbcnt -= MSIZE; \ T> - (sb)->sb_mcnt -= 1; \ T> - if ((m)->m_flags & M_EXT) { \ T> - (sb)->sb_mbcnt -= (m)->m_ext.ext_size; \ T> - (sb)->sb_ccnt -= 1; \ T> - } \ T> - if ((sb)->sb_sndptr == (m)) { \ T> - (sb)->sb_sndptr = NULL; \ T> - (sb)->sb_sndptroff = 0; \ T> - } \ T> - if ((sb)->sb_sndptroff != 0) \ T> - (sb)->sb_sndptroff -= (m)->m_len; \ T> -} T> - T> #define SB_EMPTY_FIXUP(sb) do { \ T> if ((sb)->sb_mb == NULL) { \ T> (sb)->sb_mbtail = NULL; \ T> @@ -224,13 +230,15 @@ sbspace(struct sockbuf *sb) T> T> #ifdef SOCKBUF_DEBUG T> void sblastrecordchk(struct sockbuf *, const char *, int); T> +void sblastmbufchk(struct sockbuf *, const char *, int); T> +void sbcheck(struct sockbuf *, const char *, int); T> #define SBLASTRECORDCHK(sb) sblastrecordchk((sb), __FILE__, __LINE__) T> - T> -void sblastmbufchk(struct sockbuf *, const char *, int); T> #define SBLASTMBUFCHK(sb) sblastmbufchk((sb), __FILE__, __LINE__) T> +#define SBCHECK(sb) sbcheck((sb), __FILE__, __LINE__) T> #else T> -#define SBLASTRECORDCHK(sb) /* nothing */ T> -#define SBLASTMBUFCHK(sb) /* nothing */ T> +#define SBLASTRECORDCHK(sb) do {} while (0) T> +#define SBLASTMBUFCHK(sb) do {} while (0) T> +#define SBCHECK(sb) do {} while (0) T> #endif /* SOCKBUF_DEBUG */ T> T> #endif /* _KERNEL */ T> Index: sys/sys/protosw.h T> =================================================================== T> --- sys/sys/protosw.h (.../head) (revision 266804) T> +++ sys/sys/protosw.h (.../projects/sendfile) (revision 266807) T> @@ -209,6 +209,7 @@ struct pr_usrreqs { T> #define PRUS_OOB 0x1 T> #define PRUS_EOF 0x2 T> #define PRUS_MORETOCOME 0x4 T> +#define PRUS_NOTREADY 0x8 T> int (*pru_sense)(struct socket *so, struct stat *sb); T> int (*pru_shutdown)(struct socket *so); T> int (*pru_flush)(struct socket *so, int direction); T> Index: sys/sys/sf_buf.h T> =================================================================== T> --- sys/sys/sf_buf.h (.../head) (revision 266804) T> +++ sys/sys/sf_buf.h (.../projects/sendfile) (revision 266807) T> @@ -52,7 +52,7 @@ struct sfstat { /* sendfile statistics */ T> #include T> #include T> #include T> -struct mbuf; /* for sf_buf_mext() */ T> +struct mbuf; /* for sf_mext_free() */ T> T> extern counter_u64_t sfstat[sizeof(struct sfstat) / sizeof(uint64_t)]; T> #define SFSTAT_ADD(name, val) \ T> @@ -61,6 +61,6 @@ extern counter_u64_t sfstat[sizeof(struct sfstat) T> #define SFSTAT_INC(name) SFSTAT_ADD(name, 1) T> #endif /* _KERNEL */ T> T> -int sf_buf_mext(struct mbuf *mb, void *addr, void *args); T> +int sf_mext_free(struct mbuf *mb, void *addr, void *args); T> T> #endif /* !_SYS_SF_BUF_H_ */ T> Index: sys/sys/vnode.h T> =================================================================== T> --- sys/sys/vnode.h (.../head) (revision 266804) T> +++ sys/sys/vnode.h (.../projects/sendfile) (revision 266807) T> @@ -719,6 +719,7 @@ int vop_stdbmap(struct vop_bmap_args *); T> int vop_stdfsync(struct vop_fsync_args *); T> int vop_stdgetwritemount(struct vop_getwritemount_args *); T> int vop_stdgetpages(struct vop_getpages_args *); T> +int vop_stdgetpages_async(struct vop_getpages_async_args *); T> int vop_stdinactive(struct vop_inactive_args *); T> int vop_stdislocked(struct vop_islocked_args *); T> int vop_stdkqfilter(struct vop_kqfilter_args *); T> Index: sys/sys/socketvar.h T> =================================================================== T> --- sys/sys/socketvar.h (.../head) (revision 266804) T> +++ sys/sys/socketvar.h (.../projects/sendfile) (revision 266807) T> @@ -205,7 +205,7 @@ struct xsocket { T> T> /* can we read something from so? */ T> #define soreadabledata(so) \ T> - ((so)->so_rcv.sb_cc >= (so)->so_rcv.sb_lowat || \ T> + (sbavail(&(so)->so_rcv) >= (so)->so_rcv.sb_lowat || \ T> !TAILQ_EMPTY(&(so)->so_comp) || (so)->so_error) T> #define soreadable(so) \ T> (soreadabledata(so) || ((so)->so_rcv.sb_state & SBS_CANTRCVMORE)) T> Index: sys/sys/mbuf.h T> =================================================================== T> --- sys/sys/mbuf.h (.../head) (revision 266804) T> +++ sys/sys/mbuf.h (.../projects/sendfile) (revision 266807) T> @@ -922,7 +922,7 @@ struct mbuf *m_copypacket(struct mbuf *, int); T> void m_copy_pkthdr(struct mbuf *, struct mbuf *); T> struct mbuf *m_copyup(struct mbuf *, int, int); T> struct mbuf *m_defrag(struct mbuf *, int); T> -void m_demote(struct mbuf *, int); T> +void m_demote(struct mbuf *, int, int); T> struct mbuf *m_devget(char *, int, int, struct ifnet *, T> void (*)(char *, caddr_t, u_int)); T> struct mbuf *m_dup(struct mbuf *, int); T> Index: sys/vm/vnode_pager.h T> =================================================================== T> --- sys/vm/vnode_pager.h (.../head) (revision 266804) T> +++ sys/vm/vnode_pager.h (.../projects/sendfile) (revision 266807) T> @@ -41,7 +41,7 @@ T> #ifdef _KERNEL T> T> int vnode_pager_generic_getpages(struct vnode *vp, vm_page_t *m, T> - int count, int reqpage); T> + int count, int reqpage, void (*iodone)(void *), void *arg); T> int vnode_pager_generic_putpages(struct vnode *vp, vm_page_t *m, T> int count, boolean_t sync, T> int *rtvals); T> Index: sys/vm/vm_pager.h T> =================================================================== T> --- sys/vm/vm_pager.h (.../head) (revision 266804) T> +++ sys/vm/vm_pager.h (.../projects/sendfile) (revision 266807) T> @@ -51,18 +51,21 @@ typedef vm_object_t pgo_alloc_t(void *, vm_ooffset T> struct ucred *); T> typedef void pgo_dealloc_t(vm_object_t); T> typedef int pgo_getpages_t(vm_object_t, vm_page_t *, int, int); T> +typedef int pgo_getpages_async_t(vm_object_t, vm_page_t *, int, int, T> + void(*)(void *), void *); T> typedef void pgo_putpages_t(vm_object_t, vm_page_t *, int, int, int *); T> typedef boolean_t pgo_haspage_t(vm_object_t, vm_pindex_t, int *, int *); T> typedef void pgo_pageunswapped_t(vm_page_t); T> T> struct pagerops { T> - pgo_init_t *pgo_init; /* Initialize pager. */ T> - pgo_alloc_t *pgo_alloc; /* Allocate pager. */ T> - pgo_dealloc_t *pgo_dealloc; /* Disassociate. */ T> - pgo_getpages_t *pgo_getpages; /* Get (read) page. */ T> - pgo_putpages_t *pgo_putpages; /* Put (write) page. */ T> - pgo_haspage_t *pgo_haspage; /* Does pager have page? */ T> - pgo_pageunswapped_t *pgo_pageunswapped; T> + pgo_init_t *pgo_init; /* Initialize pager. */ T> + pgo_alloc_t *pgo_alloc; /* Allocate pager. */ T> + pgo_dealloc_t *pgo_dealloc; /* Disassociate. */ T> + pgo_getpages_t *pgo_getpages; /* Get (read) page. */ T> + pgo_getpages_async_t *pgo_getpages_async; /* Get page asyncly. */ T> + pgo_putpages_t *pgo_putpages; /* Put (write) page. */ T> + pgo_haspage_t *pgo_haspage; /* Query page. */ T> + pgo_pageunswapped_t *pgo_pageunswapped; T> }; T> T> extern struct pagerops defaultpagerops; T> @@ -103,6 +106,8 @@ vm_object_t vm_pager_allocate(objtype_t, void *, v T> void vm_pager_bufferinit(void); T> void vm_pager_deallocate(vm_object_t); T> static __inline int vm_pager_get_pages(vm_object_t, vm_page_t *, int, int); T> +static __inline int vm_pager_get_pages_async(vm_object_t, vm_page_t *, int, T> + int, void(*)(void *), void *); T> static __inline boolean_t vm_pager_has_page(vm_object_t, vm_pindex_t, int *, int *); T> void vm_pager_init(void); T> vm_object_t vm_pager_object_lookup(struct pagerlst *, void *); T> @@ -131,6 +136,27 @@ vm_pager_get_pages( T> return (r); T> } T> T> +static __inline int T> +vm_pager_get_pages_async(vm_object_t object, vm_page_t *m, int count, T> + int reqpage, void (*iodone)(void *), void *arg) T> +{ T> + int r; T> + T> + VM_OBJECT_ASSERT_WLOCKED(object); T> + T> + if (*pagertab[object->type]->pgo_getpages_async == NULL) { T> + /* Emulate async operation. */ T> + r = vm_pager_get_pages(object, m, count, reqpage); T> + VM_OBJECT_WUNLOCK(object); T> + (iodone)(arg); T> + VM_OBJECT_WLOCK(object); T> + } else T> + r = (*pagertab[object->type]->pgo_getpages_async)(object, m, T> + count, reqpage, iodone, arg); T> + T> + return (r); T> +} T> + T> static __inline void T> vm_pager_put_pages( T> vm_object_t object, T> Index: sys/vm/vm_page.c T> =================================================================== T> --- sys/vm/vm_page.c (.../head) (revision 266804) T> +++ sys/vm/vm_page.c (.../projects/sendfile) (revision 266807) T> @@ -2689,6 +2689,8 @@ retrylookup: T> sleep = (allocflags & VM_ALLOC_IGN_SBUSY) != 0 ? T> vm_page_xbusied(m) : vm_page_busied(m); T> if (sleep) { T> + if (allocflags & VM_ALLOC_NOWAIT) T> + return (NULL); T> /* T> * Reference the page before unlocking and T> * sleeping so that the page daemon is less T> @@ -2716,6 +2718,8 @@ retrylookup: T> } T> m = vm_page_alloc(object, pindex, allocflags & ~VM_ALLOC_IGN_SBUSY); T> if (m == NULL) { T> + if (allocflags & VM_ALLOC_NOWAIT) T> + return (NULL); T> VM_OBJECT_WUNLOCK(object); T> VM_WAIT; T> VM_OBJECT_WLOCK(object); T> Index: sys/vm/vm_page.h T> =================================================================== T> --- sys/vm/vm_page.h (.../head) (revision 266804) T> +++ sys/vm/vm_page.h (.../projects/sendfile) (revision 266807) T> @@ -390,6 +390,7 @@ vm_page_t PHYS_TO_VM_PAGE(vm_paddr_t pa); T> #define VM_ALLOC_IGN_SBUSY 0x1000 /* vm_page_grab() only */ T> #define VM_ALLOC_NODUMP 0x2000 /* don't include in dump */ T> #define VM_ALLOC_SBUSY 0x4000 /* Shared busy the page */ T> +#define VM_ALLOC_NOWAIT 0x8000 /* Return NULL instead of sleeping */ T> T> #define VM_ALLOC_COUNT_SHIFT 16 T> #define VM_ALLOC_COUNT(count) ((count) << VM_ALLOC_COUNT_SHIFT) T> Index: sys/vm/vnode_pager.c T> =================================================================== T> --- sys/vm/vnode_pager.c (.../head) (revision 266804) T> +++ sys/vm/vnode_pager.c (.../projects/sendfile) (revision 266807) T> @@ -83,6 +83,8 @@ static int vnode_pager_input_smlfs(vm_object_t obj T> static int vnode_pager_input_old(vm_object_t object, vm_page_t m); T> static void vnode_pager_dealloc(vm_object_t); T> static int vnode_pager_getpages(vm_object_t, vm_page_t *, int, int); T> +static int vnode_pager_getpages_async(vm_object_t, vm_page_t *, int, int, T> + void(*)(void *), void *); T> static void vnode_pager_putpages(vm_object_t, vm_page_t *, int, boolean_t, int *); T> static boolean_t vnode_pager_haspage(vm_object_t, vm_pindex_t, int *, int *); T> static vm_object_t vnode_pager_alloc(void *, vm_ooffset_t, vm_prot_t, T> @@ -92,6 +94,7 @@ struct pagerops vnodepagerops = { T> .pgo_alloc = vnode_pager_alloc, T> .pgo_dealloc = vnode_pager_dealloc, T> .pgo_getpages = vnode_pager_getpages, T> + .pgo_getpages_async = vnode_pager_getpages_async, T> .pgo_putpages = vnode_pager_putpages, T> .pgo_haspage = vnode_pager_haspage, T> }; T> @@ -664,6 +667,40 @@ vnode_pager_getpages(vm_object_t object, vm_page_t T> return rtval; T> } T> T> +static int T> +vnode_pager_getpages_async(vm_object_t object, vm_page_t *m, int count, T> + int reqpage, void (*iodone)(void *), void *arg) T> +{ T> + int rtval; T> + struct vnode *vp; T> + int bytes = count * PAGE_SIZE; T> + T> + vp = object->handle; T> + VM_OBJECT_WUNLOCK(object); T> + rtval = VOP_GETPAGES_ASYNC(vp, m, bytes, reqpage, 0, iodone, arg); T> + KASSERT(rtval != EOPNOTSUPP, T> + ("vnode_pager: FS getpages_async not implemented\n")); T> + VM_OBJECT_WLOCK(object); T> + return rtval; T> +} T> + T> +struct getpages_softc { T> + vm_page_t *m; T> + struct buf *bp; T> + vm_object_t object; T> + vm_offset_t kva; T> + off_t foff; T> + int size; T> + int count; T> + int unmapped; T> + int reqpage; T> + void (*iodone)(void *); T> + void *arg; T> +}; T> + T> +int vnode_pager_generic_getpages_done(struct getpages_softc *); T> +void vnode_pager_generic_getpages_done_async(struct buf *); T> + T> /* T> * This is now called from local media FS's to operate against their T> * own vnodes if they fail to implement VOP_GETPAGES. T> @@ -670,11 +707,11 @@ vnode_pager_getpages(vm_object_t object, vm_page_t T> */ T> int T> vnode_pager_generic_getpages(struct vnode *vp, vm_page_t *m, int bytecount, T> - int reqpage) T> + int reqpage, void (*iodone)(void *), void *arg) T> { T> vm_object_t object; T> vm_offset_t kva; T> - off_t foff, tfoff, nextoff; T> + off_t foff; T> int i, j, size, bsize, first; T> daddr_t firstaddr, reqblock; T> struct bufobj *bo; T> @@ -684,6 +721,7 @@ vnode_pager_generic_getpages(struct vnode *vp, vm_ T> struct mount *mp; T> int count; T> int error; T> + int unmapped; T> T> object = vp->v_object; T> count = bytecount / PAGE_SIZE; T> @@ -891,8 +929,8 @@ vnode_pager_generic_getpages(struct vnode *vp, vm_ T> * requires mapped buffers. T> */ T> mp = vp->v_mount; T> - if (mp != NULL && (mp->mnt_kern_flag & MNTK_UNMAPPED_BUFS) != 0 && T> - unmapped_buf_allowed) { T> + unmapped = (mp != NULL && (mp->mnt_kern_flag & MNTK_UNMAPPED_BUFS)); T> + if (unmapped && unmapped_buf_allowed) { T> bp->b_data = unmapped_buf; T> bp->b_kvabase = unmapped_buf; T> bp->b_offset = 0; T> @@ -905,7 +943,6 @@ vnode_pager_generic_getpages(struct vnode *vp, vm_ T> T> /* build a minimal buffer header */ T> bp->b_iocmd = BIO_READ; T> - bp->b_iodone = bdone; T> KASSERT(bp->b_rcred == NOCRED, ("leaking read ucred")); T> KASSERT(bp->b_wcred == NOCRED, ("leaking write ucred")); T> bp->b_rcred = crhold(curthread->td_ucred); T> @@ -923,10 +960,88 @@ vnode_pager_generic_getpages(struct vnode *vp, vm_ T> T> /* do the input */ T> bp->b_iooffset = dbtob(bp->b_blkno); T> - bstrategy(bp); T> T> - bwait(bp, PVM, "vnread"); T> + if (iodone) { /* async */ T> + struct getpages_softc *sc; T> T> + sc = malloc(sizeof(*sc), M_TEMP, M_WAITOK); T> + T> + sc->m = m; T> + sc->bp = bp; T> + sc->object = object; T> + sc->foff = foff; T> + sc->size = size; T> + sc->count = count; T> + sc->unmapped = unmapped; T> + sc->reqpage = reqpage; T> + sc->kva = kva; T> + T> + sc->iodone = iodone; T> + sc->arg = arg; T> + T> + bp->b_iodone = vnode_pager_generic_getpages_done_async; T> + bp->b_caller1 = sc; T> + BUF_KERNPROC(bp); T> + bstrategy(bp); T> + /* Good bye! */ T> + } else { T> + struct getpages_softc sc; T> + T> + sc.m = m; T> + sc.bp = bp; T> + sc.object = object; T> + sc.foff = foff; T> + sc.size = size; T> + sc.count = count; T> + sc.unmapped = unmapped; T> + sc.reqpage = reqpage; T> + sc.kva = kva; T> + T> + bp->b_iodone = bdone; T> + bstrategy(bp); T> + bwait(bp, PVM, "vnread"); T> + error = vnode_pager_generic_getpages_done(&sc); T> + } T> + T> + return (error ? VM_PAGER_ERROR : VM_PAGER_OK); T> +} T> + T> +void T> +vnode_pager_generic_getpages_done_async(struct buf *bp) T> +{ T> + struct getpages_softc *sc = bp->b_caller1; T> + int error; T> + T> + error = vnode_pager_generic_getpages_done(sc); T> + T> + vm_page_xunbusy(sc->m[sc->reqpage]); T> + T> + sc->iodone(sc->arg); T> + T> + free(sc, M_TEMP); T> +} T> + T> +int T> +vnode_pager_generic_getpages_done(struct getpages_softc *sc) T> +{ T> + vm_object_t object; T> + vm_offset_t kva; T> + vm_page_t *m; T> + struct buf *bp; T> + off_t foff, tfoff, nextoff; T> + int i, size, count, unmapped, reqpage; T> + int error = 0; T> + T> + m = sc->m; T> + bp = sc->bp; T> + object = sc->object; T> + foff = sc->foff; T> + size = sc->size; T> + count = sc->count; T> + unmapped = sc->unmapped; T> + reqpage = sc->reqpage; T> + kva = sc->kva; T> + T> if ((bp->b_ioflags & BIO_ERROR) != 0) T> error = EIO; T> T> @@ -939,7 +1054,7 @@ vnode_pager_generic_getpages(struct vnode *vp, vm_ T> } T> if ((bp->b_flags & B_UNMAPPED) == 0) T> pmap_qremove(kva, count); T> - if (mp != NULL && (mp->mnt_kern_flag & MNTK_UNMAPPED_BUFS) != 0) { T> + if (unmapped) { T> bp->b_data = (caddr_t)kva; T> bp->b_kvabase = (caddr_t)kva; T> bp->b_flags &= ~B_UNMAPPED; T> @@ -995,7 +1110,8 @@ vnode_pager_generic_getpages(struct vnode *vp, vm_ T> if (error) { T> printf("vnode_pager_getpages: I/O read error\n"); T> } T> - return (error ? VM_PAGER_ERROR : VM_PAGER_OK); T> + T> + return (error); T> } T> T> /* T> Index: sys/rpc/clnt_vc.c T> =================================================================== T> --- sys/rpc/clnt_vc.c (.../head) (revision 266804) T> +++ sys/rpc/clnt_vc.c (.../projects/sendfile) (revision 266807) T> @@ -860,7 +860,7 @@ clnt_vc_soupcall(struct socket *so, void *arg, int T> * error condition T> */ T> do_read = FALSE; T> - if (so->so_rcv.sb_cc >= sizeof(uint32_t) T> + if (sbavail(&so->so_rcv) >= sizeof(uint32_t) T> || (so->so_rcv.sb_state & SBS_CANTRCVMORE) T> || so->so_error) T> do_read = TRUE; T> @@ -913,7 +913,7 @@ clnt_vc_soupcall(struct socket *so, void *arg, int T> * buffered. T> */ T> do_read = FALSE; T> - if (so->so_rcv.sb_cc >= ct->ct_record_resid T> + if (sbavail(&so->so_rcv) >= ct->ct_record_resid T> || (so->so_rcv.sb_state & SBS_CANTRCVMORE) T> || so->so_error) T> do_read = TRUE; T> Index: sys/rpc/svc_vc.c T> =================================================================== T> --- sys/rpc/svc_vc.c (.../head) (revision 266804) T> +++ sys/rpc/svc_vc.c (.../projects/sendfile) (revision 266807) T> @@ -546,7 +546,7 @@ svc_vc_ack(SVCXPRT *xprt, uint32_t *ack) T> { T> T> *ack = atomic_load_acq_32(&xprt->xp_snt_cnt); T> - *ack -= xprt->xp_socket->so_snd.sb_cc; T> + *ack -= sbused(&xprt->xp_socket->so_snd); T> return (TRUE); T> } T> T> Index: sys/ufs/ffs/ffs_vnops.c T> =================================================================== T> --- sys/ufs/ffs/ffs_vnops.c (.../head) (revision 266804) T> +++ sys/ufs/ffs/ffs_vnops.c (.../projects/sendfile) (revision 266807) T> @@ -105,6 +105,7 @@ extern int ffs_rawread(struct vnode *vp, struct ui T> static vop_fsync_t ffs_fsync; T> static vop_lock1_t ffs_lock; T> static vop_getpages_t ffs_getpages; T> +static vop_getpages_async_t ffs_getpages_async; T> static vop_read_t ffs_read; T> static vop_write_t ffs_write; T> static int ffs_extread(struct vnode *vp, struct uio *uio, int ioflag); T> @@ -125,6 +126,7 @@ struct vop_vector ffs_vnodeops1 = { T> .vop_default = &ufs_vnodeops, T> .vop_fsync = ffs_fsync, T> .vop_getpages = ffs_getpages, T> + .vop_getpages_async = ffs_getpages_async, T> .vop_lock1 = ffs_lock, T> .vop_read = ffs_read, T> .vop_reallocblks = ffs_reallocblks, T> @@ -847,18 +849,16 @@ ffs_write(ap) T> } T> T> /* T> - * get page routine T> + * Get page routines. T> */ T> static int T> -ffs_getpages(ap) T> - struct vop_getpages_args *ap; T> +ffs_getpages_checkvalid(vm_page_t *m, int count, int reqpage) T> { T> - int i; T> vm_page_t mreq; T> int pcount; T> T> - pcount = round_page(ap->a_count) / PAGE_SIZE; T> - mreq = ap->a_m[ap->a_reqpage]; T> + pcount = round_page(count) / PAGE_SIZE; T> + mreq = m[reqpage]; T> T> /* T> * if ANY DEV_BSIZE blocks are valid on a large filesystem block, T> @@ -870,24 +870,48 @@ static int T> if (mreq->valid) { T> if (mreq->valid != VM_PAGE_BITS_ALL) T> vm_page_zero_invalid(mreq, TRUE); T> - for (i = 0; i < pcount; i++) { T> - if (i != ap->a_reqpage) { T> - vm_page_lock(ap->a_m[i]); T> - vm_page_free(ap->a_m[i]); T> - vm_page_unlock(ap->a_m[i]); T> + for (int i = 0; i < pcount; i++) { T> + if (i != reqpage) { T> + vm_page_lock(m[i]); T> + vm_page_free(m[i]); T> + vm_page_unlock(m[i]); T> } T> } T> VM_OBJECT_WUNLOCK(mreq->object); T> - return VM_PAGER_OK; T> + return (VM_PAGER_OK); T> } T> VM_OBJECT_WUNLOCK(mreq->object); T> T> - return vnode_pager_generic_getpages(ap->a_vp, ap->a_m, T> - ap->a_count, T> - ap->a_reqpage); T> + return (-1); T> } T> T> +static int T> +ffs_getpages(struct vop_getpages_args *ap) T> +{ T> + int rv; T> T> + rv = ffs_getpages_checkvalid(ap->a_m, ap->a_count, ap->a_reqpage); T> + if (rv == VM_PAGER_OK) T> + return (rv); T> + T> + return (vnode_pager_generic_getpages(ap->a_vp, ap->a_m, ap->a_count, T> + ap->a_reqpage, NULL, NULL)); T> +} T> + T> +static int T> +ffs_getpages_async(struct vop_getpages_async_args *ap) T> +{ T> + int rv; T> + T> + rv = ffs_getpages_checkvalid(ap->a_m, ap->a_count, ap->a_reqpage); T> + if (rv == VM_PAGER_OK) { T> + (ap->a_vop_getpages_iodone)(ap->a_arg); T> + return (rv); T> + } T> + return (vnode_pager_generic_getpages(ap->a_vp, ap->a_m, ap->a_count, T> + ap->a_reqpage, ap->a_vop_getpages_iodone, ap->a_arg)); T> +} T> + T> /* T> * Extended attribute area reading. T> */ T> Index: sys/tools/vnode_if.awk T> =================================================================== T> --- sys/tools/vnode_if.awk (.../head) (revision 266804) T> +++ sys/tools/vnode_if.awk (.../projects/sendfile) (revision 266807) T> @@ -254,16 +254,26 @@ while ((getline < srcfile) > 0) { T> if (sub(/;$/, "") < 1) T> die("Missing end-of-line ; in \"%s\".", $0); T> T> - # pick off variable name T> - if ((argp = match($0, /[A-Za-z0-9_]+$/)) < 1) T> - die("Missing var name \"a_foo\" in \"%s\".", $0); T> - args[numargs] = substr($0, argp); T> - $0 = substr($0, 1, argp - 1); T> - T> - # what is left must be type T> - # remove trailing space (if any) T> - sub(/ $/, ""); T> - types[numargs] = $0; T> + # pick off argument name T> + if ((argp = match($0, /[A-Za-z0-9_]+$/)) > 0) { T> + args[numargs] = substr($0, argp); T> + $0 = substr($0, 1, argp - 1); T> + sub(/ $/, ""); T> + delete fargs[numargs]; T> + types[numargs] = $0; T> + } else { # try to parse a function pointer argument T> + if ((argp = match($0, T> + /\(\*[A-Za-z0-9_]+\)\([A-Za-z0-9_*, ]+\)$/)) < 1) T> + die("Missing var name \"a_foo\" in \"%s\".", T> + $0); T> + args[numargs] = substr($0, argp + 2); T> + sub(/\).+/, "", args[numargs]); T> + fargs[numargs] = substr($0, argp); T> + sub(/^\([^)]+\)/, "", fargs[numargs]); T> + $0 = substr($0, 1, argp - 1); T> + sub(/ $/, ""); T> + types[numargs] = $0; T> + } T> } T> if (numargs > 4) T> ctrargs = 4; T> @@ -286,8 +296,13 @@ while ((getline < srcfile) > 0) { T> if (hfile) { T> # Print out the vop_F_args structure. T> printh("struct "name"_args {\n\tstruct vop_generic_args a_gen;"); T> - for (i = 0; i < numargs; ++i) T> - printh("\t" t_spc(types[i]) "a_" args[i] ";"); T> + for (i = 0; i < numargs; ++i) { T> + if (fargs[i]) { T> + printh("\t" t_spc(types[i]) "(*a_" args[i] \ T> + ")" fargs[i] ";"); T> + } else T> + printh("\t" t_spc(types[i]) "a_" args[i] ";"); T> + } T> printh("};"); T> printh(""); T> T> @@ -301,8 +316,14 @@ while ((getline < srcfile) > 0) { T> printh(""); T> printh("static __inline int " uname "("); T> for (i = 0; i < numargs; ++i) { T> - printh("\t" t_spc(types[i]) args[i] \ T> - (i < numargs - 1 ? "," : ")")); T> + if (fargs[i]) { T> + printh("\t" t_spc(types[i]) "(*" args[i] \ T> + ")" fargs[i] \ T> + (i < numargs - 1 ? "," : ")")); T> + } else { T> + printh("\t" t_spc(types[i]) args[i] \ T> + (i < numargs - 1 ? "," : ")")); T> + } T> } T> printh("{"); T> printh("\tstruct " name "_args a;"); T> Index: sys/netinet/tcp_reass.c T> =================================================================== T> --- sys/netinet/tcp_reass.c (.../head) (revision 266804) T> +++ sys/netinet/tcp_reass.c (.../projects/sendfile) (revision 266807) T> @@ -248,7 +248,7 @@ present: T> m_freem(mq); T> else { T> mq->m_nextpkt = NULL; T> - sbappendstream_locked(&so->so_rcv, mq); T> + sbappendstream_locked(&so->so_rcv, mq, 0); T> wakeup = 1; T> } T> } T> Index: sys/netinet/accf_http.c T> =================================================================== T> --- sys/netinet/accf_http.c (.../head) (revision 266804) T> +++ sys/netinet/accf_http.c (.../projects/sendfile) (revision 266807) T> @@ -92,7 +92,7 @@ sbfull(struct sockbuf *sb) T> "mbcnt(%ld) >= mbmax(%ld): %d", T> sb->sb_cc, sb->sb_hiwat, sb->sb_cc >= sb->sb_hiwat, T> sb->sb_mbcnt, sb->sb_mbmax, sb->sb_mbcnt >= sb->sb_mbmax); T> - return (sb->sb_cc >= sb->sb_hiwat || sb->sb_mbcnt >= sb->sb_mbmax); T> + return (sbused(sb) >= sb->sb_hiwat || sb->sb_mbcnt >= sb->sb_mbmax); T> } T> T> /* T> @@ -162,13 +162,14 @@ static int T> sohashttpget(struct socket *so, void *arg, int waitflag) T> { T> T> - if ((so->so_rcv.sb_state & SBS_CANTRCVMORE) == 0 && !sbfull(&so->so_rcv)) { T> + if ((so->so_rcv.sb_state & SBS_CANTRCVMORE) == 0 && T> + !sbfull(&so->so_rcv)) { T> struct mbuf *m; T> char *cmp; T> int cmplen, cc; T> T> m = so->so_rcv.sb_mb; T> - cc = so->so_rcv.sb_cc - 1; T> + cc = sbavail(&so->so_rcv) - 1; T> if (cc < 1) T> return (SU_OK); T> switch (*mtod(m, char *)) { T> @@ -215,7 +216,7 @@ soparsehttpvers(struct socket *so, void *arg, int T> goto fallout; T> T> m = so->so_rcv.sb_mb; T> - cc = so->so_rcv.sb_cc; T> + cc = sbavail(&so->so_rcv); T> inspaces = spaces = 0; T> for (m = so->so_rcv.sb_mb; m; m = n) { T> n = m->m_nextpkt; T> @@ -304,7 +305,7 @@ soishttpconnected(struct socket *so, void *arg, in T> * have NCHRS left T> */ T> copied = 0; T> - ccleft = so->so_rcv.sb_cc; T> + ccleft = sbavail(&so->so_rcv); T> if (ccleft < NCHRS) T> goto readmore; T> a = b = c = '\0'; T> Index: sys/netinet/sctp_os_bsd.h T> =================================================================== T> --- sys/netinet/sctp_os_bsd.h (.../head) (revision 266804) T> +++ sys/netinet/sctp_os_bsd.h (.../projects/sendfile) (revision 266807) T> @@ -405,7 +405,7 @@ typedef struct callout sctp_os_timer_t; T> #define SCTP_SOWAKEUP(so) wakeup(&(so)->so_timeo) T> /* clear the socket buffer state */ T> #define SCTP_SB_CLEAR(sb) \ T> - (sb).sb_cc = 0; \ T> + (sb).sb_ccc = 0; \ T> (sb).sb_mb = NULL; \ T> (sb).sb_mbcnt = 0; T> T> Index: sys/netinet/tcp_output.c T> =================================================================== T> --- sys/netinet/tcp_output.c (.../head) (revision 266804) T> +++ sys/netinet/tcp_output.c (.../projects/sendfile) (revision 266807) T> @@ -322,7 +322,7 @@ after_sack_rexmit: T> * to send then the probe will be the FIN T> * itself. T> */ T> - if (off < so->so_snd.sb_cc) T> + if (off < sbavail(&so->so_snd)) T> flags &= ~TH_FIN; T> sendwin = 1; T> } else { T> @@ -348,7 +348,8 @@ after_sack_rexmit: T> */ T> if (sack_rxmit == 0) { T> if (sack_bytes_rxmt == 0) T> - len = ((long)ulmin(so->so_snd.sb_cc, sendwin) - off); T> + len = ((long)ulmin(sbavail(&so->so_snd), sendwin) - T> + off); T> else { T> long cwin; T> T> @@ -357,8 +358,8 @@ after_sack_rexmit: T> * sending new data, having retransmitted all the T> * data possible in the scoreboard. T> */ T> - len = ((long)ulmin(so->so_snd.sb_cc, tp->snd_wnd) T> - - off); T> + len = ((long)ulmin(sbavail(&so->so_snd), tp->snd_wnd) - T> + off); T> /* T> * Don't remove this (len > 0) check ! T> * We explicitly check for len > 0 here (although it T> @@ -457,12 +458,15 @@ after_sack_rexmit: T> * TODO: Shrink send buffer during idle periods together T> * with congestion window. Requires another timer. Has to T> * wait for upcoming tcp timer rewrite. T> + * T> + * XXXGL: should there be used sbused() or sbavail()? T> */ T> if (V_tcp_do_autosndbuf && so->so_snd.sb_flags & SB_AUTOSIZE) { T> if ((tp->snd_wnd / 4 * 5) >= so->so_snd.sb_hiwat && T> - so->so_snd.sb_cc >= (so->so_snd.sb_hiwat / 8 * 7) && T> - so->so_snd.sb_cc < V_tcp_autosndbuf_max && T> - sendwin >= (so->so_snd.sb_cc - (tp->snd_nxt - tp->snd_una))) { T> + sbused(&so->so_snd) >= (so->so_snd.sb_hiwat / 8 * 7) && T> + sbused(&so->so_snd) < V_tcp_autosndbuf_max && T> + sendwin >= (sbused(&so->so_snd) - T> + (tp->snd_nxt - tp->snd_una))) { T> if (!sbreserve_locked(&so->so_snd, T> min(so->so_snd.sb_hiwat + V_tcp_autosndbuf_inc, T> V_tcp_autosndbuf_max), so, curthread)) T> @@ -499,10 +503,11 @@ after_sack_rexmit: T> tso = 1; T> T> if (sack_rxmit) { T> - if (SEQ_LT(p->rxmit + len, tp->snd_una + so->so_snd.sb_cc)) T> + if (SEQ_LT(p->rxmit + len, tp->snd_una + sbavail(&so->so_snd))) T> flags &= ~TH_FIN; T> } else { T> - if (SEQ_LT(tp->snd_nxt + len, tp->snd_una + so->so_snd.sb_cc)) T> + if (SEQ_LT(tp->snd_nxt + len, tp->snd_una + T> + sbavail(&so->so_snd))) T> flags &= ~TH_FIN; T> } T> T> @@ -532,7 +537,7 @@ after_sack_rexmit: T> */ T> if (!(tp->t_flags & TF_MORETOCOME) && /* normal case */ T> (idle || (tp->t_flags & TF_NODELAY)) && T> - len + off >= so->so_snd.sb_cc && T> + len + off >= sbavail(&so->so_snd) && T> (tp->t_flags & TF_NOPUSH) == 0) { T> goto send; T> } T> @@ -660,7 +665,7 @@ dontupdate: T> * if window is nonzero, transmit what we can, T> * otherwise force out a byte. T> */ T> - if (so->so_snd.sb_cc && !tcp_timer_active(tp, TT_REXMT) && T> + if (sbavail(&so->so_snd) && !tcp_timer_active(tp, TT_REXMT) && T> !tcp_timer_active(tp, TT_PERSIST)) { T> tp->t_rxtshift = 0; T> tcp_setpersist(tp); T> @@ -786,7 +791,7 @@ send: T> * fractional unless the send sockbuf can T> * be emptied. T> */ T> - if (sendalot && off + len < so->so_snd.sb_cc) { T> + if (sendalot && off + len < sbavail(&so->so_snd)) { T> len -= len % (tp->t_maxopd - optlen); T> sendalot = 1; T> } T> @@ -889,7 +894,7 @@ send: T> * give data to the user when a buffer fills or T> * a PUSH comes in.) T> */ T> - if (off + len == so->so_snd.sb_cc) T> + if (off + len == sbavail(&so->so_snd)) T> flags |= TH_PUSH; T> SOCKBUF_UNLOCK(&so->so_snd); T> } else { T> Index: sys/netinet/siftr.c T> =================================================================== T> --- sys/netinet/siftr.c (.../head) (revision 266804) T> +++ sys/netinet/siftr.c (.../projects/sendfile) (revision 266807) T> @@ -781,9 +781,9 @@ siftr_siftdata(struct pkt_node *pn, struct inpcb * T> pn->flags = tp->t_flags; T> pn->rxt_length = tp->t_rxtcur; T> pn->snd_buf_hiwater = inp->inp_socket->so_snd.sb_hiwat; T> - pn->snd_buf_cc = inp->inp_socket->so_snd.sb_cc; T> + pn->snd_buf_cc = sbused(&inp->inp_socket->so_snd); T> pn->rcv_buf_hiwater = inp->inp_socket->so_rcv.sb_hiwat; T> - pn->rcv_buf_cc = inp->inp_socket->so_rcv.sb_cc; T> + pn->rcv_buf_cc = sbused(&inp->inp_socket->so_rcv); T> pn->sent_inflight_bytes = tp->snd_max - tp->snd_una; T> pn->t_segqlen = tp->t_segqlen; T> T> Index: sys/netinet/sctp_indata.c T> =================================================================== T> --- sys/netinet/sctp_indata.c (.../head) (revision 266804) T> +++ sys/netinet/sctp_indata.c (.../projects/sendfile) (revision 266807) T> @@ -70,7 +70,7 @@ sctp_calc_rwnd(struct sctp_tcb *stcb, struct sctp_ T> T> /* T> * This is really set wrong with respect to a 1-2-m socket. Since T> - * the sb_cc is the count that everyone as put up. When we re-write T> + * the sb_ccc is the count that everyone as put up. When we re-write T> * sctp_soreceive then we will fix this so that ONLY this T> * associations data is taken into account. T> */ T> @@ -77,7 +77,7 @@ sctp_calc_rwnd(struct sctp_tcb *stcb, struct sctp_ T> if (stcb->sctp_socket == NULL) T> return (calc); T> T> - if (stcb->asoc.sb_cc == 0 && T> + if (stcb->asoc.sb_ccc == 0 && T> asoc->size_on_reasm_queue == 0 && T> asoc->size_on_all_streams == 0) { T> /* Full rwnd granted */ T> @@ -1358,7 +1358,7 @@ sctp_process_a_data_chunk(struct sctp_tcb *stcb, s T> * When we have NO room in the rwnd we check to make sure T> * the reader is doing its job... T> */ T> - if (stcb->sctp_socket->so_rcv.sb_cc) { T> + if (stcb->sctp_socket->so_rcv.sb_ccc) { T> /* some to read, wake-up */ T> #if defined(__APPLE__) || defined(SCTP_SO_LOCK_TESTING) T> struct socket *so; T> Index: sys/netinet/sctp_pcb.c T> =================================================================== T> --- sys/netinet/sctp_pcb.c (.../head) (revision 266804) T> +++ sys/netinet/sctp_pcb.c (.../projects/sendfile) (revision 266807) T> @@ -3328,7 +3328,7 @@ sctp_inpcb_free(struct sctp_inpcb *inp, int immedi T> if ((asoc->asoc.size_on_reasm_queue > 0) || T> (asoc->asoc.control_pdapi) || T> (asoc->asoc.size_on_all_streams > 0) || T> - (so && (so->so_rcv.sb_cc > 0))) { T> + (so && (so->so_rcv.sb_ccc > 0))) { T> /* Left with Data unread */ T> struct mbuf *op_err; T> T> @@ -3556,7 +3556,7 @@ sctp_inpcb_free(struct sctp_inpcb *inp, int immedi T> TAILQ_REMOVE(&inp->read_queue, sq, next); T> sctp_free_remote_addr(sq->whoFrom); T> if (so) T> - so->so_rcv.sb_cc -= sq->length; T> + so->so_rcv.sb_ccc -= sq->length; T> if (sq->data) { T> sctp_m_freem(sq->data); T> sq->data = NULL; T> @@ -4775,7 +4775,7 @@ sctp_free_assoc(struct sctp_inpcb *inp, struct sct T> inp->sctp_flags |= SCTP_PCB_FLAGS_WAS_CONNECTED; T> if (so) { T> SOCK_LOCK(so); T> - if (so->so_rcv.sb_cc == 0) { T> + if (so->so_rcv.sb_ccc == 0) { T> so->so_state &= ~(SS_ISCONNECTING | T> SS_ISDISCONNECTING | T> SS_ISCONFIRMING | T> Index: sys/netinet/sctp_pcb.h T> =================================================================== T> --- sys/netinet/sctp_pcb.h (.../head) (revision 266804) T> +++ sys/netinet/sctp_pcb.h (.../projects/sendfile) (revision 266807) T> @@ -369,7 +369,7 @@ struct sctp_inpcb { T> } ip_inp; T> T> T> - /* Socket buffer lock protects read_queue and of course sb_cc */ T> + /* Socket buffer lock protects read_queue and of course sb_ccc */ T> struct sctp_readhead read_queue; T> T> LIST_ENTRY(sctp_inpcb) sctp_list; /* lists all endpoints */ T> Index: sys/netinet/sctp_usrreq.c T> =================================================================== T> --- sys/netinet/sctp_usrreq.c (.../head) (revision 266804) T> +++ sys/netinet/sctp_usrreq.c (.../projects/sendfile) (revision 266807) T> @@ -586,7 +586,7 @@ sctp_must_try_again: T> if (((flags & SCTP_PCB_FLAGS_SOCKET_GONE) == 0) && T> (atomic_cmpset_int(&inp->sctp_flags, flags, (flags | SCTP_PCB_FLAGS_SOCKET_GONE | SCTP_PCB_FLAGS_CLOSE_IP)))) { T> if (((so->so_options & SO_LINGER) && (so->so_linger == 0)) || T> - (so->so_rcv.sb_cc > 0)) { T> + (so->so_rcv.sb_ccc > 0)) { T> #ifdef SCTP_LOG_CLOSING T> sctp_log_closing(inp, NULL, 13); T> #endif T> @@ -751,7 +751,7 @@ sctp_disconnect(struct socket *so) T> } T> if (((so->so_options & SO_LINGER) && T> (so->so_linger == 0)) || T> - (so->so_rcv.sb_cc > 0)) { T> + (so->so_rcv.sb_ccc > 0)) { T> if (SCTP_GET_STATE(asoc) != T> SCTP_STATE_COOKIE_WAIT) { T> /* Left with Data unread */ T> @@ -916,7 +916,7 @@ sctp_flush(struct socket *so, int how) T> inp->sctp_flags |= SCTP_PCB_FLAGS_SOCKET_CANT_READ; T> SCTP_INP_READ_UNLOCK(inp); T> SCTP_INP_WUNLOCK(inp); T> - so->so_rcv.sb_cc = 0; T> + so->so_rcv.sb_ccc = 0; T> so->so_rcv.sb_mbcnt = 0; T> so->so_rcv.sb_mb = NULL; T> } T> @@ -925,7 +925,7 @@ sctp_flush(struct socket *so, int how) T> * First make sure the sb will be happy, we don't use these T> * except maybe the count T> */ T> - so->so_snd.sb_cc = 0; T> + so->so_snd.sb_ccc = 0; T> so->so_snd.sb_mbcnt = 0; T> so->so_snd.sb_mb = NULL; T> T> Index: sys/netinet/sctp_structs.h T> =================================================================== T> --- sys/netinet/sctp_structs.h (.../head) (revision 266804) T> +++ sys/netinet/sctp_structs.h (.../projects/sendfile) (revision 266807) T> @@ -982,7 +982,7 @@ struct sctp_association { T> T> uint32_t total_output_queue_size; T> T> - uint32_t sb_cc; /* shadow of sb_cc */ T> + uint32_t sb_ccc; /* shadow of sb_ccc */ T> uint32_t sb_send_resv; /* amount reserved on a send */ T> uint32_t my_rwnd_control_len; /* shadow of sb_mbcnt used for rwnd T> * control */ T> Index: sys/netinet/tcp_input.c T> =================================================================== T> --- sys/netinet/tcp_input.c (.../head) (revision 266804) T> +++ sys/netinet/tcp_input.c (.../projects/sendfile) (revision 266807) T> @@ -1729,7 +1729,7 @@ tcp_do_segment(struct mbuf *m, struct tcphdr *th, T> tcp_timer_activate(tp, TT_REXMT, T> tp->t_rxtcur); T> sowwakeup(so); T> - if (so->so_snd.sb_cc) T> + if (sbavail(&so->so_snd)) T> (void) tcp_output(tp); T> goto check_delack; T> } T> @@ -1837,7 +1837,7 @@ tcp_do_segment(struct mbuf *m, struct tcphdr *th, T> newsize, so, NULL)) T> so->so_rcv.sb_flags &= ~SB_AUTOSIZE; T> m_adj(m, drop_hdrlen); /* delayed header drop */ T> - sbappendstream_locked(&so->so_rcv, m); T> + sbappendstream_locked(&so->so_rcv, m, 0); T> } T> /* NB: sorwakeup_locked() does an implicit unlock. */ T> sorwakeup_locked(so); T> @@ -2541,7 +2541,7 @@ tcp_do_segment(struct mbuf *m, struct tcphdr *th, T> * Otherwise we would send pure ACKs. T> */ T> SOCKBUF_LOCK(&so->so_snd); T> - avail = so->so_snd.sb_cc - T> + avail = sbavail(&so->so_snd) - T> (tp->snd_nxt - tp->snd_una); T> SOCKBUF_UNLOCK(&so->so_snd); T> if (avail > 0) T> @@ -2676,10 +2676,10 @@ process_ACK: T> cc_ack_received(tp, th, CC_ACK); T> T> SOCKBUF_LOCK(&so->so_snd); T> - if (acked > so->so_snd.sb_cc) { T> - tp->snd_wnd -= so->so_snd.sb_cc; T> + if (acked > sbavail(&so->so_snd)) { T> + tp->snd_wnd -= sbavail(&so->so_snd); T> mfree = sbcut_locked(&so->so_snd, T> - (int)so->so_snd.sb_cc); T> + (int)sbavail(&so->so_snd)); T> ourfinisacked = 1; T> } else { T> mfree = sbcut_locked(&so->so_snd, acked); T> @@ -2805,7 +2805,7 @@ step6: T> * actually wanting to send this much urgent data. T> */ T> SOCKBUF_LOCK(&so->so_rcv); T> - if (th->th_urp + so->so_rcv.sb_cc > sb_max) { T> + if (th->th_urp + sbavail(&so->so_rcv) > sb_max) { T> th->th_urp = 0; /* XXX */ T> thflags &= ~TH_URG; /* XXX */ T> SOCKBUF_UNLOCK(&so->so_rcv); /* XXX */ T> @@ -2827,7 +2827,7 @@ step6: T> */ T> if (SEQ_GT(th->th_seq+th->th_urp, tp->rcv_up)) { T> tp->rcv_up = th->th_seq + th->th_urp; T> - so->so_oobmark = so->so_rcv.sb_cc + T> + so->so_oobmark = sbavail(&so->so_rcv) + T> (tp->rcv_up - tp->rcv_nxt) - 1; T> if (so->so_oobmark == 0) T> so->so_rcv.sb_state |= SBS_RCVATMARK; T> @@ -2897,7 +2897,7 @@ dodata: /* XXX */ T> if (so->so_rcv.sb_state & SBS_CANTRCVMORE) T> m_freem(m); T> else T> - sbappendstream_locked(&so->so_rcv, m); T> + sbappendstream_locked(&so->so_rcv, m, 0); T> /* NB: sorwakeup_locked() does an implicit unlock. */ T> sorwakeup_locked(so); T> } else { T> Index: sys/netinet/sctp_input.c T> =================================================================== T> --- sys/netinet/sctp_input.c (.../head) (revision 266804) T> +++ sys/netinet/sctp_input.c (.../projects/sendfile) (revision 266807) T> @@ -1042,7 +1042,7 @@ sctp_handle_shutdown_ack(struct sctp_shutdown_ack_ T> if (stcb->sctp_socket) { T> if ((stcb->sctp_ep->sctp_flags & SCTP_PCB_FLAGS_TCPTYPE) || T> (stcb->sctp_ep->sctp_flags & SCTP_PCB_FLAGS_IN_TCPPOOL)) { T> - stcb->sctp_socket->so_snd.sb_cc = 0; T> + stcb->sctp_socket->so_snd.sb_ccc = 0; T> } T> sctp_ulp_notify(SCTP_NOTIFY_ASSOC_DOWN, stcb, 0, NULL, SCTP_SO_NOT_LOCKED); T> } T> Index: sys/netinet/sctp_var.h T> =================================================================== T> --- sys/netinet/sctp_var.h (.../head) (revision 266804) T> +++ sys/netinet/sctp_var.h (.../projects/sendfile) (revision 266807) T> @@ -82,9 +82,9 @@ extern struct pr_usrreqs sctp_usrreqs; T> T> #define sctp_maxspace(sb) (max((sb)->sb_hiwat,SCTP_MINIMAL_RWND)) T> T> -#define sctp_sbspace(asoc, sb) ((long) ((sctp_maxspace(sb) > (asoc)->sb_cc) ? (sctp_maxspace(sb) - (asoc)->sb_cc) : 0)) T> +#define sctp_sbspace(asoc, sb) ((long) ((sctp_maxspace(sb) > (asoc)->sb_ccc) ? (sctp_maxspace(sb) - (asoc)->sb_ccc) : 0)) T> T> -#define sctp_sbspace_failedmsgs(sb) ((long) ((sctp_maxspace(sb) > (sb)->sb_cc) ? (sctp_maxspace(sb) - (sb)->sb_cc) : 0)) T> +#define sctp_sbspace_failedmsgs(sb) ((long) ((sctp_maxspace(sb) > (sb)->sb_ccc) ? (sctp_maxspace(sb) - (sb)->sb_ccc) : 0)) T> T> #define sctp_sbspace_sub(a,b) ((a > b) ? (a - b) : 0) T> T> @@ -195,10 +195,10 @@ extern struct pr_usrreqs sctp_usrreqs; T> } T> T> #define sctp_sbfree(ctl, stcb, sb, m) { \ T> - SCTP_SAVE_ATOMIC_DECREMENT(&(sb)->sb_cc, SCTP_BUF_LEN((m))); \ T> + SCTP_SAVE_ATOMIC_DECREMENT(&(sb)->sb_ccc, SCTP_BUF_LEN((m))); \ T> SCTP_SAVE_ATOMIC_DECREMENT(&(sb)->sb_mbcnt, MSIZE); \ T> if (((ctl)->do_not_ref_stcb == 0) && stcb) {\ T> - SCTP_SAVE_ATOMIC_DECREMENT(&(stcb)->asoc.sb_cc, SCTP_BUF_LEN((m))); \ T> + SCTP_SAVE_ATOMIC_DECREMENT(&(stcb)->asoc.sb_ccc, SCTP_BUF_LEN((m))); \ T> SCTP_SAVE_ATOMIC_DECREMENT(&(stcb)->asoc.my_rwnd_control_len, MSIZE); \ T> } \ T> if (SCTP_BUF_TYPE(m) != MT_DATA && SCTP_BUF_TYPE(m) != MT_HEADER && \ T> @@ -207,10 +207,10 @@ extern struct pr_usrreqs sctp_usrreqs; T> } T> T> #define sctp_sballoc(stcb, sb, m) { \ T> - atomic_add_int(&(sb)->sb_cc,SCTP_BUF_LEN((m))); \ T> + atomic_add_int(&(sb)->sb_ccc,SCTP_BUF_LEN((m))); \ T> atomic_add_int(&(sb)->sb_mbcnt, MSIZE); \ T> if (stcb) { \ T> - atomic_add_int(&(stcb)->asoc.sb_cc,SCTP_BUF_LEN((m))); \ T> + atomic_add_int(&(stcb)->asoc.sb_ccc,SCTP_BUF_LEN((m))); \ T> atomic_add_int(&(stcb)->asoc.my_rwnd_control_len, MSIZE); \ T> } \ T> if (SCTP_BUF_TYPE(m) != MT_DATA && SCTP_BUF_TYPE(m) != MT_HEADER && \ T> Index: sys/netinet/sctp_output.c T> =================================================================== T> --- sys/netinet/sctp_output.c (.../head) (revision 266804) T> +++ sys/netinet/sctp_output.c (.../projects/sendfile) (revision 266807) T> @@ -7104,7 +7104,7 @@ one_more_time: T> if ((stcb->sctp_socket != NULL) && \ T> ((stcb->sctp_ep->sctp_flags & SCTP_PCB_FLAGS_TCPTYPE) || T> (stcb->sctp_ep->sctp_flags & SCTP_PCB_FLAGS_IN_TCPPOOL))) { T> - atomic_subtract_int(&stcb->sctp_socket->so_snd.sb_cc, sp->length); T> + atomic_subtract_int(&stcb->sctp_socket->so_snd.sb_ccc, sp->length); T> } T> if (sp->data) { T> sctp_m_freem(sp->data); T> @@ -11382,7 +11382,7 @@ jump_out: T> drp->current_onq = htonl(asoc->size_on_reasm_queue + T> asoc->size_on_all_streams + T> asoc->my_rwnd_control_len + T> - stcb->sctp_socket->so_rcv.sb_cc); T> + stcb->sctp_socket->so_rcv.sb_ccc); T> } else { T> /*- T> * If my rwnd is 0, possibly from mbuf depletion as well as T> Index: sys/netinet/tcp_usrreq.c T> =================================================================== T> --- sys/netinet/tcp_usrreq.c (.../head) (revision 266804) T> +++ sys/netinet/tcp_usrreq.c (.../projects/sendfile) (revision 266807) T> @@ -826,7 +826,7 @@ tcp_usr_send(struct socket *so, int flags, struct T> m_freem(control); /* empty control, just free it */ T> } T> if (!(flags & PRUS_OOB)) { T> - sbappendstream(&so->so_snd, m); T> + sbappendstream(&so->so_snd, m, flags); T> if (nam && tp->t_state < TCPS_SYN_SENT) { T> /* T> * Do implied connect if not yet connected, T> @@ -858,7 +858,8 @@ tcp_usr_send(struct socket *so, int flags, struct T> socantsendmore(so); T> tcp_usrclosed(tp); T> } T> - if (!(inp->inp_flags & INP_DROPPED)) { T> + if (!(inp->inp_flags & INP_DROPPED) && T> + !(flags & PRUS_NOTREADY)) { T> if (flags & PRUS_MORETOCOME) T> tp->t_flags |= TF_MORETOCOME; T> error = tcp_output(tp); T> @@ -884,7 +885,7 @@ tcp_usr_send(struct socket *so, int flags, struct T> * of data past the urgent section. T> * Otherwise, snd_up should be one lower. T> */ T> - sbappendstream_locked(&so->so_snd, m); T> + sbappendstream_locked(&so->so_snd, m, flags); T> SOCKBUF_UNLOCK(&so->so_snd); T> if (nam && tp->t_state < TCPS_SYN_SENT) { T> /* T> @@ -908,10 +909,12 @@ tcp_usr_send(struct socket *so, int flags, struct T> tp->snd_wnd = TTCP_CLIENT_SND_WND; T> tcp_mss(tp, -1); T> } T> - tp->snd_up = tp->snd_una + so->so_snd.sb_cc; T> - tp->t_flags |= TF_FORCEDATA; T> - error = tcp_output(tp); T> - tp->t_flags &= ~TF_FORCEDATA; T> + tp->snd_up = tp->snd_una + sbavail(&so->so_snd); T> + if (!(flags & PRUS_NOTREADY)) { T> + tp->t_flags |= TF_FORCEDATA; T> + error = tcp_output(tp); T> + tp->t_flags &= ~TF_FORCEDATA; T> + } T> } T> out: T> TCPDEBUG2((flags & PRUS_OOB) ? PRU_SENDOOB : T> Index: sys/netinet/accf_dns.c T> =================================================================== T> --- sys/netinet/accf_dns.c (.../head) (revision 266804) T> +++ sys/netinet/accf_dns.c (.../projects/sendfile) (revision 266807) T> @@ -75,7 +75,7 @@ sohasdns(struct socket *so, void *arg, int waitfla T> struct sockbuf *sb = &so->so_rcv; T> T> /* If the socket is full, we're ready. */ T> - if (sb->sb_cc >= sb->sb_hiwat || sb->sb_mbcnt >= sb->sb_mbmax) T> + if (sbused(sb) >= sb->sb_hiwat || sb->sb_mbcnt >= sb->sb_mbmax) T> goto ready; T> T> /* Check to see if we have a request. */ T> @@ -115,7 +115,7 @@ skippacket(struct sockbuf *sb) { T> unsigned long packlen; T> struct packet q, *p = &q; T> T> - if (sb->sb_cc < 2) T> + if (sbavail(sb) < 2) T> return DNS_WAIT; T> T> q.m = sb->sb_mb; T> @@ -122,7 +122,7 @@ skippacket(struct sockbuf *sb) { T> q.n = q.m->m_nextpkt; T> q.moff = 0; T> q.offset = 0; T> - q.len = sb->sb_cc; T> + q.len = sbavail(sb); T> T> GET16(p, packlen); T> if (packlen + 2 > q.len) T> Index: sys/netinet/sctputil.c T> =================================================================== T> --- sys/netinet/sctputil.c (.../head) (revision 266804) T> +++ sys/netinet/sctputil.c (.../projects/sendfile) (revision 266807) T> @@ -67,9 +67,9 @@ sctp_sblog(struct sockbuf *sb, struct sctp_tcb *st T> struct sctp_cwnd_log sctp_clog; T> T> sctp_clog.x.sb.stcb = stcb; T> - sctp_clog.x.sb.so_sbcc = sb->sb_cc; T> + sctp_clog.x.sb.so_sbcc = sb->sb_ccc; T> if (stcb) T> - sctp_clog.x.sb.stcb_sbcc = stcb->asoc.sb_cc; T> + sctp_clog.x.sb.stcb_sbcc = stcb->asoc.sb_ccc; T> else T> sctp_clog.x.sb.stcb_sbcc = 0; T> sctp_clog.x.sb.incr = incr; T> @@ -4356,7 +4356,7 @@ sctp_add_to_readq(struct sctp_inpcb *inp, T> { T> /* T> * Here we must place the control on the end of the socket read T> - * queue AND increment sb_cc so that select will work properly on T> + * queue AND increment sb_ccc so that select will work properly on T> * read. T> */ T> struct mbuf *m, *prev = NULL; T> @@ -4482,7 +4482,7 @@ sctp_append_to_readq(struct sctp_inpcb *inp, T> * the reassembly queue. T> * T> * If PDAPI this means we need to add m to the end of the data. T> - * Increase the length in the control AND increment the sb_cc. T> + * Increase the length in the control AND increment the sb_ccc. T> * Otherwise sb is NULL and all we need to do is put it at the end T> * of the mbuf chain. T> */ T> @@ -4694,10 +4694,10 @@ sctp_free_bufspace(struct sctp_tcb *stcb, struct s T> T> if (stcb->sctp_socket && (((stcb->sctp_ep->sctp_flags & SCTP_PCB_FLAGS_IN_TCPPOOL)) || T> ((stcb->sctp_ep->sctp_flags & SCTP_PCB_FLAGS_TCPTYPE)))) { T> - if (stcb->sctp_socket->so_snd.sb_cc >= tp1->book_size) { T> - stcb->sctp_socket->so_snd.sb_cc -= tp1->book_size; T> + if (stcb->sctp_socket->so_snd.sb_ccc >= tp1->book_size) { T> + stcb->sctp_socket->so_snd.sb_ccc -= tp1->book_size; T> } else { T> - stcb->sctp_socket->so_snd.sb_cc = 0; T> + stcb->sctp_socket->so_snd.sb_ccc = 0; T> T> } T> } T> @@ -5232,11 +5232,11 @@ sctp_sorecvmsg(struct socket *so, T> in_eeor_mode = sctp_is_feature_on(inp, SCTP_PCB_FLAGS_EXPLICIT_EOR); T> if (SCTP_BASE_SYSCTL(sctp_logging_level) & SCTP_RECV_RWND_LOGGING_ENABLE) { T> sctp_misc_ints(SCTP_SORECV_ENTER, T> - rwnd_req, in_eeor_mode, so->so_rcv.sb_cc, uio->uio_resid); T> + rwnd_req, in_eeor_mode, so->so_rcv.sb_ccc, uio->uio_resid); T> } T> if (SCTP_BASE_SYSCTL(sctp_logging_level) & SCTP_RECV_RWND_LOGGING_ENABLE) { T> sctp_misc_ints(SCTP_SORECV_ENTERPL, T> - rwnd_req, block_allowed, so->so_rcv.sb_cc, uio->uio_resid); T> + rwnd_req, block_allowed, so->so_rcv.sb_ccc, uio->uio_resid); T> } T> error = sblock(&so->so_rcv, (block_allowed ? SBL_WAIT : 0)); T> if (error) { T> @@ -5255,7 +5255,7 @@ restart_nosblocks: T> (inp->sctp_flags & SCTP_PCB_FLAGS_SOCKET_ALLGONE)) { T> goto out; T> } T> - if ((so->so_rcv.sb_state & SBS_CANTRCVMORE) && (so->so_rcv.sb_cc == 0)) { T> + if ((so->so_rcv.sb_state & SBS_CANTRCVMORE) && (so->so_rcv.sb_ccc == 0)) { T> if (so->so_error) { T> error = so->so_error; T> if ((in_flags & MSG_PEEK) == 0) T> @@ -5262,7 +5262,7 @@ restart_nosblocks: T> so->so_error = 0; T> goto out; T> } else { T> - if (so->so_rcv.sb_cc == 0) { T> + if (so->so_rcv.sb_ccc == 0) { T> /* indicate EOF */ T> error = 0; T> goto out; T> @@ -5269,9 +5269,9 @@ restart_nosblocks: T> } T> } T> } T> - if ((so->so_rcv.sb_cc <= held_length) && block_allowed) { T> + if ((so->so_rcv.sb_ccc <= held_length) && block_allowed) { T> /* we need to wait for data */ T> - if ((so->so_rcv.sb_cc == 0) && T> + if ((so->so_rcv.sb_ccc == 0) && T> ((inp->sctp_flags & SCTP_PCB_FLAGS_TCPTYPE) || T> (inp->sctp_flags & SCTP_PCB_FLAGS_IN_TCPPOOL))) { T> if ((inp->sctp_flags & SCTP_PCB_FLAGS_CONNECTED) == 0) { T> @@ -5307,7 +5307,7 @@ restart_nosblocks: T> } T> held_length = 0; T> goto restart_nosblocks; T> - } else if (so->so_rcv.sb_cc == 0) { T> + } else if (so->so_rcv.sb_ccc == 0) { T> if (so->so_error) { T> error = so->so_error; T> if ((in_flags & MSG_PEEK) == 0) T> @@ -5364,11 +5364,11 @@ restart_nosblocks: T> SCTP_INP_READ_LOCK(inp); T> } T> control = TAILQ_FIRST(&inp->read_queue); T> - if ((control == NULL) && (so->so_rcv.sb_cc != 0)) { T> + if ((control == NULL) && (so->so_rcv.sb_ccc != 0)) { T> #ifdef INVARIANTS T> panic("Huh, its non zero and nothing on control?"); T> #endif T> - so->so_rcv.sb_cc = 0; T> + so->so_rcv.sb_ccc = 0; T> } T> SCTP_INP_READ_UNLOCK(inp); T> hold_rlock = 0; T> @@ -5489,11 +5489,11 @@ restart_nosblocks: T> } T> /* T> * if we reach here, not suitable replacement is available T> - * fragment interleave is NOT on. So stuff the sb_cc T> + * fragment interleave is NOT on. So stuff the sb_ccc T> * into the our held count, and its time to sleep again. T> */ T> - held_length = so->so_rcv.sb_cc; T> - control->held_length = so->so_rcv.sb_cc; T> + held_length = so->so_rcv.sb_ccc; T> + control->held_length = so->so_rcv.sb_ccc; T> goto restart; T> } T> /* Clear the held length since there is something to read */ T> @@ -5790,10 +5790,10 @@ get_more_data: T> if (SCTP_BASE_SYSCTL(sctp_logging_level) & SCTP_SB_LOGGING_ENABLE) { T> sctp_sblog(&so->so_rcv, control->do_not_ref_stcb ? NULL : stcb, SCTP_LOG_SBFREE, cp_len); T> } T> - atomic_subtract_int(&so->so_rcv.sb_cc, cp_len); T> + atomic_subtract_int(&so->so_rcv.sb_ccc, cp_len); T> if ((control->do_not_ref_stcb == 0) && T> stcb) { T> - atomic_subtract_int(&stcb->asoc.sb_cc, cp_len); T> + atomic_subtract_int(&stcb->asoc.sb_ccc, cp_len); T> } T> copied_so_far += cp_len; T> freed_so_far += cp_len; T> @@ -5938,7 +5938,7 @@ wait_some_more: T> (sctp_is_feature_on(inp, SCTP_PCB_FLAGS_FRAG_INTERLEAVE))) { T> goto release; T> } T> - if (so->so_rcv.sb_cc <= control->held_length) { T> + if (so->so_rcv.sb_ccc <= control->held_length) { T> error = sbwait(&so->so_rcv); T> if (error) { T> goto release; T> @@ -5965,8 +5965,8 @@ wait_some_more: T> } T> goto done_with_control; T> } T> - if (so->so_rcv.sb_cc > held_length) { T> - control->held_length = so->so_rcv.sb_cc; T> + if (so->so_rcv.sb_ccc > held_length) { T> + control->held_length = so->so_rcv.sb_ccc; T> held_length = 0; T> } T> goto wait_some_more; T> @@ -6113,13 +6113,13 @@ out: T> freed_so_far, T> ((uio) ? (slen - uio->uio_resid) : slen), T> stcb->asoc.my_rwnd, T> - so->so_rcv.sb_cc); T> + so->so_rcv.sb_ccc); T> } else { T> sctp_misc_ints(SCTP_SORECV_DONE, T> freed_so_far, T> ((uio) ? (slen - uio->uio_resid) : slen), T> 0, T> - so->so_rcv.sb_cc); T> + so->so_rcv.sb_ccc); T> } T> } T> stage_left: T> Index: sys/netinet/sctputil.h T> =================================================================== T> --- sys/netinet/sctputil.h (.../head) (revision 266804) T> +++ sys/netinet/sctputil.h (.../projects/sendfile) (revision 266807) T> @@ -284,10 +284,10 @@ do { \ T> } \ T> if (stcb->sctp_socket && ((stcb->sctp_ep->sctp_flags & SCTP_PCB_FLAGS_TCPTYPE) || \ T> (stcb->sctp_ep->sctp_flags & SCTP_PCB_FLAGS_IN_TCPPOOL))) { \ T> - if (stcb->sctp_socket->so_snd.sb_cc >= tp1->book_size) { \ T> - atomic_subtract_int(&((stcb)->sctp_socket->so_snd.sb_cc), tp1->book_size); \ T> + if (stcb->sctp_socket->so_snd.sb_ccc >= tp1->book_size) { \ T> + atomic_subtract_int(&((stcb)->sctp_socket->so_snd.sb_ccc), tp1->book_size); \ T> } else { \ T> - stcb->sctp_socket->so_snd.sb_cc = 0; \ T> + stcb->sctp_socket->so_snd.sb_ccc = 0; \ T> } \ T> } \ T> } \ T> @@ -305,10 +305,10 @@ do { \ T> } \ T> if (stcb->sctp_socket && ((stcb->sctp_ep->sctp_flags & SCTP_PCB_FLAGS_TCPTYPE) || \ T> (stcb->sctp_ep->sctp_flags & SCTP_PCB_FLAGS_IN_TCPPOOL))) { \ T> - if (stcb->sctp_socket->so_snd.sb_cc >= sp->length) { \ T> - atomic_subtract_int(&stcb->sctp_socket->so_snd.sb_cc,sp->length); \ T> + if (stcb->sctp_socket->so_snd.sb_ccc >= sp->length) { \ T> + atomic_subtract_int(&stcb->sctp_socket->so_snd.sb_ccc,sp->length); \ T> } else { \ T> - stcb->sctp_socket->so_snd.sb_cc = 0; \ T> + stcb->sctp_socket->so_snd.sb_ccc = 0; \ T> } \ T> } \ T> } \ T> @@ -320,7 +320,7 @@ do { \ T> if ((stcb->sctp_socket != NULL) && \ T> ((stcb->sctp_ep->sctp_flags & SCTP_PCB_FLAGS_TCPTYPE) || \ T> (stcb->sctp_ep->sctp_flags & SCTP_PCB_FLAGS_IN_TCPPOOL))) { \ T> - atomic_add_int(&stcb->sctp_socket->so_snd.sb_cc,sz); \ T> + atomic_add_int(&stcb->sctp_socket->so_snd.sb_ccc,sz); \ T> } \ T> } while (0) T> T> Index: usr.bin/bluetooth/btsockstat/btsockstat.c T> =================================================================== T> --- usr.bin/bluetooth/btsockstat/btsockstat.c (.../head) (revision 266804) T> +++ usr.bin/bluetooth/btsockstat/btsockstat.c (.../projects/sendfile) (revision 266807) T> @@ -255,8 +255,8 @@ hcirawpr(kvm_t *kvmd, u_long addr) T> (unsigned long) pcb.so, T> (unsigned long) this, T> pcb.flags, T> - so.so_rcv.sb_cc, T> - so.so_snd.sb_cc, T> + so.so_rcv.sb_ccc, T> + so.so_snd.sb_ccc, T> pcb.addr.hci_node); T> } T> } /* hcirawpr */ T> @@ -303,8 +303,8 @@ l2caprawpr(kvm_t *kvmd, u_long addr) T> "%-8lx %-8lx %6d %6d %-17.17s\n", T> (unsigned long) pcb.so, T> (unsigned long) this, T> - so.so_rcv.sb_cc, T> - so.so_snd.sb_cc, T> + so.so_rcv.sb_ccc, T> + so.so_snd.sb_ccc, T> bdaddrpr(&pcb.src, NULL, 0)); T> } T> } /* l2caprawpr */ T> @@ -361,8 +361,8 @@ l2cappr(kvm_t *kvmd, u_long addr) T> fprintf(stdout, T> "%-8lx %6d %6d %-17.17s/%-5d %-17.17s %-5d %s\n", T> (unsigned long) this, T> - so.so_rcv.sb_cc, T> - so.so_snd.sb_cc, T> + so.so_rcv.sb_ccc, T> + so.so_snd.sb_ccc, T> bdaddrpr(&pcb.src, local, sizeof(local)), T> pcb.psm, T> bdaddrpr(&pcb.dst, remote, sizeof(remote)), T> @@ -467,8 +467,8 @@ rfcommpr(kvm_t *kvmd, u_long addr) T> fprintf(stdout, T> "%-8lx %6d %6d %-17.17s %-17.17s %-4d %-4d %s\n", T> (unsigned long) this, T> - so.so_rcv.sb_cc, T> - so.so_snd.sb_cc, T> + so.so_rcv.sb_ccc, T> + so.so_snd.sb_ccc, T> bdaddrpr(&pcb.src, local, sizeof(local)), T> bdaddrpr(&pcb.dst, remote, sizeof(remote)), T> pcb.channel, T> Index: usr.bin/systat/netstat.c T> =================================================================== T> --- usr.bin/systat/netstat.c (.../head) (revision 266804) T> +++ usr.bin/systat/netstat.c (.../projects/sendfile) (revision 266807) T> @@ -333,8 +333,8 @@ enter_kvm(struct inpcb *inp, struct socket *so, in T> struct netinfo *p; T> T> if ((p = enter(inp, state, proto)) != NULL) { T> - p->ni_rcvcc = so->so_rcv.sb_cc; T> - p->ni_sndcc = so->so_snd.sb_cc; T> + p->ni_rcvcc = so->so_rcv.sb_ccc; T> + p->ni_sndcc = so->so_snd.sb_ccc; T> } T> } T> T> Index: usr.bin/netstat/netgraph.c T> =================================================================== T> --- usr.bin/netstat/netgraph.c (.../head) (revision 266804) T> +++ usr.bin/netstat/netgraph.c (.../projects/sendfile) (revision 266807) T> @@ -119,7 +119,7 @@ netgraphprotopr(u_long off, const char *name, int T> if (Aflag) T> printf("%8lx ", (u_long) this); T> printf("%-5.5s %6u %6u ", T> - name, sockb.so_rcv.sb_cc, sockb.so_snd.sb_cc); T> + name, sockb.so_rcv.sb_ccc, sockb.so_snd.sb_ccc); T> T> /* Get info on associated node */ T> if (ngpcb.node_id == 0 || csock == -1) T> Index: usr.bin/netstat/unix.c T> =================================================================== T> --- usr.bin/netstat/unix.c (.../head) (revision 266804) T> +++ usr.bin/netstat/unix.c (.../projects/sendfile) (revision 266807) T> @@ -287,7 +287,8 @@ unixdomainpr(struct xunpcb *xunp, struct xsocket * T> } else { T> printf("%8lx %-6.6s %6u %6u %8lx %8lx %8lx %8lx", T> (long)so->so_pcb, socktype[so->so_type], so->so_rcv.sb_cc, T> - so->so_snd.sb_cc, (long)unp->unp_vnode, (long)unp->unp_conn, T> + so->so_snd.sb_cc, (long)unp->unp_vnode, T> + (long)unp->unp_conn, T> (long)LIST_FIRST(&unp->unp_refs), T> (long)LIST_NEXT(unp, unp_reflink)); T> } T> Index: usr.bin/netstat/inet.c T> =================================================================== T> --- usr.bin/netstat/inet.c (.../head) (revision 266804) T> +++ usr.bin/netstat/inet.c (.../projects/sendfile) (revision 266807) T> @@ -137,7 +137,7 @@ pcblist_sysctl(int proto, const char *name, char * T> static void T> sbtoxsockbuf(struct sockbuf *sb, struct xsockbuf *xsb) T> { T> - xsb->sb_cc = sb->sb_cc; T> + xsb->sb_cc = sb->sb_ccc; T> xsb->sb_hiwat = sb->sb_hiwat; T> xsb->sb_mbcnt = sb->sb_mbcnt; T> xsb->sb_mcnt = sb->sb_mcnt; T> @@ -479,7 +479,8 @@ protopr(u_long off, const char *name, int af1, int T> printf("%6u %6u %6u ", tp->t_sndrexmitpack, T> tp->t_rcvoopack, tp->t_sndzerowin); T> } else { T> - printf("%6u %6u ", so->so_rcv.sb_cc, so->so_snd.sb_cc); T> + printf("%6u %6u ", T> + so->so_rcv.sb_cc, so->so_snd.sb_cc); T> } T> if (numeric_port) { T> if (inp->inp_vflag & INP_IPV4) { T> _______________________________________________ T> freebsd-arch@freebsd.org mailing list T> http://lists.freebsd.org/mailman/listinfo/freebsd-arch T> To unsubscribe, send any mail to "freebsd-arch-unsubscribe@freebsd.org" -- Totus tuus, Glebius. --hTiIB9CRvBOLTyqY Content-Type: text/x-diff; charset=us-ascii Content-Disposition: attachment; filename="sendfile.diff" Index: sys/sys/vnode.h =================================================================== --- sys/sys/vnode.h (.../head) (revision 270879) +++ sys/sys/vnode.h (.../projects/sendfile) (revision 270881) @@ -727,6 +727,7 @@ int vop_stdbmap(struct vop_bmap_args *); int vop_stdfsync(struct vop_fsync_args *); int vop_stdgetwritemount(struct vop_getwritemount_args *); int vop_stdgetpages(struct vop_getpages_args *); +int vop_stdgetpages_async(struct vop_getpages_async_args *); int vop_stdinactive(struct vop_inactive_args *); int vop_stdislocked(struct vop_islocked_args *); int vop_stdkqfilter(struct vop_kqfilter_args *); Index: sys/sys/socket.h =================================================================== --- sys/sys/socket.h (.../head) (revision 270879) +++ sys/sys/socket.h (.../projects/sendfile) (revision 270881) @@ -602,12 +602,15 @@ struct sf_hdtr_all { * Sendfile-specific flag(s) */ #define SF_NODISKIO 0x00000001 -#define SF_MNOWAIT 0x00000002 +#define SF_MNOWAIT 0x00000002 /* unused since 11.0 */ #define SF_SYNC 0x00000004 #define SF_KQUEUE 0x00000008 +#define SF_NOCACHE 0x00000010 +#define SF_FLAGS(rh, flags) (((rh) << 16) | (flags)) #ifdef _KERNEL #define SFK_COMPAT 0x00000001 +#define SF_READAHEAD(flags) ((flags) >> 16) #endif /* _KERNEL */ #endif /* __BSD_VISIBLE */ Index: sys/sys/sockbuf.h =================================================================== --- sys/sys/sockbuf.h (.../head) (revision 270879) +++ sys/sys/sockbuf.h (.../projects/sendfile) (revision 270881) @@ -89,8 +89,13 @@ struct sockbuf { struct mbuf *sb_lastrecord; /* (c/d) first mbuf of last * record in socket buffer */ struct mbuf *sb_sndptr; /* (c/d) pointer into mbuf chain */ + struct mbuf *sb_fnrdy; /* (c/d) pointer to first not ready buffer */ +#if 0 + struct mbuf *sb_lnrdy; /* (c/d) pointer to last not ready buffer */ +#endif u_int sb_sndptroff; /* (c/d) byte offset of ptr into chain */ - u_int sb_cc; /* (c/d) actual chars in buffer */ + u_int sb_acc; /* (c/d) available chars in buffer */ + u_int sb_ccc; /* (c/d) claimed chars in buffer */ u_int sb_hiwat; /* (c/d) max actual char count */ u_int sb_mbcnt; /* (c/d) chars of mbufs used */ u_int sb_mcnt; /* (c/d) number of mbufs in buffer */ @@ -120,10 +125,17 @@ struct sockbuf { #define SOCKBUF_LOCK_ASSERT(_sb) mtx_assert(SOCKBUF_MTX(_sb), MA_OWNED) #define SOCKBUF_UNLOCK_ASSERT(_sb) mtx_assert(SOCKBUF_MTX(_sb), MA_NOTOWNED) +/* + * Socket buffer private mbuf(9) flags. + */ +#define M_NOTREADY M_PROTO1 /* m_data not populated yet */ +#define M_BLOCKED M_PROTO2 /* M_NOTREADY in front of m */ +#define M_NOTAVAIL (M_NOTREADY | M_BLOCKED) + void sbappend(struct sockbuf *sb, struct mbuf *m); void sbappend_locked(struct sockbuf *sb, struct mbuf *m); -void sbappendstream(struct sockbuf *sb, struct mbuf *m); -void sbappendstream_locked(struct sockbuf *sb, struct mbuf *m); +void sbappendstream(struct sockbuf *sb, struct mbuf *m, int flags); +void sbappendstream_locked(struct sockbuf *sb, struct mbuf *m, int flags); int sbappendaddr(struct sockbuf *sb, const struct sockaddr *asa, struct mbuf *m0, struct mbuf *control); int sbappendaddr_locked(struct sockbuf *sb, const struct sockaddr *asa, @@ -136,7 +148,6 @@ int sbappendcontrol_locked(struct sockbuf *sb, str struct mbuf *control); void sbappendrecord(struct sockbuf *sb, struct mbuf *m0); void sbappendrecord_locked(struct sockbuf *sb, struct mbuf *m0); -void sbcheck(struct sockbuf *sb); void sbcompress(struct sockbuf *sb, struct mbuf *m, struct mbuf *n); struct mbuf * sbcreatecontrol(caddr_t p, int size, int type, int level); @@ -162,59 +173,54 @@ void sbtoxsockbuf(struct sockbuf *sb, struct xsock int sbwait(struct sockbuf *sb); int sblock(struct sockbuf *sb, int flags); void sbunlock(struct sockbuf *sb); +void sballoc(struct sockbuf *, struct mbuf *); +void sbfree(struct sockbuf *, struct mbuf *); +void sbmtrim(struct sockbuf *, struct mbuf *, int); +int sbready(struct sockbuf *, struct mbuf *, int); +static inline u_int +sbavail(struct sockbuf *sb) +{ + +#if 0 + SOCKBUF_LOCK_ASSERT(sb); +#endif + return (sb->sb_acc); +} + +static inline u_int +sbused(struct sockbuf *sb) +{ + +#if 0 + SOCKBUF_LOCK_ASSERT(sb); +#endif + return (sb->sb_ccc); +} + /* * How much space is there in a socket buffer (so->so_snd or so->so_rcv)? * This is problematical if the fields are unsigned, as the space might - * still be negative (cc > hiwat or mbcnt > mbmax). Should detect - * overflow and return 0. Should use "lmin" but it doesn't exist now. + * still be negative (ccc > hiwat or mbcnt > mbmax). */ -static __inline -long +static inline long sbspace(struct sockbuf *sb) { - long bleft; - long mleft; + long bleft, mleft; +#if 0 + SOCKBUF_LOCK_ASSERT(sb); +#endif + if (sb->sb_flags & SB_STOP) return(0); - bleft = sb->sb_hiwat - sb->sb_cc; + + bleft = sb->sb_hiwat - sb->sb_ccc; mleft = sb->sb_mbmax - sb->sb_mbcnt; - return((bleft < mleft) ? bleft : mleft); -} -/* adjust counters in sb reflecting allocation of m */ -#define sballoc(sb, m) { \ - (sb)->sb_cc += (m)->m_len; \ - if ((m)->m_type != MT_DATA && (m)->m_type != MT_OOBDATA) \ - (sb)->sb_ctl += (m)->m_len; \ - (sb)->sb_mbcnt += MSIZE; \ - (sb)->sb_mcnt += 1; \ - if ((m)->m_flags & M_EXT) { \ - (sb)->sb_mbcnt += (m)->m_ext.ext_size; \ - (sb)->sb_ccnt += 1; \ - } \ + return ((bleft < mleft) ? bleft : mleft); } -/* adjust counters in sb reflecting freeing of m */ -#define sbfree(sb, m) { \ - (sb)->sb_cc -= (m)->m_len; \ - if ((m)->m_type != MT_DATA && (m)->m_type != MT_OOBDATA) \ - (sb)->sb_ctl -= (m)->m_len; \ - (sb)->sb_mbcnt -= MSIZE; \ - (sb)->sb_mcnt -= 1; \ - if ((m)->m_flags & M_EXT) { \ - (sb)->sb_mbcnt -= (m)->m_ext.ext_size; \ - (sb)->sb_ccnt -= 1; \ - } \ - if ((sb)->sb_sndptr == (m)) { \ - (sb)->sb_sndptr = NULL; \ - (sb)->sb_sndptroff = 0; \ - } \ - if ((sb)->sb_sndptroff != 0) \ - (sb)->sb_sndptroff -= (m)->m_len; \ -} - #define SB_EMPTY_FIXUP(sb) do { \ if ((sb)->sb_mb == NULL) { \ (sb)->sb_mbtail = NULL; \ @@ -224,13 +230,15 @@ sbspace(struct sockbuf *sb) #ifdef SOCKBUF_DEBUG void sblastrecordchk(struct sockbuf *, const char *, int); +void sblastmbufchk(struct sockbuf *, const char *, int); +void sbcheck(struct sockbuf *, const char *, int); #define SBLASTRECORDCHK(sb) sblastrecordchk((sb), __FILE__, __LINE__) - -void sblastmbufchk(struct sockbuf *, const char *, int); #define SBLASTMBUFCHK(sb) sblastmbufchk((sb), __FILE__, __LINE__) +#define SBCHECK(sb) sbcheck((sb), __FILE__, __LINE__) #else -#define SBLASTRECORDCHK(sb) /* nothing */ -#define SBLASTMBUFCHK(sb) /* nothing */ +#define SBLASTRECORDCHK(sb) do {} while (0) +#define SBLASTMBUFCHK(sb) do {} while (0) +#define SBCHECK(sb) do {} while (0) #endif /* SOCKBUF_DEBUG */ #endif /* _KERNEL */ Index: sys/sys/protosw.h =================================================================== --- sys/sys/protosw.h (.../head) (revision 270879) +++ sys/sys/protosw.h (.../projects/sendfile) (revision 270881) @@ -208,6 +208,8 @@ struct pr_usrreqs { #define PRUS_OOB 0x1 #define PRUS_EOF 0x2 #define PRUS_MORETOCOME 0x4 +#define PRUS_NOTREADY 0x8 + int (*pru_ready)(struct socket *so, struct mbuf *m, int count); int (*pru_sense)(struct socket *so, struct stat *sb); int (*pru_shutdown)(struct socket *so); int (*pru_flush)(struct socket *so, int direction); @@ -251,6 +253,7 @@ int pru_rcvd_notsupp(struct socket *so, int flags) int pru_rcvoob_notsupp(struct socket *so, struct mbuf *m, int flags); int pru_send_notsupp(struct socket *so, int flags, struct mbuf *m, struct sockaddr *addr, struct mbuf *control, struct thread *td); +int pru_ready_notsupp(struct socket *so, struct mbuf *m, int count); int pru_sense_null(struct socket *so, struct stat *sb); int pru_shutdown_notsupp(struct socket *so); int pru_sockaddr_notsupp(struct socket *so, struct sockaddr **nam); Index: sys/sys/mbuf.h =================================================================== --- sys/sys/mbuf.h (.../head) (revision 270879) +++ sys/sys/mbuf.h (.../projects/sendfile) (revision 270881) @@ -330,12 +330,13 @@ struct mbuf { * External mbuf storage buffer types. */ #define EXT_CLUSTER 1 /* mbuf cluster */ -#define EXT_SFBUF 2 /* sendfile(2)'s sf_bufs */ +#define EXT_SFBUF 2 /* sendfile(2)'s sf_buf */ #define EXT_JUMBOP 3 /* jumbo cluster 4096 bytes */ #define EXT_JUMBO9 4 /* jumbo cluster 9216 bytes */ #define EXT_JUMBO16 5 /* jumbo cluster 16184 bytes */ #define EXT_PACKET 6 /* mbuf+cluster from packet zone */ #define EXT_MBUF 7 /* external mbuf reference (M_IOVEC) */ +#define EXT_SFBUF_NOCACHE 8 /* sendfile(2)'s sf_buf not to be cached */ #define EXT_VENDOR1 224 /* for vendor-internal use */ #define EXT_VENDOR2 225 /* for vendor-internal use */ @@ -384,6 +385,7 @@ struct mbuf { */ void sf_ext_ref(void *, void *); void sf_ext_free(void *, void *); +void sf_ext_free_nocache(void *, void *); /* * Flags indicating checksum, segmentation and other offload work to be @@ -929,7 +931,7 @@ struct mbuf *m_copypacket(struct mbuf *, int); void m_copy_pkthdr(struct mbuf *, struct mbuf *); struct mbuf *m_copyup(struct mbuf *, int, int); struct mbuf *m_defrag(struct mbuf *, int); -void m_demote(struct mbuf *, int); +void m_demote(struct mbuf *, int, int); struct mbuf *m_devget(char *, int, int, struct ifnet *, void (*)(char *, caddr_t, u_int)); struct mbuf *m_dup(struct mbuf *, int); Index: sys/sys/socketvar.h =================================================================== --- sys/sys/socketvar.h (.../head) (revision 270879) +++ sys/sys/socketvar.h (.../projects/sendfile) (revision 270881) @@ -207,7 +207,7 @@ struct xsocket { /* can we read something from so? */ #define soreadabledata(so) \ - ((so)->so_rcv.sb_cc >= (so)->so_rcv.sb_lowat || \ + (sbavail(&(so)->so_rcv) >= (so)->so_rcv.sb_lowat || \ !TAILQ_EMPTY(&(so)->so_comp) || (so)->so_error) #define soreadable(so) \ (soreadabledata(so) || ((so)->so_rcv.sb_state & SBS_CANTRCVMORE)) Index: sys/rpc/svc_vc.c =================================================================== --- sys/rpc/svc_vc.c (.../head) (revision 270879) +++ sys/rpc/svc_vc.c (.../projects/sendfile) (revision 270881) @@ -546,7 +546,7 @@ svc_vc_ack(SVCXPRT *xprt, uint32_t *ack) { *ack = atomic_load_acq_32(&xprt->xp_snt_cnt); - *ack -= xprt->xp_socket->so_snd.sb_cc; + *ack -= sbused(&xprt->xp_socket->so_snd); return (TRUE); } Index: sys/rpc/clnt_vc.c =================================================================== --- sys/rpc/clnt_vc.c (.../head) (revision 270879) +++ sys/rpc/clnt_vc.c (.../projects/sendfile) (revision 270881) @@ -860,7 +860,7 @@ clnt_vc_soupcall(struct socket *so, void *arg, int * error condition */ do_read = FALSE; - if (so->so_rcv.sb_cc >= sizeof(uint32_t) + if (sbavail(&so->so_rcv) >= sizeof(uint32_t) || (so->so_rcv.sb_state & SBS_CANTRCVMORE) || so->so_error) do_read = TRUE; @@ -913,7 +913,7 @@ clnt_vc_soupcall(struct socket *so, void *arg, int * buffered. */ do_read = FALSE; - if (so->so_rcv.sb_cc >= ct->ct_record_resid + if (sbavail(&so->so_rcv) >= ct->ct_record_resid || (so->so_rcv.sb_state & SBS_CANTRCVMORE) || so->so_error) do_read = TRUE; Index: sys/ufs/ffs/ffs_vnops.c =================================================================== --- sys/ufs/ffs/ffs_vnops.c (.../head) (revision 270879) +++ sys/ufs/ffs/ffs_vnops.c (.../projects/sendfile) (revision 270881) @@ -105,6 +105,7 @@ extern int ffs_rawread(struct vnode *vp, struct ui static vop_fsync_t ffs_fsync; static vop_lock1_t ffs_lock; static vop_getpages_t ffs_getpages; +static vop_getpages_async_t ffs_getpages_async; static vop_read_t ffs_read; static vop_write_t ffs_write; static int ffs_extread(struct vnode *vp, struct uio *uio, int ioflag); @@ -125,6 +126,7 @@ struct vop_vector ffs_vnodeops1 = { .vop_default = &ufs_vnodeops, .vop_fsync = ffs_fsync, .vop_getpages = ffs_getpages, + .vop_getpages_async = ffs_getpages_async, .vop_lock1 = ffs_lock, .vop_read = ffs_read, .vop_reallocblks = ffs_reallocblks, @@ -847,18 +849,16 @@ ffs_write(ap) } /* - * get page routine + * Get page routines. */ static int -ffs_getpages(ap) - struct vop_getpages_args *ap; +ffs_getpages_checkvalid(vm_page_t *m, int count, int reqpage) { - int i; vm_page_t mreq; int pcount; - pcount = round_page(ap->a_count) / PAGE_SIZE; - mreq = ap->a_m[ap->a_reqpage]; + pcount = round_page(count) / PAGE_SIZE; + mreq = m[reqpage]; /* * if ANY DEV_BSIZE blocks are valid on a large filesystem block, @@ -870,24 +870,48 @@ static int if (mreq->valid) { if (mreq->valid != VM_PAGE_BITS_ALL) vm_page_zero_invalid(mreq, TRUE); - for (i = 0; i < pcount; i++) { - if (i != ap->a_reqpage) { - vm_page_lock(ap->a_m[i]); - vm_page_free(ap->a_m[i]); - vm_page_unlock(ap->a_m[i]); + for (int i = 0; i < pcount; i++) { + if (i != reqpage) { + vm_page_lock(m[i]); + vm_page_free(m[i]); + vm_page_unlock(m[i]); } } VM_OBJECT_WUNLOCK(mreq->object); - return VM_PAGER_OK; + return (VM_PAGER_OK); } VM_OBJECT_WUNLOCK(mreq->object); - return vnode_pager_generic_getpages(ap->a_vp, ap->a_m, - ap->a_count, - ap->a_reqpage); + return (-1); } +static int +ffs_getpages(struct vop_getpages_args *ap) +{ + int rv; + rv = ffs_getpages_checkvalid(ap->a_m, ap->a_count, ap->a_reqpage); + if (rv == VM_PAGER_OK) + return (rv); + + return (vnode_pager_generic_getpages(ap->a_vp, ap->a_m, ap->a_count, + ap->a_reqpage, NULL, NULL)); +} + +static int +ffs_getpages_async(struct vop_getpages_async_args *ap) +{ + int rv; + + rv = ffs_getpages_checkvalid(ap->a_m, ap->a_count, ap->a_reqpage); + if (rv == VM_PAGER_OK) { + (ap->a_vop_getpages_iodone)(ap->a_arg); + return (rv); + } + return (vnode_pager_generic_getpages(ap->a_vp, ap->a_m, ap->a_count, + ap->a_reqpage, ap->a_vop_getpages_iodone, ap->a_arg)); +} + /* * Extended attribute area reading. */ Index: sys/kern/uipc_domain.c =================================================================== --- sys/kern/uipc_domain.c (.../head) (revision 270879) +++ sys/kern/uipc_domain.c (.../projects/sendfile) (revision 270881) @@ -152,6 +152,7 @@ protosw_init(struct protosw *pr) DEFAULT(pu->pru_sosend, sosend_generic); DEFAULT(pu->pru_soreceive, soreceive_generic); DEFAULT(pu->pru_sopoll, sopoll_generic); + DEFAULT(pu->pru_ready, pru_ready_notsupp); #undef DEFAULT if (pr->pr_init) (*pr->pr_init)(); Index: sys/kern/vnode_if.src =================================================================== --- sys/kern/vnode_if.src (.../head) (revision 270879) +++ sys/kern/vnode_if.src (.../projects/sendfile) (revision 270881) @@ -477,6 +477,19 @@ vop_getpages { }; +%% getpages_async vp L L L + +vop_getpages_async { + IN struct vnode *vp; + IN vm_page_t *m; + IN int count; + IN int reqpage; + IN vm_ooffset_t offset; + IN void (*vop_getpages_iodone)(void *); + IN void *arg; +}; + + %% putpages vp L L L vop_putpages { Index: sys/kern/uipc_sockbuf.c =================================================================== --- sys/kern/uipc_sockbuf.c (.../head) (revision 270879) +++ sys/kern/uipc_sockbuf.c (.../projects/sendfile) (revision 270881) @@ -68,7 +68,145 @@ static u_long sb_efficiency = 8; /* parameter for static struct mbuf *sbcut_internal(struct sockbuf *sb, int len); static void sbflush_internal(struct sockbuf *sb); +static void +sb_shift_nrdy(struct sockbuf *sb, struct mbuf *m) +{ + +#if 0 /* XXX: not yet: soclose() call path comes here w/o lock. */ + SOCKBUF_LOCK_ASSERT(sb); +#endif + KASSERT(m->m_flags & M_NOTREADY, ("%s: m %p !M_NOTREADY", __func__, m)); + + m = m->m_next; + while (m != NULL && !(m->m_flags & M_NOTREADY)) { + m->m_flags &= ~M_BLOCKED; + sb->sb_acc += m->m_len; + m = m->m_next; + } + + sb->sb_fnrdy = m; +} + +int +sbready(struct sockbuf *sb, struct mbuf *m, int count) +{ + u_int blocker; + + SOCKBUF_LOCK_ASSERT(sb); + + KASSERT(sb->sb_fnrdy != NULL, ("%s: sb %p NULL fnrdy", __func__, sb)); + + blocker = (sb->sb_fnrdy == m) ? M_BLOCKED : 0; + + for (int i = 0; i < count; i++, m = m->m_next) { + KASSERT(m->m_flags & M_NOTREADY, + ("%s: m %p !M_NOTREADY", __func__, m)); + m->m_flags &= ~(M_NOTREADY | blocker); + if (blocker) + sb->sb_acc += m->m_len; + } + + if (!blocker) + return (EINPROGRESS); + + /* This one was blocking all the queue. */ + for (; m && (m->m_flags & M_NOTREADY) == 0; m = m->m_next) { + KASSERT(m->m_flags & M_BLOCKED, + ("%s: m %p !M_BLOCKED", __func__, m)); + m->m_flags &= ~M_BLOCKED; + sb->sb_acc += m->m_len; + } + + sb->sb_fnrdy = m; + + return (0); +} + /* + * Adjust sockbuf state reflecting allocation of m. + */ +void +sballoc(struct sockbuf *sb, struct mbuf *m) +{ + + SOCKBUF_LOCK_ASSERT(sb); + + sb->sb_ccc += m->m_len; + + if (sb->sb_fnrdy == NULL) { + if (m->m_flags & M_NOTREADY) + sb->sb_fnrdy = m; + else + sb->sb_acc += m->m_len; + } else + m->m_flags |= M_BLOCKED; + + if (m->m_type != MT_DATA && m->m_type != MT_OOBDATA) + sb->sb_ctl += m->m_len; + + sb->sb_mbcnt += MSIZE; + sb->sb_mcnt += 1; + + if (m->m_flags & M_EXT) { + sb->sb_mbcnt += m->m_ext.ext_size; + sb->sb_ccnt += 1; + } +} + +/* + * Adjust sockbuf state reflecting freeing of m. + */ +void +sbfree(struct sockbuf *sb, struct mbuf *m) +{ + +#if 0 /* XXX: not yet: soclose() call path comes here w/o lock. */ + SOCKBUF_LOCK_ASSERT(sb); +#endif + + sb->sb_ccc -= m->m_len; + + if (!(m->m_flags & M_NOTAVAIL)) + sb->sb_acc -= m->m_len; + + if (sb->sb_fnrdy == m) + sb_shift_nrdy(sb, m); + + if (m->m_type != MT_DATA && m->m_type != MT_OOBDATA) + sb->sb_ctl -= m->m_len; + + sb->sb_mbcnt -= MSIZE; + sb->sb_mcnt -= 1; + if (m->m_flags & M_EXT) { + sb->sb_mbcnt -= m->m_ext.ext_size; + sb->sb_ccnt -= 1; + } + + if (sb->sb_sndptr == m) { + sb->sb_sndptr = NULL; + sb->sb_sndptroff = 0; + } + if (sb->sb_sndptroff != 0) + sb->sb_sndptroff -= m->m_len; +} + +/* + * Trim some amount of data from (first?) mbuf in buffer. + */ +void +sbmtrim(struct sockbuf *sb, struct mbuf *m, int len) +{ + + SOCKBUF_LOCK_ASSERT(sb); + KASSERT(len < m->m_len, ("%s: m %p len %d", __func__, m, len)); + + m->m_data += len; + m->m_len -= len; + sb->sb_acc -= len; + sb->sb_ccc -= len; +} + +/* * Socantsendmore indicates that no more data will be sent on the socket; it * would normally be applied to a socket when the user informs the system * that no more data is to be sent, by the protocol code (in case @@ -127,7 +265,7 @@ sbwait(struct sockbuf *sb) SOCKBUF_LOCK_ASSERT(sb); sb->sb_flags |= SB_WAIT; - return (msleep_sbt(&sb->sb_cc, &sb->sb_mtx, + return (msleep_sbt(&sb->sb_acc, &sb->sb_mtx, (sb->sb_flags & SB_NOINTR) ? PSOCK : PSOCK | PCATCH, "sbwait", sb->sb_timeo, 0, 0)); } @@ -184,7 +322,7 @@ sowakeup(struct socket *so, struct sockbuf *sb) sb->sb_flags &= ~SB_SEL; if (sb->sb_flags & SB_WAIT) { sb->sb_flags &= ~SB_WAIT; - wakeup(&sb->sb_cc); + wakeup(&sb->sb_acc); } KNOTE_LOCKED(&sb->sb_sel.si_note, 0); if (sb->sb_upcall != NULL) { @@ -519,7 +657,7 @@ sbappend(struct sockbuf *sb, struct mbuf *m) * that is, a stream protocol (such as TCP). */ void -sbappendstream_locked(struct sockbuf *sb, struct mbuf *m) +sbappendstream_locked(struct sockbuf *sb, struct mbuf *m, int flags) { SOCKBUF_LOCK_ASSERT(sb); @@ -529,8 +667,8 @@ void SBLASTMBUFCHK(sb); /* Remove all packet headers and mbuf tags to get a pure data chain. */ - m_demote(m, 1); - + m_demote(m, 1, flags & PRUS_NOTREADY ? M_NOTREADY : 0); + sbcompress(sb, m, sb->sb_mbtail); sb->sb_lastrecord = sb->sb_mb; @@ -543,38 +681,59 @@ void * that is, a stream protocol (such as TCP). */ void -sbappendstream(struct sockbuf *sb, struct mbuf *m) +sbappendstream(struct sockbuf *sb, struct mbuf *m, int flags) { SOCKBUF_LOCK(sb); - sbappendstream_locked(sb, m); + sbappendstream_locked(sb, m, flags); SOCKBUF_UNLOCK(sb); } #ifdef SOCKBUF_DEBUG void -sbcheck(struct sockbuf *sb) +sbcheck(struct sockbuf *sb, const char *file, int line) { - struct mbuf *m; - struct mbuf *n = 0; - u_long len = 0, mbcnt = 0; + struct mbuf *m, *n, *fnrdy; + u_long acc, ccc, mbcnt; SOCKBUF_LOCK_ASSERT(sb); + acc = ccc = mbcnt = 0; + fnrdy = NULL; + for (m = sb->sb_mb; m; m = n) { n = m->m_nextpkt; for (; m; m = m->m_next) { - len += m->m_len; + if ((m->m_flags & M_NOTREADY) && fnrdy == NULL) { + if (m != sb->sb_fnrdy) { + printf("sb %p: fnrdy %p != m %p\n", + sb, sb->sb_fnrdy, m); + goto fail; + } + fnrdy = m; + } + if (fnrdy) { + if (!(m->m_flags & M_NOTAVAIL)) { + printf("sb %p: fnrdy %p, m %p is avail\n", + sb, sb->sb_fnrdy, m); + goto fail; + } + } else + acc += m->m_len; + ccc += m->m_len; mbcnt += MSIZE; if (m->m_flags & M_EXT) /*XXX*/ /* pretty sure this is bogus */ mbcnt += m->m_ext.ext_size; } } - if (len != sb->sb_cc || mbcnt != sb->sb_mbcnt) { - printf("cc %ld != %u || mbcnt %ld != %u\n", len, sb->sb_cc, - mbcnt, sb->sb_mbcnt); - panic("sbcheck"); + if (acc != sb->sb_acc || ccc != sb->sb_ccc || mbcnt != sb->sb_mbcnt) { + printf("acc %ld/%u ccc %ld/%u mbcnt %ld/%u\n", + acc, sb->sb_acc, ccc, sb->sb_ccc, mbcnt, sb->sb_mbcnt); + goto fail; } + return; +fail: + panic("%s from %s:%u", __func__, file, line); } #endif @@ -800,6 +959,7 @@ sbcompress(struct sockbuf *sb, struct mbuf *m, str if (n && (n->m_flags & M_EOR) == 0 && M_WRITABLE(n) && ((sb->sb_flags & SB_NOCOALESCE) == 0) && + !(m->m_flags & M_NOTREADY) && m->m_len <= MCLBYTES / 4 && /* XXX: Don't copy too much */ m->m_len <= M_TRAILINGSPACE(n) && n->m_type == m->m_type) { @@ -806,7 +966,9 @@ sbcompress(struct sockbuf *sb, struct mbuf *m, str bcopy(mtod(m, caddr_t), mtod(n, caddr_t) + n->m_len, (unsigned)m->m_len); n->m_len += m->m_len; - sb->sb_cc += m->m_len; + sb->sb_ccc += m->m_len; + if (sb->sb_fnrdy == NULL) + sb->sb_acc += m->m_len; if (m->m_type != MT_DATA && m->m_type != MT_OOBDATA) /* XXX: Probably don't need.*/ sb->sb_ctl += m->m_len; @@ -843,13 +1005,13 @@ sbflush_internal(struct sockbuf *sb) * Don't call sbcut(sb, 0) if the leading mbuf is non-empty: * we would loop forever. Panic instead. */ - if (!sb->sb_cc && (sb->sb_mb == NULL || sb->sb_mb->m_len)) + if (sb->sb_ccc == 0 && (sb->sb_mb == NULL || sb->sb_mb->m_len)) break; - m_freem(sbcut_internal(sb, (int)sb->sb_cc)); + m_freem(sbcut_internal(sb, (int)sb->sb_ccc)); } - if (sb->sb_cc || sb->sb_mb || sb->sb_mbcnt) - panic("sbflush_internal: cc %u || mb %p || mbcnt %u", - sb->sb_cc, (void *)sb->sb_mb, sb->sb_mbcnt); + KASSERT(sb->sb_ccc == 0 && sb->sb_mb == 0 && sb->sb_mbcnt == 0, + ("%s: ccc %u mb %p mbcnt %u", __func__, + sb->sb_ccc, (void *)sb->sb_mb, sb->sb_mbcnt)); } void @@ -891,7 +1053,9 @@ sbcut_internal(struct sockbuf *sb, int len) if (m->m_len > len) { m->m_len -= len; m->m_data += len; - sb->sb_cc -= len; + sb->sb_ccc -= len; + if (!(m->m_flags & M_NOTAVAIL)) + sb->sb_acc -= len; if (sb->sb_sndptroff != 0) sb->sb_sndptroff -= len; if (m->m_type != MT_DATA && m->m_type != MT_OOBDATA) @@ -977,8 +1141,8 @@ sbsndptr(struct sockbuf *sb, u_int off, u_int len, struct mbuf *m, *ret; KASSERT(sb->sb_mb != NULL, ("%s: sb_mb is NULL", __func__)); - KASSERT(off + len <= sb->sb_cc, ("%s: beyond sb", __func__)); - KASSERT(sb->sb_sndptroff <= sb->sb_cc, ("%s: sndptroff broken", __func__)); + KASSERT(off + len <= sb->sb_acc, ("%s: beyond sb", __func__)); + KASSERT(sb->sb_sndptroff <= sb->sb_acc, ("%s: sndptroff broken", __func__)); /* * Is off below stored offset? Happens on retransmits. @@ -1096,7 +1260,7 @@ void sbtoxsockbuf(struct sockbuf *sb, struct xsockbuf *xsb) { - xsb->sb_cc = sb->sb_cc; + xsb->sb_cc = sb->sb_ccc; xsb->sb_hiwat = sb->sb_hiwat; xsb->sb_mbcnt = sb->sb_mbcnt; xsb->sb_mcnt = sb->sb_mcnt; Index: sys/kern/uipc_syscalls.c =================================================================== --- sys/kern/uipc_syscalls.c (.../head) (revision 270879) +++ sys/kern/uipc_syscalls.c (.../projects/sendfile) (revision 270881) @@ -132,9 +132,10 @@ static int filt_sfsync(struct knote *kn, long hint */ static SYSCTL_NODE(_kern_ipc, OID_AUTO, sendfile, CTLFLAG_RW, 0, "sendfile(2) tunables"); -static int sfreadahead = 1; + +static int sfreadahead = 0; SYSCTL_INT(_kern_ipc_sendfile, OID_AUTO, readahead, CTLFLAG_RW, - &sfreadahead, 0, "Number of sendfile(2) read-ahead MAXBSIZE blocks"); + &sfreadahead, 0, "Read this more pages than socket buffer can accept"); #ifdef SFSYNC_DEBUG static int sf_sync_debug = 0; @@ -2035,6 +2036,37 @@ sf_ext_free(void *arg1, void *arg2) } /* + * Same as above, but forces the page to be detached from the object + * and go into free pool. + */ +void +sf_ext_free_nocache(void *arg1, void *arg2) +{ + struct sf_buf *sf = arg1; + struct sendfile_sync *sfs = arg2; + vm_page_t pg = sf_buf_page(sf); + + sf_buf_free(sf); + + vm_page_lock(pg); + vm_page_unwire(pg, 0); + if (pg->wire_count == 0) { + vm_object_t obj; + + if ((obj = pg->object) == NULL) + vm_page_free(pg); + else if (!vm_page_xbusied(pg) && VM_OBJECT_TRYWLOCK(obj)) { + vm_page_free(pg); + VM_OBJECT_WUNLOCK(obj); + } + } + vm_page_unlock(pg); + + if (sfs != NULL) + sf_sync_deref(sfs); +} + +/* * Called to remove a reference to a sf_sync object. * * This is generally done during the mbuf free path to signify @@ -2627,106 +2659,168 @@ freebsd4_sendfile(struct thread *td, struct freebs } #endif /* COMPAT_FREEBSD4 */ + /* + * How much data to put into page i of n. + * Only first and last pages are special. + */ +static inline off_t +xfsize(int i, int n, off_t off, off_t len) +{ + + if (i == 0) + return (omin(PAGE_SIZE - (off & PAGE_MASK), len)); + + if (i == n - 1 && ((off + len) & PAGE_MASK) > 0) + return ((off + len) & PAGE_MASK); + + return (PAGE_SIZE); +} + +/* + * Offset within object for i page. + */ +static inline vm_offset_t +vmoff(int i, off_t off) +{ + + if (i == 0) + return ((vm_offset_t)off); + + return (trunc_page(off + i * PAGE_SIZE)); +} + +/* + * Pretend as if we don't have enough space, subtract xfsize() of + * all pages that failed. + */ +static inline void +fixspace(int old, int new, off_t off, int *space) +{ + + KASSERT(old > new, ("%s: old %d new %d", __func__, old, new)); + + /* Subtract last one. */ + *space -= xfsize(old - 1, old, off, *space); + old--; + + if (new == old) + /* There was only one page. */ + return; + + /* Subtract first one. */ + if (new == 0) { + *space -= xfsize(0, old, off, *space); + new++; + } + + /* Rest of pages are full sized. */ + *space -= (old - new) * PAGE_SIZE; + + KASSERT(*space >= 0, ("%s: space went backwards", __func__)); +} + +struct sf_io { + u_int nios; + int npages; + struct file *sock_fp; + struct mbuf *m; + vm_page_t pa[]; +}; + +static void +sf_io_done(void *arg) +{ + struct sf_io *sfio = arg; + struct socket *so; + + if (!refcount_release(&sfio->nios)) + return; + + so = sfio->sock_fp->f_data; + + (void)(so->so_proto->pr_usrreqs->pru_ready)(so, sfio->m, sfio->npages); + + /* XXXGL: curthread */ + fdrop(sfio->sock_fp, curthread); + free(sfio, M_TEMP); +} + static int -sendfile_readpage(vm_object_t obj, struct vnode *vp, int nd, - off_t off, int xfsize, int bsize, struct thread *td, vm_page_t *res) +sendfile_swapin(vm_object_t obj, struct sf_io *sfio, off_t off, off_t len, + int npages, int rhpages) { - vm_page_t m; - vm_pindex_t pindex; - ssize_t resid; - int error, readahead, rv; + vm_page_t *pa = sfio->pa; + int nios; - pindex = OFF_TO_IDX(off); + nios = 0; VM_OBJECT_WLOCK(obj); - m = vm_page_grab(obj, pindex, (vp != NULL ? VM_ALLOC_NOBUSY | - VM_ALLOC_IGN_SBUSY : 0) | VM_ALLOC_WIRED | VM_ALLOC_NORMAL); + for (int i = 0; i < npages; i++) + pa[i] = vm_page_grab(obj, OFF_TO_IDX(vmoff(i, off)), + VM_ALLOC_WIRED | VM_ALLOC_NORMAL); - /* - * Check if page is valid for what we need, otherwise initiate I/O. - * - * The non-zero nd argument prevents disk I/O, instead we - * return the caller what he specified in nd. In particular, - * if we already turned some pages into mbufs, nd == EAGAIN - * and the main function send them the pages before we come - * here again and block. - */ - if (m->valid != 0 && vm_page_is_valid(m, off & PAGE_MASK, xfsize)) { - if (vp == NULL) - vm_page_xunbusy(m); - VM_OBJECT_WUNLOCK(obj); - *res = m; - return (0); - } else if (nd != 0) { - if (vp == NULL) - vm_page_xunbusy(m); - error = nd; - goto free_page; - } + for (int i = 0; i < npages;) { + int j, a, count, rv; - /* - * Get the page from backing store. - */ - error = 0; - if (vp != NULL) { - VM_OBJECT_WUNLOCK(obj); - readahead = sfreadahead * MAXBSIZE; + if (vm_page_is_valid(pa[i], vmoff(i, off) & PAGE_MASK, + xfsize(i, npages, off, len))) { + vm_page_xunbusy(pa[i]); + i++; + continue; + } - /* - * Use vn_rdwr() instead of the pager interface for - * the vnode, to allow the read-ahead. - * - * XXXMAC: Because we don't have fp->f_cred here, we - * pass in NOCRED. This is probably wrong, but is - * consistent with our original implementation. - */ - error = vn_rdwr(UIO_READ, vp, NULL, readahead, trunc_page(off), - UIO_NOCOPY, IO_NODELOCKED | IO_VMIO | ((readahead / - bsize) << IO_SEQSHIFT), td->td_ucred, NOCRED, &resid, td); - SFSTAT_INC(sf_iocnt); - VM_OBJECT_WLOCK(obj); - } else { - if (vm_pager_has_page(obj, pindex, NULL, NULL)) { - rv = vm_pager_get_pages(obj, &m, 1, 0); - SFSTAT_INC(sf_iocnt); - m = vm_page_lookup(obj, pindex); - if (m == NULL) - error = EIO; - else if (rv != VM_PAGER_OK) { - vm_page_lock(m); - vm_page_free(m); - vm_page_unlock(m); - m = NULL; - error = EIO; + for (j = i + 1; j < npages; j++) + if (vm_page_is_valid(pa[j], vmoff(j, off) & PAGE_MASK, + xfsize(j, npages, off, len))) + break; + + while (!vm_pager_has_page(obj, OFF_TO_IDX(vmoff(i, off)), + NULL, &a) && i < j) { + pmap_zero_page(pa[i]); + pa[i]->valid = VM_PAGE_BITS_ALL; + pa[i]->dirty = 0; + vm_page_xunbusy(pa[i]); + i++; + } + if (i == j) + continue; + + count = min(a + 1, npages + rhpages - i); + for (j = npages; j < i + count; j++) { + pa[j] = vm_page_grab(obj, OFF_TO_IDX(vmoff(j, off)), + VM_ALLOC_NORMAL | VM_ALLOC_NOWAIT); + if (pa[j] == NULL) { + count = j - i; + break; } - } else { - pmap_zero_page(m); - m->valid = VM_PAGE_BITS_ALL; - m->dirty = 0; + if (pa[j]->valid) { + vm_page_xunbusy(pa[j]); + count = j - i; + break; + } } - if (m != NULL) - vm_page_xunbusy(m); + + refcount_acquire(&sfio->nios); + rv = vm_pager_get_pages_async(obj, pa + i, count, 0, + &sf_io_done, sfio); + + KASSERT(rv == VM_PAGER_OK, ("%s: pager fail obj %p page %p", + __func__, obj, pa[i])); + + SFSTAT_INC(sf_iocnt); + nios++; + + for (j = i; j < i + count && j < npages; j++) + KASSERT(pa[j] == vm_page_lookup(obj, + OFF_TO_IDX(vmoff(j, off))), + ("pa[j] %p lookup %p\n", pa[j], + vm_page_lookup(obj, OFF_TO_IDX(vmoff(j, off))))); + + i += count; } - if (error == 0) { - *res = m; - } else if (m != NULL) { -free_page: - vm_page_lock(m); - vm_page_unwire(m, PQ_INACTIVE); - /* - * See if anyone else might know about this page. If - * not and it is not valid, then free it. - */ - if (m->wire_count == 0 && m->valid == 0 && !vm_page_busied(m)) - vm_page_free(m); - vm_page_unlock(m); - } - KASSERT(error != 0 || (m->wire_count > 0 && - vm_page_is_valid(m, off & PAGE_MASK, xfsize)), - ("wrong page state m %p off %#jx xfsize %d", m, (uintmax_t)off, - xfsize)); VM_OBJECT_WUNLOCK(obj); - return (error); + + return (nios); } static int @@ -2833,41 +2927,26 @@ vn_sendfile(struct file *fp, int sockfd, struct ui struct vnode *vp; struct vm_object *obj; struct socket *so; - struct mbuf *m; + struct mbuf *m, *mh, *mhtail; struct sf_buf *sf; - struct vm_page *pg; struct shmfd *shmfd; struct vattr va; - off_t off, xfsize, fsbytes, sbytes, rem, obj_size; - int error, bsize, nd, hdrlen, mnw; + off_t off, sbytes, rem, obj_size; + int error, serror, bsize, hdrlen; - pg = NULL; obj = NULL; so = NULL; - m = NULL; - fsbytes = sbytes = 0; - hdrlen = mnw = 0; - rem = nbytes; - obj_size = 0; + m = mh = NULL; + sbytes = 0; error = sendfile_getobj(td, fp, &obj, &vp, &shmfd, &obj_size, &bsize); if (error != 0) return (error); - if (rem == 0) - rem = obj_size; error = kern_sendfile_getsock(td, sockfd, &sock_fp, &so); if (error != 0) goto out; - /* - * Do not wait on memory allocations but return ENOMEM for - * caller to retry later. - * XXX: Experimental. - */ - if (flags & SF_MNOWAIT) - mnw = 1; - #ifdef MAC error = mac_socket_check_send(td->td_ucred, so); if (error != 0) @@ -2875,31 +2954,27 @@ vn_sendfile(struct file *fp, int sockfd, struct ui #endif /* If headers are specified copy them into mbufs. */ - if (hdr_uio != NULL) { + if (hdr_uio != NULL && hdr_uio->uio_resid > 0) { hdr_uio->uio_td = td; hdr_uio->uio_rw = UIO_WRITE; - if (hdr_uio->uio_resid > 0) { - /* - * In FBSD < 5.0 the nbytes to send also included - * the header. If compat is specified subtract the - * header size from nbytes. - */ - if (kflags & SFK_COMPAT) { - if (nbytes > hdr_uio->uio_resid) - nbytes -= hdr_uio->uio_resid; - else - nbytes = 0; - } - m = m_uiotombuf(hdr_uio, (mnw ? M_NOWAIT : M_WAITOK), - 0, 0, 0); - if (m == NULL) { - error = mnw ? EAGAIN : ENOBUFS; - goto out; - } - hdrlen = m_length(m, NULL); + /* + * In FBSD < 5.0 the nbytes to send also included + * the header. If compat is specified subtract the + * header size from nbytes. + */ + if (kflags & SFK_COMPAT) { + if (nbytes > hdr_uio->uio_resid) + nbytes -= hdr_uio->uio_resid; + else + nbytes = 0; } - } + mh = m_uiotombuf(hdr_uio, M_WAITOK, 0, 0, 0); + hdrlen = m_length(mh, &mhtail); + } else + hdrlen = 0; + rem = nbytes ? omin(nbytes, obj_size - offset) : obj_size - offset; + /* * Protect against multiple writers to the socket. * @@ -2919,21 +2994,13 @@ vn_sendfile(struct file *fp, int sockfd, struct ui * The outer loop checks the state and available space of the socket * and takes care of the overall progress. */ - for (off = offset; ; ) { + for (off = offset; rem > 0; ) { + struct sf_io *sfio; + vm_page_t *pa; struct mbuf *mtail; - int loopbytes; - int space; - int done; + int nios, space, npages, rhpages; - if ((nbytes != 0 && nbytes == fsbytes) || - (nbytes == 0 && obj_size == fsbytes)) - break; - mtail = NULL; - loopbytes = 0; - space = 0; - done = 0; - /* * Check the socket state for ongoing connection, * no errors and space in socket buffer. @@ -3009,53 +3076,44 @@ retry_space: VOP_UNLOCK(vp, 0); goto done; } - obj_size = va.va_size; + if (va.va_size != obj_size) { + if (nbytes == 0) + rem += va.va_size - obj_size; + else if (offset + nbytes > va.va_size) + rem -= (offset + nbytes - va.va_size); + obj_size = va.va_size; + } } + if (space > rem) + space = rem; + + if (off & PAGE_MASK) + npages = 1 + howmany(space - + (PAGE_SIZE - (off & PAGE_MASK)), PAGE_SIZE); + else + npages = howmany(space, PAGE_SIZE); + + rhpages = SF_READAHEAD(flags) ? + SF_READAHEAD(flags) : sfreadahead; + rhpages = min(howmany(obj_size - (off & ~PAGE_MASK) - + (npages * PAGE_SIZE), PAGE_SIZE), rhpages); + + sfio = malloc(sizeof(struct sf_io) + + (rhpages + npages) * sizeof(vm_page_t), M_TEMP, M_WAITOK); + refcount_init(&sfio->nios, 1); + + nios = sendfile_swapin(obj, sfio, off, space, npages, rhpages); + /* * Loop and construct maximum sized mbuf chain to be bulk * dumped into socket buffer. */ - while (space > loopbytes) { - vm_offset_t pgoff; + pa = sfio->pa; + for (int i = 0; i < npages; i++) { struct mbuf *m0; /* - * Calculate the amount to transfer. - * Not to exceed a page, the EOF, - * or the passed in nbytes. - */ - pgoff = (vm_offset_t)(off & PAGE_MASK); - rem = obj_size - offset; - if (nbytes != 0) - rem = omin(rem, nbytes); - rem -= fsbytes + loopbytes; - xfsize = omin(PAGE_SIZE - pgoff, rem); - xfsize = omin(space - loopbytes, xfsize); - if (xfsize <= 0) { - done = 1; /* all data sent */ - break; - } - - /* - * Attempt to look up the page. Allocate - * if not found or wait and loop if busy. - */ - if (m != NULL) - nd = EAGAIN; /* send what we already got */ - else if ((flags & SF_NODISKIO) != 0) - nd = EBUSY; - else - nd = 0; - error = sendfile_readpage(obj, vp, nd, off, - xfsize, bsize, td, &pg); - if (error != 0) { - if (error == EAGAIN) - error = 0; /* not a real error */ - break; - } - - /* * Get a sendfile buf. When allocating the * first buffer for mbuf chain, we usually * wait as long as necessary, but this wait @@ -3064,56 +3122,60 @@ retry_space: * threads might exhaust the buffers and then * deadlock. */ - sf = sf_buf_alloc(pg, (mnw || m != NULL) ? SFB_NOWAIT : - SFB_CATCH); + sf = sf_buf_alloc(pa[i], + m != NULL ? SFB_NOWAIT : SFB_CATCH); if (sf == NULL) { SFSTAT_INC(sf_allocfail); - vm_page_lock(pg); - vm_page_unwire(pg, PQ_INACTIVE); - KASSERT(pg->object != NULL, - ("%s: object disappeared", __func__)); - vm_page_unlock(pg); + for (int j = i; j < npages; j++) { + vm_page_lock(pa[j]); + vm_page_unwire(pa[j], PQ_INACTIVE); + vm_page_unlock(pa[j]); + } if (m == NULL) - error = (mnw ? EAGAIN : EINTR); + error = ENOBUFS; + fixspace(npages, i, off, &space); break; } /* - * Get an mbuf and set it up as having - * external storage. + * Get an mbuf and set it up. + * + * SF_NOCACHE sets the page as being freed upon send. + * However, we ignore it for the last page in 'space', + * if the page is truncated, and we got more data to + * send (rem > space), or if we have readahead + * configured (rhpages > 0). */ - m0 = m_get((mnw ? M_NOWAIT : M_WAITOK), MT_DATA); - if (m0 == NULL) { - error = (mnw ? EAGAIN : ENOBUFS); - sf_ext_free(sf, NULL); - break; - } - /* - * Attach EXT_SFBUF external storage. - */ - m0->m_ext.ext_buf = (caddr_t )sf_buf_kva(sf); + m0 = m_get(M_WAITOK, MT_DATA); + m0->m_ext.ext_buf = (char *)sf_buf_kva(sf); m0->m_ext.ext_size = PAGE_SIZE; m0->m_ext.ext_arg1 = sf; m0->m_ext.ext_arg2 = sfs; - m0->m_ext.ext_type = EXT_SFBUF; + if ((flags & SF_NOCACHE) == 0 || + (i == npages - 1 && + ((off + space) & PAGE_MASK) && + (rem > space || rhpages > 0))) + m0->m_ext.ext_type = EXT_SFBUF; + else + m0->m_ext.ext_type = EXT_SFBUF_NOCACHE; m0->m_ext.ext_flags = 0; - m0->m_flags |= (M_EXT|M_RDONLY); - m0->m_data = (char *)sf_buf_kva(sf) + pgoff; - m0->m_len = xfsize; + m0->m_flags |= (M_EXT | M_RDONLY); + if (nios) + m0->m_flags |= M_NOTREADY; + m0->m_data = (char *)sf_buf_kva(sf) + + (vmoff(i, off) & PAGE_MASK); + m0->m_len = xfsize(i, npages, off, space); + if (i == 0) + sfio->m = m0; + /* Append to mbuf chain. */ if (mtail != NULL) mtail->m_next = m0; - else if (m != NULL) - m_last(m)->m_next = m0; else m = m0; mtail = m0; - /* Keep track of bits processed. */ - loopbytes += xfsize; - off += xfsize; - /* * XXX eventually this should be a sfsync * method call! @@ -3125,47 +3187,51 @@ retry_space: if (vp != NULL) VOP_UNLOCK(vp, 0); + /* Keep track of bytes processed. */ + off += space; + rem -= space; + + /* Prepend header, if any. */ + if (hdrlen) { + mhtail->m_next = m; + m = mh; + mh = NULL; + } + + if (error) { + free(sfio, M_TEMP); + goto done; + } + /* Add the buffer chain to the socket buffer. */ - if (m != NULL) { - int mlen, err; + KASSERT(m_length(m, NULL) == space + hdrlen, + ("%s: mlen %u space %d hdrlen %d", + __func__, m_length(m, NULL), space, hdrlen)); - mlen = m_length(m, NULL); - SOCKBUF_LOCK(&so->so_snd); - if (so->so_snd.sb_state & SBS_CANTSENDMORE) { - error = EPIPE; - SOCKBUF_UNLOCK(&so->so_snd); - goto done; - } - SOCKBUF_UNLOCK(&so->so_snd); - CURVNET_SET(so->so_vnet); - /* Avoid error aliasing. */ - err = (*so->so_proto->pr_usrreqs->pru_send) - (so, 0, m, NULL, NULL, td); - CURVNET_RESTORE(); - if (err == 0) { - /* - * We need two counters to get the - * file offset and nbytes to send - * right: - * - sbytes contains the total amount - * of bytes sent, including headers. - * - fsbytes contains the total amount - * of bytes sent from the file. - */ - sbytes += mlen; - fsbytes += mlen; - if (hdrlen) { - fsbytes -= hdrlen; - hdrlen = 0; - } - } else if (error == 0) - error = err; - m = NULL; /* pru_send always consumes */ + CURVNET_SET(so->so_vnet); + if (nios == 0) { + free(sfio, M_TEMP); + serror = (*so->so_proto->pr_usrreqs->pru_send) + (so, 0, m, NULL, NULL, td); + } else { + sfio->sock_fp = sock_fp; + sfio->npages = npages; + fhold(sock_fp); + serror = (*so->so_proto->pr_usrreqs->pru_send) + (so, PRUS_NOTREADY, m, NULL, NULL, td); + sf_io_done(sfio); } + CURVNET_RESTORE(); - /* Quit outer loop on error or when we're done. */ - if (done) - break; + if (serror == 0) { + sbytes += space + hdrlen; + if (hdrlen) + hdrlen = 0; + } else if (error == 0) + error = serror; + m = NULL; /* pru_send always consumes */ + + /* Quit outer loop on error. */ if (error != 0) goto done; } @@ -3200,6 +3266,8 @@ out: fdrop(sock_fp, td); if (m) m_freem(m); + if (mh) + m_freem(mh); if (error == ERESTART) error = EINTR; Index: sys/kern/uipc_debug.c =================================================================== --- sys/kern/uipc_debug.c (.../head) (revision 270879) +++ sys/kern/uipc_debug.c (.../projects/sendfile) (revision 270881) @@ -403,7 +403,8 @@ db_print_sockbuf(struct sockbuf *sb, const char *s db_printf("sb_sndptroff: %u\n", sb->sb_sndptroff); db_print_indent(indent); - db_printf("sb_cc: %u ", sb->sb_cc); + db_printf("sb_acc: %u ", sb->sb_acc); + db_printf("sb_ccc: %u ", sb->sb_ccc); db_printf("sb_hiwat: %u ", sb->sb_hiwat); db_printf("sb_mbcnt: %u ", sb->sb_mbcnt); db_printf("sb_mbmax: %u\n", sb->sb_mbmax); Index: sys/kern/uipc_mbuf.c =================================================================== --- sys/kern/uipc_mbuf.c (.../head) (revision 270879) +++ sys/kern/uipc_mbuf.c (.../projects/sendfile) (revision 270881) @@ -300,6 +300,9 @@ mb_free_ext(struct mbuf *m) case EXT_SFBUF: sf_ext_free(m->m_ext.ext_arg1, m->m_ext.ext_arg2); break; + case EXT_SFBUF_NOCACHE: + sf_ext_free_nocache(m->m_ext.ext_arg1, m->m_ext.ext_arg2); + break; default: KASSERT(m->m_ext.ext_cnt != NULL, ("%s: no refcounting pointer on %p", __func__, m)); @@ -366,6 +369,7 @@ mb_dupcl(struct mbuf *n, struct mbuf *m) switch (m->m_ext.ext_type) { case EXT_SFBUF: + case EXT_SFBUF_NOCACHE: sf_ext_ref(m->m_ext.ext_arg1, m->m_ext.ext_arg2); break; default: @@ -388,7 +392,7 @@ mb_dupcl(struct mbuf *n, struct mbuf *m) * cleaned too. */ void -m_demote(struct mbuf *m0, int all) +m_demote(struct mbuf *m0, int all, int flags) { struct mbuf *m; @@ -404,7 +408,7 @@ void m_freem(m->m_nextpkt); m->m_nextpkt = NULL; } - m->m_flags = m->m_flags & (M_EXT|M_RDONLY|M_NOFREE); + m->m_flags = m->m_flags & (M_EXT | M_RDONLY | M_NOFREE | flags); } } Index: sys/kern/sys_socket.c =================================================================== --- sys/kern/sys_socket.c (.../head) (revision 270879) +++ sys/kern/sys_socket.c (.../projects/sendfile) (revision 270881) @@ -165,20 +165,17 @@ soo_ioctl(struct file *fp, u_long cmd, void *data, case FIONREAD: /* Unlocked read. */ - *(int *)data = so->so_rcv.sb_cc; + *(int *)data = sbavail(&so->so_rcv); break; case FIONWRITE: /* Unlocked read. */ - *(int *)data = so->so_snd.sb_cc; + *(int *)data = sbavail(&so->so_snd); break; case FIONSPACE: - if ((so->so_snd.sb_hiwat < so->so_snd.sb_cc) || - (so->so_snd.sb_mbmax < so->so_snd.sb_mbcnt)) - *(int *)data = 0; - else - *(int *)data = sbspace(&so->so_snd); + /* Unlocked read. */ + *(int *)data = sbspace(&so->so_snd); break; case FIOSETOWN: @@ -244,6 +241,7 @@ soo_stat(struct file *fp, struct stat *ub, struct struct thread *td) { struct socket *so = fp->f_data; + struct sockbuf *sb; #ifdef MAC int error; #endif @@ -259,15 +257,18 @@ soo_stat(struct file *fp, struct stat *ub, struct * If SBS_CANTRCVMORE is set, but there's still data left in the * receive buffer, the socket is still readable. */ - SOCKBUF_LOCK(&so->so_rcv); - if ((so->so_rcv.sb_state & SBS_CANTRCVMORE) == 0 || - so->so_rcv.sb_cc != 0) + sb = &so->so_rcv; + SOCKBUF_LOCK(sb); + if ((sb->sb_state & SBS_CANTRCVMORE) == 0 || sbavail(sb)) ub->st_mode |= S_IRUSR | S_IRGRP | S_IROTH; - ub->st_size = so->so_rcv.sb_cc - so->so_rcv.sb_ctl; - SOCKBUF_UNLOCK(&so->so_rcv); - /* Unlocked read. */ - if ((so->so_snd.sb_state & SBS_CANTSENDMORE) == 0) + ub->st_size = sbavail(sb) - sb->sb_ctl; + SOCKBUF_UNLOCK(sb); + + sb = &so->so_snd; + SOCKBUF_LOCK(sb); + if ((sb->sb_state & SBS_CANTSENDMORE) == 0) ub->st_mode |= S_IWUSR | S_IWGRP | S_IWOTH; + SOCKBUF_UNLOCK(sb); ub->st_uid = so->so_cred->cr_uid; ub->st_gid = so->so_cred->cr_gid; return (*so->so_proto->pr_usrreqs->pru_sense)(so, ub); Index: sys/kern/uipc_usrreq.c =================================================================== --- sys/kern/uipc_usrreq.c (.../head) (revision 270879) +++ sys/kern/uipc_usrreq.c (.../projects/sendfile) (revision 270881) @@ -793,11 +793,10 @@ uipc_rcvd(struct socket *so, int flags) u_int mbcnt, sbcc; unp = sotounpcb(so); - KASSERT(unp != NULL, ("uipc_rcvd: unp == NULL")); + KASSERT(unp != NULL, ("%s: unp == NULL", __func__)); + KASSERT(so->so_type == SOCK_STREAM || so->so_type == SOCK_SEQPACKET, + ("%s: socktype %d", __func__, so->so_type)); - if (so->so_type != SOCK_STREAM && so->so_type != SOCK_SEQPACKET) - panic("uipc_rcvd socktype %d", so->so_type); - /* * Adjust backpressure on sender and wakeup any waiting to write. * @@ -810,7 +809,7 @@ uipc_rcvd(struct socket *so, int flags) */ SOCKBUF_LOCK(&so->so_rcv); mbcnt = so->so_rcv.sb_mbcnt; - sbcc = so->so_rcv.sb_cc; + sbcc = sbavail(&so->so_rcv); SOCKBUF_UNLOCK(&so->so_rcv); /* * There is a benign race condition at this point. If we're planning to @@ -846,7 +845,10 @@ uipc_send(struct socket *so, int flags, struct mbu int error = 0; unp = sotounpcb(so); - KASSERT(unp != NULL, ("uipc_send: unp == NULL")); + KASSERT(unp != NULL, ("%s: unp == NULL", __func__)); + KASSERT(so->so_type == SOCK_STREAM || so->so_type == SOCK_DGRAM || + so->so_type == SOCK_SEQPACKET, + ("%s: socktype %d", __func__, so->so_type)); if (flags & PRUS_OOB) { error = EOPNOTSUPP; @@ -997,8 +999,11 @@ uipc_send(struct socket *so, int flags, struct mbu } mbcnt = so2->so_rcv.sb_mbcnt; - sbcc = so2->so_rcv.sb_cc; - sorwakeup_locked(so2); + sbcc = sbavail(&so2->so_rcv); + if (sbcc) + sorwakeup_locked(so2); + else + SOCKBUF_UNLOCK(&so2->so_rcv); /* * The PCB lock on unp2 protects the SB_STOP flag. Without it, @@ -1014,9 +1019,6 @@ uipc_send(struct socket *so, int flags, struct mbu UNP_PCB_UNLOCK(unp2); m = NULL; break; - - default: - panic("uipc_send unknown socktype"); } /* @@ -1046,6 +1048,35 @@ release: } static int +uipc_ready(struct socket *so, struct mbuf *m, int count) +{ + struct unpcb *unp, *unp2; + struct socket *so2; + int error; + + unp = sotounpcb(so); + + UNP_LINK_RLOCK(); + unp2 = unp->unp_conn; + UNP_PCB_LOCK(unp2); + so2 = unp2->unp_socket; + + SOCKBUF_LOCK(&so2->so_rcv); + if (so2->so_rcv.sb_state & SBS_CANTRCVMORE) { + SOCKBUF_UNLOCK(&so2->so_rcv); + error = ENOTCONN; + } else if ((error = sbready(&so2->so_rcv, m, count)) == 0) + sorwakeup_locked(so2); + else + SOCKBUF_UNLOCK(&so2->so_rcv); + + UNP_PCB_UNLOCK(unp2); + UNP_LINK_RUNLOCK(); + + return (error); +} + +static int uipc_sense(struct socket *so, struct stat *sb) { struct unpcb *unp; @@ -1115,6 +1146,7 @@ static struct pr_usrreqs uipc_usrreqs_dgram = { .pru_peeraddr = uipc_peeraddr, .pru_rcvd = uipc_rcvd, .pru_send = uipc_send, + .pru_ready = uipc_ready, .pru_sense = uipc_sense, .pru_shutdown = uipc_shutdown, .pru_sockaddr = uipc_sockaddr, @@ -1137,6 +1169,7 @@ static struct pr_usrreqs uipc_usrreqs_seqpacket = .pru_peeraddr = uipc_peeraddr, .pru_rcvd = uipc_rcvd, .pru_send = uipc_send, + .pru_ready = uipc_ready, .pru_sense = uipc_sense, .pru_shutdown = uipc_shutdown, .pru_sockaddr = uipc_sockaddr, @@ -1159,6 +1192,7 @@ static struct pr_usrreqs uipc_usrreqs_stream = { .pru_peeraddr = uipc_peeraddr, .pru_rcvd = uipc_rcvd, .pru_send = uipc_send, + .pru_ready = uipc_ready, .pru_sense = uipc_sense, .pru_shutdown = uipc_shutdown, .pru_sockaddr = uipc_sockaddr, Index: sys/kern/vfs_default.c =================================================================== --- sys/kern/vfs_default.c (.../head) (revision 270879) +++ sys/kern/vfs_default.c (.../projects/sendfile) (revision 270881) @@ -111,6 +111,7 @@ struct vop_vector default_vnodeops = { .vop_close = VOP_NULL, .vop_fsync = VOP_NULL, .vop_getpages = vop_stdgetpages, + .vop_getpages_async = vop_stdgetpages_async, .vop_getwritemount = vop_stdgetwritemount, .vop_inactive = VOP_NULL, .vop_ioctl = VOP_ENOTTY, @@ -726,10 +727,19 @@ vop_stdgetpages(ap) { return vnode_pager_generic_getpages(ap->a_vp, ap->a_m, - ap->a_count, ap->a_reqpage); + ap->a_count, ap->a_reqpage, NULL, NULL); } +/* XXX Needs good comment and a manpage. */ int +vop_stdgetpages_async(struct vop_getpages_async_args *ap) +{ + + return vnode_pager_generic_getpages(ap->a_vp, ap->a_m, + ap->a_count, ap->a_reqpage, ap->a_vop_getpages_iodone, ap->a_arg); +} + +int vop_stdkqfilter(struct vop_kqfilter_args *ap) { return vfs_kqfilter(ap); Index: sys/kern/uipc_socket.c =================================================================== --- sys/kern/uipc_socket.c (.../head) (revision 270879) +++ sys/kern/uipc_socket.c (.../projects/sendfile) (revision 270881) @@ -1526,12 +1526,12 @@ restart: * 2. MSG_DONTWAIT is not set */ if (m == NULL || (((flags & MSG_DONTWAIT) == 0 && - so->so_rcv.sb_cc < uio->uio_resid) && - so->so_rcv.sb_cc < so->so_rcv.sb_lowat && + sbavail(&so->so_rcv) < uio->uio_resid) && + sbavail(&so->so_rcv) < so->so_rcv.sb_lowat && m->m_nextpkt == NULL && (pr->pr_flags & PR_ATOMIC) == 0)) { - KASSERT(m != NULL || !so->so_rcv.sb_cc, - ("receive: m == %p so->so_rcv.sb_cc == %u", - m, so->so_rcv.sb_cc)); + KASSERT(m != NULL || !sbavail(&so->so_rcv), + ("receive: m == %p sbavail == %u", + m, sbavail(&so->so_rcv))); if (so->so_error) { if (m != NULL) goto dontblock; @@ -1710,7 +1710,8 @@ dontblock: */ moff = 0; offset = 0; - while (m != NULL && uio->uio_resid > 0 && error == 0) { + while (m != NULL && !(m->m_flags & M_NOTAVAIL) && uio->uio_resid > 0 + && error == 0) { /* * If the type of mbuf has changed since the last mbuf * examined ('type'), end the receive operation. @@ -1813,9 +1814,7 @@ dontblock: SOCKBUF_LOCK(&so->so_rcv); } } - m->m_data += len; - m->m_len -= len; - so->so_rcv.sb_cc -= len; + sbmtrim(&so->so_rcv, m, len); } } SOCKBUF_LOCK_ASSERT(&so->so_rcv); @@ -1980,7 +1979,7 @@ restart: /* Abort if socket has reported problems. */ if (so->so_error) { - if (sb->sb_cc > 0) + if (sbavail(sb) > 0) goto deliver; if (oresid > uio->uio_resid) goto out; @@ -1992,7 +1991,7 @@ restart: /* Door is closed. Deliver what is left, if any. */ if (sb->sb_state & SBS_CANTRCVMORE) { - if (sb->sb_cc > 0) + if (sbavail(sb) > 0) goto deliver; else goto out; @@ -1999,7 +1998,7 @@ restart: } /* Socket buffer is empty and we shall not block. */ - if (sb->sb_cc == 0 && + if (sbavail(sb) == 0 && ((so->so_state & SS_NBIO) || (flags & (MSG_DONTWAIT|MSG_NBIO)))) { error = EAGAIN; goto out; @@ -2006,18 +2005,18 @@ restart: } /* Socket buffer got some data that we shall deliver now. */ - if (sb->sb_cc > 0 && !(flags & MSG_WAITALL) && + if (sbavail(sb) > 0 && !(flags & MSG_WAITALL) && ((sb->sb_flags & SS_NBIO) || (flags & (MSG_DONTWAIT|MSG_NBIO)) || - sb->sb_cc >= sb->sb_lowat || - sb->sb_cc >= uio->uio_resid || - sb->sb_cc >= sb->sb_hiwat) ) { + sbavail(sb) >= sb->sb_lowat || + sbavail(sb) >= uio->uio_resid || + sbavail(sb) >= sb->sb_hiwat) ) { goto deliver; } /* On MSG_WAITALL we must wait until all data or error arrives. */ if ((flags & MSG_WAITALL) && - (sb->sb_cc >= uio->uio_resid || sb->sb_cc >= sb->sb_hiwat)) + (sbavail(sb) >= uio->uio_resid || sbavail(sb) >= sb->sb_hiwat)) goto deliver; /* @@ -2031,7 +2030,7 @@ restart: deliver: SOCKBUF_LOCK_ASSERT(&so->so_rcv); - KASSERT(sb->sb_cc > 0, ("%s: sockbuf empty", __func__)); + KASSERT(sbavail(sb) > 0, ("%s: sockbuf empty", __func__)); KASSERT(sb->sb_mb != NULL, ("%s: sb_mb == NULL", __func__)); /* Statistics. */ @@ -2039,7 +2038,7 @@ deliver: uio->uio_td->td_ru.ru_msgrcv++; /* Fill uio until full or current end of socket buffer is reached. */ - len = min(uio->uio_resid, sb->sb_cc); + len = min(uio->uio_resid, sbavail(sb)); if (mp0 != NULL) { /* Dequeue as many mbufs as possible. */ if (!(flags & MSG_PEEK) && len >= sb->sb_mb->m_len) { @@ -2050,6 +2049,8 @@ deliver: for (m = sb->sb_mb; m != NULL && m->m_len <= len; m = m->m_next) { + KASSERT(!(m->m_flags & M_NOTAVAIL), + ("%s: m %p not available", __func__, m)); len -= m->m_len; uio->uio_resid -= m->m_len; sbfree(sb, m); @@ -2174,9 +2175,9 @@ soreceive_dgram(struct socket *so, struct sockaddr */ SOCKBUF_LOCK(&so->so_rcv); while ((m = so->so_rcv.sb_mb) == NULL) { - KASSERT(so->so_rcv.sb_cc == 0, - ("soreceive_dgram: sb_mb NULL but sb_cc %u", - so->so_rcv.sb_cc)); + KASSERT(sbavail(&so->so_rcv) == 0, + ("soreceive_dgram: sb_mb NULL but sbavail %u", + sbavail(&so->so_rcv))); if (so->so_error) { error = so->so_error; so->so_error = 0; @@ -3178,6 +3179,13 @@ pru_send_notsupp(struct socket *so, int flags, str return EOPNOTSUPP; } +int +pru_ready_notsupp(struct socket *so, struct mbuf *m, int count) +{ + + return (EOPNOTSUPP); +} + /* * This isn't really a ``null'' operation, but it's the default one and * doesn't do anything destructive. @@ -3249,7 +3257,7 @@ filt_soread(struct knote *kn, long hint) so = kn->kn_fp->f_data; SOCKBUF_LOCK_ASSERT(&so->so_rcv); - kn->kn_data = so->so_rcv.sb_cc - so->so_rcv.sb_ctl; + kn->kn_data = sbavail(&so->so_rcv) - so->so_rcv.sb_ctl; if (so->so_rcv.sb_state & SBS_CANTRCVMORE) { kn->kn_flags |= EV_EOF; kn->kn_fflags = so->so_error; @@ -3261,7 +3269,7 @@ filt_soread(struct knote *kn, long hint) if (kn->kn_data >= kn->kn_sdata) return 1; } else { - if (so->so_rcv.sb_cc >= so->so_rcv.sb_lowat) + if (sbavail(&so->so_rcv) >= so->so_rcv.sb_lowat) return 1; } @@ -3456,7 +3464,7 @@ soisdisconnected(struct socket *so) sorwakeup_locked(so); SOCKBUF_LOCK(&so->so_snd); so->so_snd.sb_state |= SBS_CANTSENDMORE; - sbdrop_locked(&so->so_snd, so->so_snd.sb_cc); + sbdrop_locked(&so->so_snd, sbused(&so->so_snd)); sowwakeup_locked(so); wakeup(&so->so_timeo); } Index: sys/tools/vnode_if.awk =================================================================== --- sys/tools/vnode_if.awk (.../head) (revision 270879) +++ sys/tools/vnode_if.awk (.../projects/sendfile) (revision 270881) @@ -254,16 +254,26 @@ while ((getline < srcfile) > 0) { if (sub(/;$/, "") < 1) die("Missing end-of-line ; in \"%s\".", $0); - # pick off variable name - if ((argp = match($0, /[A-Za-z0-9_]+$/)) < 1) - die("Missing var name \"a_foo\" in \"%s\".", $0); - args[numargs] = substr($0, argp); - $0 = substr($0, 1, argp - 1); - - # what is left must be type - # remove trailing space (if any) - sub(/ $/, ""); - types[numargs] = $0; + # pick off argument name + if ((argp = match($0, /[A-Za-z0-9_]+$/)) > 0) { + args[numargs] = substr($0, argp); + $0 = substr($0, 1, argp - 1); + sub(/ $/, ""); + delete fargs[numargs]; + types[numargs] = $0; + } else { # try to parse a function pointer argument + if ((argp = match($0, + /\(\*[A-Za-z0-9_]+\)\([A-Za-z0-9_*, ]+\)$/)) < 1) + die("Missing var name \"a_foo\" in \"%s\".", + $0); + args[numargs] = substr($0, argp + 2); + sub(/\).+/, "", args[numargs]); + fargs[numargs] = substr($0, argp); + sub(/^\([^)]+\)/, "", fargs[numargs]); + $0 = substr($0, 1, argp - 1); + sub(/ $/, ""); + types[numargs] = $0; + } } if (numargs > 4) ctrargs = 4; @@ -286,8 +296,13 @@ while ((getline < srcfile) > 0) { if (hfile) { # Print out the vop_F_args structure. printh("struct "name"_args {\n\tstruct vop_generic_args a_gen;"); - for (i = 0; i < numargs; ++i) - printh("\t" t_spc(types[i]) "a_" args[i] ";"); + for (i = 0; i < numargs; ++i) { + if (fargs[i]) { + printh("\t" t_spc(types[i]) "(*a_" args[i] \ + ")" fargs[i] ";"); + } else + printh("\t" t_spc(types[i]) "a_" args[i] ";"); + } printh("};"); printh(""); @@ -301,8 +316,14 @@ while ((getline < srcfile) > 0) { printh(""); printh("static __inline int " uname "("); for (i = 0; i < numargs; ++i) { - printh("\t" t_spc(types[i]) args[i] \ - (i < numargs - 1 ? "," : ")")); + if (fargs[i]) { + printh("\t" t_spc(types[i]) "(*" args[i] \ + ")" fargs[i] \ + (i < numargs - 1 ? "," : ")")); + } else { + printh("\t" t_spc(types[i]) args[i] \ + (i < numargs - 1 ? "," : ")")); + } } printh("{"); printh("\tstruct " name "_args a;"); Index: sys/netinet/sctp_var.h =================================================================== --- sys/netinet/sctp_var.h (.../head) (revision 270879) +++ sys/netinet/sctp_var.h (.../projects/sendfile) (revision 270881) @@ -82,9 +82,9 @@ extern struct pr_usrreqs sctp_usrreqs; #define sctp_maxspace(sb) (max((sb)->sb_hiwat,SCTP_MINIMAL_RWND)) -#define sctp_sbspace(asoc, sb) ((long) ((sctp_maxspace(sb) > (asoc)->sb_cc) ? (sctp_maxspace(sb) - (asoc)->sb_cc) : 0)) +#define sctp_sbspace(asoc, sb) ((long) ((sctp_maxspace(sb) > (asoc)->sb_ccc) ? (sctp_maxspace(sb) - (asoc)->sb_ccc) : 0)) -#define sctp_sbspace_failedmsgs(sb) ((long) ((sctp_maxspace(sb) > (sb)->sb_cc) ? (sctp_maxspace(sb) - (sb)->sb_cc) : 0)) +#define sctp_sbspace_failedmsgs(sb) ((long) ((sctp_maxspace(sb) > (sb)->sb_ccc) ? (sctp_maxspace(sb) - (sb)->sb_ccc) : 0)) #define sctp_sbspace_sub(a,b) ((a > b) ? (a - b) : 0) @@ -195,10 +195,10 @@ extern struct pr_usrreqs sctp_usrreqs; } #define sctp_sbfree(ctl, stcb, sb, m) { \ - SCTP_SAVE_ATOMIC_DECREMENT(&(sb)->sb_cc, SCTP_BUF_LEN((m))); \ + SCTP_SAVE_ATOMIC_DECREMENT(&(sb)->sb_ccc, SCTP_BUF_LEN((m))); \ SCTP_SAVE_ATOMIC_DECREMENT(&(sb)->sb_mbcnt, MSIZE); \ if (((ctl)->do_not_ref_stcb == 0) && stcb) {\ - SCTP_SAVE_ATOMIC_DECREMENT(&(stcb)->asoc.sb_cc, SCTP_BUF_LEN((m))); \ + SCTP_SAVE_ATOMIC_DECREMENT(&(stcb)->asoc.sb_ccc, SCTP_BUF_LEN((m))); \ SCTP_SAVE_ATOMIC_DECREMENT(&(stcb)->asoc.my_rwnd_control_len, MSIZE); \ } \ if (SCTP_BUF_TYPE(m) != MT_DATA && SCTP_BUF_TYPE(m) != MT_HEADER && \ @@ -207,10 +207,10 @@ extern struct pr_usrreqs sctp_usrreqs; } #define sctp_sballoc(stcb, sb, m) { \ - atomic_add_int(&(sb)->sb_cc,SCTP_BUF_LEN((m))); \ + atomic_add_int(&(sb)->sb_ccc,SCTP_BUF_LEN((m))); \ atomic_add_int(&(sb)->sb_mbcnt, MSIZE); \ if (stcb) { \ - atomic_add_int(&(stcb)->asoc.sb_cc,SCTP_BUF_LEN((m))); \ + atomic_add_int(&(stcb)->asoc.sb_ccc,SCTP_BUF_LEN((m))); \ atomic_add_int(&(stcb)->asoc.my_rwnd_control_len, MSIZE); \ } \ if (SCTP_BUF_TYPE(m) != MT_DATA && SCTP_BUF_TYPE(m) != MT_HEADER && \ Index: sys/netinet/tcp_usrreq.c =================================================================== --- sys/netinet/tcp_usrreq.c (.../head) (revision 270879) +++ sys/netinet/tcp_usrreq.c (.../projects/sendfile) (revision 270881) @@ -826,7 +826,7 @@ tcp_usr_send(struct socket *so, int flags, struct m_freem(control); /* empty control, just free it */ } if (!(flags & PRUS_OOB)) { - sbappendstream(&so->so_snd, m); + sbappendstream(&so->so_snd, m, flags); if (nam && tp->t_state < TCPS_SYN_SENT) { /* * Do implied connect if not yet connected, @@ -858,7 +858,8 @@ tcp_usr_send(struct socket *so, int flags, struct socantsendmore(so); tcp_usrclosed(tp); } - if (!(inp->inp_flags & INP_DROPPED)) { + if (!(inp->inp_flags & INP_DROPPED) && + !(flags & PRUS_NOTREADY)) { if (flags & PRUS_MORETOCOME) tp->t_flags |= TF_MORETOCOME; error = tcp_output(tp); @@ -884,7 +885,7 @@ tcp_usr_send(struct socket *so, int flags, struct * of data past the urgent section. * Otherwise, snd_up should be one lower. */ - sbappendstream_locked(&so->so_snd, m); + sbappendstream_locked(&so->so_snd, m, flags); SOCKBUF_UNLOCK(&so->so_snd); if (nam && tp->t_state < TCPS_SYN_SENT) { /* @@ -908,10 +909,12 @@ tcp_usr_send(struct socket *so, int flags, struct tp->snd_wnd = TTCP_CLIENT_SND_WND; tcp_mss(tp, -1); } - tp->snd_up = tp->snd_una + so->so_snd.sb_cc; - tp->t_flags |= TF_FORCEDATA; - error = tcp_output(tp); - tp->t_flags &= ~TF_FORCEDATA; + tp->snd_up = tp->snd_una + sbavail(&so->so_snd); + if (!(flags & PRUS_NOTREADY)) { + tp->t_flags |= TF_FORCEDATA; + error = tcp_output(tp); + tp->t_flags &= ~TF_FORCEDATA; + } } out: TCPDEBUG2((flags & PRUS_OOB) ? PRU_SENDOOB : @@ -922,6 +925,38 @@ out: return (error); } +static int +tcp_usr_ready(struct socket *so, struct mbuf *m, int count) +{ + struct inpcb *inp; + struct tcpcb *tp; + int error; + + inp = sotoinpcb(so); + INP_WLOCK(inp); + if (inp->inp_flags & (INP_TIMEWAIT | INP_DROPPED)) { + INP_WUNLOCK(inp); + return (ECONNRESET); + } + tp = intotcpcb(inp); + + SOCKBUF_LOCK(&so->so_snd); + if (so->so_snd.sb_state & SBS_CANTSENDMORE) { + SOCKBUF_UNLOCK(&so->so_snd); + error = ENOTCONN; + } else if (sbready(&so->so_snd, m, count) == 0) { + SOCKBUF_UNLOCK(&so->so_snd); + error = tcp_output(tp); + } else { + SOCKBUF_UNLOCK(&so->so_snd); + error = EINPROGRESS; + } + + INP_WUNLOCK(inp); + + return (error); +} + /* * Abort the TCP. Drop the connection abruptly. */ @@ -1056,6 +1091,7 @@ struct pr_usrreqs tcp_usrreqs = { .pru_rcvd = tcp_usr_rcvd, .pru_rcvoob = tcp_usr_rcvoob, .pru_send = tcp_usr_send, + .pru_ready = tcp_usr_ready, .pru_shutdown = tcp_usr_shutdown, .pru_sockaddr = in_getsockaddr, .pru_sosetlabel = in_pcbsosetlabel, Index: sys/netinet/siftr.c =================================================================== --- sys/netinet/siftr.c (.../head) (revision 270879) +++ sys/netinet/siftr.c (.../projects/sendfile) (revision 270881) @@ -781,9 +781,9 @@ siftr_siftdata(struct pkt_node *pn, struct inpcb * pn->flags = tp->t_flags; pn->rxt_length = tp->t_rxtcur; pn->snd_buf_hiwater = inp->inp_socket->so_snd.sb_hiwat; - pn->snd_buf_cc = inp->inp_socket->so_snd.sb_cc; + pn->snd_buf_cc = sbused(&inp->inp_socket->so_snd); pn->rcv_buf_hiwater = inp->inp_socket->so_rcv.sb_hiwat; - pn->rcv_buf_cc = inp->inp_socket->so_rcv.sb_cc; + pn->rcv_buf_cc = sbused(&inp->inp_socket->so_rcv); pn->sent_inflight_bytes = tp->snd_max - tp->snd_una; pn->t_segqlen = tp->t_segqlen; Index: sys/netinet/sctp_os_bsd.h =================================================================== --- sys/netinet/sctp_os_bsd.h (.../head) (revision 270879) +++ sys/netinet/sctp_os_bsd.h (.../projects/sendfile) (revision 270881) @@ -405,7 +405,7 @@ typedef struct callout sctp_os_timer_t; #define SCTP_SOWAKEUP(so) wakeup(&(so)->so_timeo) /* clear the socket buffer state */ #define SCTP_SB_CLEAR(sb) \ - (sb).sb_cc = 0; \ + (sb).sb_ccc = 0; \ (sb).sb_mb = NULL; \ (sb).sb_mbcnt = 0; Index: sys/netinet/tcp_reass.c =================================================================== --- sys/netinet/tcp_reass.c (.../head) (revision 270879) +++ sys/netinet/tcp_reass.c (.../projects/sendfile) (revision 270881) @@ -248,7 +248,7 @@ present: m_freem(mq); else { mq->m_nextpkt = NULL; - sbappendstream_locked(&so->so_rcv, mq); + sbappendstream_locked(&so->so_rcv, mq, 0); wakeup = 1; } } Index: sys/netinet/sctp_indata.c =================================================================== --- sys/netinet/sctp_indata.c (.../head) (revision 270879) +++ sys/netinet/sctp_indata.c (.../projects/sendfile) (revision 270881) @@ -70,7 +70,7 @@ sctp_calc_rwnd(struct sctp_tcb *stcb, struct sctp_ /* * This is really set wrong with respect to a 1-2-m socket. Since - * the sb_cc is the count that everyone as put up. When we re-write + * the sb_ccc is the count that everyone as put up. When we re-write * sctp_soreceive then we will fix this so that ONLY this * associations data is taken into account. */ @@ -77,7 +77,7 @@ sctp_calc_rwnd(struct sctp_tcb *stcb, struct sctp_ if (stcb->sctp_socket == NULL) return (calc); - if (stcb->asoc.sb_cc == 0 && + if (stcb->asoc.sb_ccc == 0 && asoc->size_on_reasm_queue == 0 && asoc->size_on_all_streams == 0) { /* Full rwnd granted */ @@ -1363,7 +1363,7 @@ sctp_process_a_data_chunk(struct sctp_tcb *stcb, s * When we have NO room in the rwnd we check to make sure * the reader is doing its job... */ - if (stcb->sctp_socket->so_rcv.sb_cc) { + if (stcb->sctp_socket->so_rcv.sb_ccc) { /* some to read, wake-up */ #if defined(__APPLE__) || defined(SCTP_SO_LOCK_TESTING) struct socket *so; Index: sys/netinet/accf_http.c =================================================================== --- sys/netinet/accf_http.c (.../head) (revision 270879) +++ sys/netinet/accf_http.c (.../projects/sendfile) (revision 270881) @@ -92,7 +92,7 @@ sbfull(struct sockbuf *sb) "mbcnt(%ld) >= mbmax(%ld): %d", sb->sb_cc, sb->sb_hiwat, sb->sb_cc >= sb->sb_hiwat, sb->sb_mbcnt, sb->sb_mbmax, sb->sb_mbcnt >= sb->sb_mbmax); - return (sb->sb_cc >= sb->sb_hiwat || sb->sb_mbcnt >= sb->sb_mbmax); + return (sbused(sb) >= sb->sb_hiwat || sb->sb_mbcnt >= sb->sb_mbmax); } /* @@ -162,13 +162,14 @@ static int sohashttpget(struct socket *so, void *arg, int waitflag) { - if ((so->so_rcv.sb_state & SBS_CANTRCVMORE) == 0 && !sbfull(&so->so_rcv)) { + if ((so->so_rcv.sb_state & SBS_CANTRCVMORE) == 0 && + !sbfull(&so->so_rcv)) { struct mbuf *m; char *cmp; int cmplen, cc; m = so->so_rcv.sb_mb; - cc = so->so_rcv.sb_cc - 1; + cc = sbavail(&so->so_rcv) - 1; if (cc < 1) return (SU_OK); switch (*mtod(m, char *)) { @@ -215,7 +216,7 @@ soparsehttpvers(struct socket *so, void *arg, int goto fallout; m = so->so_rcv.sb_mb; - cc = so->so_rcv.sb_cc; + cc = sbavail(&so->so_rcv); inspaces = spaces = 0; for (m = so->so_rcv.sb_mb; m; m = n) { n = m->m_nextpkt; @@ -304,7 +305,7 @@ soishttpconnected(struct socket *so, void *arg, in * have NCHRS left */ copied = 0; - ccleft = so->so_rcv.sb_cc; + ccleft = sbavail(&so->so_rcv); if (ccleft < NCHRS) goto readmore; a = b = c = '\0'; Index: sys/netinet/accf_dns.c =================================================================== --- sys/netinet/accf_dns.c (.../head) (revision 270879) +++ sys/netinet/accf_dns.c (.../projects/sendfile) (revision 270881) @@ -75,7 +75,7 @@ sohasdns(struct socket *so, void *arg, int waitfla struct sockbuf *sb = &so->so_rcv; /* If the socket is full, we're ready. */ - if (sb->sb_cc >= sb->sb_hiwat || sb->sb_mbcnt >= sb->sb_mbmax) + if (sbused(sb) >= sb->sb_hiwat || sb->sb_mbcnt >= sb->sb_mbmax) goto ready; /* Check to see if we have a request. */ @@ -115,7 +115,7 @@ skippacket(struct sockbuf *sb) { unsigned long packlen; struct packet q, *p = &q; - if (sb->sb_cc < 2) + if (sbavail(sb) < 2) return DNS_WAIT; q.m = sb->sb_mb; @@ -122,7 +122,7 @@ skippacket(struct sockbuf *sb) { q.n = q.m->m_nextpkt; q.moff = 0; q.offset = 0; - q.len = sb->sb_cc; + q.len = sbavail(sb); GET16(p, packlen); if (packlen + 2 > q.len) Index: sys/netinet/sctp_structs.h =================================================================== --- sys/netinet/sctp_structs.h (.../head) (revision 270879) +++ sys/netinet/sctp_structs.h (.../projects/sendfile) (revision 270881) @@ -990,7 +990,7 @@ struct sctp_association { uint32_t total_output_queue_size; - uint32_t sb_cc; /* shadow of sb_cc */ + uint32_t sb_ccc; /* shadow of sb_ccc */ uint32_t sb_send_resv; /* amount reserved on a send */ uint32_t my_rwnd_control_len; /* shadow of sb_mbcnt used for rwnd * control */ Index: sys/netinet/tcp_output.c =================================================================== --- sys/netinet/tcp_output.c (.../head) (revision 270879) +++ sys/netinet/tcp_output.c (.../projects/sendfile) (revision 270881) @@ -322,7 +322,7 @@ after_sack_rexmit: * to send then the probe will be the FIN * itself. */ - if (off < so->so_snd.sb_cc) + if (off < sbavail(&so->so_snd)) flags &= ~TH_FIN; sendwin = 1; } else { @@ -348,7 +348,8 @@ after_sack_rexmit: */ if (sack_rxmit == 0) { if (sack_bytes_rxmt == 0) - len = ((long)ulmin(so->so_snd.sb_cc, sendwin) - off); + len = ((long)ulmin(sbavail(&so->so_snd), sendwin) - + off); else { long cwin; @@ -357,8 +358,8 @@ after_sack_rexmit: * sending new data, having retransmitted all the * data possible in the scoreboard. */ - len = ((long)ulmin(so->so_snd.sb_cc, tp->snd_wnd) - - off); + len = ((long)ulmin(sbavail(&so->so_snd), tp->snd_wnd) - + off); /* * Don't remove this (len > 0) check ! * We explicitly check for len > 0 here (although it @@ -457,12 +458,15 @@ after_sack_rexmit: * TODO: Shrink send buffer during idle periods together * with congestion window. Requires another timer. Has to * wait for upcoming tcp timer rewrite. + * + * XXXGL: should there be used sbused() or sbavail()? */ if (V_tcp_do_autosndbuf && so->so_snd.sb_flags & SB_AUTOSIZE) { if ((tp->snd_wnd / 4 * 5) >= so->so_snd.sb_hiwat && - so->so_snd.sb_cc >= (so->so_snd.sb_hiwat / 8 * 7) && - so->so_snd.sb_cc < V_tcp_autosndbuf_max && - sendwin >= (so->so_snd.sb_cc - (tp->snd_nxt - tp->snd_una))) { + sbused(&so->so_snd) >= (so->so_snd.sb_hiwat / 8 * 7) && + sbused(&so->so_snd) < V_tcp_autosndbuf_max && + sendwin >= (sbused(&so->so_snd) - + (tp->snd_nxt - tp->snd_una))) { if (!sbreserve_locked(&so->so_snd, min(so->so_snd.sb_hiwat + V_tcp_autosndbuf_inc, V_tcp_autosndbuf_max), so, curthread)) @@ -499,10 +503,11 @@ after_sack_rexmit: tso = 1; if (sack_rxmit) { - if (SEQ_LT(p->rxmit + len, tp->snd_una + so->so_snd.sb_cc)) + if (SEQ_LT(p->rxmit + len, tp->snd_una + sbavail(&so->so_snd))) flags &= ~TH_FIN; } else { - if (SEQ_LT(tp->snd_nxt + len, tp->snd_una + so->so_snd.sb_cc)) + if (SEQ_LT(tp->snd_nxt + len, tp->snd_una + + sbavail(&so->so_snd))) flags &= ~TH_FIN; } @@ -532,7 +537,7 @@ after_sack_rexmit: */ if (!(tp->t_flags & TF_MORETOCOME) && /* normal case */ (idle || (tp->t_flags & TF_NODELAY)) && - len + off >= so->so_snd.sb_cc && + len + off >= sbavail(&so->so_snd) && (tp->t_flags & TF_NOPUSH) == 0) { goto send; } @@ -660,7 +665,7 @@ dontupdate: * if window is nonzero, transmit what we can, * otherwise force out a byte. */ - if (so->so_snd.sb_cc && !tcp_timer_active(tp, TT_REXMT) && + if (sbavail(&so->so_snd) && !tcp_timer_active(tp, TT_REXMT) && !tcp_timer_active(tp, TT_PERSIST)) { tp->t_rxtshift = 0; tcp_setpersist(tp); @@ -786,7 +791,7 @@ send: * fractional unless the send sockbuf can * be emptied. */ - if (sendalot && off + len < so->so_snd.sb_cc) { + if (sendalot && off + len < sbavail(&so->so_snd)) { len -= len % (tp->t_maxopd - optlen); sendalot = 1; } @@ -889,7 +894,7 @@ send: * give data to the user when a buffer fills or * a PUSH comes in.) */ - if (off + len == so->so_snd.sb_cc) + if (off + len == sbavail(&so->so_snd)) flags |= TH_PUSH; SOCKBUF_UNLOCK(&so->so_snd); } else { Index: sys/netinet/sctputil.c =================================================================== --- sys/netinet/sctputil.c (.../head) (revision 270879) +++ sys/netinet/sctputil.c (.../projects/sendfile) (revision 270881) @@ -67,9 +67,9 @@ sctp_sblog(struct sockbuf *sb, struct sctp_tcb *st struct sctp_cwnd_log sctp_clog; sctp_clog.x.sb.stcb = stcb; - sctp_clog.x.sb.so_sbcc = sb->sb_cc; + sctp_clog.x.sb.so_sbcc = sb->sb_ccc; if (stcb) - sctp_clog.x.sb.stcb_sbcc = stcb->asoc.sb_cc; + sctp_clog.x.sb.stcb_sbcc = stcb->asoc.sb_ccc; else sctp_clog.x.sb.stcb_sbcc = 0; sctp_clog.x.sb.incr = incr; @@ -4363,7 +4363,7 @@ sctp_add_to_readq(struct sctp_inpcb *inp, { /* * Here we must place the control on the end of the socket read - * queue AND increment sb_cc so that select will work properly on + * queue AND increment sb_ccc so that select will work properly on * read. */ struct mbuf *m, *prev = NULL; @@ -4489,7 +4489,7 @@ sctp_append_to_readq(struct sctp_inpcb *inp, * the reassembly queue. * * If PDAPI this means we need to add m to the end of the data. - * Increase the length in the control AND increment the sb_cc. + * Increase the length in the control AND increment the sb_ccc. * Otherwise sb is NULL and all we need to do is put it at the end * of the mbuf chain. */ @@ -4701,10 +4701,10 @@ sctp_free_bufspace(struct sctp_tcb *stcb, struct s if (stcb->sctp_socket && (((stcb->sctp_ep->sctp_flags & SCTP_PCB_FLAGS_IN_TCPPOOL)) || ((stcb->sctp_ep->sctp_flags & SCTP_PCB_FLAGS_TCPTYPE)))) { - if (stcb->sctp_socket->so_snd.sb_cc >= tp1->book_size) { - stcb->sctp_socket->so_snd.sb_cc -= tp1->book_size; + if (stcb->sctp_socket->so_snd.sb_ccc >= tp1->book_size) { + stcb->sctp_socket->so_snd.sb_ccc -= tp1->book_size; } else { - stcb->sctp_socket->so_snd.sb_cc = 0; + stcb->sctp_socket->so_snd.sb_ccc = 0; } } @@ -5254,11 +5254,11 @@ sctp_sorecvmsg(struct socket *so, in_eeor_mode = sctp_is_feature_on(inp, SCTP_PCB_FLAGS_EXPLICIT_EOR); if (SCTP_BASE_SYSCTL(sctp_logging_level) & SCTP_RECV_RWND_LOGGING_ENABLE) { sctp_misc_ints(SCTP_SORECV_ENTER, - rwnd_req, in_eeor_mode, so->so_rcv.sb_cc, uio->uio_resid); + rwnd_req, in_eeor_mode, so->so_rcv.sb_ccc, uio->uio_resid); } if (SCTP_BASE_SYSCTL(sctp_logging_level) & SCTP_RECV_RWND_LOGGING_ENABLE) { sctp_misc_ints(SCTP_SORECV_ENTERPL, - rwnd_req, block_allowed, so->so_rcv.sb_cc, uio->uio_resid); + rwnd_req, block_allowed, so->so_rcv.sb_ccc, uio->uio_resid); } error = sblock(&so->so_rcv, (block_allowed ? SBL_WAIT : 0)); if (error) { @@ -5277,7 +5277,7 @@ restart_nosblocks: (inp->sctp_flags & SCTP_PCB_FLAGS_SOCKET_ALLGONE)) { goto out; } - if ((so->so_rcv.sb_state & SBS_CANTRCVMORE) && (so->so_rcv.sb_cc == 0)) { + if ((so->so_rcv.sb_state & SBS_CANTRCVMORE) && (so->so_rcv.sb_ccc == 0)) { if (so->so_error) { error = so->so_error; if ((in_flags & MSG_PEEK) == 0) @@ -5284,7 +5284,7 @@ restart_nosblocks: so->so_error = 0; goto out; } else { - if (so->so_rcv.sb_cc == 0) { + if (so->so_rcv.sb_ccc == 0) { /* indicate EOF */ error = 0; goto out; @@ -5291,9 +5291,9 @@ restart_nosblocks: } } } - if ((so->so_rcv.sb_cc <= held_length) && block_allowed) { + if ((so->so_rcv.sb_ccc <= held_length) && block_allowed) { /* we need to wait for data */ - if ((so->so_rcv.sb_cc == 0) && + if ((so->so_rcv.sb_ccc == 0) && ((inp->sctp_flags & SCTP_PCB_FLAGS_TCPTYPE) || (inp->sctp_flags & SCTP_PCB_FLAGS_IN_TCPPOOL))) { if ((inp->sctp_flags & SCTP_PCB_FLAGS_CONNECTED) == 0) { @@ -5329,7 +5329,7 @@ restart_nosblocks: } held_length = 0; goto restart_nosblocks; - } else if (so->so_rcv.sb_cc == 0) { + } else if (so->so_rcv.sb_ccc == 0) { if (so->so_error) { error = so->so_error; if ((in_flags & MSG_PEEK) == 0) @@ -5386,11 +5386,11 @@ restart_nosblocks: SCTP_INP_READ_LOCK(inp); } control = TAILQ_FIRST(&inp->read_queue); - if ((control == NULL) && (so->so_rcv.sb_cc != 0)) { + if ((control == NULL) && (so->so_rcv.sb_ccc != 0)) { #ifdef INVARIANTS panic("Huh, its non zero and nothing on control?"); #endif - so->so_rcv.sb_cc = 0; + so->so_rcv.sb_ccc = 0; } SCTP_INP_READ_UNLOCK(inp); hold_rlock = 0; @@ -5511,11 +5511,11 @@ restart_nosblocks: } /* * if we reach here, not suitable replacement is available - * fragment interleave is NOT on. So stuff the sb_cc + * fragment interleave is NOT on. So stuff the sb_ccc * into the our held count, and its time to sleep again. */ - held_length = so->so_rcv.sb_cc; - control->held_length = so->so_rcv.sb_cc; + held_length = so->so_rcv.sb_ccc; + control->held_length = so->so_rcv.sb_ccc; goto restart; } /* Clear the held length since there is something to read */ @@ -5812,10 +5812,10 @@ get_more_data: if (SCTP_BASE_SYSCTL(sctp_logging_level) & SCTP_SB_LOGGING_ENABLE) { sctp_sblog(&so->so_rcv, control->do_not_ref_stcb ? NULL : stcb, SCTP_LOG_SBFREE, cp_len); } - atomic_subtract_int(&so->so_rcv.sb_cc, cp_len); + atomic_subtract_int(&so->so_rcv.sb_ccc, cp_len); if ((control->do_not_ref_stcb == 0) && stcb) { - atomic_subtract_int(&stcb->asoc.sb_cc, cp_len); + atomic_subtract_int(&stcb->asoc.sb_ccc, cp_len); } copied_so_far += cp_len; freed_so_far += cp_len; @@ -5960,7 +5960,7 @@ wait_some_more: (sctp_is_feature_on(inp, SCTP_PCB_FLAGS_FRAG_INTERLEAVE))) { goto release; } - if (so->so_rcv.sb_cc <= control->held_length) { + if (so->so_rcv.sb_ccc <= control->held_length) { error = sbwait(&so->so_rcv); if (error) { goto release; @@ -5987,8 +5987,8 @@ wait_some_more: } goto done_with_control; } - if (so->so_rcv.sb_cc > held_length) { - control->held_length = so->so_rcv.sb_cc; + if (so->so_rcv.sb_ccc > held_length) { + control->held_length = so->so_rcv.sb_ccc; held_length = 0; } goto wait_some_more; @@ -6135,13 +6135,13 @@ out: freed_so_far, ((uio) ? (slen - uio->uio_resid) : slen), stcb->asoc.my_rwnd, - so->so_rcv.sb_cc); + so->so_rcv.sb_ccc); } else { sctp_misc_ints(SCTP_SORECV_DONE, freed_so_far, ((uio) ? (slen - uio->uio_resid) : slen), 0, - so->so_rcv.sb_cc); + so->so_rcv.sb_ccc); } } stage_left: Index: sys/netinet/sctp_usrreq.c =================================================================== --- sys/netinet/sctp_usrreq.c (.../head) (revision 270879) +++ sys/netinet/sctp_usrreq.c (.../projects/sendfile) (revision 270881) @@ -586,7 +586,7 @@ sctp_must_try_again: if (((flags & SCTP_PCB_FLAGS_SOCKET_GONE) == 0) && (atomic_cmpset_int(&inp->sctp_flags, flags, (flags | SCTP_PCB_FLAGS_SOCKET_GONE | SCTP_PCB_FLAGS_CLOSE_IP)))) { if (((so->so_options & SO_LINGER) && (so->so_linger == 0)) || - (so->so_rcv.sb_cc > 0)) { + (so->so_rcv.sb_ccc > 0)) { #ifdef SCTP_LOG_CLOSING sctp_log_closing(inp, NULL, 13); #endif @@ -751,7 +751,7 @@ sctp_disconnect(struct socket *so) } if (((so->so_options & SO_LINGER) && (so->so_linger == 0)) || - (so->so_rcv.sb_cc > 0)) { + (so->so_rcv.sb_ccc > 0)) { if (SCTP_GET_STATE(asoc) != SCTP_STATE_COOKIE_WAIT) { /* Left with Data unread */ @@ -916,7 +916,7 @@ sctp_flush(struct socket *so, int how) inp->sctp_flags |= SCTP_PCB_FLAGS_SOCKET_CANT_READ; SCTP_INP_READ_UNLOCK(inp); SCTP_INP_WUNLOCK(inp); - so->so_rcv.sb_cc = 0; + so->so_rcv.sb_ccc = 0; so->so_rcv.sb_mbcnt = 0; so->so_rcv.sb_mb = NULL; } @@ -925,7 +925,7 @@ sctp_flush(struct socket *so, int how) * First make sure the sb will be happy, we don't use these * except maybe the count */ - so->so_snd.sb_cc = 0; + so->so_snd.sb_ccc = 0; so->so_snd.sb_mbcnt = 0; so->so_snd.sb_mb = NULL; Index: sys/netinet/sctputil.h =================================================================== --- sys/netinet/sctputil.h (.../head) (revision 270879) +++ sys/netinet/sctputil.h (.../projects/sendfile) (revision 270881) @@ -286,10 +286,10 @@ do { \ } \ if (stcb->sctp_socket && ((stcb->sctp_ep->sctp_flags & SCTP_PCB_FLAGS_TCPTYPE) || \ (stcb->sctp_ep->sctp_flags & SCTP_PCB_FLAGS_IN_TCPPOOL))) { \ - if (stcb->sctp_socket->so_snd.sb_cc >= tp1->book_size) { \ - atomic_subtract_int(&((stcb)->sctp_socket->so_snd.sb_cc), tp1->book_size); \ + if (stcb->sctp_socket->so_snd.sb_ccc >= tp1->book_size) { \ + atomic_subtract_int(&((stcb)->sctp_socket->so_snd.sb_ccc), tp1->book_size); \ } else { \ - stcb->sctp_socket->so_snd.sb_cc = 0; \ + stcb->sctp_socket->so_snd.sb_ccc = 0; \ } \ } \ } \ @@ -307,10 +307,10 @@ do { \ } \ if (stcb->sctp_socket && ((stcb->sctp_ep->sctp_flags & SCTP_PCB_FLAGS_TCPTYPE) || \ (stcb->sctp_ep->sctp_flags & SCTP_PCB_FLAGS_IN_TCPPOOL))) { \ - if (stcb->sctp_socket->so_snd.sb_cc >= sp->length) { \ - atomic_subtract_int(&stcb->sctp_socket->so_snd.sb_cc,sp->length); \ + if (stcb->sctp_socket->so_snd.sb_ccc >= sp->length) { \ + atomic_subtract_int(&stcb->sctp_socket->so_snd.sb_ccc,sp->length); \ } else { \ - stcb->sctp_socket->so_snd.sb_cc = 0; \ + stcb->sctp_socket->so_snd.sb_ccc = 0; \ } \ } \ } \ @@ -322,7 +322,7 @@ do { \ if ((stcb->sctp_socket != NULL) && \ ((stcb->sctp_ep->sctp_flags & SCTP_PCB_FLAGS_TCPTYPE) || \ (stcb->sctp_ep->sctp_flags & SCTP_PCB_FLAGS_IN_TCPPOOL))) { \ - atomic_add_int(&stcb->sctp_socket->so_snd.sb_cc,sz); \ + atomic_add_int(&stcb->sctp_socket->so_snd.sb_ccc,sz); \ } \ } while (0) Index: sys/netinet/sctp_input.c =================================================================== --- sys/netinet/sctp_input.c (.../head) (revision 270879) +++ sys/netinet/sctp_input.c (.../projects/sendfile) (revision 270881) @@ -1044,7 +1044,7 @@ sctp_handle_shutdown_ack(struct sctp_shutdown_ack_ if (stcb->sctp_socket) { if ((stcb->sctp_ep->sctp_flags & SCTP_PCB_FLAGS_TCPTYPE) || (stcb->sctp_ep->sctp_flags & SCTP_PCB_FLAGS_IN_TCPPOOL)) { - stcb->sctp_socket->so_snd.sb_cc = 0; + stcb->sctp_socket->so_snd.sb_ccc = 0; } sctp_ulp_notify(SCTP_NOTIFY_ASSOC_DOWN, stcb, 0, NULL, SCTP_SO_NOT_LOCKED); } Index: sys/netinet/sctp_output.c =================================================================== --- sys/netinet/sctp_output.c (.../head) (revision 270879) +++ sys/netinet/sctp_output.c (.../projects/sendfile) (revision 270881) @@ -7257,7 +7257,7 @@ one_more_time: if ((stcb->sctp_socket != NULL) && \ ((stcb->sctp_ep->sctp_flags & SCTP_PCB_FLAGS_TCPTYPE) || (stcb->sctp_ep->sctp_flags & SCTP_PCB_FLAGS_IN_TCPPOOL))) { - atomic_subtract_int(&stcb->sctp_socket->so_snd.sb_cc, sp->length); + atomic_subtract_int(&stcb->sctp_socket->so_snd.sb_ccc, sp->length); } if (sp->data) { sctp_m_freem(sp->data); @@ -11537,7 +11537,7 @@ jump_out: drp->current_onq = htonl(asoc->size_on_reasm_queue + asoc->size_on_all_streams + asoc->my_rwnd_control_len + - stcb->sctp_socket->so_rcv.sb_cc); + stcb->sctp_socket->so_rcv.sb_ccc); } else { /*- * If my rwnd is 0, possibly from mbuf depletion as well as Index: sys/netinet/sctp_pcb.c =================================================================== --- sys/netinet/sctp_pcb.c (.../head) (revision 270879) +++ sys/netinet/sctp_pcb.c (.../projects/sendfile) (revision 270881) @@ -3407,7 +3407,7 @@ sctp_inpcb_free(struct sctp_inpcb *inp, int immedi if ((asoc->asoc.size_on_reasm_queue > 0) || (asoc->asoc.control_pdapi) || (asoc->asoc.size_on_all_streams > 0) || - (so && (so->so_rcv.sb_cc > 0))) { + (so && (so->so_rcv.sb_ccc > 0))) { /* Left with Data unread */ struct mbuf *op_err; @@ -3635,7 +3635,7 @@ sctp_inpcb_free(struct sctp_inpcb *inp, int immedi TAILQ_REMOVE(&inp->read_queue, sq, next); sctp_free_remote_addr(sq->whoFrom); if (so) - so->so_rcv.sb_cc -= sq->length; + so->so_rcv.sb_ccc -= sq->length; if (sq->data) { sctp_m_freem(sq->data); sq->data = NULL; @@ -4863,7 +4863,7 @@ sctp_free_assoc(struct sctp_inpcb *inp, struct sct inp->sctp_flags |= SCTP_PCB_FLAGS_WAS_CONNECTED; if (so) { SOCK_LOCK(so); - if (so->so_rcv.sb_cc == 0) { + if (so->so_rcv.sb_ccc == 0) { so->so_state &= ~(SS_ISCONNECTING | SS_ISDISCONNECTING | SS_ISCONFIRMING | Index: sys/netinet/sctp_pcb.h =================================================================== --- sys/netinet/sctp_pcb.h (.../head) (revision 270879) +++ sys/netinet/sctp_pcb.h (.../projects/sendfile) (revision 270881) @@ -369,7 +369,7 @@ struct sctp_inpcb { } ip_inp; - /* Socket buffer lock protects read_queue and of course sb_cc */ + /* Socket buffer lock protects read_queue and of course sb_ccc */ struct sctp_readhead read_queue; LIST_ENTRY(sctp_inpcb) sctp_list; /* lists all endpoints */ Index: sys/netinet/tcp_input.c =================================================================== --- sys/netinet/tcp_input.c (.../head) (revision 270879) +++ sys/netinet/tcp_input.c (.../projects/sendfile) (revision 270881) @@ -1734,7 +1734,7 @@ tcp_do_segment(struct mbuf *m, struct tcphdr *th, tcp_timer_activate(tp, TT_REXMT, tp->t_rxtcur); sowwakeup(so); - if (so->so_snd.sb_cc) + if (sbavail(&so->so_snd)) (void) tcp_output(tp); goto check_delack; } @@ -1844,7 +1844,7 @@ tcp_do_segment(struct mbuf *m, struct tcphdr *th, newsize, so, NULL)) so->so_rcv.sb_flags &= ~SB_AUTOSIZE; m_adj(m, drop_hdrlen); /* delayed header drop */ - sbappendstream_locked(&so->so_rcv, m); + sbappendstream_locked(&so->so_rcv, m, 0); } /* NB: sorwakeup_locked() does an implicit unlock. */ sorwakeup_locked(so); @@ -2548,7 +2548,7 @@ tcp_do_segment(struct mbuf *m, struct tcphdr *th, * Otherwise we would send pure ACKs. */ SOCKBUF_LOCK(&so->so_snd); - avail = so->so_snd.sb_cc - + avail = sbavail(&so->so_snd) - (tp->snd_nxt - tp->snd_una); SOCKBUF_UNLOCK(&so->so_snd); if (avail > 0) @@ -2683,10 +2683,10 @@ process_ACK: cc_ack_received(tp, th, CC_ACK); SOCKBUF_LOCK(&so->so_snd); - if (acked > so->so_snd.sb_cc) { - tp->snd_wnd -= so->so_snd.sb_cc; + if (acked > sbavail(&so->so_snd)) { + tp->snd_wnd -= sbavail(&so->so_snd); mfree = sbcut_locked(&so->so_snd, - (int)so->so_snd.sb_cc); + (int)sbavail(&so->so_snd)); ourfinisacked = 1; } else { mfree = sbcut_locked(&so->so_snd, acked); @@ -2812,7 +2812,7 @@ step6: * actually wanting to send this much urgent data. */ SOCKBUF_LOCK(&so->so_rcv); - if (th->th_urp + so->so_rcv.sb_cc > sb_max) { + if (th->th_urp + sbavail(&so->so_rcv) > sb_max) { th->th_urp = 0; /* XXX */ thflags &= ~TH_URG; /* XXX */ SOCKBUF_UNLOCK(&so->so_rcv); /* XXX */ @@ -2834,7 +2834,7 @@ step6: */ if (SEQ_GT(th->th_seq+th->th_urp, tp->rcv_up)) { tp->rcv_up = th->th_seq + th->th_urp; - so->so_oobmark = so->so_rcv.sb_cc + + so->so_oobmark = sbavail(&so->so_rcv) + (tp->rcv_up - tp->rcv_nxt) - 1; if (so->so_oobmark == 0) so->so_rcv.sb_state |= SBS_RCVATMARK; @@ -2904,7 +2904,7 @@ dodata: /* XXX */ if (so->so_rcv.sb_state & SBS_CANTRCVMORE) m_freem(m); else - sbappendstream_locked(&so->so_rcv, m); + sbappendstream_locked(&so->so_rcv, m, 0); /* NB: sorwakeup_locked() does an implicit unlock. */ sorwakeup_locked(so); } else { Index: sys/netgraph/bluetooth/socket/ng_btsocket_l2cap.c =================================================================== --- sys/netgraph/bluetooth/socket/ng_btsocket_l2cap.c (.../head) (revision 270879) +++ sys/netgraph/bluetooth/socket/ng_btsocket_l2cap.c (.../projects/sendfile) (revision 270881) @@ -1127,9 +1127,8 @@ ng_btsocket_l2cap_process_l2ca_write_rsp(struct ng /* * Check if we have more data to send */ - sbdroprecord(&pcb->so->so_snd); - if (pcb->so->so_snd.sb_cc > 0) { + if (sbavail(&pcb->so->so_snd) > 0) { if (ng_btsocket_l2cap_send2(pcb) == 0) ng_btsocket_l2cap_timeout(pcb); else @@ -2514,7 +2513,7 @@ ng_btsocket_l2cap_send2(ng_btsocket_l2cap_pcb_p pc mtx_assert(&pcb->pcb_mtx, MA_OWNED); - if (pcb->so->so_snd.sb_cc == 0) + if (sbavail(&pcb->so->so_snd) == 0) return (EINVAL); /* XXX */ m = m_dup(pcb->so->so_snd.sb_mb, M_NOWAIT); Index: sys/netgraph/bluetooth/socket/ng_btsocket_rfcomm.c =================================================================== --- sys/netgraph/bluetooth/socket/ng_btsocket_rfcomm.c (.../head) (revision 270879) +++ sys/netgraph/bluetooth/socket/ng_btsocket_rfcomm.c (.../projects/sendfile) (revision 270881) @@ -3279,7 +3279,7 @@ ng_btsocket_rfcomm_pcb_send(ng_btsocket_rfcomm_pcb } for (error = 0, sent = 0; sent < limit; sent ++) { - length = min(pcb->mtu, pcb->so->so_snd.sb_cc); + length = min(pcb->mtu, sbavail(&pcb->so->so_snd)); if (length == 0) break; Index: sys/netgraph/bluetooth/socket/ng_btsocket_sco.c =================================================================== --- sys/netgraph/bluetooth/socket/ng_btsocket_sco.c (.../head) (revision 270879) +++ sys/netgraph/bluetooth/socket/ng_btsocket_sco.c (.../projects/sendfile) (revision 270881) @@ -906,7 +906,7 @@ ng_btsocket_sco_default_msg_input(struct ng_mesg * sbdroprecord(&pcb->so->so_snd); /* Send more if we have any */ - if (pcb->so->so_snd.sb_cc > 0) + if (sbavail(&pcb->so->so_snd) > 0) if (ng_btsocket_sco_send2(pcb) == 0) ng_btsocket_sco_timeout(pcb); @@ -1748,7 +1748,7 @@ ng_btsocket_sco_send2(ng_btsocket_sco_pcb_p pcb) mtx_assert(&pcb->pcb_mtx, MA_OWNED); while (pcb->rt->pending < pcb->rt->num_pkts && - pcb->so->so_snd.sb_cc > 0) { + sbavail(&pcb->so->so_snd) > 0) { /* Get a copy of the first packet on send queue */ m = m_dup(pcb->so->so_snd.sb_mb, M_NOWAIT); if (m == NULL) { Index: sys/ofed/drivers/infiniband/ulp/sdp/sdp_rx.c =================================================================== --- sys/ofed/drivers/infiniband/ulp/sdp/sdp_rx.c (.../head) (revision 270879) +++ sys/ofed/drivers/infiniband/ulp/sdp/sdp_rx.c (.../projects/sendfile) (revision 270881) @@ -183,7 +183,7 @@ sdp_post_recvs_needed(struct sdp_sock *ssk) * Compute bytes in the receive queue and socket buffer. */ bytes_in_process = (posted - SDP_MIN_TX_CREDITS) * buffer_size; - bytes_in_process += ssk->socket->so_rcv.sb_cc; + bytes_in_process += sbused(&ssk->socket->so_rcv); return bytes_in_process < max_bytes; } Index: sys/ofed/drivers/infiniband/ulp/sdp/sdp_main.c =================================================================== --- sys/ofed/drivers/infiniband/ulp/sdp/sdp_main.c (.../head) (revision 270879) +++ sys/ofed/drivers/infiniband/ulp/sdp/sdp_main.c (.../projects/sendfile) (revision 270881) @@ -747,7 +747,7 @@ sdp_start_disconnect(struct sdp_sock *ssk) ("sdp_start_disconnect: sdp_drop() returned NULL")); } else { soisdisconnecting(so); - unread = so->so_rcv.sb_cc; + unread = sbused(&so->so_rcv); sbflush(&so->so_rcv); sdp_usrclosed(ssk); if (!(ssk->flags & SDP_DROPPED)) { @@ -889,7 +889,7 @@ sdp_append(struct sdp_sock *ssk, struct sockbuf *s m_adj(mb, SDP_HEAD_SIZE); n->m_pkthdr.len += mb->m_pkthdr.len; n->m_flags |= mb->m_flags & (M_PUSH | M_URG); - m_demote(mb, 1); + m_demote(mb, 1, 0); sbcompress(sb, mb, sb->sb_mbtail); return; } @@ -1259,7 +1259,7 @@ sdp_sorecv(struct socket *so, struct sockaddr **ps /* We will never ever get anything unless we are connected. */ if (!(so->so_state & (SS_ISCONNECTED|SS_ISDISCONNECTED))) { /* When disconnecting there may be still some data left. */ - if (sb->sb_cc > 0) + if (sbavail(sb)) goto deliver; if (!(so->so_state & SS_ISDISCONNECTED)) error = ENOTCONN; @@ -1267,7 +1267,7 @@ sdp_sorecv(struct socket *so, struct sockaddr **ps } /* Socket buffer is empty and we shall not block. */ - if (sb->sb_cc == 0 && + if (sbavail(sb) == 0 && ((so->so_state & SS_NBIO) || (flags & (MSG_DONTWAIT|MSG_NBIO)))) { error = EAGAIN; goto out; @@ -1278,7 +1278,7 @@ restart: /* Abort if socket has reported problems. */ if (so->so_error) { - if (sb->sb_cc > 0) + if (sbavail(sb)) goto deliver; if (oresid > uio->uio_resid) goto out; @@ -1290,7 +1290,7 @@ restart: /* Door is closed. Deliver what is left, if any. */ if (sb->sb_state & SBS_CANTRCVMORE) { - if (sb->sb_cc > 0) + if (sbavail(sb)) goto deliver; else goto out; @@ -1297,18 +1297,18 @@ restart: } /* Socket buffer got some data that we shall deliver now. */ - if (sb->sb_cc > 0 && !(flags & MSG_WAITALL) && + if (sbavail(sb) && !(flags & MSG_WAITALL) && ((so->so_state & SS_NBIO) || (flags & (MSG_DONTWAIT|MSG_NBIO)) || - sb->sb_cc >= sb->sb_lowat || - sb->sb_cc >= uio->uio_resid || - sb->sb_cc >= sb->sb_hiwat) ) { + sbavail(sb) >= sb->sb_lowat || + sbavail(sb) >= uio->uio_resid || + sbavail(sb) >= sb->sb_hiwat) ) { goto deliver; } /* On MSG_WAITALL we must wait until all data or error arrives. */ if ((flags & MSG_WAITALL) && - (sb->sb_cc >= uio->uio_resid || sb->sb_cc >= sb->sb_lowat)) + (sbavail(sb) >= uio->uio_resid || sbavail(sb) >= sb->sb_lowat)) goto deliver; /* @@ -1322,7 +1322,7 @@ restart: deliver: SOCKBUF_LOCK_ASSERT(&so->so_rcv); - KASSERT(sb->sb_cc > 0, ("%s: sockbuf empty", __func__)); + KASSERT(sbavail(sb), ("%s: sockbuf empty", __func__)); KASSERT(sb->sb_mb != NULL, ("%s: sb_mb == NULL", __func__)); /* Statistics. */ @@ -1330,7 +1330,7 @@ deliver: uio->uio_td->td_ru.ru_msgrcv++; /* Fill uio until full or current end of socket buffer is reached. */ - len = min(uio->uio_resid, sb->sb_cc); + len = min(uio->uio_resid, sbavail(sb)); if (mp0 != NULL) { /* Dequeue as many mbufs as possible. */ if (!(flags & MSG_PEEK) && len >= sb->sb_mb->m_len) { @@ -1510,7 +1510,7 @@ sdp_urg(struct sdp_sock *ssk, struct mbuf *mb) if (so == NULL) return; - so->so_oobmark = so->so_rcv.sb_cc + mb->m_pkthdr.len - 1; + so->so_oobmark = sbused(&so->so_rcv) + mb->m_pkthdr.len - 1; sohasoutofband(so); ssk->oobflags &= ~(SDP_HAVEOOB | SDP_HADOOB); if (!(so->so_options & SO_OOBINLINE)) { Index: sys/dev/cxgb/ulp/tom/cxgb_cpl_io.c =================================================================== --- sys/dev/cxgb/ulp/tom/cxgb_cpl_io.c (.../head) (revision 270879) +++ sys/dev/cxgb/ulp/tom/cxgb_cpl_io.c (.../projects/sendfile) (revision 270881) @@ -445,8 +445,8 @@ t3_push_frames(struct socket *so, int req_completi * Autosize the send buffer. */ if (snd->sb_flags & SB_AUTOSIZE && VNET(tcp_do_autosndbuf)) { - if (snd->sb_cc >= (snd->sb_hiwat / 8 * 7) && - snd->sb_cc < VNET(tcp_autosndbuf_max)) { + if (sbused(snd) >= (snd->sb_hiwat / 8 * 7) && + sbused(snd) < VNET(tcp_autosndbuf_max)) { if (!sbreserve_locked(snd, min(snd->sb_hiwat + VNET(tcp_autosndbuf_inc), VNET(tcp_autosndbuf_max)), so, curthread)) @@ -597,10 +597,10 @@ t3_rcvd(struct toedev *tod, struct tcpcb *tp) INP_WLOCK_ASSERT(inp); SOCKBUF_LOCK(so_rcv); - KASSERT(toep->tp_enqueued >= so_rcv->sb_cc, - ("%s: so_rcv->sb_cc > enqueued", __func__)); - toep->tp_rx_credits += toep->tp_enqueued - so_rcv->sb_cc; - toep->tp_enqueued = so_rcv->sb_cc; + KASSERT(toep->tp_enqueued >= sbused(so_rcv), + ("%s: sbused(so_rcv) > enqueued", __func__)); + toep->tp_rx_credits += toep->tp_enqueued - sbused(so_rcv); + toep->tp_enqueued = sbused(so_rcv); SOCKBUF_UNLOCK(so_rcv); must_send = toep->tp_rx_credits + 16384 >= tp->rcv_wnd; @@ -1199,7 +1199,7 @@ do_rx_data(struct sge_qset *qs, struct rsp_desc *r } toep->tp_enqueued += m->m_pkthdr.len; - sbappendstream_locked(so_rcv, m); + sbappendstream_locked(so_rcv, m, 0); sorwakeup_locked(so); SOCKBUF_UNLOCK_ASSERT(so_rcv); @@ -1768,7 +1768,7 @@ wr_ack(struct toepcb *toep, struct mbuf *m) so_sowwakeup_locked(so); } - if (snd->sb_sndptroff < snd->sb_cc) + if (snd->sb_sndptroff < sbused(snd)) t3_push_frames(so, 0); out_free: Index: sys/dev/cxgb/ulp/iw_cxgb/iw_cxgb_cm.c =================================================================== --- sys/dev/cxgb/ulp/iw_cxgb/iw_cxgb_cm.c (.../head) (revision 270879) +++ sys/dev/cxgb/ulp/iw_cxgb/iw_cxgb_cm.c (.../projects/sendfile) (revision 270881) @@ -1507,11 +1507,11 @@ process_data(struct iwch_ep *ep) process_mpa_request(ep); break; default: - if (ep->com.so->so_rcv.sb_cc) + if (sbavail(&ep->com.so->so_rcv)) printf("%s Unexpected streaming data." " ep %p state %d so %p so_state %x so_rcv.sb_cc %u so_rcv.sb_mb %p\n", __FUNCTION__, ep, state_read(&ep->com), ep->com.so, ep->com.so->so_state, - ep->com.so->so_rcv.sb_cc, ep->com.so->so_rcv.sb_mb); + sbavail(&ep->com.so->so_rcv), ep->com.so->so_rcv.sb_mb); break; } return; Index: sys/dev/cxgbe/tom/t4_cpl_io.c =================================================================== --- sys/dev/cxgbe/tom/t4_cpl_io.c (.../head) (revision 270879) +++ sys/dev/cxgbe/tom/t4_cpl_io.c (.../projects/sendfile) (revision 270881) @@ -365,15 +365,15 @@ t4_rcvd(struct toedev *tod, struct tcpcb *tp) INP_WLOCK_ASSERT(inp); SOCKBUF_LOCK(sb); - KASSERT(toep->sb_cc >= sb->sb_cc, + KASSERT(toep->sb_cc >= sbused(sb), ("%s: sb %p has more data (%d) than last time (%d).", - __func__, sb, sb->sb_cc, toep->sb_cc)); + __func__, sb, sbused(sb), toep->sb_cc)); if (toep->ulp_mode == ULP_MODE_ISCSI) { toep->rx_credits += toep->sb_cc; toep->sb_cc = 0; } else { - toep->rx_credits += toep->sb_cc - sb->sb_cc; - toep->sb_cc = sb->sb_cc; + toep->rx_credits += toep->sb_cc - sbused(sb); + toep->sb_cc = sbused(sb); } credits = toep->rx_credits; SOCKBUF_UNLOCK(sb); @@ -1079,15 +1079,15 @@ do_peer_close(struct sge_iq *iq, const struct rss_ tp->rcv_nxt = be32toh(cpl->rcv_nxt); toep->ddp_flags &= ~(DDP_BUF0_ACTIVE | DDP_BUF1_ACTIVE); - KASSERT(toep->sb_cc >= sb->sb_cc, + KASSERT(toep->sb_cc >= sbused(sb), ("%s: sb %p has more data (%d) than last time (%d).", - __func__, sb, sb->sb_cc, toep->sb_cc)); - toep->rx_credits += toep->sb_cc - sb->sb_cc; + __func__, sb, sbused(sb), toep->sb_cc)); + toep->rx_credits += toep->sb_cc - sbused(sb); #ifdef USE_DDP_RX_FLOW_CONTROL toep->rx_credits -= m->m_len; /* adjust for F_RX_FC_DDP */ #endif - sbappendstream_locked(sb, m); - toep->sb_cc = sb->sb_cc; + sbappendstream_locked(sb, m, 0); + toep->sb_cc = sbused(sb); } socantrcvmore_locked(so); /* unlocks the sockbuf */ @@ -1582,12 +1582,12 @@ do_rx_data(struct sge_iq *iq, const struct rss_hea } } - KASSERT(toep->sb_cc >= sb->sb_cc, + KASSERT(toep->sb_cc >= sbused(sb), ("%s: sb %p has more data (%d) than last time (%d).", - __func__, sb, sb->sb_cc, toep->sb_cc)); - toep->rx_credits += toep->sb_cc - sb->sb_cc; - sbappendstream_locked(sb, m); - toep->sb_cc = sb->sb_cc; + __func__, sb, sbused(sb), toep->sb_cc)); + toep->rx_credits += toep->sb_cc - sbused(sb); + sbappendstream_locked(sb, m, 0); + toep->sb_cc = sbused(sb); sorwakeup_locked(so); SOCKBUF_UNLOCK_ASSERT(sb); Index: sys/dev/cxgbe/tom/t4_ddp.c =================================================================== --- sys/dev/cxgbe/tom/t4_ddp.c (.../head) (revision 270879) +++ sys/dev/cxgbe/tom/t4_ddp.c (.../projects/sendfile) (revision 270881) @@ -224,15 +224,15 @@ insert_ddp_data(struct toepcb *toep, uint32_t n) tp->rcv_wnd -= n; #endif - KASSERT(toep->sb_cc >= sb->sb_cc, + KASSERT(toep->sb_cc >= sbused(sb), ("%s: sb %p has more data (%d) than last time (%d).", - __func__, sb, sb->sb_cc, toep->sb_cc)); - toep->rx_credits += toep->sb_cc - sb->sb_cc; + __func__, sb, sbused(sb), toep->sb_cc)); + toep->rx_credits += toep->sb_cc - sbused(sb); #ifdef USE_DDP_RX_FLOW_CONTROL toep->rx_credits -= n; /* adjust for F_RX_FC_DDP */ #endif - sbappendstream_locked(sb, m); - toep->sb_cc = sb->sb_cc; + sbappendstream_locked(sb, m, 0); + toep->sb_cc = sbused(sb); } /* SET_TCB_FIELD sent as a ULP command looks like this */ @@ -459,15 +459,15 @@ handle_ddp_data(struct toepcb *toep, __be32 ddp_re else discourage_ddp(toep); - KASSERT(toep->sb_cc >= sb->sb_cc, + KASSERT(toep->sb_cc >= sbused(sb), ("%s: sb %p has more data (%d) than last time (%d).", - __func__, sb, sb->sb_cc, toep->sb_cc)); - toep->rx_credits += toep->sb_cc - sb->sb_cc; + __func__, sb, sbused(sb), toep->sb_cc)); + toep->rx_credits += toep->sb_cc - sbused(sb); #ifdef USE_DDP_RX_FLOW_CONTROL toep->rx_credits -= len; /* adjust for F_RX_FC_DDP */ #endif - sbappendstream_locked(sb, m); - toep->sb_cc = sb->sb_cc; + sbappendstream_locked(sb, m, 0); + toep->sb_cc = sbused(sb); wakeup: KASSERT(toep->ddp_flags & db_flag, ("%s: DDP buffer not active. toep %p, ddp_flags 0x%x, report 0x%x", @@ -908,7 +908,7 @@ handle_ddp(struct socket *so, struct uio *uio, int #endif /* XXX: too eager to disable DDP, could handle NBIO better than this. */ - if (sb->sb_cc >= uio->uio_resid || uio->uio_resid < sc->tt.ddp_thres || + if (sbused(sb) >= uio->uio_resid || uio->uio_resid < sc->tt.ddp_thres || uio->uio_resid > MAX_DDP_BUFFER_SIZE || uio->uio_iovcnt > 1 || so->so_state & SS_NBIO || flags & (MSG_DONTWAIT | MSG_NBIO) || error || so->so_error || sb->sb_state & SBS_CANTRCVMORE) @@ -946,7 +946,7 @@ handle_ddp(struct socket *so, struct uio *uio, int * payload. */ ddp_flags = select_ddp_flags(so, flags, db_idx); - wr = mk_update_tcb_for_ddp(sc, toep, db_idx, sb->sb_cc, ddp_flags); + wr = mk_update_tcb_for_ddp(sc, toep, db_idx, sbused(sb), ddp_flags); if (wr == NULL) { /* * Just unhold the pages. The DDP buffer's software state is @@ -971,8 +971,9 @@ handle_ddp(struct socket *so, struct uio *uio, int */ rc = sbwait(sb); while (toep->ddp_flags & buf_flag) { + /* XXXGL: shouldn't here be sbwait() call? */ sb->sb_flags |= SB_WAIT; - msleep(&sb->sb_cc, &sb->sb_mtx, PSOCK , "sbwait", 0); + msleep(&sb->sb_acc, &sb->sb_mtx, PSOCK , "sbwait", 0); } unwire_ddp_buffer(db); return (rc); @@ -1134,8 +1135,8 @@ restart: /* uio should be just as it was at entry */ KASSERT(oresid == uio->uio_resid, - ("%s: oresid = %d, uio_resid = %zd, sb_cc = %d", - __func__, oresid, uio->uio_resid, sb->sb_cc)); + ("%s: oresid = %d, uio_resid = %zd, sbused = %d", + __func__, oresid, uio->uio_resid, sbused(sb))); error = handle_ddp(so, uio, flags, 0); ddp_handled = 1; @@ -1145,7 +1146,7 @@ restart: /* Abort if socket has reported problems. */ if (so->so_error) { - if (sb->sb_cc > 0) + if (sbused(sb)) goto deliver; if (oresid > uio->uio_resid) goto out; @@ -1157,7 +1158,7 @@ restart: /* Door is closed. Deliver what is left, if any. */ if (sb->sb_state & SBS_CANTRCVMORE) { - if (sb->sb_cc > 0) + if (sbused(sb)) goto deliver; else goto out; @@ -1164,7 +1165,7 @@ restart: } /* Socket buffer is empty and we shall not block. */ - if (sb->sb_cc == 0 && + if (sbused(sb) == 0 && ((so->so_state & SS_NBIO) || (flags & (MSG_DONTWAIT|MSG_NBIO)))) { error = EAGAIN; goto out; @@ -1171,18 +1172,18 @@ restart: } /* Socket buffer got some data that we shall deliver now. */ - if (sb->sb_cc > 0 && !(flags & MSG_WAITALL) && + if (sbused(sb) && !(flags & MSG_WAITALL) && ((sb->sb_flags & SS_NBIO) || (flags & (MSG_DONTWAIT|MSG_NBIO)) || - sb->sb_cc >= sb->sb_lowat || - sb->sb_cc >= uio->uio_resid || - sb->sb_cc >= sb->sb_hiwat) ) { + sbused(sb) >= sb->sb_lowat || + sbused(sb) >= uio->uio_resid || + sbused(sb) >= sb->sb_hiwat) ) { goto deliver; } /* On MSG_WAITALL we must wait until all data or error arrives. */ if ((flags & MSG_WAITALL) && - (sb->sb_cc >= uio->uio_resid || sb->sb_cc >= sb->sb_lowat)) + (sbused(sb) >= uio->uio_resid || sbused(sb) >= sb->sb_lowat)) goto deliver; /* @@ -1201,7 +1202,7 @@ restart: deliver: SOCKBUF_LOCK_ASSERT(&so->so_rcv); - KASSERT(sb->sb_cc > 0, ("%s: sockbuf empty", __func__)); + KASSERT(sbused(sb) > 0, ("%s: sockbuf empty", __func__)); KASSERT(sb->sb_mb != NULL, ("%s: sb_mb == NULL", __func__)); if (sb->sb_flags & SB_DDP_INDICATE && !ddp_handled) @@ -1212,7 +1213,7 @@ deliver: uio->uio_td->td_ru.ru_msgrcv++; /* Fill uio until full or current end of socket buffer is reached. */ - len = min(uio->uio_resid, sb->sb_cc); + len = min(uio->uio_resid, sbused(sb)); if (mp0 != NULL) { /* Dequeue as many mbufs as possible. */ if (!(flags & MSG_PEEK) && len >= sb->sb_mb->m_len) { Index: sys/dev/cxgbe/iw_cxgbe/cm.c =================================================================== --- sys/dev/cxgbe/iw_cxgbe/cm.c (.../head) (revision 270879) +++ sys/dev/cxgbe/iw_cxgbe/cm.c (.../projects/sendfile) (revision 270881) @@ -584,8 +584,8 @@ process_data(struct c4iw_ep *ep) { struct sockaddr_in *local, *remote; - CTR5(KTR_IW_CXGBE, "%s: so %p, ep %p, state %s, sb_cc %d", __func__, - ep->com.so, ep, states[ep->com.state], ep->com.so->so_rcv.sb_cc); + CTR5(KTR_IW_CXGBE, "%s: so %p, ep %p, state %s, sbused %d", __func__, + ep->com.so, ep, states[ep->com.state], sbused(&ep->com.so->so_rcv)); switch (state_read(&ep->com)) { case MPA_REQ_SENT: @@ -601,11 +601,11 @@ process_data(struct c4iw_ep *ep) process_mpa_request(ep); break; default: - if (ep->com.so->so_rcv.sb_cc) - log(LOG_ERR, "%s: Unexpected streaming data. " - "ep %p, state %d, so %p, so_state 0x%x, sb_cc %u\n", + if (sbused(&ep->com.so->so_rcv)) + log(LOG_ERR, "%s: Unexpected streaming data. ep %p, " + "state %d, so %p, so_state 0x%x, sbused %u\n", __func__, ep, state_read(&ep->com), ep->com.so, - ep->com.so->so_state, ep->com.so->so_rcv.sb_cc); + ep->com.so->so_state, sbused(&ep->com.so->so_rcv)); break; } } Index: sys/dev/iscsi/icl.c =================================================================== --- sys/dev/iscsi/icl.c (.../head) (revision 270879) +++ sys/dev/iscsi/icl.c (.../projects/sendfile) (revision 270881) @@ -758,7 +758,7 @@ icl_receive_thread(void *arg) * is enough data received to read the PDU. */ SOCKBUF_LOCK(&so->so_rcv); - available = so->so_rcv.sb_cc; + available = sbavail(&so->so_rcv); if (available < ic->ic_receive_len) { so->so_rcv.sb_lowat = ic->ic_receive_len; cv_wait(&ic->ic_receive_cv, &so->so_rcv.sb_mtx); Index: sys/dev/ti/if_ti.c =================================================================== --- sys/dev/ti/if_ti.c (.../head) (revision 270879) +++ sys/dev/ti/if_ti.c (.../projects/sendfile) (revision 270881) @@ -1637,7 +1637,7 @@ ti_newbuf_jumbo(struct ti_softc *sc, int idx, stru m[i]->m_data = (void *)sf_buf_kva(sf[i]); m[i]->m_len = PAGE_SIZE; MEXTADD(m[i], sf_buf_kva(sf[i]), PAGE_SIZE, - sf_buf_mext, (void*)sf_buf_kva(sf[i]), sf[i], + sf_mext_free, (void*)sf_buf_kva(sf[i]), sf[i], 0, EXT_DISPOSABLE); m[i]->m_next = m[i+1]; } @@ -1702,7 +1702,7 @@ nobufs: if (m[i]) m_freem(m[i]); if (sf[i]) - sf_buf_mext((void *)sf_buf_kva(sf[i]), sf[i]); + sf_mext_free((void *)sf_buf_kva(sf[i]), sf[i]); } return (ENOBUFS); } Index: sys/vm/vm_pager.h =================================================================== --- sys/vm/vm_pager.h (.../head) (revision 270879) +++ sys/vm/vm_pager.h (.../projects/sendfile) (revision 270881) @@ -51,18 +51,21 @@ typedef vm_object_t pgo_alloc_t(void *, vm_ooffset struct ucred *); typedef void pgo_dealloc_t(vm_object_t); typedef int pgo_getpages_t(vm_object_t, vm_page_t *, int, int); +typedef int pgo_getpages_async_t(vm_object_t, vm_page_t *, int, int, + void(*)(void *), void *); typedef void pgo_putpages_t(vm_object_t, vm_page_t *, int, int, int *); typedef boolean_t pgo_haspage_t(vm_object_t, vm_pindex_t, int *, int *); typedef void pgo_pageunswapped_t(vm_page_t); struct pagerops { - pgo_init_t *pgo_init; /* Initialize pager. */ - pgo_alloc_t *pgo_alloc; /* Allocate pager. */ - pgo_dealloc_t *pgo_dealloc; /* Disassociate. */ - pgo_getpages_t *pgo_getpages; /* Get (read) page. */ - pgo_putpages_t *pgo_putpages; /* Put (write) page. */ - pgo_haspage_t *pgo_haspage; /* Does pager have page? */ - pgo_pageunswapped_t *pgo_pageunswapped; + pgo_init_t *pgo_init; /* Initialize pager. */ + pgo_alloc_t *pgo_alloc; /* Allocate pager. */ + pgo_dealloc_t *pgo_dealloc; /* Disassociate. */ + pgo_getpages_t *pgo_getpages; /* Get (read) page. */ + pgo_getpages_async_t *pgo_getpages_async; /* Get page asyncly. */ + pgo_putpages_t *pgo_putpages; /* Put (write) page. */ + pgo_haspage_t *pgo_haspage; /* Query page. */ + pgo_pageunswapped_t *pgo_pageunswapped; }; extern struct pagerops defaultpagerops; @@ -103,6 +106,8 @@ vm_object_t vm_pager_allocate(objtype_t, void *, v void vm_pager_bufferinit(void); void vm_pager_deallocate(vm_object_t); static __inline int vm_pager_get_pages(vm_object_t, vm_page_t *, int, int); +static __inline int vm_pager_get_pages_async(vm_object_t, vm_page_t *, int, + int, void(*)(void *), void *); static __inline boolean_t vm_pager_has_page(vm_object_t, vm_pindex_t, int *, int *); void vm_pager_init(void); vm_object_t vm_pager_object_lookup(struct pagerlst *, void *); @@ -131,6 +136,27 @@ vm_pager_get_pages( return (r); } +static __inline int +vm_pager_get_pages_async(vm_object_t object, vm_page_t *m, int count, + int reqpage, void (*iodone)(void *), void *arg) +{ + int r; + + VM_OBJECT_ASSERT_WLOCKED(object); + + if (*pagertab[object->type]->pgo_getpages_async == NULL) { + /* Emulate async operation. */ + r = vm_pager_get_pages(object, m, count, reqpage); + VM_OBJECT_WUNLOCK(object); + (iodone)(arg); + VM_OBJECT_WLOCK(object); + } else + r = (*pagertab[object->type]->pgo_getpages_async)(object, m, + count, reqpage, iodone, arg); + + return (r); +} + static __inline void vm_pager_put_pages( vm_object_t object, Index: sys/vm/vm_page.c =================================================================== --- sys/vm/vm_page.c (.../head) (revision 270879) +++ sys/vm/vm_page.c (.../projects/sendfile) (revision 270881) @@ -2692,6 +2692,8 @@ retrylookup: sleep = (allocflags & VM_ALLOC_IGN_SBUSY) != 0 ? vm_page_xbusied(m) : vm_page_busied(m); if (sleep) { + if (allocflags & VM_ALLOC_NOWAIT) + return (NULL); /* * Reference the page before unlocking and * sleeping so that the page daemon is less @@ -2719,6 +2721,8 @@ retrylookup: } m = vm_page_alloc(object, pindex, allocflags & ~VM_ALLOC_IGN_SBUSY); if (m == NULL) { + if (allocflags & VM_ALLOC_NOWAIT) + return (NULL); VM_OBJECT_WUNLOCK(object); VM_WAIT; VM_OBJECT_WLOCK(object); Index: sys/vm/vm_page.h =================================================================== --- sys/vm/vm_page.h (.../head) (revision 270879) +++ sys/vm/vm_page.h (.../projects/sendfile) (revision 270881) @@ -391,6 +391,7 @@ vm_page_t PHYS_TO_VM_PAGE(vm_paddr_t pa); #define VM_ALLOC_IGN_SBUSY 0x1000 /* vm_page_grab() only */ #define VM_ALLOC_NODUMP 0x2000 /* don't include in dump */ #define VM_ALLOC_SBUSY 0x4000 /* Shared busy the page */ +#define VM_ALLOC_NOWAIT 0x8000 /* Return NULL instead of sleeping */ #define VM_ALLOC_COUNT_SHIFT 16 #define VM_ALLOC_COUNT(count) ((count) << VM_ALLOC_COUNT_SHIFT) Index: sys/vm/vnode_pager.c =================================================================== --- sys/vm/vnode_pager.c (.../head) (revision 270879) +++ sys/vm/vnode_pager.c (.../projects/sendfile) (revision 270881) @@ -83,6 +83,8 @@ static int vnode_pager_input_smlfs(vm_object_t obj static int vnode_pager_input_old(vm_object_t object, vm_page_t m); static void vnode_pager_dealloc(vm_object_t); static int vnode_pager_getpages(vm_object_t, vm_page_t *, int, int); +static int vnode_pager_getpages_async(vm_object_t, vm_page_t *, int, int, + void(*)(void *), void *); static void vnode_pager_putpages(vm_object_t, vm_page_t *, int, boolean_t, int *); static boolean_t vnode_pager_haspage(vm_object_t, vm_pindex_t, int *, int *); static vm_object_t vnode_pager_alloc(void *, vm_ooffset_t, vm_prot_t, @@ -92,6 +94,7 @@ struct pagerops vnodepagerops = { .pgo_alloc = vnode_pager_alloc, .pgo_dealloc = vnode_pager_dealloc, .pgo_getpages = vnode_pager_getpages, + .pgo_getpages_async = vnode_pager_getpages_async, .pgo_putpages = vnode_pager_putpages, .pgo_haspage = vnode_pager_haspage, }; @@ -664,6 +667,40 @@ vnode_pager_getpages(vm_object_t object, vm_page_t return rtval; } +static int +vnode_pager_getpages_async(vm_object_t object, vm_page_t *m, int count, + int reqpage, void (*iodone)(void *), void *arg) +{ + int rtval; + struct vnode *vp; + int bytes = count * PAGE_SIZE; + + vp = object->handle; + VM_OBJECT_WUNLOCK(object); + rtval = VOP_GETPAGES_ASYNC(vp, m, bytes, reqpage, 0, iodone, arg); + KASSERT(rtval != EOPNOTSUPP, + ("vnode_pager: FS getpages_async not implemented\n")); + VM_OBJECT_WLOCK(object); + return rtval; +} + +struct getpages_softc { + vm_page_t *m; + struct buf *bp; + vm_object_t object; + vm_offset_t kva; + off_t foff; + int size; + int count; + int unmapped; + int reqpage; + void (*iodone)(void *); + void *arg; +}; + +int vnode_pager_generic_getpages_done(struct getpages_softc *); +void vnode_pager_generic_getpages_done_async(struct buf *); + /* * This is now called from local media FS's to operate against their * own vnodes if they fail to implement VOP_GETPAGES. @@ -670,11 +707,11 @@ vnode_pager_getpages(vm_object_t object, vm_page_t */ int vnode_pager_generic_getpages(struct vnode *vp, vm_page_t *m, int bytecount, - int reqpage) + int reqpage, void (*iodone)(void *), void *arg) { vm_object_t object; vm_offset_t kva; - off_t foff, tfoff, nextoff; + off_t foff; int i, j, size, bsize, first; daddr_t firstaddr, reqblock; struct bufobj *bo; @@ -684,6 +721,7 @@ vnode_pager_generic_getpages(struct vnode *vp, vm_ struct mount *mp; int count; int error; + int unmapped; object = vp->v_object; count = bytecount / PAGE_SIZE; @@ -891,8 +929,8 @@ vnode_pager_generic_getpages(struct vnode *vp, vm_ * requires mapped buffers. */ mp = vp->v_mount; - if (mp != NULL && (mp->mnt_kern_flag & MNTK_UNMAPPED_BUFS) != 0 && - unmapped_buf_allowed) { + unmapped = (mp != NULL && (mp->mnt_kern_flag & MNTK_UNMAPPED_BUFS)); + if (unmapped && unmapped_buf_allowed) { bp->b_data = unmapped_buf; bp->b_kvabase = unmapped_buf; bp->b_offset = 0; @@ -905,7 +943,6 @@ vnode_pager_generic_getpages(struct vnode *vp, vm_ /* build a minimal buffer header */ bp->b_iocmd = BIO_READ; - bp->b_iodone = bdone; KASSERT(bp->b_rcred == NOCRED, ("leaking read ucred")); KASSERT(bp->b_wcred == NOCRED, ("leaking write ucred")); bp->b_rcred = crhold(curthread->td_ucred); @@ -923,10 +960,88 @@ vnode_pager_generic_getpages(struct vnode *vp, vm_ /* do the input */ bp->b_iooffset = dbtob(bp->b_blkno); - bstrategy(bp); - bwait(bp, PVM, "vnread"); + if (iodone) { /* async */ + struct getpages_softc *sc; + sc = malloc(sizeof(*sc), M_TEMP, M_WAITOK); + + sc->m = m; + sc->bp = bp; + sc->object = object; + sc->foff = foff; + sc->size = size; + sc->count = count; + sc->unmapped = unmapped; + sc->reqpage = reqpage; + sc->kva = kva; + + sc->iodone = iodone; + sc->arg = arg; + + bp->b_iodone = vnode_pager_generic_getpages_done_async; + bp->b_caller1 = sc; + BUF_KERNPROC(bp); + bstrategy(bp); + /* Good bye! */ + } else { + struct getpages_softc sc; + + sc.m = m; + sc.bp = bp; + sc.object = object; + sc.foff = foff; + sc.size = size; + sc.count = count; + sc.unmapped = unmapped; + sc.reqpage = reqpage; + sc.kva = kva; + + bp->b_iodone = bdone; + bstrategy(bp); + bwait(bp, PVM, "vnread"); + error = vnode_pager_generic_getpages_done(&sc); + } + + return (error ? VM_PAGER_ERROR : VM_PAGER_OK); +} + +void +vnode_pager_generic_getpages_done_async(struct buf *bp) +{ + struct getpages_softc *sc = bp->b_caller1; + int error; + + error = vnode_pager_generic_getpages_done(sc); + + vm_page_xunbusy(sc->m[sc->reqpage]); + + sc->iodone(sc->arg); + + free(sc, M_TEMP); +} + +int +vnode_pager_generic_getpages_done(struct getpages_softc *sc) +{ + vm_object_t object; + vm_offset_t kva; + vm_page_t *m; + struct buf *bp; + off_t foff, tfoff, nextoff; + int i, size, count, unmapped, reqpage; + int error = 0; + + m = sc->m; + bp = sc->bp; + object = sc->object; + foff = sc->foff; + size = sc->size; + count = sc->count; + unmapped = sc->unmapped; + reqpage = sc->reqpage; + kva = sc->kva; + if ((bp->b_ioflags & BIO_ERROR) != 0) error = EIO; @@ -939,7 +1054,7 @@ vnode_pager_generic_getpages(struct vnode *vp, vm_ } if ((bp->b_flags & B_UNMAPPED) == 0) pmap_qremove(kva, count); - if (mp != NULL && (mp->mnt_kern_flag & MNTK_UNMAPPED_BUFS) != 0) { + if (unmapped) { bp->b_data = (caddr_t)kva; bp->b_kvabase = (caddr_t)kva; bp->b_flags &= ~B_UNMAPPED; @@ -995,7 +1110,8 @@ vnode_pager_generic_getpages(struct vnode *vp, vm_ if (error) { printf("vnode_pager_getpages: I/O read error\n"); } - return (error ? VM_PAGER_ERROR : VM_PAGER_OK); + + return (error); } /* Index: sys/vm/vnode_pager.h =================================================================== --- sys/vm/vnode_pager.h (.../head) (revision 270879) +++ sys/vm/vnode_pager.h (.../projects/sendfile) (revision 270881) @@ -41,7 +41,7 @@ #ifdef _KERNEL int vnode_pager_generic_getpages(struct vnode *vp, vm_page_t *m, - int count, int reqpage); + int count, int reqpage, void (*iodone)(void *), void *arg); int vnode_pager_generic_putpages(struct vnode *vp, vm_page_t *m, int count, boolean_t sync, int *rtvals); Index: usr.bin/netstat/inet.c =================================================================== --- usr.bin/netstat/inet.c (.../head) (revision 270879) +++ usr.bin/netstat/inet.c (.../projects/sendfile) (revision 270881) @@ -137,7 +137,7 @@ pcblist_sysctl(int proto, const char *name, char * static void sbtoxsockbuf(struct sockbuf *sb, struct xsockbuf *xsb) { - xsb->sb_cc = sb->sb_cc; + xsb->sb_cc = sb->sb_ccc; xsb->sb_hiwat = sb->sb_hiwat; xsb->sb_mbcnt = sb->sb_mbcnt; xsb->sb_mcnt = sb->sb_mcnt; @@ -479,7 +479,8 @@ protopr(u_long off, const char *name, int af1, int printf("%6u %6u %6u ", tp->t_sndrexmitpack, tp->t_rcvoopack, tp->t_sndzerowin); } else { - printf("%6u %6u ", so->so_rcv.sb_cc, so->so_snd.sb_cc); + printf("%6u %6u ", + so->so_rcv.sb_cc, so->so_snd.sb_cc); } if (numeric_port) { if (inp->inp_vflag & INP_IPV4) { Index: usr.bin/netstat/netgraph.c =================================================================== --- usr.bin/netstat/netgraph.c (.../head) (revision 270879) +++ usr.bin/netstat/netgraph.c (.../projects/sendfile) (revision 270881) @@ -119,7 +119,7 @@ netgraphprotopr(u_long off, const char *name, int if (Aflag) printf("%8lx ", (u_long) this); printf("%-5.5s %6u %6u ", - name, sockb.so_rcv.sb_cc, sockb.so_snd.sb_cc); + name, sockb.so_rcv.sb_ccc, sockb.so_snd.sb_ccc); /* Get info on associated node */ if (ngpcb.node_id == 0 || csock == -1) Index: usr.bin/netstat/unix.c =================================================================== --- usr.bin/netstat/unix.c (.../head) (revision 270879) +++ usr.bin/netstat/unix.c (.../projects/sendfile) (revision 270881) @@ -287,7 +287,8 @@ unixdomainpr(struct xunpcb *xunp, struct xsocket * } else { printf("%8lx %-6.6s %6u %6u %8lx %8lx %8lx %8lx", (long)so->so_pcb, socktype[so->so_type], so->so_rcv.sb_cc, - so->so_snd.sb_cc, (long)unp->unp_vnode, (long)unp->unp_conn, + so->so_snd.sb_cc, (long)unp->unp_vnode, + (long)unp->unp_conn, (long)LIST_FIRST(&unp->unp_refs), (long)LIST_NEXT(unp, unp_reflink)); } Index: usr.bin/systat/netstat.c =================================================================== --- usr.bin/systat/netstat.c (.../head) (revision 270879) +++ usr.bin/systat/netstat.c (.../projects/sendfile) (revision 270881) @@ -333,8 +333,8 @@ enter_kvm(struct inpcb *inp, struct socket *so, in struct netinfo *p; if ((p = enter(inp, state, proto)) != NULL) { - p->ni_rcvcc = so->so_rcv.sb_cc; - p->ni_sndcc = so->so_snd.sb_cc; + p->ni_rcvcc = so->so_rcv.sb_ccc; + p->ni_sndcc = so->so_snd.sb_ccc; } } Index: usr.bin/bluetooth/btsockstat/btsockstat.c =================================================================== --- usr.bin/bluetooth/btsockstat/btsockstat.c (.../head) (revision 270879) +++ usr.bin/bluetooth/btsockstat/btsockstat.c (.../projects/sendfile) (revision 270881) @@ -255,8 +255,8 @@ hcirawpr(kvm_t *kvmd, u_long addr) (unsigned long) pcb.so, (unsigned long) this, pcb.flags, - so.so_rcv.sb_cc, - so.so_snd.sb_cc, + so.so_rcv.sb_ccc, + so.so_snd.sb_ccc, pcb.addr.hci_node); } } /* hcirawpr */ @@ -303,8 +303,8 @@ l2caprawpr(kvm_t *kvmd, u_long addr) "%-8lx %-8lx %6d %6d %-17.17s\n", (unsigned long) pcb.so, (unsigned long) this, - so.so_rcv.sb_cc, - so.so_snd.sb_cc, + so.so_rcv.sb_ccc, + so.so_snd.sb_ccc, bdaddrpr(&pcb.src, NULL, 0)); } } /* l2caprawpr */ @@ -361,8 +361,8 @@ l2cappr(kvm_t *kvmd, u_long addr) fprintf(stdout, "%-8lx %6d %6d %-17.17s/%-5d %-17.17s %-5d %s\n", (unsigned long) this, - so.so_rcv.sb_cc, - so.so_snd.sb_cc, + so.so_rcv.sb_ccc, + so.so_snd.sb_ccc, bdaddrpr(&pcb.src, local, sizeof(local)), pcb.psm, bdaddrpr(&pcb.dst, remote, sizeof(remote)), @@ -467,8 +467,8 @@ rfcommpr(kvm_t *kvmd, u_long addr) fprintf(stdout, "%-8lx %6d %6d %-17.17s %-17.17s %-4d %-4d %s\n", (unsigned long) this, - so.so_rcv.sb_cc, - so.so_snd.sb_cc, + so.so_rcv.sb_ccc, + so.so_snd.sb_ccc, bdaddrpr(&pcb.src, local, sizeof(local)), bdaddrpr(&pcb.dst, remote, sizeof(remote)), pcb.channel, --hTiIB9CRvBOLTyqY-- From owner-freebsd-arch@FreeBSD.ORG Sun Aug 31 20:15:40 2014 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id A5E0D898; Sun, 31 Aug 2014 20:15:40 +0000 (UTC) Received: from elvis.mu.org (elvis.mu.org [IPv6:2001:470:1f05:b76::196]) by mx1.freebsd.org (Postfix) with ESMTP id 890D713DF; Sun, 31 Aug 2014 20:15:40 +0000 (UTC) Received: from u10-2-16-021.office.norse-data.com (unknown [50.204.88.51]) by elvis.mu.org (Postfix) with ESMTPSA id A749F346DDEF; Sun, 31 Aug 2014 13:15:39 -0700 (PDT) Message-ID: <540382E2.3040004@freebsd.org> Date: Sun, 31 Aug 2014 13:17:38 -0700 From: Alfred Perlstein Organization: FreeBSD User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9; rv:31.0) Gecko/20100101 Thunderbird/31.0 MIME-Version: 1.0 To: freebsd-arch@freebsd.org, Gleb Smirnoff Subject: Re: [CFT/review] new sendfile(2) References: <20140529102054.GX50679@FreeBSD.org> <20140729232404.GF43962@funkthat.com> <20140831165022.GE7693@FreeBSD.org> In-Reply-To: <20140831165022.GE7693@FreeBSD.org> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 31 Aug 2014 20:15:40 -0000 On 8/31/14 9:50 AM, Gleb Smirnoff wrote: > John-Mark, > > On Tue, Jul 29, 2014 at 04:24:04PM -0700, John-Mark Gurney wrote: > J> Gleb Smirnoff wrote this message on Thu, May 29, 2014 at 14:20 +0400: > J> > One of the approaches we are experimenting with is new sendfile(2) > J> > implementation, that doesn't block on the I/O done from the file > J> > descriptor. > J> > J> I know this is a reply to an old message, but... > > I am also sorry for late reply on late reply :) > > J> How is this different from: > J> SF_NODISKIO. This flag causes any sendfile() call which would > J> block on disk I/O to instead return EBUSY. Busy servers may bene- > J> fit by transferring requests that would block to a separate I/O > J> worker thread. > > It is very different. New sendfile(2) simply doesn't block and returns > success :) The I/O completes outside of syscall context. > > J> > 1) Split of socket buffer sb_cc field into sb_acc and sb_ccc. Where > J> > sb_acc stands for "available character count" and sb_ccc is "claimed > J> > character count". This allows us to write a data to a socket, that is > J> > not ready yet. The data sits in the socket, consumes its space, and > J> > keeps itself in the right order with earlier or later writes to socket. > J> > But it can be send only after it is marked as ready. This change is > J> > split across many files. > J> > J> This change really should be split out and possibly committed seperately > J> after a review by the proper people... > > Of course. It actually makes 80% of the volume of the patch. This change has high value, although it has a lot of changes for what appears to be an interesting edge case. As I read this it really confused me, can't this be accomplished by utilizing the socket's callback and pointer parameter instead? Basically you would put all that accounting inside a struct hung off of so->sb_snd.sb_upcallarg and set a callback to do your queuing. That is how you can async drive thread to queue more data, in fact by using aio to read/write to the socket from a stream. It should be relatively simple, the only tricky part being that you'll need to watch your locks and sleeps inside the so->sb_snd.sb_upcall function. Basically move the sb_acc and all of that into a special struct hung off of so->sb_snd.sb_upcallarg and leverage so->sb_snd.sb_upcall to queue more data as space becomes available. At least that's how I would have tried to accomplish this... but maybe you went down this path and hit a non-starter? -Alfred From owner-freebsd-arch@FreeBSD.ORG Sun Aug 31 22:05:20 2014 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 7CBDC63A; Sun, 31 Aug 2014 22:05:20 +0000 (UTC) Received: from phk.freebsd.dk (phk.freebsd.dk [130.225.244.222]) by mx1.freebsd.org (Postfix) with ESMTP id B83AF1F4B; Sun, 31 Aug 2014 22:05:19 +0000 (UTC) Received: from critter.freebsd.dk (unknown [192.168.60.3]) by phk.freebsd.dk (Postfix) with ESMTP id 3D11116D0; Sun, 31 Aug 2014 22:05:13 +0000 (UTC) Received: from critter.freebsd.dk (localhost [127.0.0.1]) by critter.freebsd.dk (8.14.9/8.14.9) with ESMTP id s7VM5B5v002771; Sun, 31 Aug 2014 22:05:12 GMT (envelope-from phk@phk.freebsd.dk) To: Alfred Perlstein Subject: Re: script(2) [was: [CFT/review] new sendfile(2)] In-reply-to: <540382E2.3040004@freebsd.org> From: "Poul-Henning Kamp" References: <20140529102054.GX50679@FreeBSD.org> <20140729232404.GF43962@funkthat.com> <20140831165022.GE7693@FreeBSD.org> <540382E2.3040004@freebsd.org> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-ID: <2769.1409522711.1@critter.freebsd.dk> Date: Sun, 31 Aug 2014 22:05:11 +0000 Message-ID: <2770.1409522711@critter.freebsd.dk> Cc: Gleb Smirnoff , freebsd-arch@freebsd.org X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 31 Aug 2014 22:05:20 -0000 -------- In message <540382E2.3040004@freebsd.org>, Alfred Perlstein writes: Can I inject an old idea whose time may finally have arrived ? The basic thing we are trying to do here is to avoid userland/kernel context-switches, because they are so expensive. This is a very old problem, the TTY line-disciplines, PCAP, accept filters and sendfile are all hacks that try to "optimize" specific use-cases. Imagine we instead define a byte-code-engine which interprets a string of commands, sort of like the pcap filtering engine already does. The corresponding syscall would be "follow_the_script(2)" The first set of commands in the language would be a sensible subset of syscalls and library functions open file close file accept() read() write() memcpy() ... All of these functions works *exactly* the same as they would have in userland, arguments mean exactly the same etc. The value of this facility explodes as we add smarter commands which can do stuff we would normally have to return to userland for: if {...} else {...} for(;;) {...} do {...} while() move bytes (directly) from fd to fd (optional: in the background) read at least N bytes write with timeout interpret [bl]e{8,16,32,64} at address search for pattern X By suitably defining commands in the bytecode, pretty much all of sendfile and accept-filters can be implemented using this facility instead. PCAP and ldisc are probably to entrenched to be worth the bother, and neither are not relevant in typical network server usage. One reason I think this ideas time as come is that the current proposal for HTTP/2.0 is a mess, and will be very hard to implement to anything approaching wire-speed without such a facility. And yes, if we go far enough, you can basically push an entire static-content HTTP/1.1 server into the kernel that way... Of course the language has to be safe to execute in the kernel: Hard boundary checks, copyin/copyout and all that, but that is not rocket science: PCAP solved it ages ago. And it's really not that hard to implement... Poul-Henning -- Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 phk@FreeBSD.ORG | TCP/IP since RFC 956 FreeBSD committer | BSD since 4.3-tahoe Never attribute to malice what can adequately be explained by incompetence. From owner-freebsd-arch@FreeBSD.ORG Sun Aug 31 23:33:27 2014 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 8DFFE154 for ; Sun, 31 Aug 2014 23:33:27 +0000 (UTC) Received: from elvis.mu.org (elvis.mu.org [IPv6:2001:470:1f05:b76::196]) by mx1.freebsd.org (Postfix) with ESMTP id 6E9B51952 for ; Sun, 31 Aug 2014 23:33:27 +0000 (UTC) Received: from u10-2-16-021.office.norse-data.com (unknown [50.204.88.51]) by elvis.mu.org (Postfix) with ESMTPSA id 5D5C2346DE13; Sun, 31 Aug 2014 16:33:26 -0700 (PDT) Message-ID: <5403B13C.60008@freebsd.org> Date: Sun, 31 Aug 2014 16:35:24 -0700 From: Alfred Perlstein Organization: FreeBSD User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9; rv:31.0) Gecko/20100101 Thunderbird/31.0 MIME-Version: 1.0 To: freebsd-arch@freebsd.org, Poul-Henning Kamp Subject: Re: script(2) [was: [CFT/review] new sendfile(2)] References: <20140529102054.GX50679@FreeBSD.org> <20140729232404.GF43962@funkthat.com> <20140831165022.GE7693@FreeBSD.org> <540382E2.3040004@freebsd.org> <2770.1409522711@critter.freebsd.dk> In-Reply-To: <2770.1409522711@critter.freebsd.dk> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 31 Aug 2014 23:33:27 -0000 Lua at the syscall level makes sense. :) -Alfred On 8/31/14 3:05 PM, Poul-Henning Kamp wrote: > -------- > In message <540382E2.3040004@freebsd.org>, Alfred Perlstein writes: > > Can I inject an old idea whose time may finally have arrived ? > > The basic thing we are trying to do here is to avoid userland/kernel > context-switches, because they are so expensive. > > This is a very old problem, the TTY line-disciplines, PCAP, accept > filters and sendfile are all hacks that try to "optimize" specific > use-cases. > > Imagine we instead define a byte-code-engine which interprets a > string of commands, sort of like the pcap filtering engine already > does. The corresponding syscall would be "follow_the_script(2)" > > The first set of commands in the language would be a sensible > subset of syscalls and library functions > > open file > close file > accept() > read() > write() > memcpy() > ... > > All of these functions works *exactly* the same as they would have > in userland, arguments mean exactly the same etc. > > The value of this facility explodes as we add smarter commands which > can do stuff we would normally have to return to userland for: > > if {...} else {...} > for(;;) {...} > do {...} while() > move bytes (directly) from fd to fd (optional: in the background) > read at least N bytes > write with timeout > interpret [bl]e{8,16,32,64} at address > search for pattern X > > By suitably defining commands in the bytecode, pretty much all of > sendfile and accept-filters can be implemented using > this facility instead. > > PCAP and ldisc are probably to entrenched to be worth the bother, > and neither are not relevant in typical network server usage. > > One reason I think this ideas time as come is that the current > proposal for HTTP/2.0 is a mess, and will be very hard to implement > to anything approaching wire-speed without such a facility. > > And yes, if we go far enough, you can basically push an entire > static-content HTTP/1.1 server into the kernel that way... > > Of course the language has to be safe to execute in the kernel: > Hard boundary checks, copyin/copyout and all that, but that is > not rocket science: PCAP solved it ages ago. > > And it's really not that hard to implement... > > Poul-Henning > From owner-freebsd-arch@FreeBSD.ORG Mon Sep 1 03:10:53 2014 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id CF5BEA2B; Mon, 1 Sep 2014 03:10:53 +0000 (UTC) Received: from cell.glebius.int.ru (glebius.int.ru [81.19.69.10]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "cell.glebius.int.ru", Issuer "cell.glebius.int.ru" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id C04A61E5D; Mon, 1 Sep 2014 03:10:52 +0000 (UTC) Received: from cell.glebius.int.ru (localhost [127.0.0.1]) by cell.glebius.int.ru (8.14.9/8.14.9) with ESMTP id s813AnQ9089758 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO); Mon, 1 Sep 2014 07:10:49 +0400 (MSK) (envelope-from glebius@FreeBSD.org) Received: (from glebius@localhost) by cell.glebius.int.ru (8.14.9/8.14.9/Submit) id s813AnHP089757; Mon, 1 Sep 2014 07:10:49 +0400 (MSK) (envelope-from glebius@FreeBSD.org) X-Authentication-Warning: cell.glebius.int.ru: glebius set sender to glebius@FreeBSD.org using -f Date: Mon, 1 Sep 2014 07:10:49 +0400 From: Gleb Smirnoff To: Alfred Perlstein Subject: Re: [CFT/review] new sendfile(2) Message-ID: <20140901031049.GF7693@glebius.int.ru> References: <20140529102054.GX50679@FreeBSD.org> <20140729232404.GF43962@funkthat.com> <20140831165022.GE7693@FreeBSD.org> <540382E2.3040004@freebsd.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <540382E2.3040004@freebsd.org> User-Agent: Mutt/1.5.23 (2014-03-12) Cc: freebsd-arch@freebsd.org X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 01 Sep 2014 03:10:53 -0000 Alfred, On Sun, Aug 31, 2014 at 01:17:38PM -0700, Alfred Perlstein wrote: A> > J> > 1) Split of socket buffer sb_cc field into sb_acc and sb_ccc. Where A> > J> > sb_acc stands for "available character count" and sb_ccc is "claimed A> > J> > character count". This allows us to write a data to a socket, that is A> > J> > not ready yet. The data sits in the socket, consumes its space, and A> > J> > keeps itself in the right order with earlier or later writes to socket. A> > J> > But it can be send only after it is marked as ready. This change is A> > J> > split across many files. A> > J> A> > J> This change really should be split out and possibly committed seperately A> > J> after a review by the proper people... A> > A> > Of course. It actually makes 80% of the volume of the patch. A> A> This change has high value, although it has a lot of changes for what A> appears to be an interesting edge case. A> A> As I read this it really confused me, can't this be accomplished by A> utilizing the socket's callback and pointer parameter instead? A> A> Basically you would put all that accounting inside a struct hung off of A> so->sb_snd.sb_upcallarg and set a callback to do your queuing. A> A> That is how you can async drive thread to queue more data, in fact by A> using aio to read/write to the socket from a stream. A> A> It should be relatively simple, the only tricky part being that you'll A> need to watch your locks and sleeps inside the so->sb_snd.sb_upcall A> function. A> A> Basically move the sb_acc and all of that into a special struct hung off A> of so->sb_snd.sb_upcallarg and leverage so->sb_snd.sb_upcall to queue A> more data as space becomes available. A> A> At least that's how I would have tried to accomplish this... but maybe A> you went down this path and hit a non-starter? AFAIU your proposal, you suggest to go even harder and make sendfile(2) not blocking on socket buffer being full. So that we can do sendfile(fd, s, 0, 1 Gb, ...) and return immediately. And kernel would bounce between disk and socket buffer until the end. This is another step towards Poul-Henning idea, that is in near email. No, I am not there yet :) My sendfile(2) simply doesn't block on the disk I/O. If the socket buffer is full it will either block or return EAGAIN, if it is SS_NBIO. The change to socket buffers in the patch is quite primitive. It touches so many files just because it substitutes bare access to sockbuf field with an inline. -- Totus tuus, Glebius. From owner-freebsd-arch@FreeBSD.ORG Mon Sep 1 05:38:02 2014 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 176CA645; Mon, 1 Sep 2014 05:38:02 +0000 (UTC) Received: from phk.freebsd.dk (phk.freebsd.dk [130.225.244.222]) by mx1.freebsd.org (Postfix) with ESMTP id 80C7C1562; Mon, 1 Sep 2014 05:38:01 +0000 (UTC) Received: from critter.freebsd.dk (unknown [192.168.60.3]) by phk.freebsd.dk (Postfix) with ESMTP id 01BE81598; Mon, 1 Sep 2014 05:37:59 +0000 (UTC) Received: from critter.freebsd.dk (localhost [127.0.0.1]) by critter.freebsd.dk (8.14.9/8.14.9) with ESMTP id s815bxSn004205; Mon, 1 Sep 2014 05:37:59 GMT (envelope-from phk@phk.freebsd.dk) To: Alfred Perlstein Subject: Re: script(2) [was: [CFT/review] new sendfile(2)] In-reply-to: <5403B13C.60008@freebsd.org> From: "Poul-Henning Kamp" References: <20140529102054.GX50679@FreeBSD.org> <20140729232404.GF43962@funkthat.com> <20140831165022.GE7693@FreeBSD.org> <540382E2.3040004@freebsd.org> <2770.1409522711@critter.freebsd.dk> <5403B13C.60008@freebsd.org> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-ID: <4203.1409549879.1@critter.freebsd.dk> Date: Mon, 01 Sep 2014 05:37:59 +0000 Message-ID: <4204.1409549879@critter.freebsd.dk> Cc: freebsd-arch@freebsd.org X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 01 Sep 2014 05:38:02 -0000 -------- In message <5403B13C.60008@freebsd.org>, Alfred Perlstein writes: >Lua at the syscall level makes sense. :) I doubt it. We're looking at high performance stuff and we don't want a silly parser and string processing involved. -- Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 phk@FreeBSD.ORG | TCP/IP since RFC 956 FreeBSD committer | BSD since 4.3-tahoe Never attribute to malice what can adequately be explained by incompetence. From owner-freebsd-arch@FreeBSD.ORG Mon Sep 1 12:30:52 2014 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id E027CD49; Mon, 1 Sep 2014 12:30:52 +0000 (UTC) Received: from zxy.spb.ru (zxy.spb.ru [195.70.199.98]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 99E6714B1; Mon, 1 Sep 2014 12:30:52 +0000 (UTC) Received: from slw by zxy.spb.ru with local (Exim 4.82 (FreeBSD)) (envelope-from ) id 1XOQkt-000I4Y-HB; Mon, 01 Sep 2014 16:30:43 +0400 Date: Mon, 1 Sep 2014 16:30:43 +0400 From: Slawa Olhovchenkov To: Poul-Henning Kamp Subject: Re: script(2) [was: [CFT/review] new sendfile(2)] Message-ID: <20140901123043.GA15867@zxy.spb.ru> References: <20140529102054.GX50679@FreeBSD.org> <20140729232404.GF43962@funkthat.com> <20140831165022.GE7693@FreeBSD.org> <540382E2.3040004@freebsd.org> <2770.1409522711@critter.freebsd.dk> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <2770.1409522711@critter.freebsd.dk> User-Agent: Mutt/1.5.23 (2014-03-12) X-SA-Exim-Connect-IP: X-SA-Exim-Mail-From: slw@zxy.spb.ru X-SA-Exim-Scanned: No (on zxy.spb.ru); SAEximRunCond expanded to false Cc: Alfred Perlstein , Gleb Smirnoff , freebsd-arch@freebsd.org X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 01 Sep 2014 12:30:53 -0000 On Sun, Aug 31, 2014 at 10:05:11PM +0000, Poul-Henning Kamp wrote: > -------- > In message <540382E2.3040004@freebsd.org>, Alfred Perlstein writes: > > Can I inject an old idea whose time may finally have arrived ? > > The basic thing we are trying to do here is to avoid userland/kernel > context-switches, because they are so expensive. > > This is a very old problem, the TTY line-disciplines, PCAP, accept > filters and sendfile are all hacks that try to "optimize" specific > use-cases. And aven for firewalling, yes? Firewall rules (up to Level 7) may be related to some trie (similar regex tree, radix trie) with advanced match and extract rules in nodes and leafs. From owner-freebsd-arch@FreeBSD.ORG Mon Sep 1 20:04:22 2014 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 79BE8B44 for ; Mon, 1 Sep 2014 20:04:22 +0000 (UTC) Received: from elvis.mu.org (elvis.mu.org [192.203.228.196]) by mx1.freebsd.org (Postfix) with ESMTP id 64FF61B98 for ; Mon, 1 Sep 2014 20:04:22 +0000 (UTC) Received: from u10-2-16-021.office.norse-data.com (unknown [50.204.88.51]) by elvis.mu.org (Postfix) with ESMTPSA id 43929346DE11; Mon, 1 Sep 2014 13:04:16 -0700 (PDT) Message-ID: <5404D1B8.9010006@mu.org> Date: Mon, 01 Sep 2014 13:06:16 -0700 From: Alfred Perlstein User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9; rv:31.0) Gecko/20100101 Thunderbird/31.0 MIME-Version: 1.0 To: freebsd-arch@freebsd.org, Poul-Henning Kamp Subject: Re: script(2) [was: [CFT/review] new sendfile(2)] References: <20140529102054.GX50679@FreeBSD.org> <20140729232404.GF43962@funkthat.com> <20140831165022.GE7693@FreeBSD.org> <540382E2.3040004@freebsd.org> <2770.1409522711@critter.freebsd.dk> <5403B13C.60008@freebsd.org> <4204.1409549879@critter.freebsd.dk> In-Reply-To: <4204.1409549879@critter.freebsd.dk> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 01 Sep 2014 20:04:22 -0000 On 8/31/14 10:37 PM, Poul-Henning Kamp wrote: > -------- > In message <5403B13C.60008@freebsd.org>, Alfred Perlstein writes: > >> Lua at the syscall level makes sense. :) > I doubt it. > > We're looking at high performance stuff and we don't want a silly > parser and string processing involved. > Would it really matter? Lua is bytecode, sure it's "slow" but if the API exported to it has hooks for things like "read mbufs from socket" as opposed to "copyout data" then you can do zero copy without context switches. In addition you get a language that people know as opposed to YADSL (yet another domain specific language). http://duktape.org/ :) -Alfred From owner-freebsd-arch@FreeBSD.ORG Mon Sep 1 20:57:16 2014 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id D95B5E55; Mon, 1 Sep 2014 20:57:16 +0000 (UTC) Received: from mail.iXsystems.com (mail.ixsystems.com [12.229.62.4]) (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits)) (Client CN "*.ixsystems.com", Issuer "Go Daddy Secure Certification Authority" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 3CAE4107E; Mon, 1 Sep 2014 20:57:16 +0000 (UTC) Received: from localhost (mail.ixsystems.com [10.2.55.1]) by mail.iXsystems.com (Postfix) with ESMTP id DF02D7C0C6; Mon, 1 Sep 2014 13:57:15 -0700 (PDT) Received: from mail.iXsystems.com ([10.2.55.1]) by localhost (mail.ixsystems.com [10.2.55.1]) (maiad, port 10024) with ESMTP id 54770-02; Mon, 1 Sep 2014 13:57:15 -0700 (PDT) Received: from [10.8.0.34] (unknown [10.8.0.34]) (using TLSv1 with cipher AES128-SHA (128/128 bits)) (No client certificate requested) by mail.iXsystems.com (Postfix) with ESMTPSA id 2CC137C0C1; Mon, 1 Sep 2014 13:57:13 -0700 (PDT) Content-Type: text/plain; charset=windows-1252 Mime-Version: 1.0 (Mac OS X Mail 7.3 \(1878.6\)) Subject: Re: script(2) [was: [CFT/review] new sendfile(2)] From: Jordan Hubbard In-Reply-To: <2770.1409522711@critter.freebsd.dk> Date: Mon, 1 Sep 2014 13:57:12 -0700 Content-Transfer-Encoding: quoted-printable Message-Id: References: <20140529102054.GX50679@FreeBSD.org> <20140729232404.GF43962@funkthat.com> <20140831165022.GE7693@FreeBSD.org> <540382E2.3040004@freebsd.org> <2770.1409522711@critter.freebsd.dk> To: Poul-Henning Kamp X-Mailer: Apple Mail (2.1878.6) Cc: Alfred Perlstein , Gleb Smirnoff , freebsd-arch@freebsd.org X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 01 Sep 2014 20:57:17 -0000 On Aug 31, 2014, at 3:05 PM, Poul-Henning Kamp = wrote: > Can I inject an old idea whose time may finally have arrived ? > [ =85 ] > Imagine we instead define a byte-code-engine which interprets a > string of commands, sort of like the pcap filtering engine already > does. The corresponding syscall would be "follow_the_script(2)" Having seen this pattern used for several kernel-related things in a few = of my former lives, I think this idea has a lot of merit, though I=92d = be careful not to conceptualize it purely (or only) as an =93engine for = off-loading work to in order to avoid the kernel/userland boundary cost=94= since I think the concept has a much broader application than that. It = can also obviously be used for match filters (for the packet capture = example already given) or security policies (firewalling, sandboxing) = that are in the kernel simply because that=92s the most logical place to = put them, and that means that the =93script=94 may be a full-on complex = task or a really short little script fragment (scriptlet?) which = potentially needs access to a lot more of the kernel than the file = primitives. If it=92s a firewall related task, obviously it wants to be = able to interpose itself into the networking path. If it=92s a sandbox = rule script, it=92s going to need to be able to gate access to a wide = variety of kernel services (not unlike all the checks that phk added for = jails). Perhaps that=92s what phk meant and I=92m just reading his = original message too narrowly. That=92s also why I think the rubber will most meet the road in figuring = out just how many =93bytecode primitives=94 to surface, a far more = bike-sheddy topic than the actual higher-level description format, = though we also have plenty of empirical evidence to suggest that the MAC = hook mechanism in TrustedBSD already pretty much describes all of the = logical places to place the hooks and therefore also suggests what the = full enumeration of bytecode primitives might look like. If TrustedBSD = added a hook point, consider creating a corresponding primitive which = can act on the corresponding subject/target at that point and boom, = there=92s your trail of breadcrumbs to follow. I would also add a corresponding DFA engine for acting on paths, since I = think that=92s a necessary sub-component of the bytecode engine. Unix = is path oriented. Allow all of the relevant primitives which act on = files to have a DFA for matching the ones it applies to and you=92ve = really got something pretty powerful. When we implemented application sandboxing in OS X and iOS, we chose to = use Scheme as the implementation language (see /usr/share/sandbox on any = OS X system for a good selection of examples) and a =93sandbox compiler=94= process to turn that (and the regex DFAs) into bytecode, but we could = have honestly chosen almost any scripting language so I really don=92t = think this discussion needs to get too hung up on the selection of a = higher-level language. You want Lua? Sure. Just make it a =93rule=94 = that the kernel itself doesn=92t have to know beans about Lua and some = userland agent or library will turn the Lua code into the appropriate = bytecode, and now you=92ve got the ability to write your bytecode in = Lua. When Lua is no longer in vogue and has been replaced by some other = new hotness, that library/agent can be written too without having to = change a line of kernel code. Yay for proper abstraction layers and not = stuffing interpreters where they don=92t belong anyway! That said, I=92ll also point out that we already have a bytecode = =93engine=94 in the kernel and a corresponding higher-level language = which compiles into it. That language is called D and the bytecode = interpreter is the DTrace support code. But all of you already knew = that. The fact that Sun only chose to use it for instrumentation and = debugging may be coloring everyone=92s thinking insofar as what it=92s = theoretical limits as a more general purpose mechanism are, I don=92t = know. We can only speculate as to how much farther Sun might have = taken it if they had survived as a company (if each dtrace =93worker=94 = were a kernel thread, for example, they could have added looping = primitives and other features which assumed a longer lifetime for given = units of work). It=92s an interesting topic of discussion, that=92s for sure. I had a = lot of fun with the sandboxing stuff at Apple. It would be interesting = to see where FreeBSD could go with an even more general purpose = mechanism. - Jordan From owner-freebsd-arch@FreeBSD.ORG Mon Sep 1 21:30:37 2014 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 401EC5ED for ; Mon, 1 Sep 2014 21:30:37 +0000 (UTC) Received: from hergotha.csail.mit.edu (wollman-1-pt.tunnel.tserv4.nyc4.ipv6.he.net [IPv6:2001:470:1f06:ccb::2]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id D482713BF for ; Mon, 1 Sep 2014 21:30:36 +0000 (UTC) Received: from hergotha.csail.mit.edu (localhost [127.0.0.1]) by hergotha.csail.mit.edu (8.14.7/8.14.7) with ESMTP id s81LUX14009136; Mon, 1 Sep 2014 17:30:34 -0400 (EDT) (envelope-from wollman@hergotha.csail.mit.edu) Received: (from wollman@localhost) by hergotha.csail.mit.edu (8.14.7/8.14.4/Submit) id s81LUWCs009135; Mon, 1 Sep 2014 17:30:32 -0400 (EDT) (envelope-from wollman) Date: Mon, 1 Sep 2014 17:30:32 -0400 (EDT) From: Garrett Wollman Message-Id: <201409012130.s81LUWCs009135@hergotha.csail.mit.edu> To: jkh@mail.turbofuzz.com Subject: Re: script(2) [was: [CFT/review] new sendfile(2)] X-Newsgroups: mit.lcs.mail.freebsd-arch References: <20140529102054.GX50679@FreeBSD.org> <20140729232404.GF43962@funkthat.com> <20140831165022.GE7693@FreeBSD.org> <540382E2.3040004@freebsd.org> <2770.1409522711@critter.freebsd.dk> Organization: none X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.4.3 (hergotha.csail.mit.edu [127.0.0.1]); Mon, 01 Sep 2014 17:30:34 -0400 (EDT) X-Spam-Status: No, score=-1.0 required=5.0 tests=ALL_TRUSTED autolearn=disabled version=3.4.0 X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on hergotha.csail.mit.edu Cc: freebsd-arch@freebsd.org X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 01 Sep 2014 21:30:37 -0000 In article , Jordan writes: >Having seen this pattern used for several kernel-related things in a few >of my former lives, I think this idea has a lot of merit, though I’d be >careful not to conceptualize it purely (or only) as an “engine for >off-loading work to in order to avoid the kernel/userland boundary cost” >since I think the concept has a much broader application than that. [and more good stuff] This is all heading down the road of Exokernel. Except that Exokernel did it with proof-carrying native code.[1] Once they had that (and a few other related pieces), they could use kernel code to define only the bare minimum security properties and push everything else into libraries -- network, filesystems, and so on -- without taking the huge performance hit of the pure Mach-style implementation with privilege-management servers and message-passing and stuff.[2] Other similar systems (of which I think BPF was the first, and certainly one of the first to be widely deployed) avoid the need for a rigorous proof of safety by deliberately limiting the computational power of their virtual machines. -GAWollman [1] If I remember correctly. It's been a long time. I should ask Frans, but it's a holiday so neither of us are in the office. [2] Once every decade or so, the concept of a "library operating system" comes back into vogue before being steamrolled by the market. This time around it might have more staying power, because manycore is here and monolithic operating systems do not scale nicely on 512-core processors. From owner-freebsd-arch@FreeBSD.ORG Mon Sep 1 21:34:08 2014 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id EB4F08B3 for ; Mon, 1 Sep 2014 21:34:08 +0000 (UTC) Received: from phk.freebsd.dk (phk.freebsd.dk [130.225.244.222]) by mx1.freebsd.org (Postfix) with ESMTP id AD05714D0 for ; Mon, 1 Sep 2014 21:34:08 +0000 (UTC) Received: from critter.freebsd.dk (unknown [192.168.60.3]) by phk.freebsd.dk (Postfix) with ESMTP id 6FAD81598; Mon, 1 Sep 2014 21:34:07 +0000 (UTC) Received: from critter.freebsd.dk (localhost [127.0.0.1]) by critter.freebsd.dk (8.14.9/8.14.9) with ESMTP id s81LY5XX040211; Mon, 1 Sep 2014 21:34:06 GMT (envelope-from phk@phk.freebsd.dk) To: Alfred Perlstein Subject: Re: script(2) [was: [CFT/review] new sendfile(2)] In-reply-to: <5404D1B8.9010006@mu.org> From: "Poul-Henning Kamp" References: <20140529102054.GX50679@FreeBSD.org> <20140729232404.GF43962@funkthat.com> <20140831165022.GE7693@FreeBSD.org> <540382E2.3040004@freebsd.org> <2770.1409522711@critter.freebsd.dk> <5403B13C.60008@freebsd.org> <4204.1409549879@critter.freebsd.dk> <5404D1B8.9010006@mu.org> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-ID: <40209.1409607245.1@critter.freebsd.dk> Date: Mon, 01 Sep 2014 21:34:05 +0000 Message-ID: <40210.1409607245@critter.freebsd.dk> Cc: freebsd-arch@freebsd.org X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 01 Sep 2014 21:34:09 -0000 -------- In message <5404D1B8.9010006@mu.org>, Alfred Perlstein writes: >> In message <5403B13C.60008@freebsd.org>, Alfred Perlstein writes: >> >>> Lua at the syscall level makes sense. :) >> I doubt it. >> >> We're looking at high performance stuff and we don't want a silly >> parser and string processing involved. >> >Would it really matter? Lua is bytecode, [...] I though you wanted the interpreter in the kernel. If it's only the executor, then ... maybe... We'd need to do a serious audit of the lua bytecode first... -- Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 phk@FreeBSD.ORG | TCP/IP since RFC 956 FreeBSD committer | BSD since 4.3-tahoe Never attribute to malice what can adequately be explained by incompetence. From owner-freebsd-arch@FreeBSD.ORG Mon Sep 1 21:37:47 2014 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id F09419DA for ; Mon, 1 Sep 2014 21:37:47 +0000 (UTC) Received: from phk.freebsd.dk (phk.freebsd.dk [130.225.244.222]) by mx1.freebsd.org (Postfix) with ESMTP id B266F1508 for ; Mon, 1 Sep 2014 21:37:47 +0000 (UTC) Received: from critter.freebsd.dk (unknown [192.168.60.3]) by phk.freebsd.dk (Postfix) with ESMTP id DB8C616D0; Mon, 1 Sep 2014 21:37:46 +0000 (UTC) Received: from critter.freebsd.dk (localhost [127.0.0.1]) by critter.freebsd.dk (8.14.9/8.14.9) with ESMTP id s81Lbj7f040237; Mon, 1 Sep 2014 21:37:46 GMT (envelope-from phk@phk.freebsd.dk) To: Garrett Wollman Subject: Re: script(2) [was: [CFT/review] new sendfile(2)] In-reply-to: <201409012130.s81LUWCs009135@hergotha.csail.mit.edu> From: "Poul-Henning Kamp" References: <20140529102054.GX50679@FreeBSD.org> <20140729232404.GF43962@funkthat.com> <20140831165022.GE7693@FreeBSD.org> <540382E2.3040004@freebsd.org> <2770.1409522711@critter.freebsd.dk> <201409012130.s81LUWCs009135@hergotha.csail.mit.edu> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-ID: <40235.1409607465.1@critter.freebsd.dk> Content-Transfer-Encoding: quoted-printable Date: Mon, 01 Sep 2014 21:37:45 +0000 Message-ID: <40236.1409607465@critter.freebsd.dk> Cc: jkh@mail.turbofuzz.com, freebsd-arch@freebsd.org X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 01 Sep 2014 21:37:48 -0000 -------- In message <201409012130.s81LUWCs009135@hergotha.csail.mit.edu>, Garrett W= ollma n writes: >[2] Once every decade or so, the concept of a "library operating >system" comes back into vogue before being steamrolled by the market. >This time around it might have more staying power, because manycore is >here and monolithic operating systems do not scale nicely on 512-core >processors. 40 GBit/sec ethernet may change the markets priorities. (ie: NETMAP) -- = Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 phk@FreeBSD.ORG | TCP/IP since RFC 956 FreeBSD committer | BSD since 4.3-tahoe = Never attribute to malice what can adequately be explained by incompetence= . From owner-freebsd-arch@FreeBSD.ORG Tue Sep 2 00:03:27 2014 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 702EF7C0 for ; Tue, 2 Sep 2014 00:03:27 +0000 (UTC) Received: from elvis.mu.org (elvis.mu.org [192.203.228.196]) by mx1.freebsd.org (Postfix) with ESMTP id 5B6DD1433 for ; Tue, 2 Sep 2014 00:03:26 +0000 (UTC) Received: from u10-2-16-021.office.norse-data.com (unknown [50.204.88.51]) by elvis.mu.org (Postfix) with ESMTPSA id AB10F346DDEB; Mon, 1 Sep 2014 17:03:25 -0700 (PDT) Message-ID: <540509C6.3090909@mu.org> Date: Mon, 01 Sep 2014 17:05:26 -0700 From: Alfred Perlstein User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9; rv:31.0) Gecko/20100101 Thunderbird/31.0 MIME-Version: 1.0 To: Poul-Henning Kamp Subject: Re: script(2) [was: [CFT/review] new sendfile(2)] References: <20140529102054.GX50679@FreeBSD.org> <20140729232404.GF43962@funkthat.com> <20140831165022.GE7693@FreeBSD.org> <540382E2.3040004@freebsd.org> <2770.1409522711@critter.freebsd.dk> <5403B13C.60008@freebsd.org> <4204.1409549879@critter.freebsd.dk> <5404D1B8.9010006@mu.org> <40210.1409607245@critter.freebsd.dk> In-Reply-To: <40210.1409607245@critter.freebsd.dk> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit Cc: freebsd-arch@freebsd.org X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 02 Sep 2014 00:03:27 -0000 On 9/1/14 2:34 PM, Poul-Henning Kamp wrote: > -------- > In message <5404D1B8.9010006@mu.org>, Alfred Perlstein writes: > >>> In message <5403B13C.60008@freebsd.org>, Alfred Perlstein writes: >>> >>>> Lua at the syscall level makes sense. :) >>> I doubt it. >>> >>> We're looking at high performance stuff and we don't want a silly >>> parser and string processing involved. >>> >> Would it really matter? Lua is bytecode, [...] > I though you wanted the interpreter in the kernel. > > If it's only the executor, then ... maybe... > > We'd need to do a serious audit of the lua bytecode first... > So you mean you'd inject the lua bytecode into kernel? Hmm, I'm not sure it matters, either way would be interesting. I think losing "eval" expressions might not be worth it. Just because you *can* write bad code, doesn't mean you should bar it because those facilities can be made to make very interesting things. -Alfred From owner-freebsd-arch@FreeBSD.ORG Tue Sep 2 02:46:38 2014 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 47BDC4D2; Tue, 2 Sep 2014 02:46:38 +0000 (UTC) Received: from vps1.elischer.org (vps1.elischer.org [204.109.63.16]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "vps1.elischer.org", Issuer "CA Cert Signing Authority" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id 174CF1470; Tue, 2 Sep 2014 02:46:38 +0000 (UTC) Received: from julian-mbp3.pixel8networks.com (50-196-156-133-static.hfc.comcastbusiness.net [50.196.156.133]) (authenticated bits=0) by vps1.elischer.org (8.14.9/8.14.9) with ESMTP id s822kZln018295 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES128-SHA bits=128 verify=NO); Mon, 1 Sep 2014 19:46:36 -0700 (PDT) (envelope-from julian@freebsd.org) Message-ID: <54052F86.1010906@freebsd.org> Date: Mon, 01 Sep 2014 19:46:30 -0700 From: Julian Elischer User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9; rv:24.0) Gecko/20100101 Thunderbird/24.6.0 MIME-Version: 1.0 To: Jordan Hubbard , Poul-Henning Kamp Subject: Re: script(2) [was: [CFT/review] new sendfile(2)] References: <20140529102054.GX50679@FreeBSD.org> <20140729232404.GF43962@funkthat.com> <20140831165022.GE7693@FreeBSD.org> <540382E2.3040004@freebsd.org> <2770.1409522711@critter.freebsd.dk> In-Reply-To: Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 8bit Cc: Alfred Perlstein , Gleb Smirnoff , freebsd-arch@freebsd.org X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 02 Sep 2014 02:46:38 -0000 On 9/1/14, 1:57 PM, Jordan Hubbard wrote: > > That said, I’ll also point out that we already have a bytecode “engine” in the kernel and a corresponding higher-level language which compiles into it. That language is called D and the bytecode interpreter is the DTrace support code. Actually I believe we have at least three. From owner-freebsd-arch@FreeBSD.ORG Tue Sep 2 19:32:28 2014 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id B624B5DB for ; Tue, 2 Sep 2014 19:32:28 +0000 (UTC) Received: from nm12-vm2.bullet.mail.ne1.yahoo.com (nm12-vm2.bullet.mail.ne1.yahoo.com [98.138.91.88]) by mx1.freebsd.org (Postfix) with ESMTP id 68F68199B for ; Tue, 2 Sep 2014 19:32:27 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s2048; t=1409686341; bh=xdvwpVILCgeTsmG2Sy8Ltod68wFTzfO0VBwAykwBjzU=; h=Received:Received:Received:DKIM-Signature:X-Yahoo-Newman-Id:X-Yahoo-Newman-Property:X-YMail-OSG:X-Yahoo-SMTP:Date:From:To:Cc:Subject:Message-ID:References:MIME-Version:Content-Type:Content-Disposition:In-Reply-To:User-Agent:From:Subject; b=Ny7MiF8t2cPSLflA9BuoH+AAm/zIgGSd0Lj4cEsZH1VlD9Cf1AcHzKbcXtSDBtS9UvGLpiBdKYzut0rC4ZSLo1MxgA3eFx4hq/0yAWFQkLFzL6yAuzbH89Lf9+RTVqV0fGCkDfiTGXpWHHX+rvr8DCh/v6soS6+eNx+e1c6FGX96c0DYMS7HOYNeR1OGugTR39qE2bzpuUcCbrCZHgyUnN/ABDuigGEOTVDFAuU6dvAwD97Rl8yBMLbhlQPRNGhHfSytwQuXiDi1vZ/LNaXwLE3frBBVxAJYjT/Y/CcDi6UgnfHj9In6HGdySDe/+pS5aH5ooo9UYsfZE2HnQDl4+w== DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=s2048; d=yahoo.com; b=L9gnO9/nTJwGv84gfDKz79pEv4NV74N1kODhjMQb/c6LqOrWhESkkTiorGvqpK0d37Lrybc0zbxn+RU7UxZAT86UQ2h/bd3VTeTryfL28r6MGt1eFOgpDNkyu7TiYUImjpHn29rotQCY7q68IJSOzmzkk06mRz9QTlpIYekHdG+0/qnD8SIAwhFistdQPTA+ZVUYSXPEieiHWUC1EDGintVfRIwAl2qH7f8N9ickmYVCVpw+ELTxl3DVkUf8CFOjUMBVY6LrqtZLRX+NyVbJhafYWZSrOs+g6qWxJ4NRzVJwnZnA0PrF4RpqITfdzPH2dFE2ohS+KYeiB65jEZwv/A==; Received: from [98.138.100.103] by nm12.bullet.mail.ne1.yahoo.com with NNFMP; 02 Sep 2014 19:32:21 -0000 Received: from [98.138.226.133] by tm102.bullet.mail.ne1.yahoo.com with NNFMP; 02 Sep 2014 19:32:21 -0000 Received: from [127.0.0.1] by smtp220.mail.ne1.yahoo.com with NNFMP; 02 Sep 2014 19:32:20 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s1024; t=1409686340; bh=xdvwpVILCgeTsmG2Sy8Ltod68wFTzfO0VBwAykwBjzU=; h=X-Yahoo-Newman-Id:X-Yahoo-Newman-Property:X-YMail-OSG:X-Yahoo-SMTP:Date:From:To:Cc:Subject:Message-ID:References:MIME-Version:Content-Type:Content-Disposition:In-Reply-To:User-Agent; b=kqS8+bAISYS+lcXibEqf88OTnJR7QK2TbvWeoODNpMlK1Nvl36Shh0K7sTFZeqX1+et4i7CmbbwDY3lYgpu81NseYU4yfys8JDnp6YcvyG0FMPR1dKaH/qSGcqLKOAY+fJ+8bdMen2RRmz4NvzgCy9iou47w2WZuAi6Gf5FBKqU= X-Yahoo-Newman-Id: 980863.32643.bm@smtp220.mail.ne1.yahoo.com X-Yahoo-Newman-Property: ymail-3 X-YMail-OSG: .bLyV2UVM1k9DnxcWxXbbiVzYeI7pASWCXLfAha3lHpbl6M ZpYEsJofZrmCwVCUGhVHIKARL14w.xavWFrXEiNg1qM4WE0zL1LPgV1UaWjY VCDuyj2j38.PhKXxCQPr93fcbR35gaItdFmdttEcya_fBlod7lrpl1poGPop gR5JdnwmWb.P87c9hxcT05y8VRMeTwRKkncWCsLg_RNP.GCFlAxrpcdgOTHz 0lb5uCVYMxj2HBrbo0BYybrUcRlNCKzsaZiJx_d6kfG21zVwqDciEuEQntFU uEiCm_q8mC_TZtoNlYHcRt.knfSOHHYMEWX.Ra91dB5KXWdXayA2cI8rFsSj Dm.ajaIEmR12tLkYjuR_D822HvBaao1XI_o5VQE_gp6DtWpkSBwQ4ffDTVkj 4RX17iXQjJk0LbNes3p_R8h5qbQ5cd8l0bT1M8szorFiO7d1i3HwwWVOh2Bk DLTNUoYmPIinJGE3B3TG7sx8xG_0M7SrKR9fXTvbTaNJK8gkNGE6Gdfsgctp 4law- X-Yahoo-SMTP: yVvIDoOswBD5zOzqXnwUE.yVSR2Kvw-- Date: Tue, 2 Sep 2014 15:32:11 -0400 From: Walt Ford To: Poul-Henning Kamp Subject: Re: script(2) [was: [CFT/review] new sendfile(2)] Message-ID: <20140902193211.GA29155@nbu> References: <20140529102054.GX50679@FreeBSD.org> <20140729232404.GF43962@funkthat.com> <20140831165022.GE7693@FreeBSD.org> <540382E2.3040004@freebsd.org> <2770.1409522711@critter.freebsd.dk> <5403B13C.60008@freebsd.org> <4204.1409549879@critter.freebsd.dk> <5404D1B8.9010006@mu.org> <40210.1409607245@critter.freebsd.dk> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <40210.1409607245@critter.freebsd.dk> User-Agent: Mutt/1.5.21 (2010-09-15) Cc: Alfred Perlstein , freebsd-arch@freebsd.org X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 02 Sep 2014 19:32:28 -0000 On Mon, Sep 01, 2014 at 09:34:05PM +0000, Poul-Henning Kamp wrote: > In message <5404D1B8.9010006@mu.org>, Alfred Perlstein writes: >>> In message <5403B13C.60008@freebsd.org>, Alfred Perlstein writes: >>> >>>> Lua at the syscall level makes sense. :) >>> I doubt it. >>> >>> We're looking at high performance stuff and we don't want a silly >>> parser and string processing involved. >>> >>Would it really matter? Lua is bytecode, [...] > > I though you wanted the interpreter in the kernel. > > If it's only the executor, then ... maybe... > > We'd need to do a serious audit of the lua bytecode first... I've been sort of working on a Lua-based FreeBSD for years in my spare time just because I love both so much. I could be wrong, but I think making use of Lua in-kernel would require modifying the interpreter to include the kernel's idea of locks, mutexes, memory barriers, and threads. In Lua, threads and their safety must be written by end-users last I knew, but I don't follow Lua development closely. At least in my latest work, trying to replace init_main.c and mi_startup() with a Lua script, all of that is necessary. Really, I'd even need the interpreter to be aware of the FreeBSD scheduler for a Lua-based mi_startup to be workable. I've got a lot of Lua bits and pieces in userland utilities, libraries, and now init_main.c is in boot/ but none of it works yet. I have visions of a FreeBSD that unboot backwards to reload subsystems, but I'm not close. -- Walt From owner-freebsd-arch@FreeBSD.ORG Wed Sep 3 10:57:48 2014 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 751EB30B for ; Wed, 3 Sep 2014 10:57:48 +0000 (UTC) Received: from mail-ig0-x22a.google.com (mail-ig0-x22a.google.com [IPv6:2607:f8b0:4001:c05::22a]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 3EFCC1436 for ; Wed, 3 Sep 2014 10:57:48 +0000 (UTC) Received: by mail-ig0-f170.google.com with SMTP id h3so5103558igd.1 for ; Wed, 03 Sep 2014 03:57:47 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=4+5gfG+NkzbHbnCB6lZ+5+Kd9izb+eMwp1qdwbeEYQg=; b=k/6G8SAwLoAsdGseJVXUYRX9pvapwmRyltQ1hgTPeylzuzIZ2z2WZ3fBqWtLWOKyVO Cp3ZvScjKbe8Qo+kptXB24jkRLqt2nyusiHk/5OvQgVu64cjMFBv7hXQLM4tDPi9XgvC IgGkVY66TKK156Zcl13McydiygKJ7U/7V0+LcMN2WaBOFVrDmDYyi3BsmZUqqr2CTCtU KixjSYJ3HMWH8w6jPb7eMFAhpupbKH3xDMLDhcWyevlfEJJAQXhp1wWdATlIpIT12lGZ 1rOzKUu2rNKKxskLnC2IpiTmJTjlbmotTmVV0IkuOHEdGjEypnFsZVl/3/oTarwxSvUV GH4Q== MIME-Version: 1.0 X-Received: by 10.50.26.66 with SMTP id j2mr35958930igg.45.1409741867563; Wed, 03 Sep 2014 03:57:47 -0700 (PDT) Received: by 10.107.129.25 with HTTP; Wed, 3 Sep 2014 03:57:47 -0700 (PDT) In-Reply-To: <20140902193211.GA29155@nbu> References: <20140529102054.GX50679@FreeBSD.org> <20140729232404.GF43962@funkthat.com> <20140831165022.GE7693@FreeBSD.org> <540382E2.3040004@freebsd.org> <2770.1409522711@critter.freebsd.dk> <5403B13C.60008@freebsd.org> <4204.1409549879@critter.freebsd.dk> <5404D1B8.9010006@mu.org> <40210.1409607245@critter.freebsd.dk> <20140902193211.GA29155@nbu> Date: Wed, 3 Sep 2014 12:57:47 +0200 Message-ID: Subject: Re: script(2) [was: [CFT/review] new sendfile(2)] From: Johan Bucht To: Walt Ford Content-Type: text/plain; charset=UTF-8 X-Content-Filtered-By: Mailman/MimeDel 2.1.18-1 Cc: Poul-Henning Kamp , Alfred Perlstein , freebsd-arch@freebsd.org X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 03 Sep 2014 10:57:48 -0000 Example of Lua packet filtering: http://wingolog.org/archives/2014/09/02/high-performance-packet-filtering-with-pflua On Tue, Sep 2, 2014 at 9:32 PM, Walt Ford via freebsd-arch < freebsd-arch@freebsd.org> wrote: > On Mon, Sep 01, 2014 at 09:34:05PM +0000, Poul-Henning Kamp wrote: > > In message <5404D1B8.9010006@mu.org>, Alfred Perlstein writes: > >>> In message <5403B13C.60008@freebsd.org>, Alfred Perlstein writes: > >>> > >>>> Lua at the syscall level makes sense. :) > >>> I doubt it. > >>> > >>> We're looking at high performance stuff and we don't want a silly > >>> parser and string processing involved. > >>> > >>Would it really matter? Lua is bytecode, [...] > > > > I though you wanted the interpreter in the kernel. > > > > If it's only the executor, then ... maybe... > > > > We'd need to do a serious audit of the lua bytecode first... > > I've been sort of working on a Lua-based FreeBSD for years in my spare > time just because I love both so much. I could be wrong, but I think > making use of Lua in-kernel would require modifying the interpreter to > include the kernel's idea of locks, mutexes, memory barriers, and threads. > In Lua, threads and their safety must be written by end-users last I knew, > but I don't follow Lua development closely. > > At least in my latest work, trying to replace init_main.c and mi_startup() > with a Lua script, all of that is necessary. Really, I'd even need the > interpreter to be aware of the FreeBSD scheduler for a Lua-based mi_startup > to be workable. > > I've got a lot of Lua bits and pieces in userland utilities, libraries, > and now init_main.c is in boot/ but none of it works yet. I have visions > of a FreeBSD that unboot backwards to reload subsystems, but I'm not > close. > > -- > Walt > _______________________________________________ > freebsd-arch@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-arch > To unsubscribe, send any mail to "freebsd-arch-unsubscribe@freebsd.org" > From owner-freebsd-arch@FreeBSD.ORG Thu Sep 4 03:45:34 2014 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id B044689E; Thu, 4 Sep 2014 03:45:34 +0000 (UTC) Received: from mail-ig0-x22e.google.com (mail-ig0-x22e.google.com [IPv6:2607:f8b0:4001:c05::22e]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 6909B13A8; Thu, 4 Sep 2014 03:45:34 +0000 (UTC) Received: by mail-ig0-f174.google.com with SMTP id a13so401196igq.1 for ; Wed, 03 Sep 2014 20:45:33 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:date:message-id:subject:from:to:cc:content-type; bh=fzi10Xq5OrtxRQYXI+yXIZcfReZp/8po0gexM0v5WaQ=; b=PtDsUbd9VgMG6Zh3Ahn1LVtIfMo1G1LGhTFug2Xqm/xHiOwaQ5+WFw0Vx2d82v0J2R 2ON7xWv8QJyaE4gxJIRi2ptfNFCwz2EjZyLD15+rySWsbIok0WOhf/l/0gAwNxqUkP/V N7fhFFePG3RoN9wPCZ6mUb7EHkXWdUQhXZ0xPl8Zara7BnrkfZGgt4WTM2/5SULFe6xX BX51C9NMxco8UhmrP3JZl5CEIfRxv4t+GQEiyCbwyg3hK/FcSxFXbfbKBe6zIYCDcetM tY7MhYADnO6S5fbAhuhP04rxqwmd3cqO5tlxi4lgOEXDP/9+jt5MVBldiC5sN8oTXKks +l0g== MIME-Version: 1.0 X-Received: by 10.42.126.82 with SMTP id d18mr871741ics.88.1409802333858; Wed, 03 Sep 2014 20:45:33 -0700 (PDT) Received: by 10.50.72.69 with HTTP; Wed, 3 Sep 2014 20:45:33 -0700 (PDT) Date: Wed, 3 Sep 2014 20:45:33 -0700 Message-ID: Subject: [RFC] Add __arraycount from NetBSD to sys/cdefs.h From: Garrett Cooper To: freebsd-arch@freebsd.org Content-Type: multipart/mixed; boundary=20cf300e5329504d8c0502352fbc Cc: Julio Merino , "rpaulo@freebsd.org" X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 04 Sep 2014 03:45:34 -0000 --20cf300e5329504d8c0502352fbc Content-Type: text/plain; charset=UTF-8 Hi all, In order to ease porting code and reduce divergence with NetBSD when importing code (a large chunk of which for me are tests), I would like to move nitems to sys/cdefs.h and alias __arraycount to nitems. Here's the __arraycount #define in lib/libnetbsd/sys/cdefs.h: 44 /* 45 * Return the number of elements in a statically-allocated array, 46 * __x. 47 */ 48 #define __arraycount(__x) (sizeof(__x) / sizeof(__x[0])) Here's the nitems #define in sys/sys/param.h: 277 #define nitems(x) (sizeof((x)) / sizeof((x)[0])) sys/cdefs.h gets pulled in automatically with sys/param.h, so anything using nitems will continue to function like before (see below for more details). I've attached a patch which addresses all hardcoded definitions in the tree added by FreeBSD developers. If there aren't any major concerns with my proposed change, I'll put it up for review on Phabricator. Thank you! -Garrett $ cat cdefs_pound_define.c #include #ifdef _SYS_CDEFS_H_ #warning "sys/cdefs.h has been included" #endif $ cc -c cdefs_pound_define.c cdefs_pound_define.c:4:2: warning: "sys/cdefs.h has been included" [-W#warnings] #warning "sys/cdefs.h has been included" ^ 1 warning generated. $ cc -D_KERNEL -c cdefs_pound_define.c cdefs_pound_define.c:4:2: warning: "sys/cdefs.h has been included" [-W#warnings] #warning "sys/cdefs.h has been included" ^ 1 warning generated. $ gcc -c cdefs_pound_define.c cdefs_pound_define.c:4:2: warning: #warning "sys/cdefs.h has been included" $ gcc -D_KERNEL -c cdefs_pound_define.c cdefs_pound_define.c:4:2: warning: #warning "sys/cdefs.h has been included" --20cf300e5329504d8c0502352fbc Content-Type: application/octet-stream; name="0001-Make-__arraycount-macros-from-NetBSD-to-ease-porting.patch" Content-Disposition: attachment; filename="0001-Make-__arraycount-macros-from-NetBSD-to-ease-porting.patch" Content-Transfer-Encoding: base64 X-Attachment-Id: f_hznjv7kz0 RnJvbSA2ZjkxZTAwNjQ5NjExOGFmNGIxOGYwMmFlZDIwMmExYzg1MGVhM2QwIE1vbiBTZXAgMTcg MDA6MDA6MDAgMjAwMQ0KRnJvbTogR2FycmV0dCBDb29wZXIgPHlhbmVnb21pQGdtYWlsLmNvbT4N CkRhdGU6IFRodSwgNCBTZXAgMjAxNCAwMjo1MTozOSAtMDcwMA0KU3ViamVjdDogW1BBVENIXSBN YWtlIF9fYXJyYXljb3VudCBtYWNyb3MgZnJvbSBOZXRCU0QgdG8gZWFzZSBwb3J0aW5nIGZyb20N CiBOZXRCU0QNCg0KMS4gTW92ZSBuaXRlbXMgZnJvbSBzeXMvcGFyYW0uaCB0byBzeXMvY2RlZnMu aA0KMi4gQWxpYXMgX19hcnJheWNvdW50IHRvIG5pdGVtcw0KMy4gR2FyYmFnZSBjb2xsZWN0IGFs bCBhZCBob2MgZGVmaW5pdGlvbnMgaW4gdGhlIHRyZWUgdGhhdCBhcmVuJ3QNCiAgIHByb3RlY3Rl ZCBieSAjaWZuZGVmIF9fYXJyYXljb3VudCBub3QgcHJvdmlkZWQgYnkgYSB0aGlyZC1wYXJ0eSBz b3VyY2UNCg0KU3BvbnNvcmVkIGJ5OiBFTUMgLyBJc2lsb24gU3RvcmFnZSBEaXZpc2lvbg0KLS0t DQogY29udHJpYi9saWJjLXZpcy91bnZpcy5jICAgICAgICAgICB8IDYgLS0tLS0tDQogbGliL2xp Ym5ldGJzZC9zeXMvY2RlZnMuaCAgICAgICAgICB8IDYgLS0tLS0tDQogc3lzL2tlcm4vc3RhY2tf cHJvdGVjdG9yLmMgICAgICAgICB8IDEgLQ0KIHN5cy9zeXMvY2RlZnMuaCAgICAgICAgICAgICAg ICAgICAgfCAzICsrKw0KIHN5cy9zeXMvcGFyYW0uaCAgICAgICAgICAgICAgICAgICAgfCAxIC0N CiB1c3Iuc2Jpbi9ibHVldG9vdGgvYnRwYW5kL2J0cGFuZC5oIHwgNCAtLS0tDQogNiBmaWxlcyBj aGFuZ2VkLCAzIGluc2VydGlvbnMoKyksIDE4IGRlbGV0aW9ucygtKQ0KDQpkaWZmIC0tZ2l0IGEv Y29udHJpYi9saWJjLXZpcy91bnZpcy5jIGIvY29udHJpYi9saWJjLXZpcy91bnZpcy5jDQppbmRl eCA5Y2YxMTJjLi44M2ZmMmEwIDEwMDY0NA0KLS0tIGEvY29udHJpYi9saWJjLXZpcy91bnZpcy5j DQorKysgYi9jb250cmliL2xpYmMtdmlzL3VudmlzLmMNCkBAIC01MSwxMiArNTEsNiBAQCBfX0ZC U0RJRCgiJEZyZWVCU0QkIik7DQogDQogI2RlZmluZQlfRElBR0FTU0VSVCh4KQlhc3NlcnQoeCkN CiANCi0vKg0KLSAqIFJldHVybiB0aGUgbnVtYmVyIG9mIGVsZW1lbnRzIGluIGEgc3RhdGljYWxs eS1hbGxvY2F0ZWQgYXJyYXksDQotICogX194Lg0KLSAqLw0KLSNkZWZpbmUJX19hcnJheWNvdW50 KF9feCkJKHNpemVvZihfX3gpIC8gc2l6ZW9mKF9feFswXSkpDQotDQogI2lmZGVmIF9fd2Vha19h bGlhcw0KIF9fd2Vha19hbGlhcyhzdHJudW52aXN4LF9zdHJudW52aXN4KQ0KICNlbmRpZg0KZGlm ZiAtLWdpdCBhL2xpYi9saWJuZXRic2Qvc3lzL2NkZWZzLmggYi9saWIvbGlibmV0YnNkL3N5cy9j ZGVmcy5oDQppbmRleCBiM2Q1MmM5Li5mYjUzNjdmIDEwMDY0NA0KLS0tIGEvbGliL2xpYm5ldGJz ZC9zeXMvY2RlZnMuaA0KKysrIGIvbGliL2xpYm5ldGJzZC9zeXMvY2RlZnMuaA0KQEAgLTQxLDEy ICs0MSw2IEBADQogI2RlZmluZSBfX2RlYWQNCiAjZW5kaWYNCiANCi0vKg0KLSAqIFJldHVybiB0 aGUgbnVtYmVyIG9mIGVsZW1lbnRzIGluIGEgc3RhdGljYWxseS1hbGxvY2F0ZWQgYXJyYXksDQot ICogX194Lg0KLSAqLw0KLSNkZWZpbmUJX19hcnJheWNvdW50KF9feCkJKHNpemVvZihfX3gpIC8g c2l6ZW9mKF9feFswXSkpDQotDQogI2RlZmluZQlfX19TVFJJTkcoeCkJX19TVFJJTkcoeCkNCiAj ZGVmaW5lCV9fU1RSSU5HKHgpCSN4DQogDQpkaWZmIC0tZ2l0IGEvc3lzL2tlcm4vc3RhY2tfcHJv dGVjdG9yLmMgYi9zeXMva2Vybi9zdGFja19wcm90ZWN0b3IuYw0KaW5kZXggYjVmOTk3My4uNjNh Y2I5MCAxMDA2NDQNCi0tLSBhL3N5cy9rZXJuL3N0YWNrX3Byb3RlY3Rvci5jDQorKysgYi9zeXMv a2Vybi9zdGFja19wcm90ZWN0b3IuYw0KQEAgLTE3LDcgKzE3LDYgQEAgX19zdGFja19jaGtfZmFp bCh2b2lkKQ0KIAlwYW5pYygic3RhY2sgb3ZlcmZsb3cgZGV0ZWN0ZWQ7IGJhY2t0cmFjZSBtYXkg YmUgY29ycnVwdGVkIik7DQogfQ0KIA0KLSNkZWZpbmUgX19hcnJheWNvdW50KF9feCkJKHNpemVv ZihfX3gpIC8gc2l6ZW9mKF9feFswXSkpDQogc3RhdGljIHZvaWQNCiBfX3N0YWNrX2Noa19pbml0 KHZvaWQgKmR1bW15IF9fdW51c2VkKQ0KIHsNCmRpZmYgLS1naXQgYS9zeXMvc3lzL2NkZWZzLmgg Yi9zeXMvc3lzL2NkZWZzLmgNCmluZGV4IDRjNGMyYWYuLmJlZmM1OWQgMTAwNjQ0DQotLS0gYS9z eXMvc3lzL2NkZWZzLmgNCisrKyBiL3N5cy9zeXMvY2RlZnMuaA0KQEAgLTczOSw0ICs3MzksNyBA QA0KICNkZWZpbmUgX19OT19UTFMgMQ0KICNlbmRpZg0KIA0KKyNkZWZpbmUJbml0ZW1zKHgpCShz aXplb2YoKHgpKSAvIHNpemVvZigoeClbMF0pKQ0KKyNkZWZpbmUJX19hcnJheWNvdW50KHgpCW5p dGVtcyh4KQ0KKw0KICNlbmRpZiAvKiAhX1NZU19DREVGU19IXyAqLw0KZGlmZiAtLWdpdCBhL3N5 cy9zeXMvcGFyYW0uaCBiL3N5cy9zeXMvcGFyYW0uaA0KaW5kZXggMjY0YTM4YS4uN2VjYTc0YiAx MDA2NDQNCi0tLSBhL3N5cy9zeXMvcGFyYW0uaA0KKysrIGIvc3lzL3N5cy9wYXJhbS5oDQpAQCAt Mjc0LDcgKzI3NCw2IEBADQogI2lmbmRlZiBob3dtYW55DQogI2RlZmluZQlob3dtYW55KHgsIHkp CSgoKHgpKygoeSktMSkpLyh5KSkNCiAjZW5kaWYNCi0jZGVmaW5lCW5pdGVtcyh4KQkoc2l6ZW9m KCh4KSkgLyBzaXplb2YoKHgpWzBdKSkNCiAjZGVmaW5lCXJvdW5kZG93bih4LCB5KQkoKCh4KS8o eSkpKih5KSkNCiAjZGVmaW5lCXJvdW5kZG93bjIoeCwgeSkgKCh4KSYofigoeSktMSkpKSAgICAg ICAgICAvKiBpZiB5IGlzIHBvd2VyIG9mIHR3byAqLw0KICNkZWZpbmUJcm91bmR1cCh4LCB5KQko KCgoeCkrKCh5KS0xKSkvKHkpKSooeSkpICAvKiB0byBhbnkgeSAqLw0KZGlmZiAtLWdpdCBhL3Vz ci5zYmluL2JsdWV0b290aC9idHBhbmQvYnRwYW5kLmggYi91c3Iuc2Jpbi9ibHVldG9vdGgvYnRw YW5kL2J0cGFuZC5oDQppbmRleCA0NDBjY2ExLi5iMjBjNmM1IDEwMDY0NA0KLS0tIGEvdXNyLnNi aW4vYmx1ZXRvb3RoL2J0cGFuZC9idHBhbmQuaA0KKysrIGIvdXNyLnNiaW4vYmx1ZXRvb3RoL2J0 cGFuZC9idHBhbmQuaA0KQEAgLTQzLDEwICs0Myw2IEBADQogDQogI2luY2x1ZGUgImV2ZW50Lmgi DQogDQotI2lmbmRlZiBfX2FycmF5Y291bnQNCi0jZGVmaW5lIF9fYXJyYXljb3VudChfX3gpCShz aXplb2YoX194KSAvIHNpemVvZihfX3hbMF0pKQ0KLSNlbmRpZg0KLQ0KICNpZm5kZWYJTDJDQVBf UFNNX0lOVkFMSUQNCiAjZGVmaW5lCUwyQ0FQX1BTTV9JTlZBTElEKHBzbSkJKCgocHNtKSAmIDB4 MDEwMSkgIT0gMHgwMDAxKQ0KICNlbmRpZg0KLS0gDQoyLjAuMg0KDQo= --20cf300e5329504d8c0502352fbc-- From owner-freebsd-arch@FreeBSD.ORG Thu Sep 4 04:17:47 2014 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 5FE51A51 for ; Thu, 4 Sep 2014 04:17:47 +0000 (UTC) Received: from mail-ig0-f180.google.com (mail-ig0-f180.google.com [209.85.213.180]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 22ABD178A for ; Thu, 4 Sep 2014 04:17:46 +0000 (UTC) Received: by mail-ig0-f180.google.com with SMTP id hn18so435115igb.13 for ; Wed, 03 Sep 2014 21:17:40 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:sender:content-type:mime-version:subject:from :in-reply-to:date:cc:message-id:references:to; bh=rLVqAjQfa9hYhlmXEBsGizx8Hwj3uwKSSLj1MfP4r0w=; b=bpQILFTPUAhpRfNtrlu3xiPjeZUGQDrpW7qI4hGhsue9JW3GKTKm0lA+IVwiRUAEIu iv4LpodaipWIWDXbc8l+YE25bVk+s/B8YVJl0nuqTLyDjLSumx7+dW0Aur4bxs8UzSlI WdU2/dJ6iaNjMcFYbgwq4sDBSa9SPbRdA/AOgStJhP8reEYlxvCQEnQHZz0oMpuZTtmo YTZ3lH3i9SvjSMozeCObzkoq+9jnYBlRD03D2VKOoqMno3ZgJnI6VRsdXtNYPy3u9W9m 9IWZ90y51k5h+QOFcV81SWp3tc3s65fAN7tiACZt439SMF628soQrf2lDKeYZ5HKlRyH LBhw== X-Gm-Message-State: ALoCoQmj1WY9wqVAmB0fr+5HEeW2YeaeQdCnJJ+k9gkeFQbTmagf8TZl3UQV9EBO1exSSfJvfcxu X-Received: by 10.42.35.8 with SMTP id o8mr1804870icd.41.1409804259980; Wed, 03 Sep 2014 21:17:39 -0700 (PDT) Received: from bsdimp.bsdimp.com (50-78-194-198-static.hfc.comcastbusiness.net. [50.78.194.198]) by mx.google.com with ESMTPSA id cz10sm7409559igc.5.2014.09.03.21.17.39 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Wed, 03 Sep 2014 21:17:39 -0700 (PDT) Sender: Warner Losh Content-Type: multipart/signed; boundary="Apple-Mail=_E656BEA7-B9C4-4EB2-A646-7AC292F62DF9"; protocol="application/pgp-signature"; micalg=pgp-sha512 Mime-Version: 1.0 (Mac OS X Mail 7.3 \(1878.6\)) Subject: Re: [RFC] Add __arraycount from NetBSD to sys/cdefs.h From: Warner Losh In-Reply-To: Date: Wed, 3 Sep 2014 22:17:37 -0600 Message-Id: <8D279BDC-7D40-4750-8DA7-A4535DD2E458@bsdimp.com> References: To: Garrett Cooper X-Mailer: Apple Mail (2.1878.6) Cc: Julio Merino , "rpaulo@freebsd.org" , freebsd-arch@freebsd.org X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 04 Sep 2014 04:17:47 -0000 --Apple-Mail=_E656BEA7-B9C4-4EB2-A646-7AC292F62DF9 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=windows-1252 On Sep 3, 2014, at 9:45 PM, Garrett Cooper = wrote: > Hi all, > In order to ease porting code and reduce divergence with NetBSD > when importing code (a large chunk of which for me are tests), I would > like to move nitems to sys/cdefs.h and alias __arraycount to nitems. > Here's the __arraycount #define in lib/libnetbsd/sys/cdefs.h: >=20 > 44 /* > 45 * Return the number of elements in a statically-allocated array, > 46 * __x. > 47 */ > 48 #define __arraycount(__x) (sizeof(__x) / sizeof(__x[0])) >=20 > Here's the nitems #define in sys/sys/param.h: >=20 > 277 #define nitems(x) (sizeof((x)) / sizeof((x)[0])) >=20 > sys/cdefs.h gets pulled in automatically with sys/param.h, so > anything using nitems will continue to function like before (see below > for more details). I've attached a patch which addresses all hardcoded > definitions in the tree added by FreeBSD developers. > If there aren't any major concerns with my proposed change, I'll > put it up for review on Phabricator. > Thank you! > -Garrett >=20 > $ cat cdefs_pound_define.c > #include >=20 > #ifdef _SYS_CDEFS_H_ > #warning "sys/cdefs.h has been included" > #endif > $ cc -c cdefs_pound_define.c > cdefs_pound_define.c:4:2: warning: "sys/cdefs.h has been included" = [-W#warnings] > #warning "sys/cdefs.h has been included" > ^ > 1 warning generated. > $ cc -D_KERNEL -c cdefs_pound_define.c > cdefs_pound_define.c:4:2: warning: "sys/cdefs.h has been included" = [-W#warnings] > #warning "sys/cdefs.h has been included" > ^ > 1 warning generated. > $ gcc -c cdefs_pound_define.c > cdefs_pound_define.c:4:2: warning: #warning "sys/cdefs.h has been = included" > $ gcc -D_KERNEL -c cdefs_pound_define.c > cdefs_pound_define.c:4:2: warning: #warning "sys/cdefs.h has been = included=94 I wouldn=92t bother changing the nitems #define. There=92s no need, = really, to do that. I=92d also be more inclined to believe the test if you tested what the = thing does rather than test for an artificial, implementation defined = side effect. But honestly the amount of duplication saved here is rather tiny=85 Warner --Apple-Mail=_E656BEA7-B9C4-4EB2-A646-7AC292F62DF9 Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename=signature.asc Content-Type: application/pgp-signature; name=signature.asc Content-Description: Message signed with OpenPGP using GPGMail -----BEGIN PGP SIGNATURE----- Comment: GPGTools - https://gpgtools.org iQIcBAEBCgAGBQJUB+fhAAoJEGwc0Sh9sBEAjp4QAI7/yB+LvUkAmVM/O1DoX4wp /vbOT1awGQCWCsQGpT8r7PMIskfzwHve+JQxEfPoa8ddHVeaDuDTBw0P4YjLTDqW 828TdMCcQQ6/QpFxZooOxhAsnAwJErko+HmPy0q+JMUPlqresg8m5kADA0X9wU+4 NT1TxFjE5wdET7NyWqic0A2nP6Z7WewSbTfZmXqkz9N0abGl4Lu4uGB8lrfJwR/m NMco9Q5odpNXXjdi+RiGmM+I4HSQUkEu1s9mG0KSNIPdH+fdPlLDdSIkH+0RALB+ wi3s/xvyMZXYISXsH1IT+4qe+BtPX2PZI+HrsQIJOXlAhWj+yDbzko0Xl84yS7A/ pnx6uD46aK8w1JsjjCPNpj0cQ9e1K4bBaj7dHbkuInMszseAeKEnJcjMNLlZ6Flu 0DjNvLEDg0CRX56Bxm5vYuACdf7m8ly6zwOdEY1bHr2NQmFEEC5432BfX0z2dwNk 2rkA0Bjfq7y+dK3DUqVn/8qeZS03Pf57V30tWxNc5Prm/9FHme7LqHH7K68ROm3u LVqGMUR69iLTL6UB2+9MGdSmAcVijdsgL2UjB6LUtRffRlbsCrs5vx9sijCQkqEx 4O5tmBYI5gXlPD0809COnM1RpIODJersiLsGZtrNT91RhxFsSafJPviMsrbkWMe8 T8LtfmW+iq3rPeXwDkNr =wgEK -----END PGP SIGNATURE----- --Apple-Mail=_E656BEA7-B9C4-4EB2-A646-7AC292F62DF9-- From owner-freebsd-arch@FreeBSD.ORG Thu Sep 4 05:53:46 2014 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id A1C0E2B7; Thu, 4 Sep 2014 05:53:46 +0000 (UTC) Received: from mail109.syd.optusnet.com.au (mail109.syd.optusnet.com.au [211.29.132.80]) by mx1.freebsd.org (Postfix) with ESMTP id 63D851F53; Thu, 4 Sep 2014 05:53:45 +0000 (UTC) Received: from c122-106-147-133.carlnfd1.nsw.optusnet.com.au (c122-106-147-133.carlnfd1.nsw.optusnet.com.au [122.106.147.133]) by mail109.syd.optusnet.com.au (Postfix) with ESMTPS id 1E9A0D62B58; Thu, 4 Sep 2014 15:53:41 +1000 (EST) Date: Thu, 4 Sep 2014 15:53:24 +1000 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Garrett Cooper Subject: Re: [RFC] Add __arraycount from NetBSD to sys/cdefs.h In-Reply-To: Message-ID: <20140904151838.X1355@besplex.bde.org> References: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.1 cv=BdjhjNd2 c=1 sm=1 tr=0 a=7NqvjVvQucbO2RlWB8PEog==:117 a=PO7r1zJSAAAA:8 a=ho85ICRMCckA:10 a=XXALs8e33l8A:10 a=kj9zAlcOel0A:10 a=JzwRw_2MAAAA:8 a=M3jX3SRTu_DTOVG5eQwA:9 a=CjuIK1q_8ugA:10 Cc: Julio Merino , "rpaulo@freebsd.org" , freebsd-arch@freebsd.org X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 04 Sep 2014 05:53:46 -0000 On Wed, 3 Sep 2014, Garrett Cooper wrote: > Hi all, > In order to ease porting code and reduce divergence with NetBSD > when importing code (a large chunk of which for me are tests), I would > like to move nitems to sys/cdefs.h and alias __arraycount to nitems. > Here's the __arraycount #define in lib/libnetbsd/sys/cdefs.h: > > 44 /* > 45 * Return the number of elements in a statically-allocated array, > 46 * __x. > 47 */ > 48 #define __arraycount(__x) (sizeof(__x) / sizeof(__x[0])) > > Here's the nitems #define in sys/sys/param.h: > > 277 #define nitems(x) (sizeof((x)) / sizeof((x)[0])) The NetBSD version is less namespace-polluting. The underscores in the name of its __x parameter are bogus (plain x would be in the macro's namespace). The version in the patch is vastly more namespace-polluting. It adds nitems() to and thus defeats the careful underscoring of all (?) other public names there. nitems() in was bad enough. It is of course not documented in any man page. has lots of historical pollution. It is still used a lot. Adding nitems() to it broke any application that uses this name for almost anything except possibly as a plain identifier (nitems followed by a left parentheses gives the macro, but nitems not followed by a left parenthese might not be broken). Since nitems is a very reasonable variable name, its pollution is more likely to break applications than most. I don't like depending on either nitems() or __arraycount() being in a system header. Any use of these is unportable at best. Most uses of reserved identifiers are undefined (C99 7.1.3p2). Use of __arraycount is no exception, since it is of course not documented in any man page. In practice, the undefined behaviour is usually just unportability. The only portable way to use these macros is to write your own version and spell it without leading underscores. Even this is not portable, due to bugs like the undocumented nitems() in . Use of is also unportable, but it should be possible to know what is in it by reading its documentation and not necessary to reread it often to check for new pollution in it. I usually spell my version of this macro it more like arraycount() than nitems(). nitems() is too generic even for local use. arraycount() is too verbose without being completely precise about what is being counted. Bruce From owner-freebsd-arch@FreeBSD.ORG Thu Sep 4 05:59:17 2014 Return-Path: Delivered-To: freebsd-arch@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 2883A42B; Thu, 4 Sep 2014 05:59:17 +0000 (UTC) Received: from mail.beastielabs.net (unknown [IPv6:2001:888:1227:0:200:24ff:fec9:5934]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id D29BF1F91; Thu, 4 Sep 2014 05:59:16 +0000 (UTC) Received: from beastie.hotsoft.nl (beastie.hotsoft.nl [IPv6:2001:888:1227:0:219:d1ff:fee8:91eb]) by mail.beastielabs.net (8.14.7/8.14.7) with ESMTP id s845xC3M078751; Thu, 4 Sep 2014 07:59:12 +0200 (CEST) (envelope-from hans@beastielabs.net) Message-ID: <5407FFB0.80203@beastielabs.net> Date: Thu, 04 Sep 2014 07:59:12 +0200 From: Hans Ottevanger User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:31.0) Gecko/20100101 Thunderbird/31.0 MIME-Version: 1.0 To: =?UTF-8?B?RWR3YXJkIFRvbWFzeiBOYXBpZXJhxYJh?= Subject: Re: [CFT] Autofs. References: <20140730071933.GA20122@pc5.home> <53F0878E.3000401@beastielabs.net> <20140817145059.GA5497@pc5.home> In-Reply-To: <20140817145059.GA5497@pc5.home> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit Cc: freebsd-current@FreeBSD.org, freebsd-arch@FreeBSD.org X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 04 Sep 2014 05:59:17 -0000 On 08/17/14 16:50, Edward Tomasz NapieraÅ‚a wrote: > On 0817T1244, Hans Ottevanger wrote: >> [...] >> Hi! >> >> Great to see a real autofs finally coming to FreeBSD. >> >> I already did some very cursory testing on a recent 11-CURRENT system >> that I still happened to have and things with at least the /net map >> look quite OK. >> >> I could do some more extensive testing if I could use some of my >> 10-STABLE systems. I already checked that the patch applies cleanly >> to a recent 10-STABLE (modulo a few offsets) and that both buildworld >> and buildkernel succeed. Should I expect difficulties actually >> running your autofs on 10-STABLE? > > No, it should be fine. Plan is to MFC this to 10 soon, btw. > Good to see that autofa has been MFC'd during my vacation 8-) But I found a little problem... When I try to access the NFS exported file-systems on an older test machine (running 7.x, but that is not so relevant, it also happens with other servers), with the following exports: $ showmount -e soekris Exports list on soekris: /var 192.168.0.0 /usr 192.168.0.0 /home 192.168.0.0 / 192.168.0.0 I get: $ ls /net/soekris COPYRIGHT dist libexec proc tmp bin entropy lost+found rescue usr boot etc media root var compat home mnt sbin dev lib which is correct, but the next level fails: $ ls -l /net/soekris/usr total 0 since /usr on soekris is definitely not empty. Relevant output of mount : ... map -hosts on /net (autofs) soekris:/ on /net/soekris (nfs, nosuid, automounted) This is on 10.1-PRERELEASE r270922. The kernel config is GENERIC minus devices I do not have and AUTOFS added. Config files (/etc/auto_master, et al) are default. Mounting manually does succeed (in two steps, of course). When trying this from Mac OS X (I am still on Snow Leopard) automounting works as expected. I did not have the opportunity yet to try a Linux box (also do not know whether autofs there has been eaten by systemd already 8-)). Do I miss something, or is this a bug? Kind regards, Hans From owner-freebsd-arch@FreeBSD.ORG Thu Sep 4 07:15:42 2014 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 0641D56B; Thu, 4 Sep 2014 07:15:42 +0000 (UTC) Received: from mail-ig0-x231.google.com (mail-ig0-x231.google.com [IPv6:2607:f8b0:4001:c05::231]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id B6A551987; Thu, 4 Sep 2014 07:15:41 +0000 (UTC) Received: by mail-ig0-f177.google.com with SMTP id r10so603707igi.16 for ; Thu, 04 Sep 2014 00:15:41 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=wzjUh6Hao6im93ca6cmXSoGFgFAiO5UudE3/kji/SAE=; b=OZqdf+43kcjrmN2xneYaQyaKxlwNH24ozhP6uBq6QYuIW8ZlL4EwNJthw0XyI96Qbc lc2IvMTOF/J84HrRuE1dxorIXAMKRzoIQ/+B69ieqDybMqV2kNkuEGv6seHkgaseadzQ lxL1sIWzPXh3xlQtjAZbAYNS+hy3xeh3HflMH5E9BFxmVhFtQwDlK+msjgT2md8uOcN1 /RBf1wXDxnpRLCob9ux8hPqmljwcq+qlCeFiamjNTGkfAjgyTrQVunDgUFR/KEFAWBGM GtU1yfyih9OOBNiRMjliESnaaQav8H7zPIMuVH1H4vVkRel9eaY7A9oTf811Lm1P0gAq MPMg== MIME-Version: 1.0 X-Received: by 10.50.82.98 with SMTP id h2mr3637515igy.26.1409814940903; Thu, 04 Sep 2014 00:15:40 -0700 (PDT) Received: by 10.50.72.69 with HTTP; Thu, 4 Sep 2014 00:15:40 -0700 (PDT) In-Reply-To: <5407FFB0.80203@beastielabs.net> References: <20140730071933.GA20122@pc5.home> <53F0878E.3000401@beastielabs.net> <20140817145059.GA5497@pc5.home> <5407FFB0.80203@beastielabs.net> Date: Thu, 4 Sep 2014 00:15:40 -0700 Message-ID: Subject: Re: [CFT] Autofs. From: Garrett Cooper To: Hans Ottevanger Content-Type: text/plain; charset=UTF-8 Cc: FreeBSD Current , =?UTF-8?Q?Edward_Tomasz_Napiera=C5=82a?= , freebsd-arch@freebsd.org X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 04 Sep 2014 07:15:42 -0000 Hi Hans! On Wed, Sep 3, 2014 at 10:59 PM, Hans Ottevanger wrote: > Good to see that autofa has been MFC'd during my vacation 8-) > > But I found a little problem... ... > Do I miss something, or is this a bug? Can you please provide the output of `mount -p' from your server? Thanks! -Garrett From owner-freebsd-arch@FreeBSD.ORG Thu Sep 4 07:35:55 2014 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 6E37EF24; Thu, 4 Sep 2014 07:35:55 +0000 (UTC) Received: from mail.beastielabs.net (unknown [IPv6:2001:888:1227:0:200:24ff:fec9:5934]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 254DB1E6B; Thu, 4 Sep 2014 07:35:54 +0000 (UTC) Received: from beastie.hotsoft.nl (beastie.hotsoft.nl [IPv6:2001:888:1227:0:219:d1ff:fee8:91eb]) by mail.beastielabs.net (8.14.7/8.14.7) with ESMTP id s847ZqjB079092; Thu, 4 Sep 2014 09:35:52 +0200 (CEST) (envelope-from hans@beastielabs.net) Message-ID: <54081658.9020609@beastielabs.net> Date: Thu, 04 Sep 2014 09:35:52 +0200 From: Hans Ottevanger User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:31.0) Gecko/20100101 Thunderbird/31.0 MIME-Version: 1.0 To: Garrett Cooper Subject: Re: [CFT] Autofs. References: <20140730071933.GA20122@pc5.home> <53F0878E.3000401@beastielabs.net> <20140817145059.GA5497@pc5.home> <5407FFB0.80203@beastielabs.net> In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit Cc: FreeBSD Current , =?UTF-8?B?RWR3YXJkIFRvbWFzeiBOYXBpZXJhxYJh?= , freebsd-arch@freebsd.org X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 04 Sep 2014 07:35:55 -0000 On 09/04/14 09:15, Garrett Cooper wrote: > Hi Hans! > > On Wed, Sep 3, 2014 at 10:59 PM, Hans Ottevanger wrote: > >> Good to see that autofa has been MFC'd during my vacation 8-) >> >> But I found a little problem... > > ... > >> Do I miss something, or is this a bug? > > Can you please provide the output of `mount -p' from your server? Sure, looks like this: [root@soekris ~]# mount -p /dev/ad0s1a / ufs rw 1 1 devfs /dev devfs rw 0 0 /dev/ad0s1f /home ufs rw 2 2 /dev/ad0s1e /usr ufs rw 2 2 /dev/ad0s1d /var ufs rw 2 2 And as I mentioned, mounting manually succeeds. Kind regards, Hans From owner-freebsd-arch@FreeBSD.ORG Thu Sep 4 12:43:37 2014 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 04588EC6; Thu, 4 Sep 2014 12:43:37 +0000 (UTC) Received: from mail-lb0-x232.google.com (mail-lb0-x232.google.com [IPv6:2a00:1450:4010:c04::232]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 508901160; Thu, 4 Sep 2014 12:43:36 +0000 (UTC) Received: by mail-lb0-f178.google.com with SMTP id v6so11364128lbi.23 for ; Thu, 04 Sep 2014 05:43:34 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=sender:date:from:to:cc:subject:message-id:mail-followup-to :references:mime-version:content-type:content-disposition :content-transfer-encoding:in-reply-to:user-agent; bh=xsIXfdb5SpsoCs+DNSVIUmzETgMvEF5kXWmrvUZLjiA=; b=BR65deIHDrouyQixJ+aovTfIUmhqqVI0k0QGY7krWkAMoRbDlXYCT8hJd6UKg5xiW5 RyWeXOA8cLoUzTOZhWlwmO+Zb4+O2+JJlBXJUWxjtrmGOOGv5s8qxv1v+wnW2z4zZnVQ 0IBhqDvPS3TwP1IAurhSebQTTFoCAat1/vSSNv7LPIpFmCaa2xKSXGSWyFTvljfrHIWg uARJR4y4fJ5c9xSWawFekOkBMqJxqB1c0lKYSWVz3Fu/aFA5xtbIGYhwd6MGYYrDTVxi Tnp4wT5sw81jg8OgaSjWxC+q9/ueuQEYmfm/vrxt3k5ROvqdzky06KWdPisfCEUxL9eS pMhQ== X-Received: by 10.112.135.230 with SMTP id pv6mr4131646lbb.105.1409834614135; Thu, 04 Sep 2014 05:43:34 -0700 (PDT) Received: from pc5.home (abwx83.neoplus.adsl.tpnet.pl. [83.8.247.83]) by mx.google.com with ESMTPSA id w3sm148106lal.13.2014.09.04.05.43.32 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Thu, 04 Sep 2014 05:43:33 -0700 (PDT) Sender: =?UTF-8?Q?Edward_Tomasz_Napiera=C5=82a?= Date: Thu, 4 Sep 2014 14:43:30 +0200 From: Edward Tomasz =?utf-8?Q?Napiera=C5=82a?= To: Hans Ottevanger Subject: Re: [CFT] Autofs. Message-ID: <20140904124330.GB4152@pc5.home> Mail-Followup-To: Hans Ottevanger , freebsd-arch@FreeBSD.org, freebsd-current@FreeBSD.org References: <20140730071933.GA20122@pc5.home> <53F0878E.3000401@beastielabs.net> <20140817145059.GA5497@pc5.home> <5407FFB0.80203@beastielabs.net> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <5407FFB0.80203@beastielabs.net> User-Agent: Mutt/1.5.21 (2010-09-15) Cc: freebsd-current@FreeBSD.org, freebsd-arch@FreeBSD.org X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 04 Sep 2014 12:43:37 -0000 On 0904T0759, Hans Ottevanger wrote: > On 08/17/14 16:50, Edward Tomasz NapieraÅ‚a wrote: > >On 0817T1244, Hans Ottevanger wrote: > >> > [...] > >>Hi! > >> > >>Great to see a real autofs finally coming to FreeBSD. > >> > >>I already did some very cursory testing on a recent 11-CURRENT system > >>that I still happened to have and things with at least the /net map > >>look quite OK. > >> > >>I could do some more extensive testing if I could use some of my > >>10-STABLE systems. I already checked that the patch applies cleanly > >>to a recent 10-STABLE (modulo a few offsets) and that both buildworld > >>and buildkernel succeed. Should I expect difficulties actually > >>running your autofs on 10-STABLE? > > > >No, it should be fine. Plan is to MFC this to 10 soon, btw. > > > > Good to see that autofa has been MFC'd during my vacation 8-) > > But I found a little problem... > > When I try to access the NFS exported file-systems on an older test > machine (running 7.x, but that is not so relevant, it also happens > with other servers), with the following exports: > > $ showmount -e soekris > Exports list on soekris: > /var 192.168.0.0 > /usr 192.168.0.0 > /home 192.168.0.0 > / 192.168.0.0 > > I get: > > $ ls /net/soekris > COPYRIGHT dist libexec proc tmp > bin entropy lost+found rescue usr > boot etc media root var > compat home mnt sbin > dev lib > > which is correct, but the next level fails: > > $ ls -l /net/soekris/usr > total 0 > > since /usr on soekris is definitely not empty. > Relevant output of mount : > > ... > map -hosts on /net (autofs) > soekris:/ on /net/soekris (nfs, nosuid, automounted) > > This is on 10.1-PRERELEASE r270922. The kernel config is GENERIC > minus devices I do not have and AUTOFS added. Config files > (/etc/auto_master, et al) are default. Mounting manually does succeed > (in two steps, of course). > > When trying this from Mac OS X (I am still on Snow Leopard) > automounting works as expected. I did not have the opportunity yet to > try a Linux box (also do not know whether autofs there has been eaten > by systemd already 8-)). > > Do I miss something, or is this a bug? It's a bug. Or rather, a missing feature. The problem here is that the "/" export "shadows" the rest. To handle this correctly, automountd(8) would need to mount the "/" share, then mount autofs on "/usr" etc, and then call it done. This part is easy. The problem is: how to expire (automatically unmount) it? Because of autofs mounts, the "/" share will always be busy, and thus won't ever get automatically unmounted. So, for now, we don't even try to handle this situation. I'm not sure what would the best way to solve it. From owner-freebsd-arch@FreeBSD.ORG Thu Sep 4 14:24:20 2014 Return-Path: Delivered-To: freebsd-arch@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 2BA7E118; Thu, 4 Sep 2014 14:24:20 +0000 (UTC) Received: from mail.beastielabs.net (unknown [IPv6:2001:888:1227:0:200:24ff:fec9:5934]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 544801EA6; Thu, 4 Sep 2014 14:24:19 +0000 (UTC) Received: from beastie.hotsoft.nl (beastie.hotsoft.nl [IPv6:2001:888:1227:0:219:d1ff:fee8:91eb]) by mail.beastielabs.net (8.14.7/8.14.7) with ESMTP id s84EOFJc081037; Thu, 4 Sep 2014 16:24:15 +0200 (CEST) (envelope-from hans@beastielabs.net) Message-ID: <5408760F.2000607@beastielabs.net> Date: Thu, 04 Sep 2014 16:24:15 +0200 From: Hans Ottevanger User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:31.0) Gecko/20100101 Thunderbird/31.0 MIME-Version: 1.0 To: =?UTF-8?B?RWR3YXJkIFRvbWFzeiBOYXBpZXJhxYJh?= Subject: Re: [CFT] Autofs. References: <20140730071933.GA20122@pc5.home> <53F0878E.3000401@beastielabs.net> <20140817145059.GA5497@pc5.home> <5407FFB0.80203@beastielabs.net> <20140904124330.GB4152@pc5.home> In-Reply-To: <20140904124330.GB4152@pc5.home> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit Cc: freebsd-current@FreeBSD.org, freebsd-arch@FreeBSD.org X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 04 Sep 2014 14:24:20 -0000 On 09/04/14 14:43, Edward Tomasz NapieraÅ‚a wrote: > On 0904T0759, Hans Ottevanger wrote: >> On 08/17/14 16:50, Edward Tomasz NapieraÅ‚a wrote: >>> On 0817T1244, Hans Ottevanger wrote: >>>> >> [...] >>>> Hi! >>>> >>>> Great to see a real autofs finally coming to FreeBSD. >>>> >>>> I already did some very cursory testing on a recent 11-CURRENT system >>>> that I still happened to have and things with at least the /net map >>>> look quite OK. >>>> >>>> I could do some more extensive testing if I could use some of my >>>> 10-STABLE systems. I already checked that the patch applies cleanly >>>> to a recent 10-STABLE (modulo a few offsets) and that both buildworld >>>> and buildkernel succeed. Should I expect difficulties actually >>>> running your autofs on 10-STABLE? >>> >>> No, it should be fine. Plan is to MFC this to 10 soon, btw. >>> >> >> Good to see that autofa has been MFC'd during my vacation 8-) >> >> But I found a little problem... >> >> When I try to access the NFS exported file-systems on an older test >> machine (running 7.x, but that is not so relevant, it also happens >> with other servers), with the following exports: >> >> $ showmount -e soekris >> Exports list on soekris: >> /var 192.168.0.0 >> /usr 192.168.0.0 >> /home 192.168.0.0 >> / 192.168.0.0 >> >> I get: >> >> $ ls /net/soekris >> COPYRIGHT dist libexec proc tmp >> bin entropy lost+found rescue usr >> boot etc media root var >> compat home mnt sbin >> dev lib >> >> which is correct, but the next level fails: >> >> $ ls -l /net/soekris/usr >> total 0 >> >> since /usr on soekris is definitely not empty. >> Relevant output of mount : >> >> ... >> map -hosts on /net (autofs) >> soekris:/ on /net/soekris (nfs, nosuid, automounted) >> >> This is on 10.1-PRERELEASE r270922. The kernel config is GENERIC >> minus devices I do not have and AUTOFS added. Config files >> (/etc/auto_master, et al) are default. Mounting manually does succeed >> (in two steps, of course). >> >> When trying this from Mac OS X (I am still on Snow Leopard) >> automounting works as expected. I did not have the opportunity yet to >> try a Linux box (also do not know whether autofs there has been eaten >> by systemd already 8-)). >> >> Do I miss something, or is this a bug? > > It's a bug. Or rather, a missing feature. The problem here is that > the "/" export "shadows" the rest. To handle this correctly, automountd(8) > would need to mount the "/" share, then mount autofs on "/usr" etc, and > then call it done. This part is easy. The problem is: how to expire > (automatically unmount) it? Because of autofs mounts, the "/" share > will always be busy, and thus won't ever get automatically unmounted. > So, for now, we don't even try to handle this situation. > > I'm not sure what would the best way to solve it. > Maybe the same way as Mac OS X does. On my old MacMini (Snow Leopard) I get in a quiescent state, before automounting anything: /dev/disk0s2 on / (hfs, local, journaled) devfs on /dev (devfs, local, nobrowse) map -hosts on /net (autofs, nosuid, automounted, nobrowse) map auto_home on /home (autofs, automounted, nobrowse) Immediately after "ls -l /net/soekris/usr": /dev/disk0s2 on / (hfs, local, journaled) devfs on /dev (devfs, local, nobrowse) map -hosts on /net (autofs, nosuid, automounted, nobrowse) map auto_home on /home (autofs, automounted, nobrowse) soekris:/ on /net/soekris (nfs, nodev, nosuid, automounted, nobrowse) trigger on /net/soekris/usr (autofs, automounted, nobrowse) trigger on /net/soekris/var (autofs, automounted, nobrowse) trigger on /net/soekris/home (autofs, automounted, nobrowse) soekris:/usr on /net/soekris/usr (nfs, nodev, nosuid, automounted, nobrowse) Then, after more than 400 seconds: /dev/disk0s2 on / (hfs, local, journaled) devfs on /dev (devfs, local, nobrowse) map -hosts on /net (autofs, nosuid, automounted, nobrowse) map auto_home on /home (autofs, automounted, nobrowse) soekris:/ on /net/soekris (nfs, nodev, nosuid, automounted, nobrowse) trigger on /net/soekris/usr (autofs, automounted, nobrowse) trigger on /net/soekris/var (autofs, automounted, nobrowse) trigger on /net/soekris/home (autofs, automounted, nobrowse) and finally after 600 seconds we are back to: /dev/disk0s2 on / (hfs, local, journaled) devfs on /dev (devfs, local, nobrowse) map -hosts on /net (autofs, nosuid, automounted, nobrowse) map auto_home on /home (autofs, automounted, nobrowse) So triggers for the subdirectories are automounted on their automounted parent directory and expiration occurs in steps. BTW, I reconfigured the automount timeout as 300s (was 3600s) so I do not fully understand why the first time takes at least 400s. If you think it is useful I can grab an older Linux box from my basement and try to get autofs running on it, to do the same experiment. I currently do not have a Solaris installation. Kind regards, Hans From owner-freebsd-arch@FreeBSD.ORG Thu Sep 4 19:41:56 2014 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 22494AEB; Thu, 4 Sep 2014 19:41:56 +0000 (UTC) Received: from mail-la0-x230.google.com (mail-la0-x230.google.com [IPv6:2a00:1450:4010:c03::230]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 6C2BC1C09; Thu, 4 Sep 2014 19:41:55 +0000 (UTC) Received: by mail-la0-f48.google.com with SMTP id ty20so925159lab.21 for ; Thu, 04 Sep 2014 12:41:52 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=sender:date:from:to:cc:subject:message-id:mail-followup-to :references:mime-version:content-type:content-disposition :content-transfer-encoding:in-reply-to:user-agent; bh=3m7brMSyrWCWth+zLZR3GcYUdWoSoKNXpEAmF8XRZTg=; b=a6cZv3/HYNMr1eFA7QbiuJyGBYj8JebsCCizaIeIBGFcs8lvJklxS9QigUP3bnDCbM rVxfwZAfp7LDbdJrIGJxjDvlXbBO/drDBM5Fbpjokm8bg3bsi+wZmqMdQoEbjo1ESoYd Bk1zNU8eumMWxIR5P0/Caz/G9CHI2+kV2Okbzwv1jhjS/7Fbe2MnTPKS68KPO8qxA9Sr Uce5hUDozbet2PDEWKmfsuaymQlaTSE7ykO886R7IubMmi5UVXM7S0YmOcKn2nMVEHMq qfx8tlTAz1g4WWkgfWakC2ZI2RNQ7KtAw340LhCqGJhh7BQKeSgeXnWWYpvGz/hIl+DU bP6g== X-Received: by 10.152.6.133 with SMTP id b5mr6927017laa.16.1409859712895; Thu, 04 Sep 2014 12:41:52 -0700 (PDT) Received: from pc5.home (abwx83.neoplus.adsl.tpnet.pl. [83.8.247.83]) by mx.google.com with ESMTPSA id w11sm749359lbm.30.2014.09.04.12.41.51 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Thu, 04 Sep 2014 12:41:52 -0700 (PDT) Sender: =?UTF-8?Q?Edward_Tomasz_Napiera=C5=82a?= Date: Thu, 4 Sep 2014 21:41:49 +0200 From: Edward Tomasz =?utf-8?Q?Napiera=C5=82a?= To: Hans Ottevanger Subject: Re: [CFT] Autofs. Message-ID: <20140904194149.GA4650@pc5.home> Mail-Followup-To: Hans Ottevanger , freebsd-arch@FreeBSD.org, freebsd-current@FreeBSD.org References: <20140730071933.GA20122@pc5.home> <53F0878E.3000401@beastielabs.net> <20140817145059.GA5497@pc5.home> <5407FFB0.80203@beastielabs.net> <20140904124330.GB4152@pc5.home> <5408760F.2000607@beastielabs.net> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <5408760F.2000607@beastielabs.net> User-Agent: Mutt/1.5.21 (2010-09-15) Cc: freebsd-current@FreeBSD.org, freebsd-arch@FreeBSD.org X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 04 Sep 2014 19:41:56 -0000 On 0904T1624, Hans Ottevanger wrote: > On 09/04/14 14:43, Edward Tomasz NapieraÅ‚a wrote: > >On 0904T0759, Hans Ottevanger wrote: > >>On 08/17/14 16:50, Edward Tomasz NapieraÅ‚a wrote: > >>>On 0817T1244, Hans Ottevanger wrote: > >>>> > >>[...] > >>>>Hi! > >>>> > >>>>Great to see a real autofs finally coming to FreeBSD. > >>>> > >>>>I already did some very cursory testing on a recent 11-CURRENT system > >>>>that I still happened to have and things with at least the /net map > >>>>look quite OK. > >>>> > >>>>I could do some more extensive testing if I could use some of my > >>>>10-STABLE systems. I already checked that the patch applies cleanly > >>>>to a recent 10-STABLE (modulo a few offsets) and that both buildworld > >>>>and buildkernel succeed. Should I expect difficulties actually > >>>>running your autofs on 10-STABLE? > >>> > >>>No, it should be fine. Plan is to MFC this to 10 soon, btw. > >>> > >> > >>Good to see that autofa has been MFC'd during my vacation 8-) > >> > >>But I found a little problem... > >> > >>When I try to access the NFS exported file-systems on an older test > >>machine (running 7.x, but that is not so relevant, it also happens > >>with other servers), with the following exports: > >> > >>$ showmount -e soekris > >>Exports list on soekris: > >>/var 192.168.0.0 > >>/usr 192.168.0.0 > >>/home 192.168.0.0 > >>/ 192.168.0.0 > >> > >>I get: > >> > >>$ ls /net/soekris > >>COPYRIGHT dist libexec proc tmp > >>bin entropy lost+found rescue usr > >>boot etc media root var > >>compat home mnt sbin > >>dev lib > >> > >>which is correct, but the next level fails: > >> > >>$ ls -l /net/soekris/usr > >>total 0 > >> > >>since /usr on soekris is definitely not empty. > >>Relevant output of mount : > >> > >>... > >>map -hosts on /net (autofs) > >>soekris:/ on /net/soekris (nfs, nosuid, automounted) > >> > >>This is on 10.1-PRERELEASE r270922. The kernel config is GENERIC > >>minus devices I do not have and AUTOFS added. Config files > >>(/etc/auto_master, et al) are default. Mounting manually does succeed > >>(in two steps, of course). > >> > >>When trying this from Mac OS X (I am still on Snow Leopard) > >>automounting works as expected. I did not have the opportunity yet to > >>try a Linux box (also do not know whether autofs there has been eaten > >>by systemd already 8-)). > >> > >>Do I miss something, or is this a bug? > > > >It's a bug. Or rather, a missing feature. The problem here is that > >the "/" export "shadows" the rest. To handle this correctly, automountd(8) > >would need to mount the "/" share, then mount autofs on "/usr" etc, and > >then call it done. This part is easy. The problem is: how to expire > >(automatically unmount) it? Because of autofs mounts, the "/" share > >will always be busy, and thus won't ever get automatically unmounted. > >So, for now, we don't even try to handle this situation. > > > >I'm not sure what would the best way to solve it. > > > > Maybe the same way as Mac OS X does. > > On my old MacMini (Snow Leopard) I get in a quiescent state, before > automounting anything: > > /dev/disk0s2 on / (hfs, local, journaled) > devfs on /dev (devfs, local, nobrowse) > map -hosts on /net (autofs, nosuid, automounted, nobrowse) > map auto_home on /home (autofs, automounted, nobrowse) > > Immediately after "ls -l /net/soekris/usr": > > /dev/disk0s2 on / (hfs, local, journaled) > devfs on /dev (devfs, local, nobrowse) > map -hosts on /net (autofs, nosuid, automounted, nobrowse) > map auto_home on /home (autofs, automounted, nobrowse) > soekris:/ on /net/soekris (nfs, nodev, nosuid, automounted, nobrowse) > trigger on /net/soekris/usr (autofs, automounted, nobrowse) > trigger on /net/soekris/var (autofs, automounted, nobrowse) > trigger on /net/soekris/home (autofs, automounted, nobrowse) > soekris:/usr on /net/soekris/usr (nfs, nodev, nosuid, automounted, nobrowse) > > Then, after more than 400 seconds: > > /dev/disk0s2 on / (hfs, local, journaled) > devfs on /dev (devfs, local, nobrowse) > map -hosts on /net (autofs, nosuid, automounted, nobrowse) > map auto_home on /home (autofs, automounted, nobrowse) > soekris:/ on /net/soekris (nfs, nodev, nosuid, automounted, nobrowse) The problem is, in FreeBSD we would never be able to unmount the filesystem above (/net/soekris) due to autofs instances below, which are mounted on top of it: > trigger on /net/soekris/usr (autofs, automounted, nobrowse) > trigger on /net/soekris/var (autofs, automounted, nobrowse) > trigger on /net/soekris/home (autofs, automounted, nobrowse) > > and finally after 600 seconds we are back to: > > /dev/disk0s2 on / (hfs, local, journaled) > devfs on /dev (devfs, local, nobrowse) > map -hosts on /net (autofs, nosuid, automounted, nobrowse) > map auto_home on /home (autofs, automounted, nobrowse) > > So triggers for the subdirectories are automounted on their > automounted parent directory and expiration occurs in steps. BTW, I > reconfigured the automount timeout as 300s (was 3600s) so I do not > fully understand why the first time takes at least 400s. > > If you think it is useful I can grab an older Linux box from my > basement and try to get autofs running on it, to do the same > experiment. I currently do not have a Solaris installation. Nah, not really. I know what the problem is, I'm just not sure what's te best way to approach it. And I really like to fix things the _right_ way. Give me some time :-) From owner-freebsd-arch@FreeBSD.ORG Thu Sep 4 23:08:36 2014 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 3CA47D17 for ; Thu, 4 Sep 2014 23:08:36 +0000 (UTC) Received: from mx36.phplist.com (mx36.phplist.com [50.23.59.119]) by mx1.freebsd.org (Postfix) with ESMTP id 16009130D for ; Thu, 4 Sep 2014 23:08:35 +0000 (UTC) Received: from mx36.phplist.com (mx36.phplist.com [50.23.59.119]) by mx36.phplist.com (Postfix) with ESMTP id 05ED012062 for ; Fri, 5 Sep 2014 00:08:35 +0100 (BST) DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=phplist.com; h=date:to :from:reply-to:subject:message-id:list-unsubscribe:mime-version :content-type:content-transfer-encoding; s=s0; bh=35Eu1lAvSqkpry 01KzNE1nWmQ5Y=; b=x0RS846uPc2fCNSyJhjjKz/rdwuLflRHCyjRi4UE0KV/Oj RaeFLbq7kEd7v0+O66laGPccrcTxXCGgWhvVqQRAhJx+ut1SCXtV2H6s82maQeyg VCA1JvpD/XSjsE1StrQzuMbO71PRgLLDoj6SRnUyFK9RfxREqa5TRWD+mNPt0= Received: from thomas.hosted.phplist.com (mimosa [184.173.18.3]) by mx36.phplist.com (Postfix) with ESMTP id E55F112060 for ; Fri, 5 Sep 2014 00:08:34 +0100 (BST) Received: from s1.960.clients.serverdeals.org [199.192.207.146] by thomas.hosted.phplist.com with HTTP; Thu, 04 Sep 2014 23:08:34 +0000 Date: Thu, 4 Sep 2014 23:08:34 +0000 To: freebsd-arch@freebsd.org From: Silverjewelryworld Reply-To: Silverjewelryworld Subject: Goodbye from our Newsletter Message-ID: X-Priority: 3 X-Mailer: PHPMailer 5.2.5 (https://github.com/Synchro/PHPMailer/) X-phpList-version: 3.0.7-hosted X-MessageID: systemmessage X-ListMember: freebsd-arch@freebsd.org Precedence: bulk Bounces-To: phlistbounces-thomas@phplist.com MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.18-1 List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 04 Sep 2014 23:08:36 -0000 =20 Goodbye from our Newsletter, sorry to see you go. You have been unsubscribed from our newsletters. This is the last email you will receive from us. We have added you to o= ur "blacklist", which means that our newsletter system will refuse to send you any other email, without manual intervention by our administrator. If there is an error in this information, you can re-subscribe: please go to http://thomas.hosted.phplist.com/lists/?p=3Dsubscribe and follow the steps. Thank you =20 =20 From owner-freebsd-arch@FreeBSD.ORG Fri Sep 5 08:18:01 2014 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id F3050E57 for ; Fri, 5 Sep 2014 08:18:01 +0000 (UTC) Received: from mail-wi0-x232.google.com (mail-wi0-x232.google.com [IPv6:2a00:1450:400c:c05::232]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 8BA371C58 for ; Fri, 5 Sep 2014 08:18:01 +0000 (UTC) Received: by mail-wi0-f178.google.com with SMTP id r20so2523357wiv.5 for ; Fri, 05 Sep 2014 01:17:59 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:date:message-id:subject:from:to:cc:content-type; bh=IsSBKBBkgnVGyLJdvzkGThQfq5wZ6dyoUp0rL4aipgY=; b=Xsu6FijFIE/pa4nQHpCflqOCMCQn4KFCB0UUCOH8W4juM2XqkXuYRxrBeBfHMe+Tgy GmnUh3Etd/Sr2IZChr4gs77dfN2NJIKTjS2cTkrE4Og+KpSn2mmcR34dOO/l6I7mXPh0 AtYKuu+lGk9hNwqrhmVVH2hUKl6lbaTuQ8nYgvZkc2wqzOxoICTgAvEP0GZR3AsDBy1x o0rOe9RZdQ4NVH5XaENQrDk7PpPWjfwzbiq/gnk8k7Nj8mztYMIuFWffoVpWE+5b72Iu 2eJLhXCaV0ksJfHYkFpg9cFRt6OwgNfd97MTT0PfKnkr/NuerkdCH9ZSHeY5Z6Jpt7pQ gslw== MIME-Version: 1.0 X-Received: by 10.180.20.196 with SMTP id p4mr1760788wie.56.1409905079799; Fri, 05 Sep 2014 01:17:59 -0700 (PDT) Received: by 10.217.173.196 with HTTP; Fri, 5 Sep 2014 01:17:59 -0700 (PDT) Date: Fri, 5 Sep 2014 04:17:59 -0400 Message-ID: Subject: New ASLR Patch From: Shawn Webb To: FreeBSD-current X-Mailman-Approved-At: Fri, 05 Sep 2014 11:32:56 +0000 Content-Type: text/plain; charset=UTF-8 X-Content-Filtered-By: Mailman/MimeDel 2.1.18-1 Cc: Alan Cox , dev@hardenedbsd.org, Bryan Drewery , Robert Watson , PaX Team , =?UTF-8?Q?Dag=2DErling_Sm=C3=B8rgrav?= X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 05 Sep 2014 08:18:02 -0000 Hey All, I've submitted a new revision of our ASLR patch to Phabric. It can be applied to 11-CURRENT. The main changes include removal of the MAP_32BIT hack for amd64, a couple bug fixes, and stylistic changes requested by a few people. I'm looking for commentary and volunteers for testing. The link to Phabric is below and you can download the raw patch from there. https://reviews.freebsd.org/D473 Thanks, Shawn From owner-freebsd-arch@FreeBSD.ORG Sat Sep 6 23:01:13 2014 Return-Path: Delivered-To: freebsd-arch@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 58E99BE9; Sat, 6 Sep 2014 23:01:13 +0000 (UTC) Received: from wonkity.com (wonkity.com [67.158.26.137]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "wonkity.com", Issuer "wonkity.com" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id 0FEE61FC8; Sat, 6 Sep 2014 23:01:12 +0000 (UTC) Received: from wonkity.com (localhost [127.0.0.1]) by wonkity.com (8.14.9/8.14.9) with ESMTP id s86N1Auw073781 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO); Sat, 6 Sep 2014 17:01:10 -0600 (MDT) (envelope-from wblock@wonkity.com) Received: from localhost (wblock@localhost) by wonkity.com (8.14.9/8.14.9/Submit) with ESMTP id s86N1AOU073778; Sat, 6 Sep 2014 17:01:10 -0600 (MDT) (envelope-from wblock@wonkity.com) Date: Sat, 6 Sep 2014 17:01:10 -0600 (MDT) From: Warren Block To: freebsd-arch@FreeBSD.org Subject: Improving /etc/motd and ANSI Message-ID: User-Agent: Alpine 2.11 (BSF 23 2013-08-11) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; format=flowed; charset=US-ASCII X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.4.3 (wonkity.com [127.0.0.1]); Sat, 06 Sep 2014 17:01:10 -0600 (MDT) Cc: wblock@FreeBSD.org X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 06 Sep 2014 23:01:13 -0000 /etc/motd has been in need of improvement for some time. Recently, I did a rewrite: http://www.wonkity.com/~wblock/motd/motd http://www.wonkity.com/~wblock/motd/motd.diff This new version still has the problem of using "in-band" quote marks to mark up the commands. We tell the reader to run `man man', for example, but it's not particularly obvious that the quotes should not be entered. As an experiment, this version uses ANSI underline escape sequences: http://www.wonkity.com/~wblock/motd/motd.ansi That reads better, is less likely to be misunderstood, and will work on normal consoles and most terminal emulations in use today. It will not display correctly on things that do not understand VT100/VT220 or ANSI codes, but I suspect that is a vanishingly small portion of the user base. Those users are also likely to be familiar with the problem. Is there some showstopper reason not to commit this ANSI version? From owner-freebsd-arch@FreeBSD.ORG Sat Sep 6 23:21:36 2014 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 62F44190 for ; Sat, 6 Sep 2014 23:21:36 +0000 (UTC) Received: from mail-ie0-f179.google.com (mail-ie0-f179.google.com [209.85.223.179]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 2954712AB for ; Sat, 6 Sep 2014 23:21:35 +0000 (UTC) Received: by mail-ie0-f179.google.com with SMTP id rl12so498221iec.38 for ; Sat, 06 Sep 2014 16:21:35 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:sender:content-type:mime-version:subject:from :in-reply-to:date:cc:message-id:references:to; bh=QQvwZhcNOrIAxyBWfHb4j6+6soUb5UTgIeWzpisoah0=; b=gkfob5xS9p/mK9csWsO/NWROaheHSTKeLLDBKlNL1NVeDyYZq1eIAgp9yc/AgfF1dX Q7LXpNFU4B2cULkexuDn0p4GR57Y0KpXqfnpQxa+6A2xLUvXf/k8srAdXnwvEDA60cMH yivF/GKtcVJGnXfbfmkvy5eFwaFfCv8H2w473/5x0A/ezdw49Fzt6N/wUcLp6tlKYDjr /hc3M1tcjRnCJy8UZr547zzT310ZIo07Z12y0BD+g+JY59ONvF1n9RVy9VosYfcMvg2+ 2cTQCeP3HMUKXQOMB76GwuW8O42bKzOQ6As8agPQIKRjpDHfVCheYO0s3AoBNiKcmsJb bcBg== X-Gm-Message-State: ALoCoQmxdFhqE2EK884K4b95qJS+nA2jg1jMQnv2wPMzzfHrseMhei5Io8HfK/lPko9NM3a5PNnr X-Received: by 10.50.50.229 with SMTP id f5mr14096432igo.42.1410045353933; Sat, 06 Sep 2014 16:15:53 -0700 (PDT) Received: from netflix-mac.bsdimp.com (50-78-194-198-static.hfc.comcastbusiness.net. [50.78.194.198]) by mx.google.com with ESMTPSA id a11sm7185824igm.3.2014.09.06.16.15.53 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Sat, 06 Sep 2014 16:15:53 -0700 (PDT) Sender: Warner Losh Content-Type: multipart/signed; boundary="Apple-Mail=_915E61AF-6F44-4602-A758-F6C844F0E0C6"; protocol="application/pgp-signature"; micalg=pgp-sha512 Mime-Version: 1.0 (Mac OS X Mail 7.3 \(1878.6\)) Subject: Re: Improving /etc/motd and ANSI From: Warner Losh In-Reply-To: Date: Sat, 6 Sep 2014 17:15:51 -0600 Message-Id: References: To: Warren Block X-Mailer: Apple Mail (2.1878.6) Cc: freebsd-arch@FreeBSD.org X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 06 Sep 2014 23:21:36 -0000 --Apple-Mail=_915E61AF-6F44-4602-A758-F6C844F0E0C6 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=windows-1252 On Sep 6, 2014, at 5:01 PM, Warren Block wrote: > /etc/motd has been in need of improvement for some time. Recently, I = did a rewrite: >=20 > http://www.wonkity.com/~wblock/motd/motd > http://www.wonkity.com/~wblock/motd/motd.diff >=20 > This new version still has the problem of using "in-band" quote marks = to mark up the commands. We tell the reader to run `man man', for = example, but it's not particularly obvious that the quotes should not be = entered. >=20 > As an experiment, this version uses ANSI underline escape sequences: > http://www.wonkity.com/~wblock/motd/motd.ansi >=20 > That reads better, is less likely to be misunderstood, and will work = on normal consoles and most terminal emulations in use today. >=20 > It will not display correctly on things that do not understand = VT100/VT220 or ANSI codes, but I suspect that is a vanishingly small = portion of the user base. Those users are also likely to be familiar = with the problem. >=20 > Is there some showstopper reason not to commit this ANSI version? It embeds the notion that all the world is a VT100 and interprets the = ANSI escape code identically. In years past, this definitely wasn=92t the case. But in those years we = had many different breeds of terminal roaming the earth, and these terminals were all somewhat = different (even at the same installation you=92d have a heterogeneous setup because different = departments got different vendors to supply their gear). These changes would break that. One of = the nice things about Unix has always been it played very nicely in a heterogeneous = environment and all fancy smancy curses action was done through a layer of indirection so it would = work everywhere, unlike VMS where things were more hard-coded and it was always hard to = use non-DEC gear. It also assumes that all users want to see the fancy ANSI version with = underlines and such. While rather innocuous, one needn=92t look any farther than gnu=92s color ls = to see what madness lies not too far down this path. Finally, console scraping code may be affected in some minor way and = you=92ll wind up with text that looks weird. None of these are huge show-stoppers. But it is a very nice camel=92s = nose at the moment, and I=92d hate to see the rest of the camel=85[*] Warner [*] http://en.wikipedia.org/wiki/Camel's_nose --Apple-Mail=_915E61AF-6F44-4602-A758-F6C844F0E0C6 Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename=signature.asc Content-Type: application/pgp-signature; name=signature.asc Content-Description: Message signed with OpenPGP using GPGMail -----BEGIN PGP SIGNATURE----- Comment: GPGTools - https://gpgtools.org iQIcBAEBCgAGBQJUC5WnAAoJEGwc0Sh9sBEAh3wP/3HrEa9HIkO59qmHhT+AjqHU cu2FfK+BS+Nmj+1X4JHo0OJNm4nTeld3dpD3Z/etzgTuiFh9T3v3tLp33rb2JROR UrwV9X2ge29b+aP2bC+Aid6dHih/apq+YhCdC9jwhexzPwYWQaWio73AnbCBPAR1 sb1K4gAa18/EC6K2fHdWhgHsxCqfUhToJFRxMwGwpUaB4c/FHH402NYuDieN4+M4 ZQFKjM8AckzDY3MmEd/cKxAuqRsfhjlM7OWZuV8gIlM+UEWNnCmyYEql5ZYdFdeA gTecCHF+mQZTDRPrQYHq3ajs1nCOCw4/y7v0BGP4/Q/lKR9rtvuuGQ55r8igjPx4 fouoLpQjJ9xOJfX+fUsHRGqWkzgsFOw8jWDaWlDaCzRi5NOXrga20aQspLFkU8m2 rFwMrlHaM0CaA9YIxQmLBsJ5gyLlQ3B9ax3WVk9FRWLHEvQdH68f1b8iY4rCsRWH agtF0iE+urk4J8Ud4p3mIbQndGxUOefbTE3deENERLdGKvk6SkzWY9r+PHXQQhIP qmHCkqZ4ulWKf3trzCPIKQPMcMH6jukV/WlbATEEmcGDUnqfwqWkPywhngoPG6lv bigxH4D6JOWt9Df5Z3ol1Eb0uTV5YAiFkZQhom/7inbhB3Vz+aPFImL9RjuvjyPb uT/4k80I/Mn3Lo2ewM7D =GRJx -----END PGP SIGNATURE----- --Apple-Mail=_915E61AF-6F44-4602-A758-F6C844F0E0C6--