Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 31 Aug 2014 20:48:20 +0400
From:      Gleb Smirnoff <glebius@FreeBSD.org>
To:        arch@FreeBSD.org
Cc:        alc@FreeBSD.org
Subject:   Re: [CFT/review] new sendfile(2)
Message-ID:  <20140831164820.GD7693@FreeBSD.org>
In-Reply-To: <20140529102054.GX50679@FreeBSD.org>
References:  <20140529102054.GX50679@FreeBSD.org>

next in thread | previous in thread | raw e-mail | index | archive | help

--hTiIB9CRvBOLTyqY
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline

  Hi!

  Just a followup with fresh version of the patch. For details
see below.

On Thu, May 29, 2014 at 02:20:54PM +0400, Gleb Smirnoff wrote:
T>   Hello!
T> 
T>   At Netflix and Nginx we are experimenting with improving FreeBSD
T> wrt sending large amounts of static data via HTTP.
T> 
T>   One of the approaches we are experimenting with is new sendfile(2)
T> implementation, that doesn't block on the I/O done from the file
T> descriptor.
T> 
T>   The problem with classic sendfile(2) is that if the the request
T> length is large enough, and file data is not cached in VM, then
T> sendfile(2) syscall would not return until it fills socket buffer
T> with data. With modern internet socket buffers can be up to 1 Mb,
T> thus time taken by the syscall raises by order of magnitude. All
T> the time, the nginx worker is blocked in syscall and doesn't
T> process data from other clients. The best current practice to
T> mitigate that is known as "sendfile(2) + aio_read(2)". This is
T> special mode of nginx operation on FreeBSD. The sendfile(2) call
T> is issued with SF_NODISKIO flag, that forbids the syscall to
T> perform disk I/O, and send only data that is cached by VM. If
T> sendfile(2) reports that I/O needs to be done (but forbidden), then
T> nginx would do aio_read() of a chunk of the file. The data read
T> is cached by VM, as side affect. Then sendfile() is called again.
T> 
T>   Now for the new sendfile. The core idea is that sendfile()
T> schedules the I/O, but doesn't wait for it to complete. It
T> returns immediately to the process, and I/O completion is
T> processed in kernel context. Unlike aio(4), no additional
T> threads in kernel are created. The new sendfile is a drop-in
T> replacement for the old one. Applications (like nginx) doesn't
T> need recompile, neither configuration change. The SF_NODISKIO is
T> ignored.
T> 
T>   The patch for review is available at:
T> 
T> https://phabric.freebsd.org/D102
T> 
T> And for those who prefer email attachments, it is also attached.
T> The patch has 3 logically separate changes in itself:
T> 
T> 1) Split of socket buffer sb_cc field into sb_acc and sb_ccc. Where
T> sb_acc stands for "available character count" and sb_ccc is "claimed
T> character count". This allows us to write a data to a socket, that is
T> not ready yet. The data sits in the socket, consumes its space, and
T> keeps itself in the right order with earlier or later writes to socket.
T> But it can be send only after it is marked as ready. This change is
T> split across many files.
T> 
T> 2) A new vnode operation: VOP_GETPAGES_ASYNC(). This one lives in sys/vm.
T> 
T> 3) Actual implementation of new sendfile(2). This one lives in
T> kern/uipc_syscalls.c
T> 
T> 
T> 
T>   At Netflix, we already see improvements with new sendfile(2).
T> We can send more data utilizing same amount of CPU, and we can
T> push closer to 0% idle, without experiencing short lags.
T> 
T> However, we have somewhat modified VM subsystem, that behaves
T> optimal for our task, but suboptimal for average FreeBSD system.
T> I'd like someone from community to try the new sendfile(2) at
T> other setup and see how does it serve for you.
T> 
T>   To be the early tester you need to checkout projects/sendfile
T> branch and build kernel from it. The world from head/ would
T> run fine with it.
T> 
T>   svn co http://svn.freebsd.org/base/projects/sendfile
T>   cd sendfile
T>   ... build kernel ...
T> 
T> Limitations:
T> - No testing were done on serving files on NFS.
T> - No testing were done on serving files on ZFS.
T> 
T> -- 
T> Totus tuus, Glebius.

T> Index: sys/dev/ti/if_ti.c
T> ===================================================================
T> --- sys/dev/ti/if_ti.c	(.../head)	(revision 266804)
T> +++ sys/dev/ti/if_ti.c	(.../projects/sendfile)	(revision 266807)
T> @@ -1629,7 +1629,7 @@ ti_newbuf_jumbo(struct ti_softc *sc, int idx, stru
T>  			m[i]->m_data = (void *)sf_buf_kva(sf[i]);
T>  			m[i]->m_len = PAGE_SIZE;
T>  			MEXTADD(m[i], sf_buf_kva(sf[i]), PAGE_SIZE,
T> -			    sf_buf_mext, (void*)sf_buf_kva(sf[i]), sf[i],
T> +			    sf_mext_free, (void*)sf_buf_kva(sf[i]), sf[i],
T>  			    0, EXT_DISPOSABLE);
T>  			m[i]->m_next = m[i+1];
T>  		}
T> @@ -1694,7 +1694,7 @@ nobufs:
T>  		if (m[i])
T>  			m_freem(m[i]);
T>  		if (sf[i])
T> -			sf_buf_mext((void *)sf_buf_kva(sf[i]), sf[i]);
T> +			sf_mext_free((void *)sf_buf_kva(sf[i]), sf[i]);
T>  	}
T>  	return (ENOBUFS);
T>  }
T> Index: sys/dev/cxgbe/tom/t4_cpl_io.c
T> ===================================================================
T> --- sys/dev/cxgbe/tom/t4_cpl_io.c	(.../head)	(revision 266804)
T> +++ sys/dev/cxgbe/tom/t4_cpl_io.c	(.../projects/sendfile)	(revision 266807)
T> @@ -338,11 +338,11 @@ t4_rcvd(struct toedev *tod, struct tcpcb *tp)
T>  	INP_WLOCK_ASSERT(inp);
T>  
T>  	SOCKBUF_LOCK(sb);
T> -	KASSERT(toep->sb_cc >= sb->sb_cc,
T> +	KASSERT(toep->sb_cc >= sbused(sb),
T>  	    ("%s: sb %p has more data (%d) than last time (%d).",
T> -	    __func__, sb, sb->sb_cc, toep->sb_cc));
T> -	toep->rx_credits += toep->sb_cc - sb->sb_cc;
T> -	toep->sb_cc = sb->sb_cc;
T> +	    __func__, sb, sbused(sb), toep->sb_cc));
T> +	toep->rx_credits += toep->sb_cc - sbused(sb);
T> +	toep->sb_cc = sbused(sb);
T>  	credits = toep->rx_credits;
T>  	SOCKBUF_UNLOCK(sb);
T>  
T> @@ -863,15 +863,15 @@ do_peer_close(struct sge_iq *iq, const struct rss_
T>  		tp->rcv_nxt = be32toh(cpl->rcv_nxt);
T>  		toep->ddp_flags &= ~(DDP_BUF0_ACTIVE | DDP_BUF1_ACTIVE);
T>  
T> -		KASSERT(toep->sb_cc >= sb->sb_cc,
T> +		KASSERT(toep->sb_cc >= sbused(sb),
T>  		    ("%s: sb %p has more data (%d) than last time (%d).",
T> -		    __func__, sb, sb->sb_cc, toep->sb_cc));
T> -		toep->rx_credits += toep->sb_cc - sb->sb_cc;
T> +		    __func__, sb, sbused(sb), toep->sb_cc));
T> +		toep->rx_credits += toep->sb_cc - sbused(sb);
T>  #ifdef USE_DDP_RX_FLOW_CONTROL
T>  		toep->rx_credits -= m->m_len;	/* adjust for F_RX_FC_DDP */
T>  #endif
T> -		sbappendstream_locked(sb, m);
T> -		toep->sb_cc = sb->sb_cc;
T> +		sbappendstream_locked(sb, m, 0);
T> +		toep->sb_cc = sbused(sb);
T>  	}
T>  	socantrcvmore_locked(so);	/* unlocks the sockbuf */
T>  
T> @@ -1281,12 +1281,12 @@ do_rx_data(struct sge_iq *iq, const struct rss_hea
T>  		}
T>  	}
T>  
T> -	KASSERT(toep->sb_cc >= sb->sb_cc,
T> +	KASSERT(toep->sb_cc >= sbused(sb),
T>  	    ("%s: sb %p has more data (%d) than last time (%d).",
T> -	    __func__, sb, sb->sb_cc, toep->sb_cc));
T> -	toep->rx_credits += toep->sb_cc - sb->sb_cc;
T> -	sbappendstream_locked(sb, m);
T> -	toep->sb_cc = sb->sb_cc;
T> +	    __func__, sb, sbused(sb), toep->sb_cc));
T> +	toep->rx_credits += toep->sb_cc - sbused(sb);
T> +	sbappendstream_locked(sb, m, 0);
T> +	toep->sb_cc = sbused(sb);
T>  	sorwakeup_locked(so);
T>  	SOCKBUF_UNLOCK_ASSERT(sb);
T>  
T> Index: sys/dev/cxgbe/tom/t4_ddp.c
T> ===================================================================
T> --- sys/dev/cxgbe/tom/t4_ddp.c	(.../head)	(revision 266804)
T> +++ sys/dev/cxgbe/tom/t4_ddp.c	(.../projects/sendfile)	(revision 266807)
T> @@ -224,15 +224,15 @@ insert_ddp_data(struct toepcb *toep, uint32_t n)
T>  	tp->rcv_wnd -= n;
T>  #endif
T>  
T> -	KASSERT(toep->sb_cc >= sb->sb_cc,
T> +	KASSERT(toep->sb_cc >= sbused(sb),
T>  	    ("%s: sb %p has more data (%d) than last time (%d).",
T> -	    __func__, sb, sb->sb_cc, toep->sb_cc));
T> -	toep->rx_credits += toep->sb_cc - sb->sb_cc;
T> +	    __func__, sb, sbused(sb), toep->sb_cc));
T> +	toep->rx_credits += toep->sb_cc - sbused(sb);
T>  #ifdef USE_DDP_RX_FLOW_CONTROL
T>  	toep->rx_credits -= n;	/* adjust for F_RX_FC_DDP */
T>  #endif
T> -	sbappendstream_locked(sb, m);
T> -	toep->sb_cc = sb->sb_cc;
T> +	sbappendstream_locked(sb, m, 0);
T> +	toep->sb_cc = sbused(sb);
T>  }
T>  
T>  /* SET_TCB_FIELD sent as a ULP command looks like this */
T> @@ -459,15 +459,15 @@ handle_ddp_data(struct toepcb *toep, __be32 ddp_re
T>  	else
T>  		discourage_ddp(toep);
T>  
T> -	KASSERT(toep->sb_cc >= sb->sb_cc,
T> +	KASSERT(toep->sb_cc >= sbused(sb),
T>  	    ("%s: sb %p has more data (%d) than last time (%d).",
T> -	    __func__, sb, sb->sb_cc, toep->sb_cc));
T> -	toep->rx_credits += toep->sb_cc - sb->sb_cc;
T> +	    __func__, sb, sbused(sb), toep->sb_cc));
T> +	toep->rx_credits += toep->sb_cc - sbused(sb);
T>  #ifdef USE_DDP_RX_FLOW_CONTROL
T>  	toep->rx_credits -= len;	/* adjust for F_RX_FC_DDP */
T>  #endif
T> -	sbappendstream_locked(sb, m);
T> -	toep->sb_cc = sb->sb_cc;
T> +	sbappendstream_locked(sb, m, 0);
T> +	toep->sb_cc = sbused(sb);
T>  wakeup:
T>  	KASSERT(toep->ddp_flags & db_flag,
T>  	    ("%s: DDP buffer not active. toep %p, ddp_flags 0x%x, report 0x%x",
T> @@ -897,7 +897,7 @@ handle_ddp(struct socket *so, struct uio *uio, int
T>  #endif
T>  
T>  	/* XXX: too eager to disable DDP, could handle NBIO better than this. */
T> -	if (sb->sb_cc >= uio->uio_resid || uio->uio_resid < sc->tt.ddp_thres ||
T> +	if (sbused(sb) >= uio->uio_resid || uio->uio_resid < sc->tt.ddp_thres ||
T>  	    uio->uio_resid > MAX_DDP_BUFFER_SIZE || uio->uio_iovcnt > 1 ||
T>  	    so->so_state & SS_NBIO || flags & (MSG_DONTWAIT | MSG_NBIO) ||
T>  	    error || so->so_error || sb->sb_state & SBS_CANTRCVMORE)
T> @@ -935,7 +935,7 @@ handle_ddp(struct socket *so, struct uio *uio, int
T>  	 * payload.
T>  	 */
T>  	ddp_flags = select_ddp_flags(so, flags, db_idx);
T> -	wr = mk_update_tcb_for_ddp(sc, toep, db_idx, sb->sb_cc, ddp_flags);
T> +	wr = mk_update_tcb_for_ddp(sc, toep, db_idx, sbused(sb), ddp_flags);
T>  	if (wr == NULL) {
T>  		/*
T>  		 * Just unhold the pages.  The DDP buffer's software state is
T> @@ -960,8 +960,9 @@ handle_ddp(struct socket *so, struct uio *uio, int
T>  	 */
T>  	rc = sbwait(sb);
T>  	while (toep->ddp_flags & buf_flag) {
T> +		/* XXXGL: shouldn't here be sbwait() call? */
T>  		sb->sb_flags |= SB_WAIT;
T> -		msleep(&sb->sb_cc, &sb->sb_mtx, PSOCK , "sbwait", 0);
T> +		msleep(&sb->sb_acc, &sb->sb_mtx, PSOCK , "sbwait", 0);
T>  	}
T>  	unwire_ddp_buffer(db);
T>  	return (rc);
T> @@ -1123,8 +1124,8 @@ restart:
T>  
T>  		/* uio should be just as it was at entry */
T>  		KASSERT(oresid == uio->uio_resid,
T> -		    ("%s: oresid = %d, uio_resid = %zd, sb_cc = %d",
T> -		    __func__, oresid, uio->uio_resid, sb->sb_cc));
T> +		    ("%s: oresid = %d, uio_resid = %zd, sbused = %d",
T> +		    __func__, oresid, uio->uio_resid, sbused(sb)));
T>  
T>  		error = handle_ddp(so, uio, flags, 0);
T>  		ddp_handled = 1;
T> @@ -1134,7 +1135,7 @@ restart:
T>  
T>  	/* Abort if socket has reported problems. */
T>  	if (so->so_error) {
T> -		if (sb->sb_cc > 0)
T> +		if (sbused(sb))
T>  			goto deliver;
T>  		if (oresid > uio->uio_resid)
T>  			goto out;
T> @@ -1146,7 +1147,7 @@ restart:
T>  
T>  	/* Door is closed.  Deliver what is left, if any. */
T>  	if (sb->sb_state & SBS_CANTRCVMORE) {
T> -		if (sb->sb_cc > 0)
T> +		if (sbused(sb))
T>  			goto deliver;
T>  		else
T>  			goto out;
T> @@ -1153,7 +1154,7 @@ restart:
T>  	}
T>  
T>  	/* Socket buffer is empty and we shall not block. */
T> -	if (sb->sb_cc == 0 &&
T> +	if (sbused(sb) == 0 &&
T>  	    ((so->so_state & SS_NBIO) || (flags & (MSG_DONTWAIT|MSG_NBIO)))) {
T>  		error = EAGAIN;
T>  		goto out;
T> @@ -1160,18 +1161,18 @@ restart:
T>  	}
T>  
T>  	/* Socket buffer got some data that we shall deliver now. */
T> -	if (sb->sb_cc > 0 && !(flags & MSG_WAITALL) &&
T> +	if (sbused(sb) && !(flags & MSG_WAITALL) &&
T>  	    ((sb->sb_flags & SS_NBIO) ||
T>  	     (flags & (MSG_DONTWAIT|MSG_NBIO)) ||
T> -	     sb->sb_cc >= sb->sb_lowat ||
T> -	     sb->sb_cc >= uio->uio_resid ||
T> -	     sb->sb_cc >= sb->sb_hiwat) ) {
T> +	     sbused(sb) >= sb->sb_lowat ||
T> +	     sbused(sb) >= uio->uio_resid ||
T> +	     sbused(sb) >= sb->sb_hiwat) ) {
T>  		goto deliver;
T>  	}
T>  
T>  	/* On MSG_WAITALL we must wait until all data or error arrives. */
T>  	if ((flags & MSG_WAITALL) &&
T> -	    (sb->sb_cc >= uio->uio_resid || sb->sb_cc >= sb->sb_lowat))
T> +	    (sbused(sb) >= uio->uio_resid || sbused(sb) >= sb->sb_lowat))
T>  		goto deliver;
T>  
T>  	/*
T> @@ -1190,7 +1191,7 @@ restart:
T>  
T>  deliver:
T>  	SOCKBUF_LOCK_ASSERT(&so->so_rcv);
T> -	KASSERT(sb->sb_cc > 0, ("%s: sockbuf empty", __func__));
T> +	KASSERT(sbused(sb) > 0, ("%s: sockbuf empty", __func__));
T>  	KASSERT(sb->sb_mb != NULL, ("%s: sb_mb == NULL", __func__));
T>  
T>  	if (sb->sb_flags & SB_DDP_INDICATE && !ddp_handled)
T> @@ -1201,7 +1202,7 @@ deliver:
T>  		uio->uio_td->td_ru.ru_msgrcv++;
T>  
T>  	/* Fill uio until full or current end of socket buffer is reached. */
T> -	len = min(uio->uio_resid, sb->sb_cc);
T> +	len = min(uio->uio_resid, sbused(sb));
T>  	if (mp0 != NULL) {
T>  		/* Dequeue as many mbufs as possible. */
T>  		if (!(flags & MSG_PEEK) && len >= sb->sb_mb->m_len) {
T> Index: sys/dev/cxgbe/iw_cxgbe/cm.c
T> ===================================================================
T> --- sys/dev/cxgbe/iw_cxgbe/cm.c	(.../head)	(revision 266804)
T> +++ sys/dev/cxgbe/iw_cxgbe/cm.c	(.../projects/sendfile)	(revision 266807)
T> @@ -585,8 +585,8 @@ process_data(struct c4iw_ep *ep)
T>  {
T>  	struct sockaddr_in *local, *remote;
T>  
T> -	CTR5(KTR_IW_CXGBE, "%s: so %p, ep %p, state %s, sb_cc %d", __func__,
T> -	    ep->com.so, ep, states[ep->com.state], ep->com.so->so_rcv.sb_cc);
T> +	CTR5(KTR_IW_CXGBE, "%s: so %p, ep %p, state %s, sbused %d", __func__,
T> +	    ep->com.so, ep, states[ep->com.state], sbused(&ep->com.so->so_rcv));
T>  
T>  	switch (state_read(&ep->com)) {
T>  	case MPA_REQ_SENT:
T> @@ -602,11 +602,11 @@ process_data(struct c4iw_ep *ep)
T>  		process_mpa_request(ep);
T>  		break;
T>  	default:
T> -		if (ep->com.so->so_rcv.sb_cc)
T> -			log(LOG_ERR, "%s: Unexpected streaming data.  "
T> -			    "ep %p, state %d, so %p, so_state 0x%x, sb_cc %u\n",
T> +		if (sbused(&ep->com.so->so_rcv))
T> +			log(LOG_ERR, "%s: Unexpected streaming data. ep %p, "
T> +			    "state %d, so %p, so_state 0x%x, sbused %u\n",
T>  			    __func__, ep, state_read(&ep->com), ep->com.so,
T> -			    ep->com.so->so_state, ep->com.so->so_rcv.sb_cc);
T> +			    ep->com.so->so_state, sbused(&ep->com.so->so_rcv));
T>  		break;
T>  	}
T>  }
T> Index: sys/dev/iscsi/icl.c
T> ===================================================================
T> --- sys/dev/iscsi/icl.c	(.../head)	(revision 266804)
T> +++ sys/dev/iscsi/icl.c	(.../projects/sendfile)	(revision 266807)
T> @@ -758,7 +758,7 @@ icl_receive_thread(void *arg)
T>  		 * is enough data received to read the PDU.
T>  		 */
T>  		SOCKBUF_LOCK(&so->so_rcv);
T> -		available = so->so_rcv.sb_cc;
T> +		available = sbavail(&so->so_rcv);
T>  		if (available < ic->ic_receive_len) {
T>  			so->so_rcv.sb_lowat = ic->ic_receive_len;
T>  			cv_wait(&ic->ic_receive_cv, &so->so_rcv.sb_mtx);
T> Index: sys/dev/cxgb/ulp/tom/cxgb_cpl_io.c
T> ===================================================================
T> --- sys/dev/cxgb/ulp/tom/cxgb_cpl_io.c	(.../head)	(revision 266804)
T> +++ sys/dev/cxgb/ulp/tom/cxgb_cpl_io.c	(.../projects/sendfile)	(revision 266807)
T> @@ -445,8 +445,8 @@ t3_push_frames(struct socket *so, int req_completi
T>  	 * Autosize the send buffer.
T>  	 */
T>  	if (snd->sb_flags & SB_AUTOSIZE && VNET(tcp_do_autosndbuf)) {
T> -		if (snd->sb_cc >= (snd->sb_hiwat / 8 * 7) &&
T> -		    snd->sb_cc < VNET(tcp_autosndbuf_max)) {
T> +		if (sbused(snd) >= (snd->sb_hiwat / 8 * 7) &&
T> +		    sbused(snd) < VNET(tcp_autosndbuf_max)) {
T>  			if (!sbreserve_locked(snd, min(snd->sb_hiwat +
T>  			    VNET(tcp_autosndbuf_inc), VNET(tcp_autosndbuf_max)),
T>  			    so, curthread))
T> @@ -597,10 +597,10 @@ t3_rcvd(struct toedev *tod, struct tcpcb *tp)
T>  	INP_WLOCK_ASSERT(inp);
T>  
T>  	SOCKBUF_LOCK(so_rcv);
T> -	KASSERT(toep->tp_enqueued >= so_rcv->sb_cc,
T> -	    ("%s: so_rcv->sb_cc > enqueued", __func__));
T> -	toep->tp_rx_credits += toep->tp_enqueued - so_rcv->sb_cc;
T> -	toep->tp_enqueued = so_rcv->sb_cc;
T> +	KASSERT(toep->tp_enqueued >= sbused(so_rcv),
T> +	    ("%s: sbused(so_rcv) > enqueued", __func__));
T> +	toep->tp_rx_credits += toep->tp_enqueued - sbused(so_rcv);
T> +	toep->tp_enqueued = sbused(so_rcv);
T>  	SOCKBUF_UNLOCK(so_rcv);
T>  
T>  	must_send = toep->tp_rx_credits + 16384 >= tp->rcv_wnd;
T> @@ -1199,7 +1199,7 @@ do_rx_data(struct sge_qset *qs, struct rsp_desc *r
T>  	}
T>  
T>  	toep->tp_enqueued += m->m_pkthdr.len;
T> -	sbappendstream_locked(so_rcv, m);
T> +	sbappendstream_locked(so_rcv, m, 0);
T>  	sorwakeup_locked(so);
T>  	SOCKBUF_UNLOCK_ASSERT(so_rcv);
T>  
T> @@ -1768,7 +1768,7 @@ wr_ack(struct toepcb *toep, struct mbuf *m)
T>  		so_sowwakeup_locked(so);
T>  	}
T>  
T> -	if (snd->sb_sndptroff < snd->sb_cc)
T> +	if (snd->sb_sndptroff < sbused(snd))
T>  		t3_push_frames(so, 0);
T>  
T>  out_free:
T> Index: sys/dev/cxgb/ulp/iw_cxgb/iw_cxgb_cm.c
T> ===================================================================
T> --- sys/dev/cxgb/ulp/iw_cxgb/iw_cxgb_cm.c	(.../head)	(revision 266804)
T> +++ sys/dev/cxgb/ulp/iw_cxgb/iw_cxgb_cm.c	(.../projects/sendfile)	(revision 266807)
T> @@ -1515,11 +1515,11 @@ process_data(struct iwch_ep *ep)
T>  		process_mpa_request(ep);
T>  		break;
T>  	default:
T> -		if (ep->com.so->so_rcv.sb_cc) 
T> +		if (sbavail(&ep->com.so->so_rcv)) 
T>  			printf("%s Unexpected streaming data."
T>  			       " ep %p state %d so %p so_state %x so_rcv.sb_cc %u so_rcv.sb_mb %p\n",
T>  			       __FUNCTION__, ep, state_read(&ep->com), ep->com.so, ep->com.so->so_state,
T> -			       ep->com.so->so_rcv.sb_cc, ep->com.so->so_rcv.sb_mb);
T> +			       sbavail(&ep->com.so->so_rcv), ep->com.so->so_rcv.sb_mb);
T>  		break;
T>  	}
T>  	return;
T> Index: sys/kern/uipc_debug.c
T> ===================================================================
T> --- sys/kern/uipc_debug.c	(.../head)	(revision 266804)
T> +++ sys/kern/uipc_debug.c	(.../projects/sendfile)	(revision 266807)
T> @@ -403,7 +403,8 @@ db_print_sockbuf(struct sockbuf *sb, const char *s
T>  	db_printf("sb_sndptroff: %u\n", sb->sb_sndptroff);
T>  
T>  	db_print_indent(indent);
T> -	db_printf("sb_cc: %u   ", sb->sb_cc);
T> +	db_printf("sb_acc: %u   ", sb->sb_acc);
T> +	db_printf("sb_ccc: %u   ", sb->sb_ccc);
T>  	db_printf("sb_hiwat: %u   ", sb->sb_hiwat);
T>  	db_printf("sb_mbcnt: %u   ", sb->sb_mbcnt);
T>  	db_printf("sb_mbmax: %u\n", sb->sb_mbmax);
T> Index: sys/kern/uipc_mbuf.c
T> ===================================================================
T> --- sys/kern/uipc_mbuf.c	(.../head)	(revision 266804)
T> +++ sys/kern/uipc_mbuf.c	(.../projects/sendfile)	(revision 266807)
T> @@ -389,7 +389,7 @@ mb_dupcl(struct mbuf *n, struct mbuf *m)
T>   * cleaned too.
T>   */
T>  void
T> -m_demote(struct mbuf *m0, int all)
T> +m_demote(struct mbuf *m0, int all, int flags)
T>  {
T>  	struct mbuf *m;
T>  
T> @@ -405,7 +405,7 @@ void
T>  			m_freem(m->m_nextpkt);
T>  			m->m_nextpkt = NULL;
T>  		}
T> -		m->m_flags = m->m_flags & (M_EXT|M_RDONLY|M_NOFREE);
T> +		m->m_flags = m->m_flags & (M_EXT | M_RDONLY | M_NOFREE | flags);
T>  	}
T>  }
T>  
T> Index: sys/kern/sys_socket.c
T> ===================================================================
T> --- sys/kern/sys_socket.c	(.../head)	(revision 266804)
T> +++ sys/kern/sys_socket.c	(.../projects/sendfile)	(revision 266807)
T> @@ -167,20 +167,17 @@ soo_ioctl(struct file *fp, u_long cmd, void *data,
T>  
T>  	case FIONREAD:
T>  		/* Unlocked read. */
T> -		*(int *)data = so->so_rcv.sb_cc;
T> +		*(int *)data = sbavail(&so->so_rcv);
T>  		break;
T>  
T>  	case FIONWRITE:
T>  		/* Unlocked read. */
T> -		*(int *)data = so->so_snd.sb_cc;
T> +		*(int *)data = sbavail(&so->so_snd);
T>  		break;
T>  
T>  	case FIONSPACE:
T> -		if ((so->so_snd.sb_hiwat < so->so_snd.sb_cc) ||
T> -		    (so->so_snd.sb_mbmax < so->so_snd.sb_mbcnt))
T> -			*(int *)data = 0;
T> -		else
T> -			*(int *)data = sbspace(&so->so_snd);
T> +		/* Unlocked read. */
T> +		*(int *)data = sbspace(&so->so_snd);
T>  		break;
T>  
T>  	case FIOSETOWN:
T> @@ -246,6 +243,7 @@ soo_stat(struct file *fp, struct stat *ub, struct
T>      struct thread *td)
T>  {
T>  	struct socket *so = fp->f_data;
T> +	struct sockbuf *sb;
T>  #ifdef MAC
T>  	int error;
T>  #endif
T> @@ -261,15 +259,18 @@ soo_stat(struct file *fp, struct stat *ub, struct
T>  	 * If SBS_CANTRCVMORE is set, but there's still data left in the
T>  	 * receive buffer, the socket is still readable.
T>  	 */
T> -	SOCKBUF_LOCK(&so->so_rcv);
T> -	if ((so->so_rcv.sb_state & SBS_CANTRCVMORE) == 0 ||
T> -	    so->so_rcv.sb_cc != 0)
T> +	sb = &so->so_rcv;
T> +	SOCKBUF_LOCK(sb);
T> +	if ((sb->sb_state & SBS_CANTRCVMORE) == 0 || sbavail(sb))
T>  		ub->st_mode |= S_IRUSR | S_IRGRP | S_IROTH;
T> -	ub->st_size = so->so_rcv.sb_cc - so->so_rcv.sb_ctl;
T> -	SOCKBUF_UNLOCK(&so->so_rcv);
T> -	/* Unlocked read. */
T> -	if ((so->so_snd.sb_state & SBS_CANTSENDMORE) == 0)
T> +	ub->st_size = sbavail(sb) - sb->sb_ctl;
T> +	SOCKBUF_UNLOCK(sb);
T> +
T> +	sb = &so->so_snd;
T> +	SOCKBUF_LOCK(sb);
T> +	if ((sb->sb_state & SBS_CANTSENDMORE) == 0)
T>  		ub->st_mode |= S_IWUSR | S_IWGRP | S_IWOTH;
T> +	SOCKBUF_UNLOCK(sb);
T>  	ub->st_uid = so->so_cred->cr_uid;
T>  	ub->st_gid = so->so_cred->cr_gid;
T>  	return (*so->so_proto->pr_usrreqs->pru_sense)(so, ub);
T> Index: sys/kern/uipc_usrreq.c
T> ===================================================================
T> --- sys/kern/uipc_usrreq.c	(.../head)	(revision 266804)
T> +++ sys/kern/uipc_usrreq.c	(.../projects/sendfile)	(revision 266807)
T> @@ -790,11 +790,10 @@ uipc_rcvd(struct socket *so, int flags)
T>  	u_int mbcnt, sbcc;
T>  
T>  	unp = sotounpcb(so);
T> -	KASSERT(unp != NULL, ("uipc_rcvd: unp == NULL"));
T> +	KASSERT(unp != NULL, ("%s: unp == NULL", __func__));
T> +	KASSERT(so->so_type == SOCK_STREAM || so->so_type == SOCK_SEQPACKET,
T> +	    ("%s: socktype %d", __func__, so->so_type));
T>  
T> -	if (so->so_type != SOCK_STREAM && so->so_type != SOCK_SEQPACKET)
T> -		panic("uipc_rcvd socktype %d", so->so_type);
T> -
T>  	/*
T>  	 * Adjust backpressure on sender and wakeup any waiting to write.
T>  	 *
T> @@ -807,7 +806,7 @@ uipc_rcvd(struct socket *so, int flags)
T>  	 */
T>  	SOCKBUF_LOCK(&so->so_rcv);
T>  	mbcnt = so->so_rcv.sb_mbcnt;
T> -	sbcc = so->so_rcv.sb_cc;
T> +	sbcc = sbavail(&so->so_rcv);
T>  	SOCKBUF_UNLOCK(&so->so_rcv);
T>  	/*
T>  	 * There is a benign race condition at this point.  If we're planning to
T> @@ -843,7 +842,10 @@ uipc_send(struct socket *so, int flags, struct mbu
T>  	int error = 0;
T>  
T>  	unp = sotounpcb(so);
T> -	KASSERT(unp != NULL, ("uipc_send: unp == NULL"));
T> +	KASSERT(unp != NULL, ("%s: unp == NULL", __func__));
T> +	KASSERT(so->so_type == SOCK_STREAM || so->so_type == SOCK_DGRAM ||
T> +	    so->so_type == SOCK_SEQPACKET,
T> +	    ("%s: socktype %d", __func__, so->so_type));
T>  
T>  	if (flags & PRUS_OOB) {
T>  		error = EOPNOTSUPP;
T> @@ -994,7 +996,7 @@ uipc_send(struct socket *so, int flags, struct mbu
T>  		}
T>  
T>  		mbcnt = so2->so_rcv.sb_mbcnt;
T> -		sbcc = so2->so_rcv.sb_cc;
T> +		sbcc = sbavail(&so2->so_rcv);
T>  		sorwakeup_locked(so2);
T>  
T>  		/*
T> @@ -1011,9 +1013,6 @@ uipc_send(struct socket *so, int flags, struct mbu
T>  		UNP_PCB_UNLOCK(unp2);
T>  		m = NULL;
T>  		break;
T> -
T> -	default:
T> -		panic("uipc_send unknown socktype");
T>  	}
T>  
T>  	/*
T> Index: sys/kern/vfs_default.c
T> ===================================================================
T> --- sys/kern/vfs_default.c	(.../head)	(revision 266804)
T> +++ sys/kern/vfs_default.c	(.../projects/sendfile)	(revision 266807)
T> @@ -111,6 +111,7 @@ struct vop_vector default_vnodeops = {
T>  	.vop_close =		VOP_NULL,
T>  	.vop_fsync =		VOP_NULL,
T>  	.vop_getpages =		vop_stdgetpages,
T> +	.vop_getpages_async =	vop_stdgetpages_async,
T>  	.vop_getwritemount = 	vop_stdgetwritemount,
T>  	.vop_inactive =		VOP_NULL,
T>  	.vop_ioctl =		VOP_ENOTTY,
T> @@ -726,10 +727,19 @@ vop_stdgetpages(ap)
T>  {
T>  
T>  	return vnode_pager_generic_getpages(ap->a_vp, ap->a_m,
T> -	    ap->a_count, ap->a_reqpage);
T> +	    ap->a_count, ap->a_reqpage, NULL, NULL);
T>  }
T>  
T> +/* XXX Needs good comment and a manpage. */
T>  int
T> +vop_stdgetpages_async(struct vop_getpages_async_args *ap)
T> +{
T> +
T> +	return vnode_pager_generic_getpages(ap->a_vp, ap->a_m,
T> +	    ap->a_count, ap->a_reqpage, ap->a_vop_getpages_iodone, ap->a_arg);
T> +}
T> +
T> +int
T>  vop_stdkqfilter(struct vop_kqfilter_args *ap)
T>  {
T>  	return vfs_kqfilter(ap);
T> Index: sys/kern/uipc_socket.c
T> ===================================================================
T> --- sys/kern/uipc_socket.c	(.../head)	(revision 266804)
T> +++ sys/kern/uipc_socket.c	(.../projects/sendfile)	(revision 266807)
T> @@ -1459,12 +1459,12 @@ restart:
T>  	 *   2. MSG_DONTWAIT is not set
T>  	 */
T>  	if (m == NULL || (((flags & MSG_DONTWAIT) == 0 &&
T> -	    so->so_rcv.sb_cc < uio->uio_resid) &&
T> -	    so->so_rcv.sb_cc < so->so_rcv.sb_lowat &&
T> +	    sbavail(&so->so_rcv) < uio->uio_resid) &&
T> +	    sbavail(&so->so_rcv) < so->so_rcv.sb_lowat &&
T>  	    m->m_nextpkt == NULL && (pr->pr_flags & PR_ATOMIC) == 0)) {
T> -		KASSERT(m != NULL || !so->so_rcv.sb_cc,
T> -		    ("receive: m == %p so->so_rcv.sb_cc == %u",
T> -		    m, so->so_rcv.sb_cc));
T> +		KASSERT(m != NULL || !sbavail(&so->so_rcv),
T> +		    ("receive: m == %p sbavail == %u",
T> +		    m, sbavail(&so->so_rcv)));
T>  		if (so->so_error) {
T>  			if (m != NULL)
T>  				goto dontblock;
T> @@ -1746,9 +1746,7 @@ dontblock:
T>  						SOCKBUF_LOCK(&so->so_rcv);
T>  					}
T>  				}
T> -				m->m_data += len;
T> -				m->m_len -= len;
T> -				so->so_rcv.sb_cc -= len;
T> +				sbmtrim(&so->so_rcv, m, len);
T>  			}
T>  		}
T>  		SOCKBUF_LOCK_ASSERT(&so->so_rcv);
T> @@ -1913,7 +1911,7 @@ restart:
T>  
T>  	/* Abort if socket has reported problems. */
T>  	if (so->so_error) {
T> -		if (sb->sb_cc > 0)
T> +		if (sbavail(sb) > 0)
T>  			goto deliver;
T>  		if (oresid > uio->uio_resid)
T>  			goto out;
T> @@ -1925,7 +1923,7 @@ restart:
T>  
T>  	/* Door is closed.  Deliver what is left, if any. */
T>  	if (sb->sb_state & SBS_CANTRCVMORE) {
T> -		if (sb->sb_cc > 0)
T> +		if (sbavail(sb) > 0)
T>  			goto deliver;
T>  		else
T>  			goto out;
T> @@ -1932,7 +1930,7 @@ restart:
T>  	}
T>  
T>  	/* Socket buffer is empty and we shall not block. */
T> -	if (sb->sb_cc == 0 &&
T> +	if (sbavail(sb) == 0 &&
T>  	    ((so->so_state & SS_NBIO) || (flags & (MSG_DONTWAIT|MSG_NBIO)))) {
T>  		error = EAGAIN;
T>  		goto out;
T> @@ -1939,18 +1937,18 @@ restart:
T>  	}
T>  
T>  	/* Socket buffer got some data that we shall deliver now. */
T> -	if (sb->sb_cc > 0 && !(flags & MSG_WAITALL) &&
T> +	if (sbavail(sb) > 0 && !(flags & MSG_WAITALL) &&
T>  	    ((sb->sb_flags & SS_NBIO) ||
T>  	     (flags & (MSG_DONTWAIT|MSG_NBIO)) ||
T> -	     sb->sb_cc >= sb->sb_lowat ||
T> -	     sb->sb_cc >= uio->uio_resid ||
T> -	     sb->sb_cc >= sb->sb_hiwat) ) {
T> +	     sbavail(sb) >= sb->sb_lowat ||
T> +	     sbavail(sb) >= uio->uio_resid ||
T> +	     sbavail(sb) >= sb->sb_hiwat) ) {
T>  		goto deliver;
T>  	}
T>  
T>  	/* On MSG_WAITALL we must wait until all data or error arrives. */
T>  	if ((flags & MSG_WAITALL) &&
T> -	    (sb->sb_cc >= uio->uio_resid || sb->sb_cc >= sb->sb_hiwat))
T> +	    (sbavail(sb) >= uio->uio_resid || sbavail(sb) >= sb->sb_hiwat))
T>  		goto deliver;
T>  
T>  	/*
T> @@ -1964,7 +1962,7 @@ restart:
T>  
T>  deliver:
T>  	SOCKBUF_LOCK_ASSERT(&so->so_rcv);
T> -	KASSERT(sb->sb_cc > 0, ("%s: sockbuf empty", __func__));
T> +	KASSERT(sbavail(sb) > 0, ("%s: sockbuf empty", __func__));
T>  	KASSERT(sb->sb_mb != NULL, ("%s: sb_mb == NULL", __func__));
T>  
T>  	/* Statistics. */
T> @@ -1972,7 +1970,7 @@ deliver:
T>  		uio->uio_td->td_ru.ru_msgrcv++;
T>  
T>  	/* Fill uio until full or current end of socket buffer is reached. */
T> -	len = min(uio->uio_resid, sb->sb_cc);
T> +	len = min(uio->uio_resid, sbavail(sb));
T>  	if (mp0 != NULL) {
T>  		/* Dequeue as many mbufs as possible. */
T>  		if (!(flags & MSG_PEEK) && len >= sb->sb_mb->m_len) {
T> @@ -1983,6 +1981,8 @@ deliver:
T>  			for (m = sb->sb_mb;
T>  			     m != NULL && m->m_len <= len;
T>  			     m = m->m_next) {
T> +				KASSERT(!(m->m_flags & M_NOTAVAIL),
T> +				    ("%s: m %p not available", __func__, m));
T>  				len -= m->m_len;
T>  				uio->uio_resid -= m->m_len;
T>  				sbfree(sb, m);
T> @@ -2107,9 +2107,9 @@ soreceive_dgram(struct socket *so, struct sockaddr
T>  	 */
T>  	SOCKBUF_LOCK(&so->so_rcv);
T>  	while ((m = so->so_rcv.sb_mb) == NULL) {
T> -		KASSERT(so->so_rcv.sb_cc == 0,
T> -		    ("soreceive_dgram: sb_mb NULL but sb_cc %u",
T> -		    so->so_rcv.sb_cc));
T> +		KASSERT(sbavail(&so->so_rcv) == 0,
T> +		    ("soreceive_dgram: sb_mb NULL but sbavail %u",
T> +		    sbavail(&so->so_rcv)));
T>  		if (so->so_error) {
T>  			error = so->so_error;
T>  			so->so_error = 0;
T> @@ -3157,7 +3157,7 @@ filt_soread(struct knote *kn, long hint)
T>  	so = kn->kn_fp->f_data;
T>  	SOCKBUF_LOCK_ASSERT(&so->so_rcv);
T>  
T> -	kn->kn_data = so->so_rcv.sb_cc - so->so_rcv.sb_ctl;
T> +	kn->kn_data = sbavail(&so->so_rcv) - so->so_rcv.sb_ctl;
T>  	if (so->so_rcv.sb_state & SBS_CANTRCVMORE) {
T>  		kn->kn_flags |= EV_EOF;
T>  		kn->kn_fflags = so->so_error;
T> @@ -3167,7 +3167,7 @@ filt_soread(struct knote *kn, long hint)
T>  	else if (kn->kn_sfflags & NOTE_LOWAT)
T>  		return (kn->kn_data >= kn->kn_sdata);
T>  	else
T> -		return (so->so_rcv.sb_cc >= so->so_rcv.sb_lowat);
T> +		return (sbavail(&so->so_rcv) >= so->so_rcv.sb_lowat);
T>  }
T>  
T>  static void
T> @@ -3350,7 +3350,7 @@ soisdisconnected(struct socket *so)
T>  	sorwakeup_locked(so);
T>  	SOCKBUF_LOCK(&so->so_snd);
T>  	so->so_snd.sb_state |= SBS_CANTSENDMORE;
T> -	sbdrop_locked(&so->so_snd, so->so_snd.sb_cc);
T> +	sbdrop_locked(&so->so_snd, sbused(&so->so_snd));
T>  	sowwakeup_locked(so);
T>  	wakeup(&so->so_timeo);
T>  }
T> Index: sys/kern/vnode_if.src
T> ===================================================================
T> --- sys/kern/vnode_if.src	(.../head)	(revision 266804)
T> +++ sys/kern/vnode_if.src	(.../projects/sendfile)	(revision 266807)
T> @@ -477,6 +477,19 @@ vop_getpages {
T>  };
T>  
T>  
T> +%% getpages_async	vp	L L L
T> +
T> +vop_getpages_async {
T> +	IN struct vnode *vp;
T> +	IN vm_page_t *m;
T> +	IN int count;
T> +	IN int reqpage;
T> +	IN vm_ooffset_t offset;
T> +	IN void (*vop_getpages_iodone)(void *);
T> +	IN void *arg;
T> +};
T> +
T> +
T>  %% putpages	vp	L L L
T>  
T>  vop_putpages {
T> Index: sys/kern/uipc_sockbuf.c
T> ===================================================================
T> --- sys/kern/uipc_sockbuf.c	(.../head)	(revision 266804)
T> +++ sys/kern/uipc_sockbuf.c	(.../projects/sendfile)	(revision 266807)
T> @@ -68,7 +68,152 @@ static	u_long sb_efficiency = 8;	/* parameter for
T>  static struct mbuf	*sbcut_internal(struct sockbuf *sb, int len);
T>  static void	sbflush_internal(struct sockbuf *sb);
T>  
T> +static void
T> +sb_shift_nrdy(struct sockbuf *sb, struct mbuf *m)
T> +{
T> +
T> +	SOCKBUF_LOCK_ASSERT(sb);
T> +	KASSERT(m->m_flags & M_NOTREADY, ("%s: m %p !M_NOTREADY", __func__, m));
T> +
T> +	m = m->m_next;
T> +	while (m != NULL && !(m->m_flags & M_NOTREADY)) {
T> +		m->m_flags &= ~M_BLOCKED;
T> +		sb->sb_acc += m->m_len;
T> +		m = m->m_next;
T> +	}
T> +
T> +	sb->sb_fnrdy = m;
T> +}
T> +
T> +int
T> +sbready(struct sockbuf *sb, struct mbuf *m, int count)
T> +{
T> +	u_int blocker;
T> +
T> +	SOCKBUF_LOCK(sb);
T> +
T> +	if (sb->sb_state & SBS_CANTSENDMORE) {
T> +		SOCKBUF_UNLOCK(sb);
T> +		return (ENOTCONN);
T> +	}
T> +
T> +	KASSERT(sb->sb_fnrdy != NULL, ("%s: sb %p NULL fnrdy", __func__, sb));
T> +
T> +	blocker = (sb->sb_fnrdy == m) ? M_BLOCKED : 0;
T> +
T> +	for (int i = 0; i < count; i++, m = m->m_next) {
T> +		KASSERT(m->m_flags & M_NOTREADY,
T> +		    ("%s: m %p !M_NOTREADY", __func__, m));
T> +		m->m_flags &= ~(M_NOTREADY | blocker);
T> +		if (blocker)
T> +			sb->sb_acc += m->m_len;
T> +	}
T> +
T> +	if (!blocker) {
T> +		SOCKBUF_UNLOCK(sb);
T> +		return (EWOULDBLOCK);
T> +	}
T> +
T> +	/* This one was blocking all the queue. */
T> +	for (; m && (m->m_flags & M_NOTREADY) == 0; m = m->m_next) {
T> +		KASSERT(m->m_flags & M_BLOCKED,
T> +		    ("%s: m %p !M_BLOCKED", __func__, m));
T> +		m->m_flags &= ~M_BLOCKED;
T> +		sb->sb_acc += m->m_len;
T> +	}
T> +
T> +	sb->sb_fnrdy = m;
T> +
T> +	SOCKBUF_UNLOCK(sb);
T> +
T> +	return (0);
T> +}
T> +
T>  /*
T> + * Adjust sockbuf state reflecting allocation of m.
T> + */
T> +void
T> +sballoc(struct sockbuf *sb, struct mbuf *m)
T> +{
T> +
T> +	SOCKBUF_LOCK_ASSERT(sb);
T> +
T> +	sb->sb_ccc += m->m_len;
T> +
T> +	if (sb->sb_fnrdy == NULL) {
T> +		if (m->m_flags & M_NOTREADY)
T> +			sb->sb_fnrdy = m;
T> +		else
T> +			sb->sb_acc += m->m_len;
T> +	} else
T> +		m->m_flags |= M_BLOCKED;
T> +
T> +	if (m->m_type != MT_DATA && m->m_type != MT_OOBDATA)
T> +		sb->sb_ctl += m->m_len;
T> +
T> +	sb->sb_mbcnt += MSIZE;
T> +	sb->sb_mcnt += 1;
T> +
T> +	if (m->m_flags & M_EXT) {
T> +		sb->sb_mbcnt += m->m_ext.ext_size;
T> +		sb->sb_ccnt += 1;
T> +	}
T> +}
T> +
T> +/*
T> + * Adjust sockbuf state reflecting freeing of m.
T> + */
T> +void
T> +sbfree(struct sockbuf *sb, struct mbuf *m)
T> +{
T> +
T> +#if 0	/* XXX: not yet: soclose() call path comes here w/o lock. */
T> +	SOCKBUF_LOCK_ASSERT(sb);
T> +#endif
T> +
T> +	sb->sb_ccc -= m->m_len;
T> +
T> +	if (!(m->m_flags & M_NOTAVAIL))
T> +		sb->sb_acc -= m->m_len;
T> +
T> +	if (sb->sb_fnrdy == m)
T> +		sb_shift_nrdy(sb, m);
T> +
T> +	if (m->m_type != MT_DATA && m->m_type != MT_OOBDATA)
T> +		sb->sb_ctl -= m->m_len;
T> +
T> +	sb->sb_mbcnt -= MSIZE;
T> +	sb->sb_mcnt -= 1;
T> +	if (m->m_flags & M_EXT) {
T> +		sb->sb_mbcnt -= m->m_ext.ext_size;
T> +		sb->sb_ccnt -= 1;
T> +	}
T> +
T> +	if (sb->sb_sndptr == m) {
T> +		sb->sb_sndptr = NULL;
T> +		sb->sb_sndptroff = 0;
T> +	}
T> +	if (sb->sb_sndptroff != 0)
T> +		sb->sb_sndptroff -= m->m_len;
T> +}
T> +
T> +/*
T> + * Trim some amount of data from (first?) mbuf in buffer.
T> + */
T> +void
T> +sbmtrim(struct sockbuf *sb, struct mbuf *m, int len)
T> +{
T> +
T> +	SOCKBUF_LOCK_ASSERT(sb);
T> +	KASSERT(len < m->m_len, ("%s: m %p len %d", __func__, m, len));
T> +
T> +	m->m_data += len;
T> +	m->m_len -= len;
T> +	sb->sb_acc -= len;
T> +	sb->sb_ccc -= len;
T> +}
T> +
T> +/*
T>   * Socantsendmore indicates that no more data will be sent on the socket; it
T>   * would normally be applied to a socket when the user informs the system
T>   * that no more data is to be sent, by the protocol code (in case
T> @@ -127,7 +272,7 @@ sbwait(struct sockbuf *sb)
T>  	SOCKBUF_LOCK_ASSERT(sb);
T>  
T>  	sb->sb_flags |= SB_WAIT;
T> -	return (msleep_sbt(&sb->sb_cc, &sb->sb_mtx,
T> +	return (msleep_sbt(&sb->sb_acc, &sb->sb_mtx,
T>  	    (sb->sb_flags & SB_NOINTR) ? PSOCK : PSOCK | PCATCH, "sbwait",
T>  	    sb->sb_timeo, 0, 0));
T>  }
T> @@ -184,7 +329,7 @@ sowakeup(struct socket *so, struct sockbuf *sb)
T>  		sb->sb_flags &= ~SB_SEL;
T>  	if (sb->sb_flags & SB_WAIT) {
T>  		sb->sb_flags &= ~SB_WAIT;
T> -		wakeup(&sb->sb_cc);
T> +		wakeup(&sb->sb_acc);
T>  	}
T>  	KNOTE_LOCKED(&sb->sb_sel.si_note, 0);
T>  	if (sb->sb_upcall != NULL) {
T> @@ -519,7 +664,7 @@ sbappend(struct sockbuf *sb, struct mbuf *m)
T>   * that is, a stream protocol (such as TCP).
T>   */
T>  void
T> -sbappendstream_locked(struct sockbuf *sb, struct mbuf *m)
T> +sbappendstream_locked(struct sockbuf *sb, struct mbuf *m, int flags)
T>  {
T>  	SOCKBUF_LOCK_ASSERT(sb);
T>  
T> @@ -529,8 +674,8 @@ void
T>  	SBLASTMBUFCHK(sb);
T>  
T>  	/* Remove all packet headers and mbuf tags to get a pure data chain. */
T> -	m_demote(m, 1);
T> -	
T> +	m_demote(m, 1, flags & PRUS_NOTREADY ? M_NOTREADY : 0);
T> +
T>  	sbcompress(sb, m, sb->sb_mbtail);
T>  
T>  	sb->sb_lastrecord = sb->sb_mb;
T> @@ -543,38 +688,59 @@ void
T>   * that is, a stream protocol (such as TCP).
T>   */
T>  void
T> -sbappendstream(struct sockbuf *sb, struct mbuf *m)
T> +sbappendstream(struct sockbuf *sb, struct mbuf *m, int flags)
T>  {
T>  
T>  	SOCKBUF_LOCK(sb);
T> -	sbappendstream_locked(sb, m);
T> +	sbappendstream_locked(sb, m, flags);
T>  	SOCKBUF_UNLOCK(sb);
T>  }
T>  
T>  #ifdef SOCKBUF_DEBUG
T>  void
T> -sbcheck(struct sockbuf *sb)
T> +sbcheck(struct sockbuf *sb, const char *file, int line)
T>  {
T> -	struct mbuf *m;
T> -	struct mbuf *n = 0;
T> -	u_long len = 0, mbcnt = 0;
T> +	struct mbuf *m, *n, *fnrdy;
T> +	u_long acc, ccc, mbcnt;
T>  
T>  	SOCKBUF_LOCK_ASSERT(sb);
T>  
T> +	acc = ccc = mbcnt = 0;
T> +	fnrdy = NULL;
T> +
T>  	for (m = sb->sb_mb; m; m = n) {
T>  	    n = m->m_nextpkt;
T>  	    for (; m; m = m->m_next) {
T> -		len += m->m_len;
T> +		if ((m->m_flags & M_NOTREADY) && fnrdy == NULL) {
T> +			if (m != sb->sb_fnrdy) {
T> +				printf("sb %p: fnrdy %p != m %p\n",
T> +				    sb, sb->sb_fnrdy, m);
T> +				goto fail;
T> +			}
T> +			fnrdy = m;
T> +		}
T> +		if (fnrdy) {
T> +			if (!(m->m_flags & M_NOTAVAIL)) {
T> +				printf("sb %p: fnrdy %p, m %p is avail\n",
T> +				    sb, sb->sb_fnrdy, m);
T> +				goto fail;
T> +			}
T> +		} else
T> +			acc += m->m_len;
T> +		ccc += m->m_len;
T>  		mbcnt += MSIZE;
T>  		if (m->m_flags & M_EXT) /*XXX*/ /* pretty sure this is bogus */
T>  			mbcnt += m->m_ext.ext_size;
T>  	    }
T>  	}
T> -	if (len != sb->sb_cc || mbcnt != sb->sb_mbcnt) {
T> -		printf("cc %ld != %u || mbcnt %ld != %u\n", len, sb->sb_cc,
T> -		    mbcnt, sb->sb_mbcnt);
T> -		panic("sbcheck");
T> +	if (acc != sb->sb_acc || ccc != sb->sb_ccc || mbcnt != sb->sb_mbcnt) {
T> +		printf("acc %ld/%u ccc %ld/%u mbcnt %ld/%u\n",
T> +		    acc, sb->sb_acc, ccc, sb->sb_ccc, mbcnt, sb->sb_mbcnt);
T> +		goto fail;
T>  	}
T> +	return;
T> +fail:
T> +	panic("%s from %s:%u", __func__, file, line);
T>  }
T>  #endif
T>  
T> @@ -800,6 +966,7 @@ sbcompress(struct sockbuf *sb, struct mbuf *m, str
T>  		if (n && (n->m_flags & M_EOR) == 0 &&
T>  		    M_WRITABLE(n) &&
T>  		    ((sb->sb_flags & SB_NOCOALESCE) == 0) &&
T> +		    !(m->m_flags & M_NOTREADY) &&
T>  		    m->m_len <= MCLBYTES / 4 && /* XXX: Don't copy too much */
T>  		    m->m_len <= M_TRAILINGSPACE(n) &&
T>  		    n->m_type == m->m_type) {
T> @@ -806,7 +973,9 @@ sbcompress(struct sockbuf *sb, struct mbuf *m, str
T>  			bcopy(mtod(m, caddr_t), mtod(n, caddr_t) + n->m_len,
T>  			    (unsigned)m->m_len);
T>  			n->m_len += m->m_len;
T> -			sb->sb_cc += m->m_len;
T> +			sb->sb_ccc += m->m_len;
T> +			if (sb->sb_fnrdy == NULL)
T> +				sb->sb_acc += m->m_len;
T>  			if (m->m_type != MT_DATA && m->m_type != MT_OOBDATA)
T>  				/* XXX: Probably don't need.*/
T>  				sb->sb_ctl += m->m_len;
T> @@ -843,13 +1012,13 @@ sbflush_internal(struct sockbuf *sb)
T>  		 * Don't call sbcut(sb, 0) if the leading mbuf is non-empty:
T>  		 * we would loop forever. Panic instead.
T>  		 */
T> -		if (!sb->sb_cc && (sb->sb_mb == NULL || sb->sb_mb->m_len))
T> +		if (sb->sb_ccc == 0 && (sb->sb_mb == NULL || sb->sb_mb->m_len))
T>  			break;
T> -		m_freem(sbcut_internal(sb, (int)sb->sb_cc));
T> +		m_freem(sbcut_internal(sb, (int)sb->sb_ccc));
T>  	}
T> -	if (sb->sb_cc || sb->sb_mb || sb->sb_mbcnt)
T> -		panic("sbflush_internal: cc %u || mb %p || mbcnt %u",
T> -		    sb->sb_cc, (void *)sb->sb_mb, sb->sb_mbcnt);
T> +	KASSERT(sb->sb_ccc == 0 && sb->sb_mb == 0 && sb->sb_mbcnt == 0,
T> +	    ("%s: ccc %u mb %p mbcnt %u", __func__,
T> +	    sb->sb_ccc, (void *)sb->sb_mb, sb->sb_mbcnt));
T>  }
T>  
T>  void
T> @@ -891,7 +1060,9 @@ sbcut_internal(struct sockbuf *sb, int len)
T>  		if (m->m_len > len) {
T>  			m->m_len -= len;
T>  			m->m_data += len;
T> -			sb->sb_cc -= len;
T> +			sb->sb_ccc -= len;
T> +			if (!(m->m_flags & M_NOTAVAIL))
T> +				sb->sb_acc -= len;
T>  			if (sb->sb_sndptroff != 0)
T>  				sb->sb_sndptroff -= len;
T>  			if (m->m_type != MT_DATA && m->m_type != MT_OOBDATA)
T> @@ -977,8 +1148,8 @@ sbsndptr(struct sockbuf *sb, u_int off, u_int len,
T>  	struct mbuf *m, *ret;
T>  
T>  	KASSERT(sb->sb_mb != NULL, ("%s: sb_mb is NULL", __func__));
T> -	KASSERT(off + len <= sb->sb_cc, ("%s: beyond sb", __func__));
T> -	KASSERT(sb->sb_sndptroff <= sb->sb_cc, ("%s: sndptroff broken", __func__));
T> +	KASSERT(off + len <= sb->sb_acc, ("%s: beyond sb", __func__));
T> +	KASSERT(sb->sb_sndptroff <= sb->sb_acc, ("%s: sndptroff broken", __func__));
T>  
T>  	/*
T>  	 * Is off below stored offset? Happens on retransmits.
T> @@ -1091,7 +1262,7 @@ void
T>  sbtoxsockbuf(struct sockbuf *sb, struct xsockbuf *xsb)
T>  {
T>  
T> -	xsb->sb_cc = sb->sb_cc;
T> +	xsb->sb_cc = sb->sb_ccc;
T>  	xsb->sb_hiwat = sb->sb_hiwat;
T>  	xsb->sb_mbcnt = sb->sb_mbcnt;
T>  	xsb->sb_mcnt = sb->sb_mcnt;	
T> Index: sys/kern/uipc_syscalls.c
T> ===================================================================
T> --- sys/kern/uipc_syscalls.c	(.../head)	(revision 266804)
T> +++ sys/kern/uipc_syscalls.c	(.../projects/sendfile)	(revision 266807)
T> @@ -132,9 +132,10 @@ static int	filt_sfsync(struct knote *kn, long hint
T>   */
T>  static SYSCTL_NODE(_kern_ipc, OID_AUTO, sendfile, CTLFLAG_RW, 0,
T>      "sendfile(2) tunables");
T> -static int sfreadahead = 1;
T> +
T> +static int sfreadahead = 0;
T>  SYSCTL_INT(_kern_ipc_sendfile, OID_AUTO, readahead, CTLFLAG_RW,
T> -    &sfreadahead, 0, "Number of sendfile(2) read-ahead MAXBSIZE blocks");
T> +    &sfreadahead, 0, "Read this more pages than socket buffer can accept");
T>  
T>  #ifdef	SFSYNC_DEBUG
T>  static int sf_sync_debug = 0;
T> @@ -1988,7 +1989,7 @@ filt_sfsync(struct knote *kn, long hint)
T>   * Detach mapped page and release resources back to the system.
T>   */
T>  int
T> -sf_buf_mext(struct mbuf *mb, void *addr, void *args)
T> +sf_mext_free(struct mbuf *mb, void *addr, void *args)
T>  {
T>  	vm_page_t m;
T>  	struct sendfile_sync *sfs;
T> @@ -2009,13 +2010,42 @@ int
T>  		sfs = addr;
T>  		sf_sync_deref(sfs);
T>  	}
T> -	/*
T> -	 * sfs may be invalid at this point, don't use it!
T> -	 */
T>  	return (EXT_FREE_OK);
T>  }
T>  
T>  /*
T> + * Same as above, but forces the page to be detached from the object
T> + * and go into free pool.
T> + */
T> +static int
T> +sf_mext_free_nocache(struct mbuf *mb, void *addr, void *args)
T> +{
T> +	vm_page_t m;
T> +	struct sendfile_sync *sfs;
T> +
T> +	m = sf_buf_page(args);
T> +	sf_buf_free(args);
T> +	vm_page_lock(m);
T> +	vm_page_unwire(m, 0);
T> +	if (m->wire_count == 0) {
T> +		vm_object_t obj;
T> +
T> +		if ((obj = m->object) == NULL)
T> +			vm_page_free(m);
T> +		else if (!vm_page_xbusied(m) && VM_OBJECT_TRYWLOCK(obj)) {
T> +			vm_page_free(m);
T> +			VM_OBJECT_WUNLOCK(obj);
T> +		}
T> +	}
T> +	vm_page_unlock(m);
T> +	if (addr != NULL) {
T> +		sfs = addr;
T> +		sf_sync_deref(sfs);
T> +	}
T> +	return (EXT_FREE_OK);
T> +}
T> +
T> +/*
T>   * Called to remove a reference to a sf_sync object.
T>   *
T>   * This is generally done during the mbuf free path to signify
T> @@ -2608,106 +2638,181 @@ freebsd4_sendfile(struct thread *td, struct freebs
T>  }
T>  #endif /* COMPAT_FREEBSD4 */
T>  
T> + /*
T> +  * How much data to put into page i of n.
T> +  * Only first and last pages are special.
T> +  */
T> +static inline off_t
T> +xfsize(int i, int n, off_t off, off_t len)
T> +{
T> +
T> +	if (i == 0)
T> +		return (omin(PAGE_SIZE - (off & PAGE_MASK), len));
T> +
T> +	if (i == n - 1 && ((off + len) & PAGE_MASK) > 0)
T> +		return ((off + len) & PAGE_MASK);
T> +
T> +	return (PAGE_SIZE);
T> +}
T> +
T> +/*
T> + * Offset within object for i page.
T> + */
T> +static inline vm_offset_t
T> +vmoff(int i, off_t off)
T> +{
T> +
T> +	if (i == 0)
T> +		return ((vm_offset_t)off);
T> +
T> +	return (trunc_page(off + i * PAGE_SIZE));
T> +}
T> +
T> +/*
T> + * Pretend as if we don't have enough space, subtract xfsize() of
T> + * all pages that failed.
T> + */
T> +static inline void
T> +fixspace(int old, int new, off_t off, int *space)
T> +{
T> +
T> +	KASSERT(old > new, ("%s: old %d new %d", __func__, old, new));
T> +
T> +	/* Subtract last one. */
T> +	*space -= xfsize(old - 1, old, off, *space);
T> +	old--;
T> +
T> +	if (new == old)
T> +		/* There was only one page. */
T> +		return;
T> +
T> +	/* Subtract first one. */
T> +	if (new == 0) {
T> +		*space -= xfsize(0, old, off, *space);
T> +		new++;
T> +	}
T> +
T> +	/* Rest of pages are full sized. */
T> +	*space -= (old - new) * PAGE_SIZE;
T> +
T> +	KASSERT(*space >= 0, ("%s: space went backwards", __func__));
T> +}
T> +
T> +struct sf_io {
T> +	u_int		nios;
T> +	int		npages;
T> +	struct file	*sock_fp;
T> +	struct mbuf	*m;
T> +	vm_page_t	pa[];
T> +};
T> +
T> +static void
T> +sf_io_done(void *arg)
T> +{
T> +	struct sf_io *sfio = arg;
T> +	struct socket *so;
T> +
T> +	if (!refcount_release(&sfio->nios))
T> +		return;
T> +
T> +	so  = sfio->sock_fp->f_data;
T> +
T> +	if (sbready(&so->so_snd, sfio->m, sfio->npages) == 0) {
T> +		struct mbuf *m;
T> +
T> +		m = m_get(M_NOWAIT, MT_DATA);
T> +		if (m == NULL) {
T> +			panic("XXXGL");
T> +		}
T> +		m->m_len = 0;
T> +		CURVNET_SET(so->so_vnet);
T> +		/* XXXGL: curthread */
T> +		(void )(so->so_proto->pr_usrreqs->pru_send)
T> +		    (so, 0, m, NULL, NULL, curthread);
T> +		CURVNET_RESTORE();
T> +	}
T> +
T> +	/* XXXGL: curthread */
T> +	fdrop(sfio->sock_fp, curthread);
T> +	free(sfio, M_TEMP);
T> +}
T> +
T>  static int
T> -sendfile_readpage(vm_object_t obj, struct vnode *vp, int nd,
T> -    off_t off, int xfsize, int bsize, struct thread *td, vm_page_t *res)
T> +sendfile_swapin(vm_object_t obj, struct sf_io *sfio, off_t off, off_t len,
T> +    int npages, int rhpages)
T>  {
T> -	vm_page_t m;
T> -	vm_pindex_t pindex;
T> -	ssize_t resid;
T> -	int error, readahead, rv;
T> +	vm_page_t *pa = sfio->pa;
T> +	int nios;
T>  
T> -	pindex = OFF_TO_IDX(off);
T> +	nios = 0;
T>  	VM_OBJECT_WLOCK(obj);
T> -	m = vm_page_grab(obj, pindex, (vp != NULL ? VM_ALLOC_NOBUSY |
T> -	    VM_ALLOC_IGN_SBUSY : 0) | VM_ALLOC_WIRED | VM_ALLOC_NORMAL);
T> +	for (int i = 0; i < npages; i++)
T> +		pa[i] = vm_page_grab(obj, OFF_TO_IDX(vmoff(i, off)),
T> +		    VM_ALLOC_WIRED | VM_ALLOC_NORMAL);
T>  
T> -	/*
T> -	 * Check if page is valid for what we need, otherwise initiate I/O.
T> -	 *
T> -	 * The non-zero nd argument prevents disk I/O, instead we
T> -	 * return the caller what he specified in nd.  In particular,
T> -	 * if we already turned some pages into mbufs, nd == EAGAIN
T> -	 * and the main function send them the pages before we come
T> -	 * here again and block.
T> -	 */
T> -	if (m->valid != 0 && vm_page_is_valid(m, off & PAGE_MASK, xfsize)) {
T> -		if (vp == NULL)
T> -			vm_page_xunbusy(m);
T> -		VM_OBJECT_WUNLOCK(obj);
T> -		*res = m;
T> -		return (0);
T> -	} else if (nd != 0) {
T> -		if (vp == NULL)
T> -			vm_page_xunbusy(m);
T> -		error = nd;
T> -		goto free_page;
T> -	}
T> +	for (int i = 0; i < npages;) {
T> +		int j, a, count, rv;
T>  
T> -	/*
T> -	 * Get the page from backing store.
T> -	 */
T> -	error = 0;
T> -	if (vp != NULL) {
T> -		VM_OBJECT_WUNLOCK(obj);
T> -		readahead = sfreadahead * MAXBSIZE;
T> +		if (vm_page_is_valid(pa[i], vmoff(i, off) & PAGE_MASK,
T> +		    xfsize(i, npages, off, len))) {
T> +			vm_page_xunbusy(pa[i]);
T> +			i++;
T> +			continue;
T> +		}
T>  
T> -		/*
T> -		 * Use vn_rdwr() instead of the pager interface for
T> -		 * the vnode, to allow the read-ahead.
T> -		 *
T> -		 * XXXMAC: Because we don't have fp->f_cred here, we
T> -		 * pass in NOCRED.  This is probably wrong, but is
T> -		 * consistent with our original implementation.
T> -		 */
T> -		error = vn_rdwr(UIO_READ, vp, NULL, readahead, trunc_page(off),
T> -		    UIO_NOCOPY, IO_NODELOCKED | IO_VMIO | ((readahead /
T> -		    bsize) << IO_SEQSHIFT), td->td_ucred, NOCRED, &resid, td);
T> -		SFSTAT_INC(sf_iocnt);
T> -		VM_OBJECT_WLOCK(obj);
T> -	} else {
T> -		if (vm_pager_has_page(obj, pindex, NULL, NULL)) {
T> -			rv = vm_pager_get_pages(obj, &m, 1, 0);
T> -			SFSTAT_INC(sf_iocnt);
T> -			m = vm_page_lookup(obj, pindex);
T> -			if (m == NULL)
T> -				error = EIO;
T> -			else if (rv != VM_PAGER_OK) {
T> -				vm_page_lock(m);
T> -				vm_page_free(m);
T> -				vm_page_unlock(m);
T> -				m = NULL;
T> -				error = EIO;
T> +		for (j = i + 1; j < npages; j++)
T> +			if (vm_page_is_valid(pa[j], vmoff(j, off) & PAGE_MASK,
T> +			    xfsize(j, npages, off, len)))
T> +				break;
T> +
T> +		while (!vm_pager_has_page(obj, OFF_TO_IDX(vmoff(i, off)),
T> +		    NULL, &a) && i < j) {
T> +			pmap_zero_page(pa[i]);
T> +			pa[i]->valid = VM_PAGE_BITS_ALL;
T> +			pa[i]->dirty = 0;
T> +			vm_page_xunbusy(pa[i]);
T> +			i++;
T> +		}
T> +		if (i == j)
T> +			continue;
T> +
T> +		count = min(a + 1, npages + rhpages - i);
T> +		for (j = npages; j < i + count; j++) {
T> +			pa[j] = vm_page_grab(obj, OFF_TO_IDX(vmoff(j, off)),
T> +			    VM_ALLOC_NORMAL | VM_ALLOC_NOWAIT);
T> +			if (pa[j] == NULL) {
T> +				count = j - i;
T> +				break;
T>  			}
T> -		} else {
T> -			pmap_zero_page(m);
T> -			m->valid = VM_PAGE_BITS_ALL;
T> -			m->dirty = 0;
T> +			if (pa[j]->valid) {
T> +				vm_page_xunbusy(pa[j]);
T> +				count = j - i;
T> +				break;
T> +			}
T>  		}
T> -		if (m != NULL)
T> -			vm_page_xunbusy(m);
T> +
T> +		refcount_acquire(&sfio->nios);
T> +		rv = vm_pager_get_pages_async(obj, pa + i, count, 0,
T> +		    &sf_io_done, sfio);
T> +
T> +		KASSERT(rv == VM_PAGER_OK, ("%s: pager fail obj %p page %p",
T> +		    __func__, obj, pa[i]));
T> +
T> +		SFSTAT_INC(sf_iocnt);
T> +		nios++;
T> +
T> +		for (j = i; j < i + count && j < npages; j++)
T> +			KASSERT(pa[j] == vm_page_lookup(obj,
T> +			    OFF_TO_IDX(vmoff(j, off))),
T> +			    ("pa[j] %p lookup %p\n", pa[j],
T> +			    vm_page_lookup(obj, OFF_TO_IDX(vmoff(j, off)))));
T> +
T> +		i += count;
T>  	}
T> -	if (error == 0) {
T> -		*res = m;
T> -	} else if (m != NULL) {
T> -free_page:
T> -		vm_page_lock(m);
T> -		vm_page_unwire(m, 0);
T>  
T> -		/*
T> -		 * See if anyone else might know about this page.  If
T> -		 * not and it is not valid, then free it.
T> -		 */
T> -		if (m->wire_count == 0 && m->valid == 0 && !vm_page_busied(m))
T> -			vm_page_free(m);
T> -		vm_page_unlock(m);
T> -	}
T> -	KASSERT(error != 0 || (m->wire_count > 0 &&
T> -	    vm_page_is_valid(m, off & PAGE_MASK, xfsize)),
T> -	    ("wrong page state m %p off %#jx xfsize %d", m, (uintmax_t)off,
T> -	    xfsize));
T>  	VM_OBJECT_WUNLOCK(obj);
T> -	return (error);
T> +
T> +	return (nios);
T>  }
T>  
T>  static int
T> @@ -2814,41 +2919,26 @@ vn_sendfile(struct file *fp, int sockfd, struct ui
T>  	struct vnode *vp;
T>  	struct vm_object *obj;
T>  	struct socket *so;
T> -	struct mbuf *m;
T> +	struct mbuf *m, *mh, *mhtail;
T>  	struct sf_buf *sf;
T> -	struct vm_page *pg;
T>  	struct shmfd *shmfd;
T>  	struct vattr va;
T> -	off_t off, xfsize, fsbytes, sbytes, rem, obj_size;
T> -	int error, bsize, nd, hdrlen, mnw;
T> +	off_t off, sbytes, rem, obj_size;
T> +	int error, serror, bsize, hdrlen;
T>  
T> -	pg = NULL;
T>  	obj = NULL;
T>  	so = NULL;
T> -	m = NULL;
T> -	fsbytes = sbytes = 0;
T> -	hdrlen = mnw = 0;
T> -	rem = nbytes;
T> -	obj_size = 0;
T> +	m = mh = NULL;
T> +	sbytes = 0;
T>  
T>  	error = sendfile_getobj(td, fp, &obj, &vp, &shmfd, &obj_size, &bsize);
T>  	if (error != 0)
T>  		return (error);
T> -	if (rem == 0)
T> -		rem = obj_size;
T>  
T>  	error = kern_sendfile_getsock(td, sockfd, &sock_fp, &so);
T>  	if (error != 0)
T>  		goto out;
T>  
T> -	/*
T> -	 * Do not wait on memory allocations but return ENOMEM for
T> -	 * caller to retry later.
T> -	 * XXX: Experimental.
T> -	 */
T> -	if (flags & SF_MNOWAIT)
T> -		mnw = 1;
T> -
T>  #ifdef MAC
T>  	error = mac_socket_check_send(td->td_ucred, so);
T>  	if (error != 0)
T> @@ -2856,31 +2946,27 @@ vn_sendfile(struct file *fp, int sockfd, struct ui
T>  #endif
T>  
T>  	/* If headers are specified copy them into mbufs. */
T> -	if (hdr_uio != NULL) {
T> +	if (hdr_uio != NULL && hdr_uio->uio_resid > 0) {
T>  		hdr_uio->uio_td = td;
T>  		hdr_uio->uio_rw = UIO_WRITE;
T> -		if (hdr_uio->uio_resid > 0) {
T> -			/*
T> -			 * In FBSD < 5.0 the nbytes to send also included
T> -			 * the header.  If compat is specified subtract the
T> -			 * header size from nbytes.
T> -			 */
T> -			if (kflags & SFK_COMPAT) {
T> -				if (nbytes > hdr_uio->uio_resid)
T> -					nbytes -= hdr_uio->uio_resid;
T> -				else
T> -					nbytes = 0;
T> -			}
T> -			m = m_uiotombuf(hdr_uio, (mnw ? M_NOWAIT : M_WAITOK),
T> -			    0, 0, 0);
T> -			if (m == NULL) {
T> -				error = mnw ? EAGAIN : ENOBUFS;
T> -				goto out;
T> -			}
T> -			hdrlen = m_length(m, NULL);
T> +		/*
T> +		 * In FBSD < 5.0 the nbytes to send also included
T> +		 * the header.  If compat is specified subtract the
T> +		 * header size from nbytes.
T> +		 */
T> +		if (kflags & SFK_COMPAT) {
T> +			if (nbytes > hdr_uio->uio_resid)
T> +				nbytes -= hdr_uio->uio_resid;
T> +			else
T> +				nbytes = 0;
T>  		}
T> -	}
T> +		mh = m_uiotombuf(hdr_uio, M_WAITOK, 0, 0, 0);
T> +		hdrlen = m_length(mh, &mhtail);
T> +	} else
T> +		hdrlen = 0;
T>  
T> +	rem = nbytes ? omin(nbytes, obj_size - offset) : obj_size - offset;
T> +
T>  	/*
T>  	 * Protect against multiple writers to the socket.
T>  	 *
T> @@ -2900,21 +2986,13 @@ vn_sendfile(struct file *fp, int sockfd, struct ui
T>  	 * The outer loop checks the state and available space of the socket
T>  	 * and takes care of the overall progress.
T>  	 */
T> -	for (off = offset; ; ) {
T> +	for (off = offset; rem > 0; ) {
T> +		struct sf_io *sfio;
T> +		vm_page_t *pa;
T>  		struct mbuf *mtail;
T> -		int loopbytes;
T> -		int space;
T> -		int done;
T> +		int nios, space, npages, rhpages;
T>  
T> -		if ((nbytes != 0 && nbytes == fsbytes) ||
T> -		    (nbytes == 0 && obj_size == fsbytes))
T> -			break;
T> -
T>  		mtail = NULL;
T> -		loopbytes = 0;
T> -		space = 0;
T> -		done = 0;
T> -
T>  		/*
T>  		 * Check the socket state for ongoing connection,
T>  		 * no errors and space in socket buffer.
T> @@ -2990,53 +3068,44 @@ retry_space:
T>  				VOP_UNLOCK(vp, 0);
T>  				goto done;
T>  			}
T> -			obj_size = va.va_size;
T> +			if (va.va_size != obj_size) {
T> +				if (nbytes == 0)
T> +					rem += va.va_size - obj_size;
T> +				else if (offset + nbytes > va.va_size)
T> +					rem -= (offset + nbytes - va.va_size);
T> +				obj_size = va.va_size;
T> +			}
T>  		}
T>  
T> +		if (space > rem)
T> +			space = rem;
T> +
T> +		if (off & PAGE_MASK)
T> +			npages = 1 + howmany(space -
T> +			    (PAGE_SIZE - (off & PAGE_MASK)), PAGE_SIZE);
T> +		else
T> +			npages = howmany(space, PAGE_SIZE);
T> +
T> +		rhpages = SF_READAHEAD(flags) ?
T> +		    SF_READAHEAD(flags) : sfreadahead;
T> +		rhpages = min(howmany(obj_size - (off & ~PAGE_MASK) -
T> +		    (npages * PAGE_SIZE), PAGE_SIZE), rhpages);
T> +
T> +		sfio = malloc(sizeof(struct sf_io) +
T> +		    (rhpages + npages) * sizeof(vm_page_t), M_TEMP, M_WAITOK);
T> +		refcount_init(&sfio->nios, 1);
T> +
T> +		nios = sendfile_swapin(obj, sfio, off, space, npages, rhpages);
T> +
T>  		/*
T>  		 * Loop and construct maximum sized mbuf chain to be bulk
T>  		 * dumped into socket buffer.
T>  		 */
T> -		while (space > loopbytes) {
T> -			vm_offset_t pgoff;
T> +		pa = sfio->pa;
T> +		for (int i = 0; i < npages; i++) {
T>  			struct mbuf *m0;
T>  
T>  			/*
T> -			 * Calculate the amount to transfer.
T> -			 * Not to exceed a page, the EOF,
T> -			 * or the passed in nbytes.
T> -			 */
T> -			pgoff = (vm_offset_t)(off & PAGE_MASK);
T> -			rem = obj_size - offset;
T> -			if (nbytes != 0)
T> -				rem = omin(rem, nbytes);
T> -			rem -= fsbytes + loopbytes;
T> -			xfsize = omin(PAGE_SIZE - pgoff, rem);
T> -			xfsize = omin(space - loopbytes, xfsize);
T> -			if (xfsize <= 0) {
T> -				done = 1;		/* all data sent */
T> -				break;
T> -			}
T> -
T> -			/*
T> -			 * Attempt to look up the page.  Allocate
T> -			 * if not found or wait and loop if busy.
T> -			 */
T> -			if (m != NULL)
T> -				nd = EAGAIN; /* send what we already got */
T> -			else if ((flags & SF_NODISKIO) != 0)
T> -				nd = EBUSY;
T> -			else
T> -				nd = 0;
T> -			error = sendfile_readpage(obj, vp, nd, off,
T> -			    xfsize, bsize, td, &pg);
T> -			if (error != 0) {
T> -				if (error == EAGAIN)
T> -					error = 0;	/* not a real error */
T> -				break;
T> -			}
T> -
T> -			/*
T>  			 * Get a sendfile buf.  When allocating the
T>  			 * first buffer for mbuf chain, we usually
T>  			 * wait as long as necessary, but this wait
T> @@ -3045,17 +3114,18 @@ retry_space:
T>  			 * threads might exhaust the buffers and then
T>  			 * deadlock.
T>  			 */
T> -			sf = sf_buf_alloc(pg, (mnw || m != NULL) ? SFB_NOWAIT :
T> -			    SFB_CATCH);
T> +			sf = sf_buf_alloc(pa[i],
T> +			    m != NULL ? SFB_NOWAIT : SFB_CATCH);
T>  			if (sf == NULL) {
T>  				SFSTAT_INC(sf_allocfail);
T> -				vm_page_lock(pg);
T> -				vm_page_unwire(pg, 0);
T> -				KASSERT(pg->object != NULL,
T> -				    ("%s: object disappeared", __func__));
T> -				vm_page_unlock(pg);
T> +				for (int j = i; j < npages; j++) {
T> +					vm_page_lock(pa[j]);
T> +					vm_page_unwire(pa[j], 0);
T> +					vm_page_unlock(pa[j]);
T> +				}
T>  				if (m == NULL)
T> -					error = (mnw ? EAGAIN : EINTR);
T> +					error = ENOBUFS;
T> +				fixspace(npages, i, off, &space);
T>  				break;
T>  			}
T>  
T> @@ -3063,36 +3133,26 @@ retry_space:
T>  			 * Get an mbuf and set it up as having
T>  			 * external storage.
T>  			 */
T> -			m0 = m_get((mnw ? M_NOWAIT : M_WAITOK), MT_DATA);
T> -			if (m0 == NULL) {
T> -				error = (mnw ? EAGAIN : ENOBUFS);
T> -				(void)sf_buf_mext(NULL, NULL, sf);
T> -				break;
T> -			}
T> -			if (m_extadd(m0, (caddr_t )sf_buf_kva(sf), PAGE_SIZE,
T> -			    sf_buf_mext, sfs, sf, M_RDONLY, EXT_SFBUF,
T> -			    (mnw ? M_NOWAIT : M_WAITOK)) != 0) {
T> -				error = (mnw ? EAGAIN : ENOBUFS);
T> -				(void)sf_buf_mext(NULL, NULL, sf);
T> -				m_freem(m0);
T> -				break;
T> -			}
T> -			m0->m_data = (char *)sf_buf_kva(sf) + pgoff;
T> -			m0->m_len = xfsize;
T> +			m0 = m_get(M_WAITOK, MT_DATA);
T> +			(void )m_extadd(m0, (caddr_t )sf_buf_kva(sf), PAGE_SIZE,
T> +			    (flags & SF_NOCACHE) ? sf_mext_free_nocache :
T> +			    sf_mext_free, sfs, sf, M_RDONLY, EXT_SFBUF,
T> +			    M_WAITOK);
T> +			m0->m_data = (char *)sf_buf_kva(sf) +
T> +			    (vmoff(i, off) & PAGE_MASK);
T> +			m0->m_len = xfsize(i, npages, off, space);
T> +			m0->m_flags |= M_NOTREADY;
T>  
T> +			if (i == 0)
T> +				sfio->m = m0;
T> +
T>  			/* Append to mbuf chain. */
T>  			if (mtail != NULL)
T>  				mtail->m_next = m0;
T> -			else if (m != NULL)
T> -				m_last(m)->m_next = m0;
T>  			else
T>  				m = m0;
T>  			mtail = m0;
T>  
T> -			/* Keep track of bits processed. */
T> -			loopbytes += xfsize;
T> -			off += xfsize;
T> -
T>  			/*
T>  			 * XXX eventually this should be a sfsync
T>  			 * method call!
T> @@ -3104,47 +3164,51 @@ retry_space:
T>  		if (vp != NULL)
T>  			VOP_UNLOCK(vp, 0);
T>  
T> +		/* Keep track of bytes processed. */
T> +		off += space;
T> +		rem -= space;
T> +
T> +		/* Prepend header, if any. */
T> +		if (hdrlen) {
T> +			mhtail->m_next = m;
T> +			m = mh;
T> +			mh = NULL;
T> +		}
T> +
T> +		if (error) {
T> +			free(sfio, M_TEMP);
T> +			goto done;
T> +		}
T> +
T>  		/* Add the buffer chain to the socket buffer. */
T> -		if (m != NULL) {
T> -			int mlen, err;
T> +		KASSERT(m_length(m, NULL) == space + hdrlen,
T> +		    ("%s: mlen %u space %d hdrlen %d",
T> +		    __func__, m_length(m, NULL), space, hdrlen));
T>  
T> -			mlen = m_length(m, NULL);
T> -			SOCKBUF_LOCK(&so->so_snd);
T> -			if (so->so_snd.sb_state & SBS_CANTSENDMORE) {
T> -				error = EPIPE;
T> -				SOCKBUF_UNLOCK(&so->so_snd);
T> -				goto done;
T> -			}
T> -			SOCKBUF_UNLOCK(&so->so_snd);
T> -			CURVNET_SET(so->so_vnet);
T> -			/* Avoid error aliasing. */
T> -			err = (*so->so_proto->pr_usrreqs->pru_send)
T> -				    (so, 0, m, NULL, NULL, td);
T> -			CURVNET_RESTORE();
T> -			if (err == 0) {
T> -				/*
T> -				 * We need two counters to get the
T> -				 * file offset and nbytes to send
T> -				 * right:
T> -				 * - sbytes contains the total amount
T> -				 *   of bytes sent, including headers.
T> -				 * - fsbytes contains the total amount
T> -				 *   of bytes sent from the file.
T> -				 */
T> -				sbytes += mlen;
T> -				fsbytes += mlen;
T> -				if (hdrlen) {
T> -					fsbytes -= hdrlen;
T> -					hdrlen = 0;
T> -				}
T> -			} else if (error == 0)
T> -				error = err;
T> -			m = NULL;	/* pru_send always consumes */
T> +		CURVNET_SET(so->so_vnet);
T> +		if (nios == 0) {
T> +			free(sfio, M_TEMP);
T> +			serror = (*so->so_proto->pr_usrreqs->pru_send)
T> +			    (so, 0, m, NULL, NULL, td);
T> +		} else {
T> +			sfio->sock_fp = sock_fp;
T> +			sfio->npages = npages;
T> +			fhold(sock_fp);
T> +			serror = (*so->so_proto->pr_usrreqs->pru_send)
T> +			    (so, PRUS_NOTREADY, m, NULL, NULL, td);
T> +			sf_io_done(sfio);
T>  		}
T> +		CURVNET_RESTORE();
T>  
T> -		/* Quit outer loop on error or when we're done. */
T> -		if (done)
T> -			break;
T> +		if (serror == 0) {
T> +			sbytes += space + hdrlen;
T> +			if (hdrlen)
T> +				hdrlen = 0;
T> +		} else if (error == 0)
T> +			error = serror;
T> +		m = NULL;	/* pru_send always consumes */
T> +
T> +		/* Quit outer loop on error. */
T>  		if (error != 0)
T>  			goto done;
T>  	}
T> @@ -3179,6 +3243,8 @@ out:
T>  		fdrop(sock_fp, td);
T>  	if (m)
T>  		m_freem(m);
T> +	if (mh)
T> +		m_freem(mh);
T>  
T>  	if (error == ERESTART)
T>  		error = EINTR;
T> Index: sys/netgraph/bluetooth/socket/ng_btsocket_l2cap.c
T> ===================================================================
T> --- sys/netgraph/bluetooth/socket/ng_btsocket_l2cap.c	(.../head)	(revision 266804)
T> +++ sys/netgraph/bluetooth/socket/ng_btsocket_l2cap.c	(.../projects/sendfile)	(revision 266807)
T> @@ -1127,9 +1127,8 @@ ng_btsocket_l2cap_process_l2ca_write_rsp(struct ng
T>  	/*
T>   	 * Check if we have more data to send
T>   	 */
T> -
T>  	sbdroprecord(&pcb->so->so_snd);
T> -	if (pcb->so->so_snd.sb_cc > 0) {
T> +	if (sbavail(&pcb->so->so_snd) > 0) {
T>  		if (ng_btsocket_l2cap_send2(pcb) == 0)
T>  			ng_btsocket_l2cap_timeout(pcb);
T>  		else
T> @@ -2510,7 +2509,7 @@ ng_btsocket_l2cap_send2(ng_btsocket_l2cap_pcb_p pc
T>  	
T>  	mtx_assert(&pcb->pcb_mtx, MA_OWNED);
T>  
T> -	if (pcb->so->so_snd.sb_cc == 0)
T> +	if (sbavail(&pcb->so->so_snd) == 0)
T>  		return (EINVAL); /* XXX */
T>  
T>  	m = m_dup(pcb->so->so_snd.sb_mb, M_NOWAIT);
T> Index: sys/netgraph/bluetooth/socket/ng_btsocket_rfcomm.c
T> ===================================================================
T> --- sys/netgraph/bluetooth/socket/ng_btsocket_rfcomm.c	(.../head)	(revision 266804)
T> +++ sys/netgraph/bluetooth/socket/ng_btsocket_rfcomm.c	(.../projects/sendfile)	(revision 266807)
T> @@ -3274,7 +3274,7 @@ ng_btsocket_rfcomm_pcb_send(ng_btsocket_rfcomm_pcb
T>  	}
T>  
T>  	for (error = 0, sent = 0; sent < limit; sent ++) { 
T> -		length = min(pcb->mtu, pcb->so->so_snd.sb_cc);
T> +		length = min(pcb->mtu, sbavail(&pcb->so->so_snd));
T>  		if (length == 0)
T>  			break;
T>  
T> Index: sys/netgraph/bluetooth/socket/ng_btsocket_sco.c
T> ===================================================================
T> --- sys/netgraph/bluetooth/socket/ng_btsocket_sco.c	(.../head)	(revision 266804)
T> +++ sys/netgraph/bluetooth/socket/ng_btsocket_sco.c	(.../projects/sendfile)	(revision 266807)
T> @@ -906,7 +906,7 @@ ng_btsocket_sco_default_msg_input(struct ng_mesg *
T>  				sbdroprecord(&pcb->so->so_snd);
T>  
T>  			/* Send more if we have any */
T> -			if (pcb->so->so_snd.sb_cc > 0)
T> +			if (sbavail(&pcb->so->so_snd) > 0)
T>  				if (ng_btsocket_sco_send2(pcb) == 0)
T>  					ng_btsocket_sco_timeout(pcb);
T>  
T> @@ -1744,7 +1744,7 @@ ng_btsocket_sco_send2(ng_btsocket_sco_pcb_p pcb)
T>  	mtx_assert(&pcb->pcb_mtx, MA_OWNED);
T>  
T>  	while (pcb->rt->pending < pcb->rt->num_pkts &&
T> -	       pcb->so->so_snd.sb_cc > 0) {
T> +	       sbavail(&pcb->so->so_snd) > 0) {
T>  		/* Get a copy of the first packet on send queue */
T>  		m = m_dup(pcb->so->so_snd.sb_mb, M_NOWAIT);
T>  		if (m == NULL) {
T> Index: sys/ofed/drivers/infiniband/ulp/sdp/sdp_main.c
T> ===================================================================
T> --- sys/ofed/drivers/infiniband/ulp/sdp/sdp_main.c	(.../head)	(revision 266804)
T> +++ sys/ofed/drivers/infiniband/ulp/sdp/sdp_main.c	(.../projects/sendfile)	(revision 266807)
T> @@ -746,7 +746,7 @@ sdp_start_disconnect(struct sdp_sock *ssk)
T>  		    ("sdp_start_disconnect: sdp_drop() returned NULL"));
T>  	} else {
T>  		soisdisconnecting(so);
T> -		unread = so->so_rcv.sb_cc;
T> +		unread = sbused(&so->so_rcv);
T>  		sbflush(&so->so_rcv);
T>  		sdp_usrclosed(ssk);
T>  		if (!(ssk->flags & SDP_DROPPED)) {
T> @@ -888,7 +888,7 @@ sdp_append(struct sdp_sock *ssk, struct sockbuf *s
T>  		m_adj(mb, SDP_HEAD_SIZE);
T>  		n->m_pkthdr.len += mb->m_pkthdr.len;
T>  		n->m_flags |= mb->m_flags & (M_PUSH | M_URG);
T> -		m_demote(mb, 1);
T> +		m_demote(mb, 1, 0);
T>  		sbcompress(sb, mb, sb->sb_mbtail);
T>  		return;
T>  	}
T> @@ -1258,7 +1258,7 @@ sdp_sorecv(struct socket *so, struct sockaddr **ps
T>  	/* We will never ever get anything unless we are connected. */
T>  	if (!(so->so_state & (SS_ISCONNECTED|SS_ISDISCONNECTED))) {
T>  		/* When disconnecting there may be still some data left. */
T> -		if (sb->sb_cc > 0)
T> +		if (sbavail(sb))
T>  			goto deliver;
T>  		if (!(so->so_state & SS_ISDISCONNECTED))
T>  			error = ENOTCONN;
T> @@ -1266,7 +1266,7 @@ sdp_sorecv(struct socket *so, struct sockaddr **ps
T>  	}
T>  
T>  	/* Socket buffer is empty and we shall not block. */
T> -	if (sb->sb_cc == 0 &&
T> +	if (sbavail(sb) == 0 &&
T>  	    ((so->so_state & SS_NBIO) || (flags & (MSG_DONTWAIT|MSG_NBIO)))) {
T>  		error = EAGAIN;
T>  		goto out;
T> @@ -1277,7 +1277,7 @@ restart:
T>  
T>  	/* Abort if socket has reported problems. */
T>  	if (so->so_error) {
T> -		if (sb->sb_cc > 0)
T> +		if (sbavail(sb))
T>  			goto deliver;
T>  		if (oresid > uio->uio_resid)
T>  			goto out;
T> @@ -1289,7 +1289,7 @@ restart:
T>  
T>  	/* Door is closed.  Deliver what is left, if any. */
T>  	if (sb->sb_state & SBS_CANTRCVMORE) {
T> -		if (sb->sb_cc > 0)
T> +		if (sbavail(sb))
T>  			goto deliver;
T>  		else
T>  			goto out;
T> @@ -1296,18 +1296,18 @@ restart:
T>  	}
T>  
T>  	/* Socket buffer got some data that we shall deliver now. */
T> -	if (sb->sb_cc > 0 && !(flags & MSG_WAITALL) &&
T> +	if (sbavail(sb) && !(flags & MSG_WAITALL) &&
T>  	    ((so->so_state & SS_NBIO) ||
T>  	     (flags & (MSG_DONTWAIT|MSG_NBIO)) ||
T> -	     sb->sb_cc >= sb->sb_lowat ||
T> -	     sb->sb_cc >= uio->uio_resid ||
T> -	     sb->sb_cc >= sb->sb_hiwat) ) {
T> +	     sbavail(sb) >= sb->sb_lowat ||
T> +	     sbavail(sb) >= uio->uio_resid ||
T> +	     sbavail(sb) >= sb->sb_hiwat) ) {
T>  		goto deliver;
T>  	}
T>  
T>  	/* On MSG_WAITALL we must wait until all data or error arrives. */
T>  	if ((flags & MSG_WAITALL) &&
T> -	    (sb->sb_cc >= uio->uio_resid || sb->sb_cc >= sb->sb_lowat))
T> +	    (sbavail(sb) >= uio->uio_resid || sbavail(sb) >= sb->sb_lowat))
T>  		goto deliver;
T>  
T>  	/*
T> @@ -1321,7 +1321,7 @@ restart:
T>  
T>  deliver:
T>  	SOCKBUF_LOCK_ASSERT(&so->so_rcv);
T> -	KASSERT(sb->sb_cc > 0, ("%s: sockbuf empty", __func__));
T> +	KASSERT(sbavail(sb), ("%s: sockbuf empty", __func__));
T>  	KASSERT(sb->sb_mb != NULL, ("%s: sb_mb == NULL", __func__));
T>  
T>  	/* Statistics. */
T> @@ -1329,7 +1329,7 @@ deliver:
T>  		uio->uio_td->td_ru.ru_msgrcv++;
T>  
T>  	/* Fill uio until full or current end of socket buffer is reached. */
T> -	len = min(uio->uio_resid, sb->sb_cc);
T> +	len = min(uio->uio_resid, sbavail(sb));
T>  	if (mp0 != NULL) {
T>  		/* Dequeue as many mbufs as possible. */
T>  		if (!(flags & MSG_PEEK) && len >= sb->sb_mb->m_len) {
T> @@ -1509,7 +1509,7 @@ sdp_urg(struct sdp_sock *ssk, struct mbuf *mb)
T>  	if (so == NULL)
T>  		return;
T>  
T> -	so->so_oobmark = so->so_rcv.sb_cc + mb->m_pkthdr.len - 1;
T> +	so->so_oobmark = sbused(&so->so_rcv) + mb->m_pkthdr.len - 1;
T>  	sohasoutofband(so);
T>  	ssk->oobflags &= ~(SDP_HAVEOOB | SDP_HADOOB);
T>  	if (!(so->so_options & SO_OOBINLINE)) {
T> Index: sys/ofed/drivers/infiniband/ulp/sdp/sdp_rx.c
T> ===================================================================
T> --- sys/ofed/drivers/infiniband/ulp/sdp/sdp_rx.c	(.../head)	(revision 266804)
T> +++ sys/ofed/drivers/infiniband/ulp/sdp/sdp_rx.c	(.../projects/sendfile)	(revision 266807)
T> @@ -183,7 +183,7 @@ sdp_post_recvs_needed(struct sdp_sock *ssk)
T>  	 * Compute bytes in the receive queue and socket buffer.
T>  	 */
T>  	bytes_in_process = (posted - SDP_MIN_TX_CREDITS) * buffer_size;
T> -	bytes_in_process += ssk->socket->so_rcv.sb_cc;
T> +	bytes_in_process += sbused(&ssk->socket->so_rcv);
T>  
T>  	return bytes_in_process < max_bytes;
T>  }
T> Index: sys/sys/socket.h
T> ===================================================================
T> --- sys/sys/socket.h	(.../head)	(revision 266804)
T> +++ sys/sys/socket.h	(.../projects/sendfile)	(revision 266807)
T> @@ -602,12 +602,15 @@ struct sf_hdtr_all {
T>   * Sendfile-specific flag(s)
T>   */
T>  #define	SF_NODISKIO     0x00000001
T> -#define	SF_MNOWAIT	0x00000002
T> +#define	SF_MNOWAIT	0x00000002	/* unused since 11.0 */
T>  #define	SF_SYNC		0x00000004
T>  #define	SF_KQUEUE	0x00000008
T> +#define	SF_NOCACHE	0x00000010
T> +#define	SF_FLAGS(rh, flags)	(((rh) << 16) | (flags))
T>  
T>  #ifdef _KERNEL
T>  #define	SFK_COMPAT	0x00000001
T> +#define	SF_READAHEAD(flags)	((flags) >> 16)
T>  #endif /* _KERNEL */
T>  #endif /* __BSD_VISIBLE */
T>  
T> Index: sys/sys/sockbuf.h
T> ===================================================================
T> --- sys/sys/sockbuf.h	(.../head)	(revision 266804)
T> +++ sys/sys/sockbuf.h	(.../projects/sendfile)	(revision 266807)
T> @@ -89,8 +89,13 @@ struct	sockbuf {
T>  	struct	mbuf *sb_lastrecord;	/* (c/d) first mbuf of last
T>  					 * record in socket buffer */
T>  	struct	mbuf *sb_sndptr; /* (c/d) pointer into mbuf chain */
T> +	struct	mbuf *sb_fnrdy;	/* (c/d) pointer to first not ready buffer */
T> +#if 0
T> +	struct	mbuf *sb_lnrdy;	/* (c/d) pointer to last not ready buffer */
T> +#endif
T>  	u_int	sb_sndptroff;	/* (c/d) byte offset of ptr into chain */
T> -	u_int	sb_cc;		/* (c/d) actual chars in buffer */
T> +	u_int	sb_acc;		/* (c/d) available chars in buffer */
T> +	u_int	sb_ccc;		/* (c/d) claimed chars in buffer */
T>  	u_int	sb_hiwat;	/* (c/d) max actual char count */
T>  	u_int	sb_mbcnt;	/* (c/d) chars of mbufs used */
T>  	u_int   sb_mcnt;        /* (c/d) number of mbufs in buffer */
T> @@ -120,10 +125,17 @@ struct	sockbuf {
T>  #define	SOCKBUF_LOCK_ASSERT(_sb)	mtx_assert(SOCKBUF_MTX(_sb), MA_OWNED)
T>  #define	SOCKBUF_UNLOCK_ASSERT(_sb)	mtx_assert(SOCKBUF_MTX(_sb), MA_NOTOWNED)
T>  
T> +/*
T> + * Socket buffer private mbuf(9) flags.
T> + */
T> +#define	M_NOTREADY	M_PROTO1	/* m_data not populated yet */
T> +#define	M_BLOCKED	M_PROTO2	/* M_NOTREADY in front of m */
T> +#define	M_NOTAVAIL	(M_NOTREADY | M_BLOCKED)
T> +
T>  void	sbappend(struct sockbuf *sb, struct mbuf *m);
T>  void	sbappend_locked(struct sockbuf *sb, struct mbuf *m);
T> -void	sbappendstream(struct sockbuf *sb, struct mbuf *m);
T> -void	sbappendstream_locked(struct sockbuf *sb, struct mbuf *m);
T> +void	sbappendstream(struct sockbuf *sb, struct mbuf *m, int flags);
T> +void	sbappendstream_locked(struct sockbuf *sb, struct mbuf *m, int flags);
T>  int	sbappendaddr(struct sockbuf *sb, const struct sockaddr *asa,
T>  	    struct mbuf *m0, struct mbuf *control);
T>  int	sbappendaddr_locked(struct sockbuf *sb, const struct sockaddr *asa,
T> @@ -136,7 +148,6 @@ int	sbappendcontrol_locked(struct sockbuf *sb, str
T>  	    struct mbuf *control);
T>  void	sbappendrecord(struct sockbuf *sb, struct mbuf *m0);
T>  void	sbappendrecord_locked(struct sockbuf *sb, struct mbuf *m0);
T> -void	sbcheck(struct sockbuf *sb);
T>  void	sbcompress(struct sockbuf *sb, struct mbuf *m, struct mbuf *n);
T>  struct mbuf *
T>  	sbcreatecontrol(caddr_t p, int size, int type, int level);
T> @@ -162,59 +173,54 @@ void	sbtoxsockbuf(struct sockbuf *sb, struct xsock
T>  int	sbwait(struct sockbuf *sb);
T>  int	sblock(struct sockbuf *sb, int flags);
T>  void	sbunlock(struct sockbuf *sb);
T> +void	sballoc(struct sockbuf *, struct mbuf *);
T> +void	sbfree(struct sockbuf *, struct mbuf *);
T> +void	sbmtrim(struct sockbuf *, struct mbuf *, int);
T> +int	sbready(struct sockbuf *, struct mbuf *, int);
T>  
T> +static inline u_int
T> +sbavail(struct sockbuf *sb)
T> +{
T> +
T> +#if 0
T> +	SOCKBUF_LOCK_ASSERT(sb);
T> +#endif
T> +	return (sb->sb_acc);
T> +}
T> +
T> +static inline u_int
T> +sbused(struct sockbuf *sb)
T> +{
T> +
T> +#if 0
T> +	SOCKBUF_LOCK_ASSERT(sb);
T> +#endif
T> +	return (sb->sb_ccc);
T> +}
T> +
T>  /*
T>   * How much space is there in a socket buffer (so->so_snd or so->so_rcv)?
T>   * This is problematical if the fields are unsigned, as the space might
T> - * still be negative (cc > hiwat or mbcnt > mbmax).  Should detect
T> - * overflow and return 0.  Should use "lmin" but it doesn't exist now.
T> + * still be negative (ccc > hiwat or mbcnt > mbmax).
T>   */
T> -static __inline
T> -long
T> +static inline long
T>  sbspace(struct sockbuf *sb)
T>  {
T> -	long bleft;
T> -	long mleft;
T> +	long bleft, mleft;
T>  
T> +#if 0
T> +	SOCKBUF_LOCK_ASSERT(sb);
T> +#endif
T> +
T>  	if (sb->sb_flags & SB_STOP)
T>  		return(0);
T> -	bleft = sb->sb_hiwat - sb->sb_cc;
T> +
T> +	bleft = sb->sb_hiwat - sb->sb_ccc;
T>  	mleft = sb->sb_mbmax - sb->sb_mbcnt;
T> -	return((bleft < mleft) ? bleft : mleft);
T> -}
T>  
T> -/* adjust counters in sb reflecting allocation of m */
T> -#define	sballoc(sb, m) { \
T> -	(sb)->sb_cc += (m)->m_len; \
T> -	if ((m)->m_type != MT_DATA && (m)->m_type != MT_OOBDATA) \
T> -		(sb)->sb_ctl += (m)->m_len; \
T> -	(sb)->sb_mbcnt += MSIZE; \
T> -	(sb)->sb_mcnt += 1; \
T> -	if ((m)->m_flags & M_EXT) { \
T> -		(sb)->sb_mbcnt += (m)->m_ext.ext_size; \
T> -		(sb)->sb_ccnt += 1; \
T> -	} \
T> +	return ((bleft < mleft) ? bleft : mleft);
T>  }
T>  
T> -/* adjust counters in sb reflecting freeing of m */
T> -#define	sbfree(sb, m) { \
T> -	(sb)->sb_cc -= (m)->m_len; \
T> -	if ((m)->m_type != MT_DATA && (m)->m_type != MT_OOBDATA) \
T> -		(sb)->sb_ctl -= (m)->m_len; \
T> -	(sb)->sb_mbcnt -= MSIZE; \
T> -	(sb)->sb_mcnt -= 1; \
T> -	if ((m)->m_flags & M_EXT) { \
T> -		(sb)->sb_mbcnt -= (m)->m_ext.ext_size; \
T> -		(sb)->sb_ccnt -= 1; \
T> -	} \
T> -	if ((sb)->sb_sndptr == (m)) { \
T> -		(sb)->sb_sndptr = NULL; \
T> -		(sb)->sb_sndptroff = 0; \
T> -	} \
T> -	if ((sb)->sb_sndptroff != 0) \
T> -		(sb)->sb_sndptroff -= (m)->m_len; \
T> -}
T> -
T>  #define SB_EMPTY_FIXUP(sb) do {						\
T>  	if ((sb)->sb_mb == NULL) {					\
T>  		(sb)->sb_mbtail = NULL;					\
T> @@ -224,13 +230,15 @@ sbspace(struct sockbuf *sb)
T>  
T>  #ifdef SOCKBUF_DEBUG
T>  void	sblastrecordchk(struct sockbuf *, const char *, int);
T> +void	sblastmbufchk(struct sockbuf *, const char *, int);
T> +void	sbcheck(struct sockbuf *, const char *, int);
T>  #define	SBLASTRECORDCHK(sb)	sblastrecordchk((sb), __FILE__, __LINE__)
T> -
T> -void	sblastmbufchk(struct sockbuf *, const char *, int);
T>  #define	SBLASTMBUFCHK(sb)	sblastmbufchk((sb), __FILE__, __LINE__)
T> +#define	SBCHECK(sb)		sbcheck((sb), __FILE__, __LINE__)
T>  #else
T> -#define	SBLASTRECORDCHK(sb)      /* nothing */
T> -#define	SBLASTMBUFCHK(sb)        /* nothing */
T> +#define	SBLASTRECORDCHK(sb)	do {} while (0)
T> +#define	SBLASTMBUFCHK(sb)	do {} while (0)
T> +#define	SBCHECK(sb)		do {} while (0)
T>  #endif /* SOCKBUF_DEBUG */
T>  
T>  #endif /* _KERNEL */
T> Index: sys/sys/protosw.h
T> ===================================================================
T> --- sys/sys/protosw.h	(.../head)	(revision 266804)
T> +++ sys/sys/protosw.h	(.../projects/sendfile)	(revision 266807)
T> @@ -209,6 +209,7 @@ struct pr_usrreqs {
T>  #define	PRUS_OOB	0x1
T>  #define	PRUS_EOF	0x2
T>  #define	PRUS_MORETOCOME	0x4
T> +#define	PRUS_NOTREADY	0x8
T>  	int	(*pru_sense)(struct socket *so, struct stat *sb);
T>  	int	(*pru_shutdown)(struct socket *so);
T>  	int	(*pru_flush)(struct socket *so, int direction);
T> Index: sys/sys/sf_buf.h
T> ===================================================================
T> --- sys/sys/sf_buf.h	(.../head)	(revision 266804)
T> +++ sys/sys/sf_buf.h	(.../projects/sendfile)	(revision 266807)
T> @@ -52,7 +52,7 @@ struct sfstat {				/* sendfile statistics */
T>  #include <machine/sf_buf.h>
T>  #include <sys/systm.h>
T>  #include <sys/counter.h>
T> -struct mbuf;	/* for sf_buf_mext() */
T> +struct mbuf;	/* for sf_mext_free() */
T>  
T>  extern counter_u64_t sfstat[sizeof(struct sfstat) / sizeof(uint64_t)];
T>  #define	SFSTAT_ADD(name, val)	\
T> @@ -61,6 +61,6 @@ extern counter_u64_t sfstat[sizeof(struct sfstat)
T>  #define	SFSTAT_INC(name)	SFSTAT_ADD(name, 1)
T>  #endif /* _KERNEL */
T>  
T> -int	sf_buf_mext(struct mbuf *mb, void *addr, void *args);
T> +int	sf_mext_free(struct mbuf *mb, void *addr, void *args);
T>  
T>  #endif /* !_SYS_SF_BUF_H_ */
T> Index: sys/sys/vnode.h
T> ===================================================================
T> --- sys/sys/vnode.h	(.../head)	(revision 266804)
T> +++ sys/sys/vnode.h	(.../projects/sendfile)	(revision 266807)
T> @@ -719,6 +719,7 @@ int	vop_stdbmap(struct vop_bmap_args *);
T>  int	vop_stdfsync(struct vop_fsync_args *);
T>  int	vop_stdgetwritemount(struct vop_getwritemount_args *);
T>  int	vop_stdgetpages(struct vop_getpages_args *);
T> +int	vop_stdgetpages_async(struct vop_getpages_async_args *);
T>  int	vop_stdinactive(struct vop_inactive_args *);
T>  int	vop_stdislocked(struct vop_islocked_args *);
T>  int	vop_stdkqfilter(struct vop_kqfilter_args *);
T> Index: sys/sys/socketvar.h
T> ===================================================================
T> --- sys/sys/socketvar.h	(.../head)	(revision 266804)
T> +++ sys/sys/socketvar.h	(.../projects/sendfile)	(revision 266807)
T> @@ -205,7 +205,7 @@ struct xsocket {
T>  
T>  /* can we read something from so? */
T>  #define	soreadabledata(so) \
T> -    ((so)->so_rcv.sb_cc >= (so)->so_rcv.sb_lowat || \
T> +    (sbavail(&(so)->so_rcv) >= (so)->so_rcv.sb_lowat || \
T>  	!TAILQ_EMPTY(&(so)->so_comp) || (so)->so_error)
T>  #define	soreadable(so) \
T>  	(soreadabledata(so) || ((so)->so_rcv.sb_state & SBS_CANTRCVMORE))
T> Index: sys/sys/mbuf.h
T> ===================================================================
T> --- sys/sys/mbuf.h	(.../head)	(revision 266804)
T> +++ sys/sys/mbuf.h	(.../projects/sendfile)	(revision 266807)
T> @@ -922,7 +922,7 @@ struct mbuf	*m_copypacket(struct mbuf *, int);
T>  void		 m_copy_pkthdr(struct mbuf *, struct mbuf *);
T>  struct mbuf	*m_copyup(struct mbuf *, int, int);
T>  struct mbuf	*m_defrag(struct mbuf *, int);
T> -void		 m_demote(struct mbuf *, int);
T> +void		 m_demote(struct mbuf *, int, int);
T>  struct mbuf	*m_devget(char *, int, int, struct ifnet *,
T>  		    void (*)(char *, caddr_t, u_int));
T>  struct mbuf	*m_dup(struct mbuf *, int);
T> Index: sys/vm/vnode_pager.h
T> ===================================================================
T> --- sys/vm/vnode_pager.h	(.../head)	(revision 266804)
T> +++ sys/vm/vnode_pager.h	(.../projects/sendfile)	(revision 266807)
T> @@ -41,7 +41,7 @@
T>  #ifdef _KERNEL
T>  
T>  int vnode_pager_generic_getpages(struct vnode *vp, vm_page_t *m,
T> -					  int count, int reqpage);
T> +    int count, int reqpage, void (*iodone)(void *), void *arg);
T>  int vnode_pager_generic_putpages(struct vnode *vp, vm_page_t *m,
T>  					  int count, boolean_t sync,
T>  					  int *rtvals);
T> Index: sys/vm/vm_pager.h
T> ===================================================================
T> --- sys/vm/vm_pager.h	(.../head)	(revision 266804)
T> +++ sys/vm/vm_pager.h	(.../projects/sendfile)	(revision 266807)
T> @@ -51,18 +51,21 @@ typedef vm_object_t pgo_alloc_t(void *, vm_ooffset
T>      struct ucred *);
T>  typedef void pgo_dealloc_t(vm_object_t);
T>  typedef int pgo_getpages_t(vm_object_t, vm_page_t *, int, int);
T> +typedef int pgo_getpages_async_t(vm_object_t, vm_page_t *, int, int,
T> +    void(*)(void *), void *);
T>  typedef void pgo_putpages_t(vm_object_t, vm_page_t *, int, int, int *);
T>  typedef boolean_t pgo_haspage_t(vm_object_t, vm_pindex_t, int *, int *);
T>  typedef void pgo_pageunswapped_t(vm_page_t);
T>  
T>  struct pagerops {
T> -	pgo_init_t	*pgo_init;		/* Initialize pager. */
T> -	pgo_alloc_t	*pgo_alloc;		/* Allocate pager. */
T> -	pgo_dealloc_t	*pgo_dealloc;		/* Disassociate. */
T> -	pgo_getpages_t	*pgo_getpages;		/* Get (read) page. */
T> -	pgo_putpages_t	*pgo_putpages;		/* Put (write) page. */
T> -	pgo_haspage_t	*pgo_haspage;		/* Does pager have page? */
T> -	pgo_pageunswapped_t *pgo_pageunswapped;
T> +	pgo_init_t		*pgo_init;		/* Initialize pager. */
T> +	pgo_alloc_t		*pgo_alloc;		/* Allocate pager. */
T> +	pgo_dealloc_t		*pgo_dealloc;		/* Disassociate. */
T> +	pgo_getpages_t		*pgo_getpages;		/* Get (read) page. */
T> +	pgo_getpages_async_t	*pgo_getpages_async;	/* Get page asyncly. */
T> +	pgo_putpages_t		*pgo_putpages;		/* Put (write) page. */
T> +	pgo_haspage_t		*pgo_haspage;		/* Query page. */
T> +	pgo_pageunswapped_t	*pgo_pageunswapped;
T>  };
T>  
T>  extern struct pagerops defaultpagerops;
T> @@ -103,6 +106,8 @@ vm_object_t vm_pager_allocate(objtype_t, void *, v
T>  void vm_pager_bufferinit(void);
T>  void vm_pager_deallocate(vm_object_t);
T>  static __inline int vm_pager_get_pages(vm_object_t, vm_page_t *, int, int);
T> +static __inline int vm_pager_get_pages_async(vm_object_t, vm_page_t *, int,
T> +    int, void(*)(void *), void *);
T>  static __inline boolean_t vm_pager_has_page(vm_object_t, vm_pindex_t, int *, int *);
T>  void vm_pager_init(void);
T>  vm_object_t vm_pager_object_lookup(struct pagerlst *, void *);
T> @@ -131,6 +136,27 @@ vm_pager_get_pages(
T>  	return (r);
T>  }
T>  
T> +static __inline int
T> +vm_pager_get_pages_async(vm_object_t object, vm_page_t *m, int count,
T> +    int reqpage, void (*iodone)(void *), void *arg)
T> +{
T> +	int r;
T> +
T> +	VM_OBJECT_ASSERT_WLOCKED(object);
T> +
T> +	if (*pagertab[object->type]->pgo_getpages_async == NULL) {
T> +		/* Emulate async operation. */
T> +		r = vm_pager_get_pages(object, m, count, reqpage);
T> +		VM_OBJECT_WUNLOCK(object);
T> +		(iodone)(arg);
T> +		VM_OBJECT_WLOCK(object);
T> +	} else
T> +		r = (*pagertab[object->type]->pgo_getpages_async)(object, m,
T> +		    count, reqpage, iodone, arg);
T> +
T> +	return (r);
T> +}
T> +
T>  static __inline void
T>  vm_pager_put_pages(
T>  	vm_object_t object,
T> Index: sys/vm/vm_page.c
T> ===================================================================
T> --- sys/vm/vm_page.c	(.../head)	(revision 266804)
T> +++ sys/vm/vm_page.c	(.../projects/sendfile)	(revision 266807)
T> @@ -2689,6 +2689,8 @@ retrylookup:
T>  		sleep = (allocflags & VM_ALLOC_IGN_SBUSY) != 0 ?
T>  		    vm_page_xbusied(m) : vm_page_busied(m);
T>  		if (sleep) {
T> +			if (allocflags & VM_ALLOC_NOWAIT)
T> +				return (NULL);
T>  			/*
T>  			 * Reference the page before unlocking and
T>  			 * sleeping so that the page daemon is less
T> @@ -2716,6 +2718,8 @@ retrylookup:
T>  	}
T>  	m = vm_page_alloc(object, pindex, allocflags & ~VM_ALLOC_IGN_SBUSY);
T>  	if (m == NULL) {
T> +		if (allocflags & VM_ALLOC_NOWAIT)
T> +			return (NULL);
T>  		VM_OBJECT_WUNLOCK(object);
T>  		VM_WAIT;
T>  		VM_OBJECT_WLOCK(object);
T> Index: sys/vm/vm_page.h
T> ===================================================================
T> --- sys/vm/vm_page.h	(.../head)	(revision 266804)
T> +++ sys/vm/vm_page.h	(.../projects/sendfile)	(revision 266807)
T> @@ -390,6 +390,7 @@ vm_page_t PHYS_TO_VM_PAGE(vm_paddr_t pa);
T>  #define	VM_ALLOC_IGN_SBUSY	0x1000	/* vm_page_grab() only */
T>  #define	VM_ALLOC_NODUMP		0x2000	/* don't include in dump */
T>  #define	VM_ALLOC_SBUSY		0x4000	/* Shared busy the page */
T> +#define	VM_ALLOC_NOWAIT		0x8000	/* Return NULL instead of sleeping */
T>  
T>  #define	VM_ALLOC_COUNT_SHIFT	16
T>  #define	VM_ALLOC_COUNT(count)	((count) << VM_ALLOC_COUNT_SHIFT)
T> Index: sys/vm/vnode_pager.c
T> ===================================================================
T> --- sys/vm/vnode_pager.c	(.../head)	(revision 266804)
T> +++ sys/vm/vnode_pager.c	(.../projects/sendfile)	(revision 266807)
T> @@ -83,6 +83,8 @@ static int vnode_pager_input_smlfs(vm_object_t obj
T>  static int vnode_pager_input_old(vm_object_t object, vm_page_t m);
T>  static void vnode_pager_dealloc(vm_object_t);
T>  static int vnode_pager_getpages(vm_object_t, vm_page_t *, int, int);
T> +static int vnode_pager_getpages_async(vm_object_t, vm_page_t *, int, int,
T> +    void(*)(void  *), void *);
T>  static void vnode_pager_putpages(vm_object_t, vm_page_t *, int, boolean_t, int *);
T>  static boolean_t vnode_pager_haspage(vm_object_t, vm_pindex_t, int *, int *);
T>  static vm_object_t vnode_pager_alloc(void *, vm_ooffset_t, vm_prot_t,
T> @@ -92,6 +94,7 @@ struct pagerops vnodepagerops = {
T>  	.pgo_alloc =	vnode_pager_alloc,
T>  	.pgo_dealloc =	vnode_pager_dealloc,
T>  	.pgo_getpages =	vnode_pager_getpages,
T> +	.pgo_getpages_async = vnode_pager_getpages_async,
T>  	.pgo_putpages =	vnode_pager_putpages,
T>  	.pgo_haspage =	vnode_pager_haspage,
T>  };
T> @@ -664,6 +667,40 @@ vnode_pager_getpages(vm_object_t object, vm_page_t
T>  	return rtval;
T>  }
T>  
T> +static int
T> +vnode_pager_getpages_async(vm_object_t object, vm_page_t *m, int count,
T> +    int reqpage, void (*iodone)(void *), void *arg)
T> +{
T> +	int rtval;
T> +	struct vnode *vp;
T> +	int bytes = count * PAGE_SIZE;
T> +
T> +	vp = object->handle;
T> +	VM_OBJECT_WUNLOCK(object);
T> +	rtval = VOP_GETPAGES_ASYNC(vp, m, bytes, reqpage, 0, iodone, arg);
T> +	KASSERT(rtval != EOPNOTSUPP,
T> +	    ("vnode_pager: FS getpages_async not implemented\n"));
T> +	VM_OBJECT_WLOCK(object);
T> +	return rtval;
T> +}
T> +
T> +struct getpages_softc {
T> +	vm_page_t *m;
T> +	struct buf *bp;
T> +	vm_object_t object;
T> +	vm_offset_t kva;
T> +	off_t foff;
T> +	int size;
T> +	int count;
T> +	int unmapped;
T> +	int reqpage;
T> +	void (*iodone)(void *);
T> +	void *arg;
T> +};
T> +
T> +int	vnode_pager_generic_getpages_done(struct getpages_softc *);
T> +void	vnode_pager_generic_getpages_done_async(struct buf *);
T> +
T>  /*
T>   * This is now called from local media FS's to operate against their
T>   * own vnodes if they fail to implement VOP_GETPAGES.
T> @@ -670,11 +707,11 @@ vnode_pager_getpages(vm_object_t object, vm_page_t
T>   */
T>  int
T>  vnode_pager_generic_getpages(struct vnode *vp, vm_page_t *m, int bytecount,
T> -    int reqpage)
T> +    int reqpage, void (*iodone)(void *), void *arg)
T>  {
T>  	vm_object_t object;
T>  	vm_offset_t kva;
T> -	off_t foff, tfoff, nextoff;
T> +	off_t foff;
T>  	int i, j, size, bsize, first;
T>  	daddr_t firstaddr, reqblock;
T>  	struct bufobj *bo;
T> @@ -684,6 +721,7 @@ vnode_pager_generic_getpages(struct vnode *vp, vm_
T>  	struct mount *mp;
T>  	int count;
T>  	int error;
T> +	int unmapped;
T>  
T>  	object = vp->v_object;
T>  	count = bytecount / PAGE_SIZE;
T> @@ -891,8 +929,8 @@ vnode_pager_generic_getpages(struct vnode *vp, vm_
T>  	 * requires mapped buffers.
T>  	 */
T>  	mp = vp->v_mount;
T> -	if (mp != NULL && (mp->mnt_kern_flag & MNTK_UNMAPPED_BUFS) != 0 &&
T> -	    unmapped_buf_allowed) {
T> +	unmapped = (mp != NULL && (mp->mnt_kern_flag & MNTK_UNMAPPED_BUFS));
T> +	if (unmapped && unmapped_buf_allowed) {
T>  		bp->b_data = unmapped_buf;
T>  		bp->b_kvabase = unmapped_buf;
T>  		bp->b_offset = 0;
T> @@ -905,7 +943,6 @@ vnode_pager_generic_getpages(struct vnode *vp, vm_
T>  
T>  	/* build a minimal buffer header */
T>  	bp->b_iocmd = BIO_READ;
T> -	bp->b_iodone = bdone;
T>  	KASSERT(bp->b_rcred == NOCRED, ("leaking read ucred"));
T>  	KASSERT(bp->b_wcred == NOCRED, ("leaking write ucred"));
T>  	bp->b_rcred = crhold(curthread->td_ucred);
T> @@ -923,10 +960,88 @@ vnode_pager_generic_getpages(struct vnode *vp, vm_
T>  
T>  	/* do the input */
T>  	bp->b_iooffset = dbtob(bp->b_blkno);
T> -	bstrategy(bp);
T>  
T> -	bwait(bp, PVM, "vnread");
T> +	if (iodone) { /* async */
T> +		struct getpages_softc *sc;
T>  
T> +		sc = malloc(sizeof(*sc), M_TEMP, M_WAITOK);
T> +
T> +		sc->m = m;
T> +		sc->bp = bp;
T> +		sc->object = object;
T> +		sc->foff = foff;
T> +		sc->size = size;
T> +		sc->count = count;
T> +		sc->unmapped = unmapped;
T> +		sc->reqpage = reqpage;
T> +		sc->kva = kva;
T> +
T> +		sc->iodone = iodone;
T> +		sc->arg = arg;
T> +
T> +		bp->b_iodone = vnode_pager_generic_getpages_done_async;
T> +		bp->b_caller1 = sc;
T> +		BUF_KERNPROC(bp);
T> +		bstrategy(bp);
T> +		/* Good bye! */
T> +	} else {
T> +		struct getpages_softc sc;
T> +
T> +		sc.m = m;
T> +		sc.bp = bp;
T> +		sc.object = object;
T> +		sc.foff = foff;
T> +		sc.size = size;
T> +		sc.count = count;
T> +		sc.unmapped = unmapped;
T> +		sc.reqpage = reqpage;
T> +		sc.kva = kva;
T> +
T> +		bp->b_iodone = bdone;
T> +		bstrategy(bp);
T> +		bwait(bp, PVM, "vnread");
T> +		error = vnode_pager_generic_getpages_done(&sc);
T> +	}
T> +
T> +	return (error ? VM_PAGER_ERROR : VM_PAGER_OK);
T> +}
T> +
T> +void
T> +vnode_pager_generic_getpages_done_async(struct buf *bp)
T> +{
T> +	struct getpages_softc *sc = bp->b_caller1;
T> +	int error;
T> +
T> +	error = vnode_pager_generic_getpages_done(sc);
T> +
T> +	vm_page_xunbusy(sc->m[sc->reqpage]);
T> +
T> +	sc->iodone(sc->arg);
T> +
T> +	free(sc, M_TEMP);
T> +}
T> +
T> +int
T> +vnode_pager_generic_getpages_done(struct getpages_softc *sc)
T> +{
T> +	vm_object_t object;
T> +	vm_offset_t kva;
T> +	vm_page_t *m;
T> +	struct buf *bp;
T> +	off_t foff, tfoff, nextoff;
T> +	int i, size, count, unmapped, reqpage;
T> +	int error = 0;
T> +
T> +	m = sc->m;
T> +	bp = sc->bp;
T> +	object = sc->object;
T> +	foff = sc->foff;
T> +	size = sc->size;
T> +	count = sc->count;
T> +	unmapped = sc->unmapped;
T> +	reqpage = sc->reqpage;
T> +	kva = sc->kva;
T> +
T>  	if ((bp->b_ioflags & BIO_ERROR) != 0)
T>  		error = EIO;
T>  
T> @@ -939,7 +1054,7 @@ vnode_pager_generic_getpages(struct vnode *vp, vm_
T>  	}
T>  	if ((bp->b_flags & B_UNMAPPED) == 0)
T>  		pmap_qremove(kva, count);
T> -	if (mp != NULL && (mp->mnt_kern_flag & MNTK_UNMAPPED_BUFS) != 0) {
T> +	if (unmapped) {
T>  		bp->b_data = (caddr_t)kva;
T>  		bp->b_kvabase = (caddr_t)kva;
T>  		bp->b_flags &= ~B_UNMAPPED;
T> @@ -995,7 +1110,8 @@ vnode_pager_generic_getpages(struct vnode *vp, vm_
T>  	if (error) {
T>  		printf("vnode_pager_getpages: I/O read error\n");
T>  	}
T> -	return (error ? VM_PAGER_ERROR : VM_PAGER_OK);
T> +
T> +	return (error);
T>  }
T>  
T>  /*
T> Index: sys/rpc/clnt_vc.c
T> ===================================================================
T> --- sys/rpc/clnt_vc.c	(.../head)	(revision 266804)
T> +++ sys/rpc/clnt_vc.c	(.../projects/sendfile)	(revision 266807)
T> @@ -860,7 +860,7 @@ clnt_vc_soupcall(struct socket *so, void *arg, int
T>  			 * error condition
T>  			 */
T>  			do_read = FALSE;
T> -			if (so->so_rcv.sb_cc >= sizeof(uint32_t)
T> +			if (sbavail(&so->so_rcv) >= sizeof(uint32_t)
T>  			    || (so->so_rcv.sb_state & SBS_CANTRCVMORE)
T>  			    || so->so_error)
T>  				do_read = TRUE;
T> @@ -913,7 +913,7 @@ clnt_vc_soupcall(struct socket *so, void *arg, int
T>  			 * buffered.
T>  			 */
T>  			do_read = FALSE;
T> -			if (so->so_rcv.sb_cc >= ct->ct_record_resid
T> +			if (sbavail(&so->so_rcv) >= ct->ct_record_resid
T>  			    || (so->so_rcv.sb_state & SBS_CANTRCVMORE)
T>  			    || so->so_error)
T>  				do_read = TRUE;
T> Index: sys/rpc/svc_vc.c
T> ===================================================================
T> --- sys/rpc/svc_vc.c	(.../head)	(revision 266804)
T> +++ sys/rpc/svc_vc.c	(.../projects/sendfile)	(revision 266807)
T> @@ -546,7 +546,7 @@ svc_vc_ack(SVCXPRT *xprt, uint32_t *ack)
T>  {
T>  
T>  	*ack = atomic_load_acq_32(&xprt->xp_snt_cnt);
T> -	*ack -= xprt->xp_socket->so_snd.sb_cc;
T> +	*ack -= sbused(&xprt->xp_socket->so_snd);
T>  	return (TRUE);
T>  }
T>  
T> Index: sys/ufs/ffs/ffs_vnops.c
T> ===================================================================
T> --- sys/ufs/ffs/ffs_vnops.c	(.../head)	(revision 266804)
T> +++ sys/ufs/ffs/ffs_vnops.c	(.../projects/sendfile)	(revision 266807)
T> @@ -105,6 +105,7 @@ extern int	ffs_rawread(struct vnode *vp, struct ui
T>  static vop_fsync_t	ffs_fsync;
T>  static vop_lock1_t	ffs_lock;
T>  static vop_getpages_t	ffs_getpages;
T> +static vop_getpages_async_t ffs_getpages_async;
T>  static vop_read_t	ffs_read;
T>  static vop_write_t	ffs_write;
T>  static int	ffs_extread(struct vnode *vp, struct uio *uio, int ioflag);
T> @@ -125,6 +126,7 @@ struct vop_vector ffs_vnodeops1 = {
T>  	.vop_default =		&ufs_vnodeops,
T>  	.vop_fsync =		ffs_fsync,
T>  	.vop_getpages =		ffs_getpages,
T> +	.vop_getpages_async =	ffs_getpages_async,
T>  	.vop_lock1 =		ffs_lock,
T>  	.vop_read =		ffs_read,
T>  	.vop_reallocblks =	ffs_reallocblks,
T> @@ -847,18 +849,16 @@ ffs_write(ap)
T>  }
T>  
T>  /*
T> - * get page routine
T> + * Get page routines.
T>   */
T>  static int
T> -ffs_getpages(ap)
T> -	struct vop_getpages_args *ap;
T> +ffs_getpages_checkvalid(vm_page_t *m, int count, int reqpage)
T>  {
T> -	int i;
T>  	vm_page_t mreq;
T>  	int pcount;
T>  
T> -	pcount = round_page(ap->a_count) / PAGE_SIZE;
T> -	mreq = ap->a_m[ap->a_reqpage];
T> +	pcount = round_page(count) / PAGE_SIZE;
T> +	mreq = m[reqpage];
T>  
T>  	/*
T>  	 * if ANY DEV_BSIZE blocks are valid on a large filesystem block,
T> @@ -870,24 +870,48 @@ static int
T>  	if (mreq->valid) {
T>  		if (mreq->valid != VM_PAGE_BITS_ALL)
T>  			vm_page_zero_invalid(mreq, TRUE);
T> -		for (i = 0; i < pcount; i++) {
T> -			if (i != ap->a_reqpage) {
T> -				vm_page_lock(ap->a_m[i]);
T> -				vm_page_free(ap->a_m[i]);
T> -				vm_page_unlock(ap->a_m[i]);
T> +		for (int i = 0; i < pcount; i++) {
T> +			if (i != reqpage) {
T> +				vm_page_lock(m[i]);
T> +				vm_page_free(m[i]);
T> +				vm_page_unlock(m[i]);
T>  			}
T>  		}
T>  		VM_OBJECT_WUNLOCK(mreq->object);
T> -		return VM_PAGER_OK;
T> +		return (VM_PAGER_OK);
T>  	}
T>  	VM_OBJECT_WUNLOCK(mreq->object);
T>  
T> -	return vnode_pager_generic_getpages(ap->a_vp, ap->a_m,
T> -					    ap->a_count,
T> -					    ap->a_reqpage);
T> +	return (-1);
T>  }
T>  
T> +static int
T> +ffs_getpages(struct vop_getpages_args *ap)
T> +{
T> +	int rv;
T>  
T> +	rv = ffs_getpages_checkvalid(ap->a_m, ap->a_count, ap->a_reqpage);
T> +	if (rv == VM_PAGER_OK)
T> +		return (rv);
T> +
T> +	return (vnode_pager_generic_getpages(ap->a_vp, ap->a_m, ap->a_count,
T> +	    ap->a_reqpage, NULL, NULL));
T> +}
T> +
T> +static int
T> +ffs_getpages_async(struct vop_getpages_async_args *ap)
T> +{
T> +	int rv;
T> +
T> +	rv = ffs_getpages_checkvalid(ap->a_m, ap->a_count, ap->a_reqpage);
T> +	if (rv == VM_PAGER_OK) {
T> +		(ap->a_vop_getpages_iodone)(ap->a_arg);
T> +		return (rv);
T> +	}
T> +	return (vnode_pager_generic_getpages(ap->a_vp, ap->a_m, ap->a_count,
T> +	    ap->a_reqpage, ap->a_vop_getpages_iodone, ap->a_arg));
T> +}
T> +
T>  /*
T>   * Extended attribute area reading.
T>   */
T> Index: sys/tools/vnode_if.awk
T> ===================================================================
T> --- sys/tools/vnode_if.awk	(.../head)	(revision 266804)
T> +++ sys/tools/vnode_if.awk	(.../projects/sendfile)	(revision 266807)
T> @@ -254,16 +254,26 @@ while ((getline < srcfile) > 0) {
T>  		if (sub(/;$/, "") < 1)
T>  			die("Missing end-of-line ; in \"%s\".", $0);
T>  
T> -		# pick off variable name
T> -		if ((argp = match($0, /[A-Za-z0-9_]+$/)) < 1)
T> -			die("Missing var name \"a_foo\" in \"%s\".", $0);
T> -		args[numargs] = substr($0, argp);
T> -		$0 = substr($0, 1, argp - 1);
T> -
T> -		# what is left must be type
T> -		# remove trailing space (if any)
T> -		sub(/ $/, "");
T> -		types[numargs] = $0;
T> +		# pick off argument name
T> +		if ((argp = match($0, /[A-Za-z0-9_]+$/)) > 0) {
T> +			args[numargs] = substr($0, argp);
T> +			$0 = substr($0, 1, argp - 1);
T> +			sub(/ $/, "");
T> +			delete fargs[numargs];
T> +			types[numargs] = $0;
T> +		} else {	# try to parse a function pointer argument
T> +			if ((argp = match($0,
T> +			    /\(\*[A-Za-z0-9_]+\)\([A-Za-z0-9_*, ]+\)$/)) < 1)
T> +				die("Missing var name \"a_foo\" in \"%s\".",
T> +				    $0);
T> +			args[numargs] = substr($0, argp + 2);
T> +			sub(/\).+/, "", args[numargs]);
T> +			fargs[numargs] = substr($0, argp);
T> +			sub(/^\([^)]+\)/, "", fargs[numargs]);
T> +			$0 = substr($0, 1, argp - 1);
T> +			sub(/ $/, "");
T> +			types[numargs] = $0;
T> +		}
T>  	}
T>  	if (numargs > 4)
T>  		ctrargs = 4;
T> @@ -286,8 +296,13 @@ while ((getline < srcfile) > 0) {
T>  	if (hfile) {
T>  		# Print out the vop_F_args structure.
T>  		printh("struct "name"_args {\n\tstruct vop_generic_args a_gen;");
T> -		for (i = 0; i < numargs; ++i)
T> -			printh("\t" t_spc(types[i]) "a_" args[i] ";");
T> +		for (i = 0; i < numargs; ++i) {
T> +			if (fargs[i]) {
T> +				printh("\t" t_spc(types[i]) "(*a_" args[i] \
T> +				    ")" fargs[i] ";");
T> +			} else
T> +				printh("\t" t_spc(types[i]) "a_" args[i] ";");
T> +		}
T>  		printh("};");
T>  		printh("");
T>  
T> @@ -301,8 +316,14 @@ while ((getline < srcfile) > 0) {
T>  		printh("");
T>  		printh("static __inline int " uname "(");
T>  		for (i = 0; i < numargs; ++i) {
T> -			printh("\t" t_spc(types[i]) args[i] \
T> -			    (i < numargs - 1 ? "," : ")"));
T> +			if (fargs[i]) {
T> +				printh("\t" t_spc(types[i]) "(*" args[i] \
T> +				    ")" fargs[i] \
T> +				    (i < numargs - 1 ? "," : ")"));
T> +			} else {
T> +				printh("\t" t_spc(types[i]) args[i] \
T> +				    (i < numargs - 1 ? "," : ")"));
T> +			}
T>  		}
T>  		printh("{");
T>  		printh("\tstruct " name "_args a;");
T> Index: sys/netinet/tcp_reass.c
T> ===================================================================
T> --- sys/netinet/tcp_reass.c	(.../head)	(revision 266804)
T> +++ sys/netinet/tcp_reass.c	(.../projects/sendfile)	(revision 266807)
T> @@ -248,7 +248,7 @@ present:
T>  			m_freem(mq);
T>  		else {
T>  			mq->m_nextpkt = NULL;
T> -			sbappendstream_locked(&so->so_rcv, mq);
T> +			sbappendstream_locked(&so->so_rcv, mq, 0);
T>  			wakeup = 1;
T>  		}
T>  	}
T> Index: sys/netinet/accf_http.c
T> ===================================================================
T> --- sys/netinet/accf_http.c	(.../head)	(revision 266804)
T> +++ sys/netinet/accf_http.c	(.../projects/sendfile)	(revision 266807)
T> @@ -92,7 +92,7 @@ sbfull(struct sockbuf *sb)
T>  	    "mbcnt(%ld) >= mbmax(%ld): %d",
T>  	    sb->sb_cc, sb->sb_hiwat, sb->sb_cc >= sb->sb_hiwat,
T>  	    sb->sb_mbcnt, sb->sb_mbmax, sb->sb_mbcnt >= sb->sb_mbmax);
T> -	return (sb->sb_cc >= sb->sb_hiwat || sb->sb_mbcnt >= sb->sb_mbmax);
T> +	return (sbused(sb) >= sb->sb_hiwat || sb->sb_mbcnt >= sb->sb_mbmax);
T>  }
T>  
T>  /*
T> @@ -162,13 +162,14 @@ static int
T>  sohashttpget(struct socket *so, void *arg, int waitflag)
T>  {
T>  
T> -	if ((so->so_rcv.sb_state & SBS_CANTRCVMORE) == 0 && !sbfull(&so->so_rcv)) {
T> +	if ((so->so_rcv.sb_state & SBS_CANTRCVMORE) == 0 &&
T> +	    !sbfull(&so->so_rcv)) {
T>  		struct mbuf *m;
T>  		char *cmp;
T>  		int	cmplen, cc;
T>  
T>  		m = so->so_rcv.sb_mb;
T> -		cc = so->so_rcv.sb_cc - 1;
T> +		cc = sbavail(&so->so_rcv) - 1;
T>  		if (cc < 1)
T>  			return (SU_OK);
T>  		switch (*mtod(m, char *)) {
T> @@ -215,7 +216,7 @@ soparsehttpvers(struct socket *so, void *arg, int
T>  		goto fallout;
T>  
T>  	m = so->so_rcv.sb_mb;
T> -	cc = so->so_rcv.sb_cc;
T> +	cc = sbavail(&so->so_rcv);
T>  	inspaces = spaces = 0;
T>  	for (m = so->so_rcv.sb_mb; m; m = n) {
T>  		n = m->m_nextpkt;
T> @@ -304,7 +305,7 @@ soishttpconnected(struct socket *so, void *arg, in
T>  	 * have NCHRS left
T>  	 */
T>  	copied = 0;
T> -	ccleft = so->so_rcv.sb_cc;
T> +	ccleft = sbavail(&so->so_rcv);
T>  	if (ccleft < NCHRS)
T>  		goto readmore;
T>  	a = b = c = '\0';
T> Index: sys/netinet/sctp_os_bsd.h
T> ===================================================================
T> --- sys/netinet/sctp_os_bsd.h	(.../head)	(revision 266804)
T> +++ sys/netinet/sctp_os_bsd.h	(.../projects/sendfile)	(revision 266807)
T> @@ -405,7 +405,7 @@ typedef struct callout sctp_os_timer_t;
T>  #define SCTP_SOWAKEUP(so)	wakeup(&(so)->so_timeo)
T>  /* clear the socket buffer state */
T>  #define SCTP_SB_CLEAR(sb)	\
T> -	(sb).sb_cc = 0;		\
T> +	(sb).sb_ccc = 0;		\
T>  	(sb).sb_mb = NULL;	\
T>  	(sb).sb_mbcnt = 0;
T>  
T> Index: sys/netinet/tcp_output.c
T> ===================================================================
T> --- sys/netinet/tcp_output.c	(.../head)	(revision 266804)
T> +++ sys/netinet/tcp_output.c	(.../projects/sendfile)	(revision 266807)
T> @@ -322,7 +322,7 @@ after_sack_rexmit:
T>  			 * to send then the probe will be the FIN
T>  			 * itself.
T>  			 */
T> -			if (off < so->so_snd.sb_cc)
T> +			if (off < sbavail(&so->so_snd))
T>  				flags &= ~TH_FIN;
T>  			sendwin = 1;
T>  		} else {
T> @@ -348,7 +348,8 @@ after_sack_rexmit:
T>  	 */
T>  	if (sack_rxmit == 0) {
T>  		if (sack_bytes_rxmt == 0)
T> -			len = ((long)ulmin(so->so_snd.sb_cc, sendwin) - off);
T> +			len = ((long)ulmin(sbavail(&so->so_snd), sendwin) -
T> +			    off);
T>  		else {
T>  			long cwin;
T>  
T> @@ -357,8 +358,8 @@ after_sack_rexmit:
T>  			 * sending new data, having retransmitted all the
T>  			 * data possible in the scoreboard.
T>  			 */
T> -			len = ((long)ulmin(so->so_snd.sb_cc, tp->snd_wnd) 
T> -			       - off);
T> +			len = ((long)ulmin(sbavail(&so->so_snd), tp->snd_wnd) -
T> +			    off);
T>  			/*
T>  			 * Don't remove this (len > 0) check !
T>  			 * We explicitly check for len > 0 here (although it 
T> @@ -457,12 +458,15 @@ after_sack_rexmit:
T>  	 * TODO: Shrink send buffer during idle periods together
T>  	 * with congestion window.  Requires another timer.  Has to
T>  	 * wait for upcoming tcp timer rewrite.
T> +	 *
T> +	 * XXXGL: should there be used sbused() or sbavail()?
T>  	 */
T>  	if (V_tcp_do_autosndbuf && so->so_snd.sb_flags & SB_AUTOSIZE) {
T>  		if ((tp->snd_wnd / 4 * 5) >= so->so_snd.sb_hiwat &&
T> -		    so->so_snd.sb_cc >= (so->so_snd.sb_hiwat / 8 * 7) &&
T> -		    so->so_snd.sb_cc < V_tcp_autosndbuf_max &&
T> -		    sendwin >= (so->so_snd.sb_cc - (tp->snd_nxt - tp->snd_una))) {
T> +		    sbused(&so->so_snd) >= (so->so_snd.sb_hiwat / 8 * 7) &&
T> +		    sbused(&so->so_snd) < V_tcp_autosndbuf_max &&
T> +		    sendwin >= (sbused(&so->so_snd) -
T> +		    (tp->snd_nxt - tp->snd_una))) {
T>  			if (!sbreserve_locked(&so->so_snd,
T>  			    min(so->so_snd.sb_hiwat + V_tcp_autosndbuf_inc,
T>  			     V_tcp_autosndbuf_max), so, curthread))
T> @@ -499,10 +503,11 @@ after_sack_rexmit:
T>  		tso = 1;
T>  
T>  	if (sack_rxmit) {
T> -		if (SEQ_LT(p->rxmit + len, tp->snd_una + so->so_snd.sb_cc))
T> +		if (SEQ_LT(p->rxmit + len, tp->snd_una + sbavail(&so->so_snd)))
T>  			flags &= ~TH_FIN;
T>  	} else {
T> -		if (SEQ_LT(tp->snd_nxt + len, tp->snd_una + so->so_snd.sb_cc))
T> +		if (SEQ_LT(tp->snd_nxt + len, tp->snd_una +
T> +		    sbavail(&so->so_snd)))
T>  			flags &= ~TH_FIN;
T>  	}
T>  
T> @@ -532,7 +537,7 @@ after_sack_rexmit:
T>  		 */
T>  		if (!(tp->t_flags & TF_MORETOCOME) &&	/* normal case */
T>  		    (idle || (tp->t_flags & TF_NODELAY)) &&
T> -		    len + off >= so->so_snd.sb_cc &&
T> +		    len + off >= sbavail(&so->so_snd) &&
T>  		    (tp->t_flags & TF_NOPUSH) == 0) {
T>  			goto send;
T>  		}
T> @@ -660,7 +665,7 @@ dontupdate:
T>  	 * if window is nonzero, transmit what we can,
T>  	 * otherwise force out a byte.
T>  	 */
T> -	if (so->so_snd.sb_cc && !tcp_timer_active(tp, TT_REXMT) &&
T> +	if (sbavail(&so->so_snd) && !tcp_timer_active(tp, TT_REXMT) &&
T>  	    !tcp_timer_active(tp, TT_PERSIST)) {
T>  		tp->t_rxtshift = 0;
T>  		tcp_setpersist(tp);
T> @@ -786,7 +791,7 @@ send:
T>  			 * fractional unless the send sockbuf can
T>  			 * be emptied.
T>  			 */
T> -			if (sendalot && off + len < so->so_snd.sb_cc) {
T> +			if (sendalot && off + len < sbavail(&so->so_snd)) {
T>  				len -= len % (tp->t_maxopd - optlen);
T>  				sendalot = 1;
T>  			}
T> @@ -889,7 +894,7 @@ send:
T>  		 * give data to the user when a buffer fills or
T>  		 * a PUSH comes in.)
T>  		 */
T> -		if (off + len == so->so_snd.sb_cc)
T> +		if (off + len == sbavail(&so->so_snd))
T>  			flags |= TH_PUSH;
T>  		SOCKBUF_UNLOCK(&so->so_snd);
T>  	} else {
T> Index: sys/netinet/siftr.c
T> ===================================================================
T> --- sys/netinet/siftr.c	(.../head)	(revision 266804)
T> +++ sys/netinet/siftr.c	(.../projects/sendfile)	(revision 266807)
T> @@ -781,9 +781,9 @@ siftr_siftdata(struct pkt_node *pn, struct inpcb *
T>  	pn->flags = tp->t_flags;
T>  	pn->rxt_length = tp->t_rxtcur;
T>  	pn->snd_buf_hiwater = inp->inp_socket->so_snd.sb_hiwat;
T> -	pn->snd_buf_cc = inp->inp_socket->so_snd.sb_cc;
T> +	pn->snd_buf_cc = sbused(&inp->inp_socket->so_snd);
T>  	pn->rcv_buf_hiwater = inp->inp_socket->so_rcv.sb_hiwat;
T> -	pn->rcv_buf_cc = inp->inp_socket->so_rcv.sb_cc;
T> +	pn->rcv_buf_cc = sbused(&inp->inp_socket->so_rcv);
T>  	pn->sent_inflight_bytes = tp->snd_max - tp->snd_una;
T>  	pn->t_segqlen = tp->t_segqlen;
T>  
T> Index: sys/netinet/sctp_indata.c
T> ===================================================================
T> --- sys/netinet/sctp_indata.c	(.../head)	(revision 266804)
T> +++ sys/netinet/sctp_indata.c	(.../projects/sendfile)	(revision 266807)
T> @@ -70,7 +70,7 @@ sctp_calc_rwnd(struct sctp_tcb *stcb, struct sctp_
T>  
T>  	/*
T>  	 * This is really set wrong with respect to a 1-2-m socket. Since
T> -	 * the sb_cc is the count that everyone as put up. When we re-write
T> +	 * the sb_ccc is the count that everyone as put up. When we re-write
T>  	 * sctp_soreceive then we will fix this so that ONLY this
T>  	 * associations data is taken into account.
T>  	 */
T> @@ -77,7 +77,7 @@ sctp_calc_rwnd(struct sctp_tcb *stcb, struct sctp_
T>  	if (stcb->sctp_socket == NULL)
T>  		return (calc);
T>  
T> -	if (stcb->asoc.sb_cc == 0 &&
T> +	if (stcb->asoc.sb_ccc == 0 &&
T>  	    asoc->size_on_reasm_queue == 0 &&
T>  	    asoc->size_on_all_streams == 0) {
T>  		/* Full rwnd granted */
T> @@ -1358,7 +1358,7 @@ sctp_process_a_data_chunk(struct sctp_tcb *stcb, s
T>  		 * When we have NO room in the rwnd we check to make sure
T>  		 * the reader is doing its job...
T>  		 */
T> -		if (stcb->sctp_socket->so_rcv.sb_cc) {
T> +		if (stcb->sctp_socket->so_rcv.sb_ccc) {
T>  			/* some to read, wake-up */
T>  #if defined(__APPLE__) || defined(SCTP_SO_LOCK_TESTING)
T>  			struct socket *so;
T> Index: sys/netinet/sctp_pcb.c
T> ===================================================================
T> --- sys/netinet/sctp_pcb.c	(.../head)	(revision 266804)
T> +++ sys/netinet/sctp_pcb.c	(.../projects/sendfile)	(revision 266807)
T> @@ -3328,7 +3328,7 @@ sctp_inpcb_free(struct sctp_inpcb *inp, int immedi
T>  			if ((asoc->asoc.size_on_reasm_queue > 0) ||
T>  			    (asoc->asoc.control_pdapi) ||
T>  			    (asoc->asoc.size_on_all_streams > 0) ||
T> -			    (so && (so->so_rcv.sb_cc > 0))) {
T> +			    (so && (so->so_rcv.sb_ccc > 0))) {
T>  				/* Left with Data unread */
T>  				struct mbuf *op_err;
T>  
T> @@ -3556,7 +3556,7 @@ sctp_inpcb_free(struct sctp_inpcb *inp, int immedi
T>  		TAILQ_REMOVE(&inp->read_queue, sq, next);
T>  		sctp_free_remote_addr(sq->whoFrom);
T>  		if (so)
T> -			so->so_rcv.sb_cc -= sq->length;
T> +			so->so_rcv.sb_ccc -= sq->length;
T>  		if (sq->data) {
T>  			sctp_m_freem(sq->data);
T>  			sq->data = NULL;
T> @@ -4775,7 +4775,7 @@ sctp_free_assoc(struct sctp_inpcb *inp, struct sct
T>  			inp->sctp_flags |= SCTP_PCB_FLAGS_WAS_CONNECTED;
T>  			if (so) {
T>  				SOCK_LOCK(so);
T> -				if (so->so_rcv.sb_cc == 0) {
T> +				if (so->so_rcv.sb_ccc == 0) {
T>  					so->so_state &= ~(SS_ISCONNECTING |
T>  					    SS_ISDISCONNECTING |
T>  					    SS_ISCONFIRMING |
T> Index: sys/netinet/sctp_pcb.h
T> ===================================================================
T> --- sys/netinet/sctp_pcb.h	(.../head)	(revision 266804)
T> +++ sys/netinet/sctp_pcb.h	(.../projects/sendfile)	(revision 266807)
T> @@ -369,7 +369,7 @@ struct sctp_inpcb {
T>  	}     ip_inp;
T>  
T>  
T> -	/* Socket buffer lock protects read_queue and of course sb_cc */
T> +	/* Socket buffer lock protects read_queue and of course sb_ccc */
T>  	struct sctp_readhead read_queue;
T>  
T>  	              LIST_ENTRY(sctp_inpcb) sctp_list;	/* lists all endpoints */
T> Index: sys/netinet/sctp_usrreq.c
T> ===================================================================
T> --- sys/netinet/sctp_usrreq.c	(.../head)	(revision 266804)
T> +++ sys/netinet/sctp_usrreq.c	(.../projects/sendfile)	(revision 266807)
T> @@ -586,7 +586,7 @@ sctp_must_try_again:
T>  	if (((flags & SCTP_PCB_FLAGS_SOCKET_GONE) == 0) &&
T>  	    (atomic_cmpset_int(&inp->sctp_flags, flags, (flags | SCTP_PCB_FLAGS_SOCKET_GONE | SCTP_PCB_FLAGS_CLOSE_IP)))) {
T>  		if (((so->so_options & SO_LINGER) && (so->so_linger == 0)) ||
T> -		    (so->so_rcv.sb_cc > 0)) {
T> +		    (so->so_rcv.sb_ccc > 0)) {
T>  #ifdef SCTP_LOG_CLOSING
T>  			sctp_log_closing(inp, NULL, 13);
T>  #endif
T> @@ -751,7 +751,7 @@ sctp_disconnect(struct socket *so)
T>  			}
T>  			if (((so->so_options & SO_LINGER) &&
T>  			    (so->so_linger == 0)) ||
T> -			    (so->so_rcv.sb_cc > 0)) {
T> +			    (so->so_rcv.sb_ccc > 0)) {
T>  				if (SCTP_GET_STATE(asoc) !=
T>  				    SCTP_STATE_COOKIE_WAIT) {
T>  					/* Left with Data unread */
T> @@ -916,7 +916,7 @@ sctp_flush(struct socket *so, int how)
T>  		inp->sctp_flags |= SCTP_PCB_FLAGS_SOCKET_CANT_READ;
T>  		SCTP_INP_READ_UNLOCK(inp);
T>  		SCTP_INP_WUNLOCK(inp);
T> -		so->so_rcv.sb_cc = 0;
T> +		so->so_rcv.sb_ccc = 0;
T>  		so->so_rcv.sb_mbcnt = 0;
T>  		so->so_rcv.sb_mb = NULL;
T>  	}
T> @@ -925,7 +925,7 @@ sctp_flush(struct socket *so, int how)
T>  		 * First make sure the sb will be happy, we don't use these
T>  		 * except maybe the count
T>  		 */
T> -		so->so_snd.sb_cc = 0;
T> +		so->so_snd.sb_ccc = 0;
T>  		so->so_snd.sb_mbcnt = 0;
T>  		so->so_snd.sb_mb = NULL;
T>  
T> Index: sys/netinet/sctp_structs.h
T> ===================================================================
T> --- sys/netinet/sctp_structs.h	(.../head)	(revision 266804)
T> +++ sys/netinet/sctp_structs.h	(.../projects/sendfile)	(revision 266807)
T> @@ -982,7 +982,7 @@ struct sctp_association {
T>  
T>  	uint32_t total_output_queue_size;
T>  
T> -	uint32_t sb_cc;		/* shadow of sb_cc */
T> +	uint32_t sb_ccc;		/* shadow of sb_ccc */
T>  	uint32_t sb_send_resv;	/* amount reserved on a send */
T>  	uint32_t my_rwnd_control_len;	/* shadow of sb_mbcnt used for rwnd
T>  					 * control */
T> Index: sys/netinet/tcp_input.c
T> ===================================================================
T> --- sys/netinet/tcp_input.c	(.../head)	(revision 266804)
T> +++ sys/netinet/tcp_input.c	(.../projects/sendfile)	(revision 266807)
T> @@ -1729,7 +1729,7 @@ tcp_do_segment(struct mbuf *m, struct tcphdr *th,
T>  					tcp_timer_activate(tp, TT_REXMT,
T>  						      tp->t_rxtcur);
T>  				sowwakeup(so);
T> -				if (so->so_snd.sb_cc)
T> +				if (sbavail(&so->so_snd))
T>  					(void) tcp_output(tp);
T>  				goto check_delack;
T>  			}
T> @@ -1837,7 +1837,7 @@ tcp_do_segment(struct mbuf *m, struct tcphdr *th,
T>  					    newsize, so, NULL))
T>  						so->so_rcv.sb_flags &= ~SB_AUTOSIZE;
T>  				m_adj(m, drop_hdrlen);	/* delayed header drop */
T> -				sbappendstream_locked(&so->so_rcv, m);
T> +				sbappendstream_locked(&so->so_rcv, m, 0);
T>  			}
T>  			/* NB: sorwakeup_locked() does an implicit unlock. */
T>  			sorwakeup_locked(so);
T> @@ -2541,7 +2541,7 @@ tcp_do_segment(struct mbuf *m, struct tcphdr *th,
T>  					 * Otherwise we would send pure ACKs.
T>  					 */
T>  					SOCKBUF_LOCK(&so->so_snd);
T> -					avail = so->so_snd.sb_cc -
T> +					avail = sbavail(&so->so_snd) -
T>  					    (tp->snd_nxt - tp->snd_una);
T>  					SOCKBUF_UNLOCK(&so->so_snd);
T>  					if (avail > 0)
T> @@ -2676,10 +2676,10 @@ process_ACK:
T>  		cc_ack_received(tp, th, CC_ACK);
T>  
T>  		SOCKBUF_LOCK(&so->so_snd);
T> -		if (acked > so->so_snd.sb_cc) {
T> -			tp->snd_wnd -= so->so_snd.sb_cc;
T> +		if (acked > sbavail(&so->so_snd)) {
T> +			tp->snd_wnd -= sbavail(&so->so_snd);
T>  			mfree = sbcut_locked(&so->so_snd,
T> -			    (int)so->so_snd.sb_cc);
T> +			    (int)sbavail(&so->so_snd));
T>  			ourfinisacked = 1;
T>  		} else {
T>  			mfree = sbcut_locked(&so->so_snd, acked);
T> @@ -2805,7 +2805,7 @@ step6:
T>  		 * actually wanting to send this much urgent data.
T>  		 */
T>  		SOCKBUF_LOCK(&so->so_rcv);
T> -		if (th->th_urp + so->so_rcv.sb_cc > sb_max) {
T> +		if (th->th_urp + sbavail(&so->so_rcv) > sb_max) {
T>  			th->th_urp = 0;			/* XXX */
T>  			thflags &= ~TH_URG;		/* XXX */
T>  			SOCKBUF_UNLOCK(&so->so_rcv);	/* XXX */
T> @@ -2827,7 +2827,7 @@ step6:
T>  		 */
T>  		if (SEQ_GT(th->th_seq+th->th_urp, tp->rcv_up)) {
T>  			tp->rcv_up = th->th_seq + th->th_urp;
T> -			so->so_oobmark = so->so_rcv.sb_cc +
T> +			so->so_oobmark = sbavail(&so->so_rcv) +
T>  			    (tp->rcv_up - tp->rcv_nxt) - 1;
T>  			if (so->so_oobmark == 0)
T>  				so->so_rcv.sb_state |= SBS_RCVATMARK;
T> @@ -2897,7 +2897,7 @@ dodata:							/* XXX */
T>  			if (so->so_rcv.sb_state & SBS_CANTRCVMORE)
T>  				m_freem(m);
T>  			else
T> -				sbappendstream_locked(&so->so_rcv, m);
T> +				sbappendstream_locked(&so->so_rcv, m, 0);
T>  			/* NB: sorwakeup_locked() does an implicit unlock. */
T>  			sorwakeup_locked(so);
T>  		} else {
T> Index: sys/netinet/sctp_input.c
T> ===================================================================
T> --- sys/netinet/sctp_input.c	(.../head)	(revision 266804)
T> +++ sys/netinet/sctp_input.c	(.../projects/sendfile)	(revision 266807)
T> @@ -1042,7 +1042,7 @@ sctp_handle_shutdown_ack(struct sctp_shutdown_ack_
T>  	if (stcb->sctp_socket) {
T>  		if ((stcb->sctp_ep->sctp_flags & SCTP_PCB_FLAGS_TCPTYPE) ||
T>  		    (stcb->sctp_ep->sctp_flags & SCTP_PCB_FLAGS_IN_TCPPOOL)) {
T> -			stcb->sctp_socket->so_snd.sb_cc = 0;
T> +			stcb->sctp_socket->so_snd.sb_ccc = 0;
T>  		}
T>  		sctp_ulp_notify(SCTP_NOTIFY_ASSOC_DOWN, stcb, 0, NULL, SCTP_SO_NOT_LOCKED);
T>  	}
T> Index: sys/netinet/sctp_var.h
T> ===================================================================
T> --- sys/netinet/sctp_var.h	(.../head)	(revision 266804)
T> +++ sys/netinet/sctp_var.h	(.../projects/sendfile)	(revision 266807)
T> @@ -82,9 +82,9 @@ extern struct pr_usrreqs sctp_usrreqs;
T>  
T>  #define sctp_maxspace(sb) (max((sb)->sb_hiwat,SCTP_MINIMAL_RWND))
T>  
T> -#define	sctp_sbspace(asoc, sb) ((long) ((sctp_maxspace(sb) > (asoc)->sb_cc) ? (sctp_maxspace(sb) - (asoc)->sb_cc) : 0))
T> +#define	sctp_sbspace(asoc, sb) ((long) ((sctp_maxspace(sb) > (asoc)->sb_ccc) ? (sctp_maxspace(sb) - (asoc)->sb_ccc) : 0))
T>  
T> -#define	sctp_sbspace_failedmsgs(sb) ((long) ((sctp_maxspace(sb) > (sb)->sb_cc) ? (sctp_maxspace(sb) - (sb)->sb_cc) : 0))
T> +#define	sctp_sbspace_failedmsgs(sb) ((long) ((sctp_maxspace(sb) > (sb)->sb_ccc) ? (sctp_maxspace(sb) - (sb)->sb_ccc) : 0))
T>  
T>  #define sctp_sbspace_sub(a,b) ((a > b) ? (a - b) : 0)
T>  
T> @@ -195,10 +195,10 @@ extern struct pr_usrreqs sctp_usrreqs;
T>  }
T>  
T>  #define sctp_sbfree(ctl, stcb, sb, m) { \
T> -	SCTP_SAVE_ATOMIC_DECREMENT(&(sb)->sb_cc, SCTP_BUF_LEN((m))); \
T> +	SCTP_SAVE_ATOMIC_DECREMENT(&(sb)->sb_ccc, SCTP_BUF_LEN((m))); \
T>  	SCTP_SAVE_ATOMIC_DECREMENT(&(sb)->sb_mbcnt, MSIZE); \
T>  	if (((ctl)->do_not_ref_stcb == 0) && stcb) {\
T> -		SCTP_SAVE_ATOMIC_DECREMENT(&(stcb)->asoc.sb_cc, SCTP_BUF_LEN((m))); \
T> +		SCTP_SAVE_ATOMIC_DECREMENT(&(stcb)->asoc.sb_ccc, SCTP_BUF_LEN((m))); \
T>  		SCTP_SAVE_ATOMIC_DECREMENT(&(stcb)->asoc.my_rwnd_control_len, MSIZE); \
T>  	} \
T>  	if (SCTP_BUF_TYPE(m) != MT_DATA && SCTP_BUF_TYPE(m) != MT_HEADER && \
T> @@ -207,10 +207,10 @@ extern struct pr_usrreqs sctp_usrreqs;
T>  }
T>  
T>  #define sctp_sballoc(stcb, sb, m) { \
T> -	atomic_add_int(&(sb)->sb_cc,SCTP_BUF_LEN((m))); \
T> +	atomic_add_int(&(sb)->sb_ccc,SCTP_BUF_LEN((m))); \
T>  	atomic_add_int(&(sb)->sb_mbcnt, MSIZE); \
T>  	if (stcb) { \
T> -		atomic_add_int(&(stcb)->asoc.sb_cc,SCTP_BUF_LEN((m))); \
T> +		atomic_add_int(&(stcb)->asoc.sb_ccc,SCTP_BUF_LEN((m))); \
T>  		atomic_add_int(&(stcb)->asoc.my_rwnd_control_len, MSIZE); \
T>  	} \
T>  	if (SCTP_BUF_TYPE(m) != MT_DATA && SCTP_BUF_TYPE(m) != MT_HEADER && \
T> Index: sys/netinet/sctp_output.c
T> ===================================================================
T> --- sys/netinet/sctp_output.c	(.../head)	(revision 266804)
T> +++ sys/netinet/sctp_output.c	(.../projects/sendfile)	(revision 266807)
T> @@ -7104,7 +7104,7 @@ one_more_time:
T>  			if ((stcb->sctp_socket != NULL) && \
T>  			    ((stcb->sctp_ep->sctp_flags & SCTP_PCB_FLAGS_TCPTYPE) ||
T>  			    (stcb->sctp_ep->sctp_flags & SCTP_PCB_FLAGS_IN_TCPPOOL))) {
T> -				atomic_subtract_int(&stcb->sctp_socket->so_snd.sb_cc, sp->length);
T> +				atomic_subtract_int(&stcb->sctp_socket->so_snd.sb_ccc, sp->length);
T>  			}
T>  			if (sp->data) {
T>  				sctp_m_freem(sp->data);
T> @@ -11382,7 +11382,7 @@ jump_out:
T>  		drp->current_onq = htonl(asoc->size_on_reasm_queue +
T>  		    asoc->size_on_all_streams +
T>  		    asoc->my_rwnd_control_len +
T> -		    stcb->sctp_socket->so_rcv.sb_cc);
T> +		    stcb->sctp_socket->so_rcv.sb_ccc);
T>  	} else {
T>  		/*-
T>  		 * If my rwnd is 0, possibly from mbuf depletion as well as
T> Index: sys/netinet/tcp_usrreq.c
T> ===================================================================
T> --- sys/netinet/tcp_usrreq.c	(.../head)	(revision 266804)
T> +++ sys/netinet/tcp_usrreq.c	(.../projects/sendfile)	(revision 266807)
T> @@ -826,7 +826,7 @@ tcp_usr_send(struct socket *so, int flags, struct
T>  		m_freem(control);	/* empty control, just free it */
T>  	}
T>  	if (!(flags & PRUS_OOB)) {
T> -		sbappendstream(&so->so_snd, m);
T> +		sbappendstream(&so->so_snd, m, flags);
T>  		if (nam && tp->t_state < TCPS_SYN_SENT) {
T>  			/*
T>  			 * Do implied connect if not yet connected,
T> @@ -858,7 +858,8 @@ tcp_usr_send(struct socket *so, int flags, struct
T>  			socantsendmore(so);
T>  			tcp_usrclosed(tp);
T>  		}
T> -		if (!(inp->inp_flags & INP_DROPPED)) {
T> +		if (!(inp->inp_flags & INP_DROPPED) &&
T> +		    !(flags & PRUS_NOTREADY)) {
T>  			if (flags & PRUS_MORETOCOME)
T>  				tp->t_flags |= TF_MORETOCOME;
T>  			error = tcp_output(tp);
T> @@ -884,7 +885,7 @@ tcp_usr_send(struct socket *so, int flags, struct
T>  		 * of data past the urgent section.
T>  		 * Otherwise, snd_up should be one lower.
T>  		 */
T> -		sbappendstream_locked(&so->so_snd, m);
T> +		sbappendstream_locked(&so->so_snd, m, flags);
T>  		SOCKBUF_UNLOCK(&so->so_snd);
T>  		if (nam && tp->t_state < TCPS_SYN_SENT) {
T>  			/*
T> @@ -908,10 +909,12 @@ tcp_usr_send(struct socket *so, int flags, struct
T>  			tp->snd_wnd = TTCP_CLIENT_SND_WND;
T>  			tcp_mss(tp, -1);
T>  		}
T> -		tp->snd_up = tp->snd_una + so->so_snd.sb_cc;
T> -		tp->t_flags |= TF_FORCEDATA;
T> -		error = tcp_output(tp);
T> -		tp->t_flags &= ~TF_FORCEDATA;
T> +		tp->snd_up = tp->snd_una + sbavail(&so->so_snd);
T> +		if (!(flags & PRUS_NOTREADY)) {
T> +			tp->t_flags |= TF_FORCEDATA;
T> +			error = tcp_output(tp);
T> +			tp->t_flags &= ~TF_FORCEDATA;
T> +		}
T>  	}
T>  out:
T>  	TCPDEBUG2((flags & PRUS_OOB) ? PRU_SENDOOB :
T> Index: sys/netinet/accf_dns.c
T> ===================================================================
T> --- sys/netinet/accf_dns.c	(.../head)	(revision 266804)
T> +++ sys/netinet/accf_dns.c	(.../projects/sendfile)	(revision 266807)
T> @@ -75,7 +75,7 @@ sohasdns(struct socket *so, void *arg, int waitfla
T>  	struct sockbuf *sb = &so->so_rcv;
T>  
T>  	/* If the socket is full, we're ready. */
T> -	if (sb->sb_cc >= sb->sb_hiwat || sb->sb_mbcnt >= sb->sb_mbmax)
T> +	if (sbused(sb) >= sb->sb_hiwat || sb->sb_mbcnt >= sb->sb_mbmax)
T>  		goto ready;
T>  
T>  	/* Check to see if we have a request. */
T> @@ -115,7 +115,7 @@ skippacket(struct sockbuf *sb) {
T>  	unsigned long packlen;
T>  	struct packet q, *p = &q;
T>  
T> -	if (sb->sb_cc < 2)
T> +	if (sbavail(sb) < 2)
T>  		return DNS_WAIT;
T>  
T>  	q.m = sb->sb_mb;
T> @@ -122,7 +122,7 @@ skippacket(struct sockbuf *sb) {
T>  	q.n = q.m->m_nextpkt;
T>  	q.moff = 0;
T>  	q.offset = 0;
T> -	q.len = sb->sb_cc;
T> +	q.len = sbavail(sb);
T>  
T>  	GET16(p, packlen);
T>  	if (packlen + 2 > q.len)
T> Index: sys/netinet/sctputil.c
T> ===================================================================
T> --- sys/netinet/sctputil.c	(.../head)	(revision 266804)
T> +++ sys/netinet/sctputil.c	(.../projects/sendfile)	(revision 266807)
T> @@ -67,9 +67,9 @@ sctp_sblog(struct sockbuf *sb, struct sctp_tcb *st
T>  	struct sctp_cwnd_log sctp_clog;
T>  
T>  	sctp_clog.x.sb.stcb = stcb;
T> -	sctp_clog.x.sb.so_sbcc = sb->sb_cc;
T> +	sctp_clog.x.sb.so_sbcc = sb->sb_ccc;
T>  	if (stcb)
T> -		sctp_clog.x.sb.stcb_sbcc = stcb->asoc.sb_cc;
T> +		sctp_clog.x.sb.stcb_sbcc = stcb->asoc.sb_ccc;
T>  	else
T>  		sctp_clog.x.sb.stcb_sbcc = 0;
T>  	sctp_clog.x.sb.incr = incr;
T> @@ -4356,7 +4356,7 @@ sctp_add_to_readq(struct sctp_inpcb *inp,
T>  {
T>  	/*
T>  	 * Here we must place the control on the end of the socket read
T> -	 * queue AND increment sb_cc so that select will work properly on
T> +	 * queue AND increment sb_ccc so that select will work properly on
T>  	 * read.
T>  	 */
T>  	struct mbuf *m, *prev = NULL;
T> @@ -4482,7 +4482,7 @@ sctp_append_to_readq(struct sctp_inpcb *inp,
T>  	 * the reassembly queue.
T>  	 * 
T>  	 * If PDAPI this means we need to add m to the end of the data.
T> -	 * Increase the length in the control AND increment the sb_cc.
T> +	 * Increase the length in the control AND increment the sb_ccc.
T>  	 * Otherwise sb is NULL and all we need to do is put it at the end
T>  	 * of the mbuf chain.
T>  	 */
T> @@ -4694,10 +4694,10 @@ sctp_free_bufspace(struct sctp_tcb *stcb, struct s
T>  
T>  	if (stcb->sctp_socket && (((stcb->sctp_ep->sctp_flags & SCTP_PCB_FLAGS_IN_TCPPOOL)) ||
T>  	    ((stcb->sctp_ep->sctp_flags & SCTP_PCB_FLAGS_TCPTYPE)))) {
T> -		if (stcb->sctp_socket->so_snd.sb_cc >= tp1->book_size) {
T> -			stcb->sctp_socket->so_snd.sb_cc -= tp1->book_size;
T> +		if (stcb->sctp_socket->so_snd.sb_ccc >= tp1->book_size) {
T> +			stcb->sctp_socket->so_snd.sb_ccc -= tp1->book_size;
T>  		} else {
T> -			stcb->sctp_socket->so_snd.sb_cc = 0;
T> +			stcb->sctp_socket->so_snd.sb_ccc = 0;
T>  
T>  		}
T>  	}
T> @@ -5232,11 +5232,11 @@ sctp_sorecvmsg(struct socket *so,
T>  	in_eeor_mode = sctp_is_feature_on(inp, SCTP_PCB_FLAGS_EXPLICIT_EOR);
T>  	if (SCTP_BASE_SYSCTL(sctp_logging_level) & SCTP_RECV_RWND_LOGGING_ENABLE) {
T>  		sctp_misc_ints(SCTP_SORECV_ENTER,
T> -		    rwnd_req, in_eeor_mode, so->so_rcv.sb_cc, uio->uio_resid);
T> +		    rwnd_req, in_eeor_mode, so->so_rcv.sb_ccc, uio->uio_resid);
T>  	}
T>  	if (SCTP_BASE_SYSCTL(sctp_logging_level) & SCTP_RECV_RWND_LOGGING_ENABLE) {
T>  		sctp_misc_ints(SCTP_SORECV_ENTERPL,
T> -		    rwnd_req, block_allowed, so->so_rcv.sb_cc, uio->uio_resid);
T> +		    rwnd_req, block_allowed, so->so_rcv.sb_ccc, uio->uio_resid);
T>  	}
T>  	error = sblock(&so->so_rcv, (block_allowed ? SBL_WAIT : 0));
T>  	if (error) {
T> @@ -5255,7 +5255,7 @@ restart_nosblocks:
T>  	    (inp->sctp_flags & SCTP_PCB_FLAGS_SOCKET_ALLGONE)) {
T>  		goto out;
T>  	}
T> -	if ((so->so_rcv.sb_state & SBS_CANTRCVMORE) && (so->so_rcv.sb_cc == 0)) {
T> +	if ((so->so_rcv.sb_state & SBS_CANTRCVMORE) && (so->so_rcv.sb_ccc == 0)) {
T>  		if (so->so_error) {
T>  			error = so->so_error;
T>  			if ((in_flags & MSG_PEEK) == 0)
T> @@ -5262,7 +5262,7 @@ restart_nosblocks:
T>  				so->so_error = 0;
T>  			goto out;
T>  		} else {
T> -			if (so->so_rcv.sb_cc == 0) {
T> +			if (so->so_rcv.sb_ccc == 0) {
T>  				/* indicate EOF */
T>  				error = 0;
T>  				goto out;
T> @@ -5269,9 +5269,9 @@ restart_nosblocks:
T>  			}
T>  		}
T>  	}
T> -	if ((so->so_rcv.sb_cc <= held_length) && block_allowed) {
T> +	if ((so->so_rcv.sb_ccc <= held_length) && block_allowed) {
T>  		/* we need to wait for data */
T> -		if ((so->so_rcv.sb_cc == 0) &&
T> +		if ((so->so_rcv.sb_ccc == 0) &&
T>  		    ((inp->sctp_flags & SCTP_PCB_FLAGS_TCPTYPE) ||
T>  		    (inp->sctp_flags & SCTP_PCB_FLAGS_IN_TCPPOOL))) {
T>  			if ((inp->sctp_flags & SCTP_PCB_FLAGS_CONNECTED) == 0) {
T> @@ -5307,7 +5307,7 @@ restart_nosblocks:
T>  		}
T>  		held_length = 0;
T>  		goto restart_nosblocks;
T> -	} else if (so->so_rcv.sb_cc == 0) {
T> +	} else if (so->so_rcv.sb_ccc == 0) {
T>  		if (so->so_error) {
T>  			error = so->so_error;
T>  			if ((in_flags & MSG_PEEK) == 0)
T> @@ -5364,11 +5364,11 @@ restart_nosblocks:
T>  			SCTP_INP_READ_LOCK(inp);
T>  		}
T>  		control = TAILQ_FIRST(&inp->read_queue);
T> -		if ((control == NULL) && (so->so_rcv.sb_cc != 0)) {
T> +		if ((control == NULL) && (so->so_rcv.sb_ccc != 0)) {
T>  #ifdef INVARIANTS
T>  			panic("Huh, its non zero and nothing on control?");
T>  #endif
T> -			so->so_rcv.sb_cc = 0;
T> +			so->so_rcv.sb_ccc = 0;
T>  		}
T>  		SCTP_INP_READ_UNLOCK(inp);
T>  		hold_rlock = 0;
T> @@ -5489,11 +5489,11 @@ restart_nosblocks:
T>  		}
T>  		/*
T>  		 * if we reach here, not suitable replacement is available
T> -		 * <or> fragment interleave is NOT on. So stuff the sb_cc
T> +		 * <or> fragment interleave is NOT on. So stuff the sb_ccc
T>  		 * into the our held count, and its time to sleep again.
T>  		 */
T> -		held_length = so->so_rcv.sb_cc;
T> -		control->held_length = so->so_rcv.sb_cc;
T> +		held_length = so->so_rcv.sb_ccc;
T> +		control->held_length = so->so_rcv.sb_ccc;
T>  		goto restart;
T>  	}
T>  	/* Clear the held length since there is something to read */
T> @@ -5790,10 +5790,10 @@ get_more_data:
T>  					if (SCTP_BASE_SYSCTL(sctp_logging_level) & SCTP_SB_LOGGING_ENABLE) {
T>  						sctp_sblog(&so->so_rcv, control->do_not_ref_stcb ? NULL : stcb, SCTP_LOG_SBFREE, cp_len);
T>  					}
T> -					atomic_subtract_int(&so->so_rcv.sb_cc, cp_len);
T> +					atomic_subtract_int(&so->so_rcv.sb_ccc, cp_len);
T>  					if ((control->do_not_ref_stcb == 0) &&
T>  					    stcb) {
T> -						atomic_subtract_int(&stcb->asoc.sb_cc, cp_len);
T> +						atomic_subtract_int(&stcb->asoc.sb_ccc, cp_len);
T>  					}
T>  					copied_so_far += cp_len;
T>  					freed_so_far += cp_len;
T> @@ -5938,7 +5938,7 @@ wait_some_more:
T>  		    (sctp_is_feature_on(inp, SCTP_PCB_FLAGS_FRAG_INTERLEAVE))) {
T>  			goto release;
T>  		}
T> -		if (so->so_rcv.sb_cc <= control->held_length) {
T> +		if (so->so_rcv.sb_ccc <= control->held_length) {
T>  			error = sbwait(&so->so_rcv);
T>  			if (error) {
T>  				goto release;
T> @@ -5965,8 +5965,8 @@ wait_some_more:
T>  				}
T>  				goto done_with_control;
T>  			}
T> -			if (so->so_rcv.sb_cc > held_length) {
T> -				control->held_length = so->so_rcv.sb_cc;
T> +			if (so->so_rcv.sb_ccc > held_length) {
T> +				control->held_length = so->so_rcv.sb_ccc;
T>  				held_length = 0;
T>  			}
T>  			goto wait_some_more;
T> @@ -6113,13 +6113,13 @@ out:
T>  			    freed_so_far,
T>  			    ((uio) ? (slen - uio->uio_resid) : slen),
T>  			    stcb->asoc.my_rwnd,
T> -			    so->so_rcv.sb_cc);
T> +			    so->so_rcv.sb_ccc);
T>  		} else {
T>  			sctp_misc_ints(SCTP_SORECV_DONE,
T>  			    freed_so_far,
T>  			    ((uio) ? (slen - uio->uio_resid) : slen),
T>  			    0,
T> -			    so->so_rcv.sb_cc);
T> +			    so->so_rcv.sb_ccc);
T>  		}
T>  	}
T>  stage_left:
T> Index: sys/netinet/sctputil.h
T> ===================================================================
T> --- sys/netinet/sctputil.h	(.../head)	(revision 266804)
T> +++ sys/netinet/sctputil.h	(.../projects/sendfile)	(revision 266807)
T> @@ -284,10 +284,10 @@ do { \
T>  		} \
T>     	        if (stcb->sctp_socket && ((stcb->sctp_ep->sctp_flags & SCTP_PCB_FLAGS_TCPTYPE) || \
T>  	            (stcb->sctp_ep->sctp_flags & SCTP_PCB_FLAGS_IN_TCPPOOL))) { \
T> -			if (stcb->sctp_socket->so_snd.sb_cc >= tp1->book_size) { \
T> -				atomic_subtract_int(&((stcb)->sctp_socket->so_snd.sb_cc), tp1->book_size); \
T> +			if (stcb->sctp_socket->so_snd.sb_ccc >= tp1->book_size) { \
T> +				atomic_subtract_int(&((stcb)->sctp_socket->so_snd.sb_ccc), tp1->book_size); \
T>  			} else { \
T> -				stcb->sctp_socket->so_snd.sb_cc = 0; \
T> +				stcb->sctp_socket->so_snd.sb_ccc = 0; \
T>  			} \
T>  		} \
T>          } \
T> @@ -305,10 +305,10 @@ do { \
T>  		} \
T>     	        if (stcb->sctp_socket && ((stcb->sctp_ep->sctp_flags & SCTP_PCB_FLAGS_TCPTYPE) || \
T>  	            (stcb->sctp_ep->sctp_flags & SCTP_PCB_FLAGS_IN_TCPPOOL))) { \
T> -			if (stcb->sctp_socket->so_snd.sb_cc >= sp->length) { \
T> -				atomic_subtract_int(&stcb->sctp_socket->so_snd.sb_cc,sp->length); \
T> +			if (stcb->sctp_socket->so_snd.sb_ccc >= sp->length) { \
T> +				atomic_subtract_int(&stcb->sctp_socket->so_snd.sb_ccc,sp->length); \
T>  			} else { \
T> -				stcb->sctp_socket->so_snd.sb_cc = 0; \
T> +				stcb->sctp_socket->so_snd.sb_ccc = 0; \
T>  			} \
T>  		} \
T>          } \
T> @@ -320,7 +320,7 @@ do { \
T>  	if ((stcb->sctp_socket != NULL) && \
T>  	    ((stcb->sctp_ep->sctp_flags & SCTP_PCB_FLAGS_TCPTYPE) || \
T>  	     (stcb->sctp_ep->sctp_flags & SCTP_PCB_FLAGS_IN_TCPPOOL))) { \
T> -		atomic_add_int(&stcb->sctp_socket->so_snd.sb_cc,sz); \
T> +		atomic_add_int(&stcb->sctp_socket->so_snd.sb_ccc,sz); \
T>  	} \
T>  } while (0)
T>  
T> Index: usr.bin/bluetooth/btsockstat/btsockstat.c
T> ===================================================================
T> --- usr.bin/bluetooth/btsockstat/btsockstat.c	(.../head)	(revision 266804)
T> +++ usr.bin/bluetooth/btsockstat/btsockstat.c	(.../projects/sendfile)	(revision 266807)
T> @@ -255,8 +255,8 @@ hcirawpr(kvm_t *kvmd, u_long addr)
T>  			(unsigned long) pcb.so,
T>  			(unsigned long) this,
T>  			pcb.flags,
T> -			so.so_rcv.sb_cc,
T> -			so.so_snd.sb_cc,
T> +			so.so_rcv.sb_ccc,
T> +			so.so_snd.sb_ccc,
T>  			pcb.addr.hci_node);
T>  	}
T>  } /* hcirawpr */
T> @@ -303,8 +303,8 @@ l2caprawpr(kvm_t *kvmd, u_long addr)
T>  "%-8lx %-8lx %6d %6d %-17.17s\n",
T>  			(unsigned long) pcb.so,
T>  			(unsigned long) this,
T> -			so.so_rcv.sb_cc,
T> -			so.so_snd.sb_cc,
T> +			so.so_rcv.sb_ccc,
T> +			so.so_snd.sb_ccc,
T>  			bdaddrpr(&pcb.src, NULL, 0));
T>  	}
T>  } /* l2caprawpr */
T> @@ -361,8 +361,8 @@ l2cappr(kvm_t *kvmd, u_long addr)
T>  		fprintf(stdout,
T>  "%-8lx %6d %6d %-17.17s/%-5d %-17.17s %-5d %s\n",
T>  			(unsigned long) this,
T> -			so.so_rcv.sb_cc,
T> -			so.so_snd.sb_cc,
T> +			so.so_rcv.sb_ccc,
T> +			so.so_snd.sb_ccc,
T>  			bdaddrpr(&pcb.src, local, sizeof(local)),
T>  			pcb.psm,
T>  			bdaddrpr(&pcb.dst, remote, sizeof(remote)),
T> @@ -467,8 +467,8 @@ rfcommpr(kvm_t *kvmd, u_long addr)
T>  		fprintf(stdout,
T>  "%-8lx %6d %6d %-17.17s %-17.17s %-4d %-4d %s\n",
T>  			(unsigned long) this,
T> -			so.so_rcv.sb_cc,
T> -			so.so_snd.sb_cc,
T> +			so.so_rcv.sb_ccc,
T> +			so.so_snd.sb_ccc,
T>  			bdaddrpr(&pcb.src, local, sizeof(local)),
T>  			bdaddrpr(&pcb.dst, remote, sizeof(remote)),
T>  			pcb.channel,
T> Index: usr.bin/systat/netstat.c
T> ===================================================================
T> --- usr.bin/systat/netstat.c	(.../head)	(revision 266804)
T> +++ usr.bin/systat/netstat.c	(.../projects/sendfile)	(revision 266807)
T> @@ -333,8 +333,8 @@ enter_kvm(struct inpcb *inp, struct socket *so, in
T>  	struct netinfo *p;
T>  
T>  	if ((p = enter(inp, state, proto)) != NULL) {
T> -		p->ni_rcvcc = so->so_rcv.sb_cc;
T> -		p->ni_sndcc = so->so_snd.sb_cc;
T> +		p->ni_rcvcc = so->so_rcv.sb_ccc;
T> +		p->ni_sndcc = so->so_snd.sb_ccc;
T>  	}
T>  }
T>  
T> Index: usr.bin/netstat/netgraph.c
T> ===================================================================
T> --- usr.bin/netstat/netgraph.c	(.../head)	(revision 266804)
T> +++ usr.bin/netstat/netgraph.c	(.../projects/sendfile)	(revision 266807)
T> @@ -119,7 +119,7 @@ netgraphprotopr(u_long off, const char *name, int
T>  		if (Aflag)
T>  			printf("%8lx ", (u_long) this);
T>  		printf("%-5.5s %6u %6u ",
T> -		    name, sockb.so_rcv.sb_cc, sockb.so_snd.sb_cc);
T> +		    name, sockb.so_rcv.sb_ccc, sockb.so_snd.sb_ccc);
T>  
T>  		/* Get info on associated node */
T>  		if (ngpcb.node_id == 0 || csock == -1)
T> Index: usr.bin/netstat/unix.c
T> ===================================================================
T> --- usr.bin/netstat/unix.c	(.../head)	(revision 266804)
T> +++ usr.bin/netstat/unix.c	(.../projects/sendfile)	(revision 266807)
T> @@ -287,7 +287,8 @@ unixdomainpr(struct xunpcb *xunp, struct xsocket *
T>  	} else {
T>  		printf("%8lx %-6.6s %6u %6u %8lx %8lx %8lx %8lx",
T>  		    (long)so->so_pcb, socktype[so->so_type], so->so_rcv.sb_cc,
T> -		    so->so_snd.sb_cc, (long)unp->unp_vnode, (long)unp->unp_conn,
T> +		    so->so_snd.sb_cc, (long)unp->unp_vnode,
T> +		    (long)unp->unp_conn,
T>  		    (long)LIST_FIRST(&unp->unp_refs),
T>  		    (long)LIST_NEXT(unp, unp_reflink));
T>  	}
T> Index: usr.bin/netstat/inet.c
T> ===================================================================
T> --- usr.bin/netstat/inet.c	(.../head)	(revision 266804)
T> +++ usr.bin/netstat/inet.c	(.../projects/sendfile)	(revision 266807)
T> @@ -137,7 +137,7 @@ pcblist_sysctl(int proto, const char *name, char *
T>  static void
T>  sbtoxsockbuf(struct sockbuf *sb, struct xsockbuf *xsb)
T>  {
T> -	xsb->sb_cc = sb->sb_cc;
T> +	xsb->sb_cc = sb->sb_ccc;
T>  	xsb->sb_hiwat = sb->sb_hiwat;
T>  	xsb->sb_mbcnt = sb->sb_mbcnt;
T>  	xsb->sb_mcnt = sb->sb_mcnt;
T> @@ -479,7 +479,8 @@ protopr(u_long off, const char *name, int af1, int
T>  				printf("%6u %6u %6u ", tp->t_sndrexmitpack,
T>  				       tp->t_rcvoopack, tp->t_sndzerowin);
T>  		} else {
T> -			printf("%6u %6u ", so->so_rcv.sb_cc, so->so_snd.sb_cc);
T> +			printf("%6u %6u ",
T> +			    so->so_rcv.sb_cc, so->so_snd.sb_cc);
T>  		}
T>  		if (numeric_port) {
T>  			if (inp->inp_vflag & INP_IPV4) {

T> _______________________________________________
T> freebsd-arch@freebsd.org mailing list
T> http://lists.freebsd.org/mailman/listinfo/freebsd-arch
T> To unsubscribe, send any mail to "freebsd-arch-unsubscribe@freebsd.org"


-- 
Totus tuus, Glebius.

--hTiIB9CRvBOLTyqY
Content-Type: text/x-diff; charset=us-ascii
Content-Disposition: attachment; filename="sendfile.diff"

Index: sys/sys/vnode.h
===================================================================
--- sys/sys/vnode.h	(.../head)	(revision 270879)
+++ sys/sys/vnode.h	(.../projects/sendfile)	(revision 270881)
@@ -727,6 +727,7 @@ int	vop_stdbmap(struct vop_bmap_args *);
 int	vop_stdfsync(struct vop_fsync_args *);
 int	vop_stdgetwritemount(struct vop_getwritemount_args *);
 int	vop_stdgetpages(struct vop_getpages_args *);
+int	vop_stdgetpages_async(struct vop_getpages_async_args *);
 int	vop_stdinactive(struct vop_inactive_args *);
 int	vop_stdislocked(struct vop_islocked_args *);
 int	vop_stdkqfilter(struct vop_kqfilter_args *);
Index: sys/sys/socket.h
===================================================================
--- sys/sys/socket.h	(.../head)	(revision 270879)
+++ sys/sys/socket.h	(.../projects/sendfile)	(revision 270881)
@@ -602,12 +602,15 @@ struct sf_hdtr_all {
  * Sendfile-specific flag(s)
  */
 #define	SF_NODISKIO     0x00000001
-#define	SF_MNOWAIT	0x00000002
+#define	SF_MNOWAIT	0x00000002	/* unused since 11.0 */
 #define	SF_SYNC		0x00000004
 #define	SF_KQUEUE	0x00000008
+#define	SF_NOCACHE	0x00000010
+#define	SF_FLAGS(rh, flags)	(((rh) << 16) | (flags))
 
 #ifdef _KERNEL
 #define	SFK_COMPAT	0x00000001
+#define	SF_READAHEAD(flags)	((flags) >> 16)
 #endif /* _KERNEL */
 #endif /* __BSD_VISIBLE */
 
Index: sys/sys/sockbuf.h
===================================================================
--- sys/sys/sockbuf.h	(.../head)	(revision 270879)
+++ sys/sys/sockbuf.h	(.../projects/sendfile)	(revision 270881)
@@ -89,8 +89,13 @@ struct	sockbuf {
 	struct	mbuf *sb_lastrecord;	/* (c/d) first mbuf of last
 					 * record in socket buffer */
 	struct	mbuf *sb_sndptr; /* (c/d) pointer into mbuf chain */
+	struct	mbuf *sb_fnrdy;	/* (c/d) pointer to first not ready buffer */
+#if 0
+	struct	mbuf *sb_lnrdy;	/* (c/d) pointer to last not ready buffer */
+#endif
 	u_int	sb_sndptroff;	/* (c/d) byte offset of ptr into chain */
-	u_int	sb_cc;		/* (c/d) actual chars in buffer */
+	u_int	sb_acc;		/* (c/d) available chars in buffer */
+	u_int	sb_ccc;		/* (c/d) claimed chars in buffer */
 	u_int	sb_hiwat;	/* (c/d) max actual char count */
 	u_int	sb_mbcnt;	/* (c/d) chars of mbufs used */
 	u_int   sb_mcnt;        /* (c/d) number of mbufs in buffer */
@@ -120,10 +125,17 @@ struct	sockbuf {
 #define	SOCKBUF_LOCK_ASSERT(_sb)	mtx_assert(SOCKBUF_MTX(_sb), MA_OWNED)
 #define	SOCKBUF_UNLOCK_ASSERT(_sb)	mtx_assert(SOCKBUF_MTX(_sb), MA_NOTOWNED)
 
+/*
+ * Socket buffer private mbuf(9) flags.
+ */
+#define	M_NOTREADY	M_PROTO1	/* m_data not populated yet */
+#define	M_BLOCKED	M_PROTO2	/* M_NOTREADY in front of m */
+#define	M_NOTAVAIL	(M_NOTREADY | M_BLOCKED)
+
 void	sbappend(struct sockbuf *sb, struct mbuf *m);
 void	sbappend_locked(struct sockbuf *sb, struct mbuf *m);
-void	sbappendstream(struct sockbuf *sb, struct mbuf *m);
-void	sbappendstream_locked(struct sockbuf *sb, struct mbuf *m);
+void	sbappendstream(struct sockbuf *sb, struct mbuf *m, int flags);
+void	sbappendstream_locked(struct sockbuf *sb, struct mbuf *m, int flags);
 int	sbappendaddr(struct sockbuf *sb, const struct sockaddr *asa,
 	    struct mbuf *m0, struct mbuf *control);
 int	sbappendaddr_locked(struct sockbuf *sb, const struct sockaddr *asa,
@@ -136,7 +148,6 @@ int	sbappendcontrol_locked(struct sockbuf *sb, str
 	    struct mbuf *control);
 void	sbappendrecord(struct sockbuf *sb, struct mbuf *m0);
 void	sbappendrecord_locked(struct sockbuf *sb, struct mbuf *m0);
-void	sbcheck(struct sockbuf *sb);
 void	sbcompress(struct sockbuf *sb, struct mbuf *m, struct mbuf *n);
 struct mbuf *
 	sbcreatecontrol(caddr_t p, int size, int type, int level);
@@ -162,59 +173,54 @@ void	sbtoxsockbuf(struct sockbuf *sb, struct xsock
 int	sbwait(struct sockbuf *sb);
 int	sblock(struct sockbuf *sb, int flags);
 void	sbunlock(struct sockbuf *sb);
+void	sballoc(struct sockbuf *, struct mbuf *);
+void	sbfree(struct sockbuf *, struct mbuf *);
+void	sbmtrim(struct sockbuf *, struct mbuf *, int);
+int	sbready(struct sockbuf *, struct mbuf *, int);
 
+static inline u_int
+sbavail(struct sockbuf *sb)
+{
+
+#if 0
+	SOCKBUF_LOCK_ASSERT(sb);
+#endif
+	return (sb->sb_acc);
+}
+
+static inline u_int
+sbused(struct sockbuf *sb)
+{
+
+#if 0
+	SOCKBUF_LOCK_ASSERT(sb);
+#endif
+	return (sb->sb_ccc);
+}
+
 /*
  * How much space is there in a socket buffer (so->so_snd or so->so_rcv)?
  * This is problematical if the fields are unsigned, as the space might
- * still be negative (cc > hiwat or mbcnt > mbmax).  Should detect
- * overflow and return 0.  Should use "lmin" but it doesn't exist now.
+ * still be negative (ccc > hiwat or mbcnt > mbmax).
  */
-static __inline
-long
+static inline long
 sbspace(struct sockbuf *sb)
 {
-	long bleft;
-	long mleft;
+	long bleft, mleft;
 
+#if 0
+	SOCKBUF_LOCK_ASSERT(sb);
+#endif
+
 	if (sb->sb_flags & SB_STOP)
 		return(0);
-	bleft = sb->sb_hiwat - sb->sb_cc;
+
+	bleft = sb->sb_hiwat - sb->sb_ccc;
 	mleft = sb->sb_mbmax - sb->sb_mbcnt;
-	return((bleft < mleft) ? bleft : mleft);
-}
 
-/* adjust counters in sb reflecting allocation of m */
-#define	sballoc(sb, m) { \
-	(sb)->sb_cc += (m)->m_len; \
-	if ((m)->m_type != MT_DATA && (m)->m_type != MT_OOBDATA) \
-		(sb)->sb_ctl += (m)->m_len; \
-	(sb)->sb_mbcnt += MSIZE; \
-	(sb)->sb_mcnt += 1; \
-	if ((m)->m_flags & M_EXT) { \
-		(sb)->sb_mbcnt += (m)->m_ext.ext_size; \
-		(sb)->sb_ccnt += 1; \
-	} \
+	return ((bleft < mleft) ? bleft : mleft);
 }
 
-/* adjust counters in sb reflecting freeing of m */
-#define	sbfree(sb, m) { \
-	(sb)->sb_cc -= (m)->m_len; \
-	if ((m)->m_type != MT_DATA && (m)->m_type != MT_OOBDATA) \
-		(sb)->sb_ctl -= (m)->m_len; \
-	(sb)->sb_mbcnt -= MSIZE; \
-	(sb)->sb_mcnt -= 1; \
-	if ((m)->m_flags & M_EXT) { \
-		(sb)->sb_mbcnt -= (m)->m_ext.ext_size; \
-		(sb)->sb_ccnt -= 1; \
-	} \
-	if ((sb)->sb_sndptr == (m)) { \
-		(sb)->sb_sndptr = NULL; \
-		(sb)->sb_sndptroff = 0; \
-	} \
-	if ((sb)->sb_sndptroff != 0) \
-		(sb)->sb_sndptroff -= (m)->m_len; \
-}
-
 #define SB_EMPTY_FIXUP(sb) do {						\
 	if ((sb)->sb_mb == NULL) {					\
 		(sb)->sb_mbtail = NULL;					\
@@ -224,13 +230,15 @@ sbspace(struct sockbuf *sb)
 
 #ifdef SOCKBUF_DEBUG
 void	sblastrecordchk(struct sockbuf *, const char *, int);
+void	sblastmbufchk(struct sockbuf *, const char *, int);
+void	sbcheck(struct sockbuf *, const char *, int);
 #define	SBLASTRECORDCHK(sb)	sblastrecordchk((sb), __FILE__, __LINE__)
-
-void	sblastmbufchk(struct sockbuf *, const char *, int);
 #define	SBLASTMBUFCHK(sb)	sblastmbufchk((sb), __FILE__, __LINE__)
+#define	SBCHECK(sb)		sbcheck((sb), __FILE__, __LINE__)
 #else
-#define	SBLASTRECORDCHK(sb)      /* nothing */
-#define	SBLASTMBUFCHK(sb)        /* nothing */
+#define	SBLASTRECORDCHK(sb)	do {} while (0)
+#define	SBLASTMBUFCHK(sb)	do {} while (0)
+#define	SBCHECK(sb)		do {} while (0)
 #endif /* SOCKBUF_DEBUG */
 
 #endif /* _KERNEL */
Index: sys/sys/protosw.h
===================================================================
--- sys/sys/protosw.h	(.../head)	(revision 270879)
+++ sys/sys/protosw.h	(.../projects/sendfile)	(revision 270881)
@@ -208,6 +208,8 @@ struct pr_usrreqs {
 #define	PRUS_OOB	0x1
 #define	PRUS_EOF	0x2
 #define	PRUS_MORETOCOME	0x4
+#define	PRUS_NOTREADY	0x8
+	int	(*pru_ready)(struct socket *so, struct mbuf *m, int count);
 	int	(*pru_sense)(struct socket *so, struct stat *sb);
 	int	(*pru_shutdown)(struct socket *so);
 	int	(*pru_flush)(struct socket *so, int direction);
@@ -251,6 +253,7 @@ int	pru_rcvd_notsupp(struct socket *so, int flags)
 int	pru_rcvoob_notsupp(struct socket *so, struct mbuf *m, int flags);
 int	pru_send_notsupp(struct socket *so, int flags, struct mbuf *m,
 	    struct sockaddr *addr, struct mbuf *control, struct thread *td);
+int	pru_ready_notsupp(struct socket *so, struct mbuf *m, int count);
 int	pru_sense_null(struct socket *so, struct stat *sb);
 int	pru_shutdown_notsupp(struct socket *so);
 int	pru_sockaddr_notsupp(struct socket *so, struct sockaddr **nam);
Index: sys/sys/mbuf.h
===================================================================
--- sys/sys/mbuf.h	(.../head)	(revision 270879)
+++ sys/sys/mbuf.h	(.../projects/sendfile)	(revision 270881)
@@ -330,12 +330,13 @@ struct mbuf {
  * External mbuf storage buffer types.
  */
 #define	EXT_CLUSTER	1	/* mbuf cluster */
-#define	EXT_SFBUF	2	/* sendfile(2)'s sf_bufs */
+#define	EXT_SFBUF	2	/* sendfile(2)'s sf_buf */
 #define	EXT_JUMBOP	3	/* jumbo cluster 4096 bytes */
 #define	EXT_JUMBO9	4	/* jumbo cluster 9216 bytes */
 #define	EXT_JUMBO16	5	/* jumbo cluster 16184 bytes */
 #define	EXT_PACKET	6	/* mbuf+cluster from packet zone */
 #define	EXT_MBUF	7	/* external mbuf reference (M_IOVEC) */
+#define	EXT_SFBUF_NOCACHE 8	/* sendfile(2)'s sf_buf not to be cached */
 
 #define	EXT_VENDOR1	224	/* for vendor-internal use */
 #define	EXT_VENDOR2	225	/* for vendor-internal use */
@@ -384,6 +385,7 @@ struct mbuf {
  */
 void sf_ext_ref(void *, void *);
 void sf_ext_free(void *, void *);
+void sf_ext_free_nocache(void *, void *);
 
 /*
  * Flags indicating checksum, segmentation and other offload work to be
@@ -929,7 +931,7 @@ struct mbuf	*m_copypacket(struct mbuf *, int);
 void		 m_copy_pkthdr(struct mbuf *, struct mbuf *);
 struct mbuf	*m_copyup(struct mbuf *, int, int);
 struct mbuf	*m_defrag(struct mbuf *, int);
-void		 m_demote(struct mbuf *, int);
+void		 m_demote(struct mbuf *, int, int);
 struct mbuf	*m_devget(char *, int, int, struct ifnet *,
 		    void (*)(char *, caddr_t, u_int));
 struct mbuf	*m_dup(struct mbuf *, int);
Index: sys/sys/socketvar.h
===================================================================
--- sys/sys/socketvar.h	(.../head)	(revision 270879)
+++ sys/sys/socketvar.h	(.../projects/sendfile)	(revision 270881)
@@ -207,7 +207,7 @@ struct xsocket {
 
 /* can we read something from so? */
 #define	soreadabledata(so) \
-    ((so)->so_rcv.sb_cc >= (so)->so_rcv.sb_lowat || \
+    (sbavail(&(so)->so_rcv) >= (so)->so_rcv.sb_lowat || \
 	!TAILQ_EMPTY(&(so)->so_comp) || (so)->so_error)
 #define	soreadable(so) \
 	(soreadabledata(so) || ((so)->so_rcv.sb_state & SBS_CANTRCVMORE))
Index: sys/rpc/svc_vc.c
===================================================================
--- sys/rpc/svc_vc.c	(.../head)	(revision 270879)
+++ sys/rpc/svc_vc.c	(.../projects/sendfile)	(revision 270881)
@@ -546,7 +546,7 @@ svc_vc_ack(SVCXPRT *xprt, uint32_t *ack)
 {
 
 	*ack = atomic_load_acq_32(&xprt->xp_snt_cnt);
-	*ack -= xprt->xp_socket->so_snd.sb_cc;
+	*ack -= sbused(&xprt->xp_socket->so_snd);
 	return (TRUE);
 }
 
Index: sys/rpc/clnt_vc.c
===================================================================
--- sys/rpc/clnt_vc.c	(.../head)	(revision 270879)
+++ sys/rpc/clnt_vc.c	(.../projects/sendfile)	(revision 270881)
@@ -860,7 +860,7 @@ clnt_vc_soupcall(struct socket *so, void *arg, int
 			 * error condition
 			 */
 			do_read = FALSE;
-			if (so->so_rcv.sb_cc >= sizeof(uint32_t)
+			if (sbavail(&so->so_rcv) >= sizeof(uint32_t)
 			    || (so->so_rcv.sb_state & SBS_CANTRCVMORE)
 			    || so->so_error)
 				do_read = TRUE;
@@ -913,7 +913,7 @@ clnt_vc_soupcall(struct socket *so, void *arg, int
 			 * buffered.
 			 */
 			do_read = FALSE;
-			if (so->so_rcv.sb_cc >= ct->ct_record_resid
+			if (sbavail(&so->so_rcv) >= ct->ct_record_resid
 			    || (so->so_rcv.sb_state & SBS_CANTRCVMORE)
 			    || so->so_error)
 				do_read = TRUE;
Index: sys/ufs/ffs/ffs_vnops.c
===================================================================
--- sys/ufs/ffs/ffs_vnops.c	(.../head)	(revision 270879)
+++ sys/ufs/ffs/ffs_vnops.c	(.../projects/sendfile)	(revision 270881)
@@ -105,6 +105,7 @@ extern int	ffs_rawread(struct vnode *vp, struct ui
 static vop_fsync_t	ffs_fsync;
 static vop_lock1_t	ffs_lock;
 static vop_getpages_t	ffs_getpages;
+static vop_getpages_async_t ffs_getpages_async;
 static vop_read_t	ffs_read;
 static vop_write_t	ffs_write;
 static int	ffs_extread(struct vnode *vp, struct uio *uio, int ioflag);
@@ -125,6 +126,7 @@ struct vop_vector ffs_vnodeops1 = {
 	.vop_default =		&ufs_vnodeops,
 	.vop_fsync =		ffs_fsync,
 	.vop_getpages =		ffs_getpages,
+	.vop_getpages_async =	ffs_getpages_async,
 	.vop_lock1 =		ffs_lock,
 	.vop_read =		ffs_read,
 	.vop_reallocblks =	ffs_reallocblks,
@@ -847,18 +849,16 @@ ffs_write(ap)
 }
 
 /*
- * get page routine
+ * Get page routines.
  */
 static int
-ffs_getpages(ap)
-	struct vop_getpages_args *ap;
+ffs_getpages_checkvalid(vm_page_t *m, int count, int reqpage)
 {
-	int i;
 	vm_page_t mreq;
 	int pcount;
 
-	pcount = round_page(ap->a_count) / PAGE_SIZE;
-	mreq = ap->a_m[ap->a_reqpage];
+	pcount = round_page(count) / PAGE_SIZE;
+	mreq = m[reqpage];
 
 	/*
 	 * if ANY DEV_BSIZE blocks are valid on a large filesystem block,
@@ -870,24 +870,48 @@ static int
 	if (mreq->valid) {
 		if (mreq->valid != VM_PAGE_BITS_ALL)
 			vm_page_zero_invalid(mreq, TRUE);
-		for (i = 0; i < pcount; i++) {
-			if (i != ap->a_reqpage) {
-				vm_page_lock(ap->a_m[i]);
-				vm_page_free(ap->a_m[i]);
-				vm_page_unlock(ap->a_m[i]);
+		for (int i = 0; i < pcount; i++) {
+			if (i != reqpage) {
+				vm_page_lock(m[i]);
+				vm_page_free(m[i]);
+				vm_page_unlock(m[i]);
 			}
 		}
 		VM_OBJECT_WUNLOCK(mreq->object);
-		return VM_PAGER_OK;
+		return (VM_PAGER_OK);
 	}
 	VM_OBJECT_WUNLOCK(mreq->object);
 
-	return vnode_pager_generic_getpages(ap->a_vp, ap->a_m,
-					    ap->a_count,
-					    ap->a_reqpage);
+	return (-1);
 }
 
+static int
+ffs_getpages(struct vop_getpages_args *ap)
+{
+	int rv;
 
+	rv = ffs_getpages_checkvalid(ap->a_m, ap->a_count, ap->a_reqpage);
+	if (rv == VM_PAGER_OK)
+		return (rv);
+
+	return (vnode_pager_generic_getpages(ap->a_vp, ap->a_m, ap->a_count,
+	    ap->a_reqpage, NULL, NULL));
+}
+
+static int
+ffs_getpages_async(struct vop_getpages_async_args *ap)
+{
+	int rv;
+
+	rv = ffs_getpages_checkvalid(ap->a_m, ap->a_count, ap->a_reqpage);
+	if (rv == VM_PAGER_OK) {
+		(ap->a_vop_getpages_iodone)(ap->a_arg);
+		return (rv);
+	}
+	return (vnode_pager_generic_getpages(ap->a_vp, ap->a_m, ap->a_count,
+	    ap->a_reqpage, ap->a_vop_getpages_iodone, ap->a_arg));
+}
+
 /*
  * Extended attribute area reading.
  */
Index: sys/kern/uipc_domain.c
===================================================================
--- sys/kern/uipc_domain.c	(.../head)	(revision 270879)
+++ sys/kern/uipc_domain.c	(.../projects/sendfile)	(revision 270881)
@@ -152,6 +152,7 @@ protosw_init(struct protosw *pr)
 	DEFAULT(pu->pru_sosend, sosend_generic);
 	DEFAULT(pu->pru_soreceive, soreceive_generic);
 	DEFAULT(pu->pru_sopoll, sopoll_generic);
+	DEFAULT(pu->pru_ready, pru_ready_notsupp);
 #undef DEFAULT
 	if (pr->pr_init)
 		(*pr->pr_init)();
Index: sys/kern/vnode_if.src
===================================================================
--- sys/kern/vnode_if.src	(.../head)	(revision 270879)
+++ sys/kern/vnode_if.src	(.../projects/sendfile)	(revision 270881)
@@ -477,6 +477,19 @@ vop_getpages {
 };
 
 
+%% getpages_async	vp	L L L
+
+vop_getpages_async {
+	IN struct vnode *vp;
+	IN vm_page_t *m;
+	IN int count;
+	IN int reqpage;
+	IN vm_ooffset_t offset;
+	IN void (*vop_getpages_iodone)(void *);
+	IN void *arg;
+};
+
+
 %% putpages	vp	L L L
 
 vop_putpages {
Index: sys/kern/uipc_sockbuf.c
===================================================================
--- sys/kern/uipc_sockbuf.c	(.../head)	(revision 270879)
+++ sys/kern/uipc_sockbuf.c	(.../projects/sendfile)	(revision 270881)
@@ -68,7 +68,145 @@ static	u_long sb_efficiency = 8;	/* parameter for
 static struct mbuf	*sbcut_internal(struct sockbuf *sb, int len);
 static void	sbflush_internal(struct sockbuf *sb);
 
+static void
+sb_shift_nrdy(struct sockbuf *sb, struct mbuf *m)
+{
+
+#if 0	/* XXX: not yet: soclose() call path comes here w/o lock. */
+	SOCKBUF_LOCK_ASSERT(sb);
+#endif
+	KASSERT(m->m_flags & M_NOTREADY, ("%s: m %p !M_NOTREADY", __func__, m));
+
+	m = m->m_next;
+	while (m != NULL && !(m->m_flags & M_NOTREADY)) {
+		m->m_flags &= ~M_BLOCKED;
+		sb->sb_acc += m->m_len;
+		m = m->m_next;
+	}
+
+	sb->sb_fnrdy = m;
+}
+
+int
+sbready(struct sockbuf *sb, struct mbuf *m, int count)
+{
+	u_int blocker;
+
+	SOCKBUF_LOCK_ASSERT(sb);
+
+	KASSERT(sb->sb_fnrdy != NULL, ("%s: sb %p NULL fnrdy", __func__, sb));
+
+	blocker = (sb->sb_fnrdy == m) ? M_BLOCKED : 0;
+
+	for (int i = 0; i < count; i++, m = m->m_next) {
+		KASSERT(m->m_flags & M_NOTREADY,
+		    ("%s: m %p !M_NOTREADY", __func__, m));
+		m->m_flags &= ~(M_NOTREADY | blocker);
+		if (blocker)
+			sb->sb_acc += m->m_len;
+	}
+
+	if (!blocker)
+		return (EINPROGRESS);
+
+	/* This one was blocking all the queue. */
+	for (; m && (m->m_flags & M_NOTREADY) == 0; m = m->m_next) {
+		KASSERT(m->m_flags & M_BLOCKED,
+		    ("%s: m %p !M_BLOCKED", __func__, m));
+		m->m_flags &= ~M_BLOCKED;
+		sb->sb_acc += m->m_len;
+	}
+
+	sb->sb_fnrdy = m;
+
+	return (0);
+}
+
 /*
+ * Adjust sockbuf state reflecting allocation of m.
+ */
+void
+sballoc(struct sockbuf *sb, struct mbuf *m)
+{
+
+	SOCKBUF_LOCK_ASSERT(sb);
+
+	sb->sb_ccc += m->m_len;
+
+	if (sb->sb_fnrdy == NULL) {
+		if (m->m_flags & M_NOTREADY)
+			sb->sb_fnrdy = m;
+		else
+			sb->sb_acc += m->m_len;
+	} else
+		m->m_flags |= M_BLOCKED;
+
+	if (m->m_type != MT_DATA && m->m_type != MT_OOBDATA)
+		sb->sb_ctl += m->m_len;
+
+	sb->sb_mbcnt += MSIZE;
+	sb->sb_mcnt += 1;
+
+	if (m->m_flags & M_EXT) {
+		sb->sb_mbcnt += m->m_ext.ext_size;
+		sb->sb_ccnt += 1;
+	}
+}
+
+/*
+ * Adjust sockbuf state reflecting freeing of m.
+ */
+void
+sbfree(struct sockbuf *sb, struct mbuf *m)
+{
+
+#if 0	/* XXX: not yet: soclose() call path comes here w/o lock. */
+	SOCKBUF_LOCK_ASSERT(sb);
+#endif
+
+	sb->sb_ccc -= m->m_len;
+
+	if (!(m->m_flags & M_NOTAVAIL))
+		sb->sb_acc -= m->m_len;
+
+	if (sb->sb_fnrdy == m)
+		sb_shift_nrdy(sb, m);
+
+	if (m->m_type != MT_DATA && m->m_type != MT_OOBDATA)
+		sb->sb_ctl -= m->m_len;
+
+	sb->sb_mbcnt -= MSIZE;
+	sb->sb_mcnt -= 1;
+	if (m->m_flags & M_EXT) {
+		sb->sb_mbcnt -= m->m_ext.ext_size;
+		sb->sb_ccnt -= 1;
+	}
+
+	if (sb->sb_sndptr == m) {
+		sb->sb_sndptr = NULL;
+		sb->sb_sndptroff = 0;
+	}
+	if (sb->sb_sndptroff != 0)
+		sb->sb_sndptroff -= m->m_len;
+}
+
+/*
+ * Trim some amount of data from (first?) mbuf in buffer.
+ */
+void
+sbmtrim(struct sockbuf *sb, struct mbuf *m, int len)
+{
+
+	SOCKBUF_LOCK_ASSERT(sb);
+	KASSERT(len < m->m_len, ("%s: m %p len %d", __func__, m, len));
+
+	m->m_data += len;
+	m->m_len -= len;
+	sb->sb_acc -= len;
+	sb->sb_ccc -= len;
+}
+
+/*
  * Socantsendmore indicates that no more data will be sent on the socket; it
  * would normally be applied to a socket when the user informs the system
  * that no more data is to be sent, by the protocol code (in case
@@ -127,7 +265,7 @@ sbwait(struct sockbuf *sb)
 	SOCKBUF_LOCK_ASSERT(sb);
 
 	sb->sb_flags |= SB_WAIT;
-	return (msleep_sbt(&sb->sb_cc, &sb->sb_mtx,
+	return (msleep_sbt(&sb->sb_acc, &sb->sb_mtx,
 	    (sb->sb_flags & SB_NOINTR) ? PSOCK : PSOCK | PCATCH, "sbwait",
 	    sb->sb_timeo, 0, 0));
 }
@@ -184,7 +322,7 @@ sowakeup(struct socket *so, struct sockbuf *sb)
 		sb->sb_flags &= ~SB_SEL;
 	if (sb->sb_flags & SB_WAIT) {
 		sb->sb_flags &= ~SB_WAIT;
-		wakeup(&sb->sb_cc);
+		wakeup(&sb->sb_acc);
 	}
 	KNOTE_LOCKED(&sb->sb_sel.si_note, 0);
 	if (sb->sb_upcall != NULL) {
@@ -519,7 +657,7 @@ sbappend(struct sockbuf *sb, struct mbuf *m)
  * that is, a stream protocol (such as TCP).
  */
 void
-sbappendstream_locked(struct sockbuf *sb, struct mbuf *m)
+sbappendstream_locked(struct sockbuf *sb, struct mbuf *m, int flags)
 {
 	SOCKBUF_LOCK_ASSERT(sb);
 
@@ -529,8 +667,8 @@ void
 	SBLASTMBUFCHK(sb);
 
 	/* Remove all packet headers and mbuf tags to get a pure data chain. */
-	m_demote(m, 1);
-	
+	m_demote(m, 1, flags & PRUS_NOTREADY ? M_NOTREADY : 0);
+
 	sbcompress(sb, m, sb->sb_mbtail);
 
 	sb->sb_lastrecord = sb->sb_mb;
@@ -543,38 +681,59 @@ void
  * that is, a stream protocol (such as TCP).
  */
 void
-sbappendstream(struct sockbuf *sb, struct mbuf *m)
+sbappendstream(struct sockbuf *sb, struct mbuf *m, int flags)
 {
 
 	SOCKBUF_LOCK(sb);
-	sbappendstream_locked(sb, m);
+	sbappendstream_locked(sb, m, flags);
 	SOCKBUF_UNLOCK(sb);
 }
 
 #ifdef SOCKBUF_DEBUG
 void
-sbcheck(struct sockbuf *sb)
+sbcheck(struct sockbuf *sb, const char *file, int line)
 {
-	struct mbuf *m;
-	struct mbuf *n = 0;
-	u_long len = 0, mbcnt = 0;
+	struct mbuf *m, *n, *fnrdy;
+	u_long acc, ccc, mbcnt;
 
 	SOCKBUF_LOCK_ASSERT(sb);
 
+	acc = ccc = mbcnt = 0;
+	fnrdy = NULL;
+
 	for (m = sb->sb_mb; m; m = n) {
 	    n = m->m_nextpkt;
 	    for (; m; m = m->m_next) {
-		len += m->m_len;
+		if ((m->m_flags & M_NOTREADY) && fnrdy == NULL) {
+			if (m != sb->sb_fnrdy) {
+				printf("sb %p: fnrdy %p != m %p\n",
+				    sb, sb->sb_fnrdy, m);
+				goto fail;
+			}
+			fnrdy = m;
+		}
+		if (fnrdy) {
+			if (!(m->m_flags & M_NOTAVAIL)) {
+				printf("sb %p: fnrdy %p, m %p is avail\n",
+				    sb, sb->sb_fnrdy, m);
+				goto fail;
+			}
+		} else
+			acc += m->m_len;
+		ccc += m->m_len;
 		mbcnt += MSIZE;
 		if (m->m_flags & M_EXT) /*XXX*/ /* pretty sure this is bogus */
 			mbcnt += m->m_ext.ext_size;
 	    }
 	}
-	if (len != sb->sb_cc || mbcnt != sb->sb_mbcnt) {
-		printf("cc %ld != %u || mbcnt %ld != %u\n", len, sb->sb_cc,
-		    mbcnt, sb->sb_mbcnt);
-		panic("sbcheck");
+	if (acc != sb->sb_acc || ccc != sb->sb_ccc || mbcnt != sb->sb_mbcnt) {
+		printf("acc %ld/%u ccc %ld/%u mbcnt %ld/%u\n",
+		    acc, sb->sb_acc, ccc, sb->sb_ccc, mbcnt, sb->sb_mbcnt);
+		goto fail;
 	}
+	return;
+fail:
+	panic("%s from %s:%u", __func__, file, line);
 }
 #endif
 
@@ -800,6 +959,7 @@ sbcompress(struct sockbuf *sb, struct mbuf *m, str
 		if (n && (n->m_flags & M_EOR) == 0 &&
 		    M_WRITABLE(n) &&
 		    ((sb->sb_flags & SB_NOCOALESCE) == 0) &&
+		    !(m->m_flags & M_NOTREADY) &&
 		    m->m_len <= MCLBYTES / 4 && /* XXX: Don't copy too much */
 		    m->m_len <= M_TRAILINGSPACE(n) &&
 		    n->m_type == m->m_type) {
@@ -806,7 +966,9 @@ sbcompress(struct sockbuf *sb, struct mbuf *m, str
 			bcopy(mtod(m, caddr_t), mtod(n, caddr_t) + n->m_len,
 			    (unsigned)m->m_len);
 			n->m_len += m->m_len;
-			sb->sb_cc += m->m_len;
+			sb->sb_ccc += m->m_len;
+			if (sb->sb_fnrdy == NULL)
+				sb->sb_acc += m->m_len;
 			if (m->m_type != MT_DATA && m->m_type != MT_OOBDATA)
 				/* XXX: Probably don't need.*/
 				sb->sb_ctl += m->m_len;
@@ -843,13 +1005,13 @@ sbflush_internal(struct sockbuf *sb)
 		 * Don't call sbcut(sb, 0) if the leading mbuf is non-empty:
 		 * we would loop forever. Panic instead.
 		 */
-		if (!sb->sb_cc && (sb->sb_mb == NULL || sb->sb_mb->m_len))
+		if (sb->sb_ccc == 0 && (sb->sb_mb == NULL || sb->sb_mb->m_len))
 			break;
-		m_freem(sbcut_internal(sb, (int)sb->sb_cc));
+		m_freem(sbcut_internal(sb, (int)sb->sb_ccc));
 	}
-	if (sb->sb_cc || sb->sb_mb || sb->sb_mbcnt)
-		panic("sbflush_internal: cc %u || mb %p || mbcnt %u",
-		    sb->sb_cc, (void *)sb->sb_mb, sb->sb_mbcnt);
+	KASSERT(sb->sb_ccc == 0 && sb->sb_mb == 0 && sb->sb_mbcnt == 0,
+	    ("%s: ccc %u mb %p mbcnt %u", __func__,
+	    sb->sb_ccc, (void *)sb->sb_mb, sb->sb_mbcnt));
 }
 
 void
@@ -891,7 +1053,9 @@ sbcut_internal(struct sockbuf *sb, int len)
 		if (m->m_len > len) {
 			m->m_len -= len;
 			m->m_data += len;
-			sb->sb_cc -= len;
+			sb->sb_ccc -= len;
+			if (!(m->m_flags & M_NOTAVAIL))
+				sb->sb_acc -= len;
 			if (sb->sb_sndptroff != 0)
 				sb->sb_sndptroff -= len;
 			if (m->m_type != MT_DATA && m->m_type != MT_OOBDATA)
@@ -977,8 +1141,8 @@ sbsndptr(struct sockbuf *sb, u_int off, u_int len,
 	struct mbuf *m, *ret;
 
 	KASSERT(sb->sb_mb != NULL, ("%s: sb_mb is NULL", __func__));
-	KASSERT(off + len <= sb->sb_cc, ("%s: beyond sb", __func__));
-	KASSERT(sb->sb_sndptroff <= sb->sb_cc, ("%s: sndptroff broken", __func__));
+	KASSERT(off + len <= sb->sb_acc, ("%s: beyond sb", __func__));
+	KASSERT(sb->sb_sndptroff <= sb->sb_acc, ("%s: sndptroff broken", __func__));
 
 	/*
 	 * Is off below stored offset? Happens on retransmits.
@@ -1096,7 +1260,7 @@ void
 sbtoxsockbuf(struct sockbuf *sb, struct xsockbuf *xsb)
 {
 
-	xsb->sb_cc = sb->sb_cc;
+	xsb->sb_cc = sb->sb_ccc;
 	xsb->sb_hiwat = sb->sb_hiwat;
 	xsb->sb_mbcnt = sb->sb_mbcnt;
 	xsb->sb_mcnt = sb->sb_mcnt;	
Index: sys/kern/uipc_syscalls.c
===================================================================
--- sys/kern/uipc_syscalls.c	(.../head)	(revision 270879)
+++ sys/kern/uipc_syscalls.c	(.../projects/sendfile)	(revision 270881)
@@ -132,9 +132,10 @@ static int	filt_sfsync(struct knote *kn, long hint
  */
 static SYSCTL_NODE(_kern_ipc, OID_AUTO, sendfile, CTLFLAG_RW, 0,
     "sendfile(2) tunables");
-static int sfreadahead = 1;
+
+static int sfreadahead = 0;
 SYSCTL_INT(_kern_ipc_sendfile, OID_AUTO, readahead, CTLFLAG_RW,
-    &sfreadahead, 0, "Number of sendfile(2) read-ahead MAXBSIZE blocks");
+    &sfreadahead, 0, "Read this more pages than socket buffer can accept");
 
 #ifdef	SFSYNC_DEBUG
 static int sf_sync_debug = 0;
@@ -2035,6 +2036,37 @@ sf_ext_free(void *arg1, void *arg2)
 }
 
 /*
+ * Same as above, but forces the page to be detached from the object
+ * and go into free pool.
+ */
+void
+sf_ext_free_nocache(void *arg1, void *arg2)
+{
+	struct sf_buf *sf = arg1;
+	struct sendfile_sync *sfs = arg2;
+	vm_page_t pg = sf_buf_page(sf);
+
+	sf_buf_free(sf);
+
+	vm_page_lock(pg);
+	vm_page_unwire(pg, 0);
+	if (pg->wire_count == 0) {
+		vm_object_t obj;
+
+		if ((obj = pg->object) == NULL)
+			vm_page_free(pg);
+		else if (!vm_page_xbusied(pg) && VM_OBJECT_TRYWLOCK(obj)) {
+			vm_page_free(pg);
+			VM_OBJECT_WUNLOCK(obj);
+		}
+	}
+	vm_page_unlock(pg);
+
+	if (sfs != NULL)
+		sf_sync_deref(sfs);
+}
+
+/*
  * Called to remove a reference to a sf_sync object.
  *
  * This is generally done during the mbuf free path to signify
@@ -2627,106 +2659,168 @@ freebsd4_sendfile(struct thread *td, struct freebs
 }
 #endif /* COMPAT_FREEBSD4 */
 
+ /*
+  * How much data to put into page i of n.
+  * Only first and last pages are special.
+  */
+static inline off_t
+xfsize(int i, int n, off_t off, off_t len)
+{
+
+	if (i == 0)
+		return (omin(PAGE_SIZE - (off & PAGE_MASK), len));
+
+	if (i == n - 1 && ((off + len) & PAGE_MASK) > 0)
+		return ((off + len) & PAGE_MASK);
+
+	return (PAGE_SIZE);
+}
+
+/*
+ * Offset within object for i page.
+ */
+static inline vm_offset_t
+vmoff(int i, off_t off)
+{
+
+	if (i == 0)
+		return ((vm_offset_t)off);
+
+	return (trunc_page(off + i * PAGE_SIZE));
+}
+
+/*
+ * Pretend as if we don't have enough space, subtract xfsize() of
+ * all pages that failed.
+ */
+static inline void
+fixspace(int old, int new, off_t off, int *space)
+{
+
+	KASSERT(old > new, ("%s: old %d new %d", __func__, old, new));
+
+	/* Subtract last one. */
+	*space -= xfsize(old - 1, old, off, *space);
+	old--;
+
+	if (new == old)
+		/* There was only one page. */
+		return;
+
+	/* Subtract first one. */
+	if (new == 0) {
+		*space -= xfsize(0, old, off, *space);
+		new++;
+	}
+
+	/* Rest of pages are full sized. */
+	*space -= (old - new) * PAGE_SIZE;
+
+	KASSERT(*space >= 0, ("%s: space went backwards", __func__));
+}
+
+struct sf_io {
+	u_int		nios;
+	int		npages;
+	struct file	*sock_fp;
+	struct mbuf	*m;
+	vm_page_t	pa[];
+};
+
+static void
+sf_io_done(void *arg)
+{
+	struct sf_io *sfio = arg;
+	struct socket *so;
+
+	if (!refcount_release(&sfio->nios))
+		return;
+
+	so = sfio->sock_fp->f_data;
+
+	(void)(so->so_proto->pr_usrreqs->pru_ready)(so, sfio->m, sfio->npages);
+
+	/* XXXGL: curthread */
+	fdrop(sfio->sock_fp, curthread);
+	free(sfio, M_TEMP);
+}
+
 static int
-sendfile_readpage(vm_object_t obj, struct vnode *vp, int nd,
-    off_t off, int xfsize, int bsize, struct thread *td, vm_page_t *res)
+sendfile_swapin(vm_object_t obj, struct sf_io *sfio, off_t off, off_t len,
+    int npages, int rhpages)
 {
-	vm_page_t m;
-	vm_pindex_t pindex;
-	ssize_t resid;
-	int error, readahead, rv;
+	vm_page_t *pa = sfio->pa;
+	int nios;
 
-	pindex = OFF_TO_IDX(off);
+	nios = 0;
 	VM_OBJECT_WLOCK(obj);
-	m = vm_page_grab(obj, pindex, (vp != NULL ? VM_ALLOC_NOBUSY |
-	    VM_ALLOC_IGN_SBUSY : 0) | VM_ALLOC_WIRED | VM_ALLOC_NORMAL);
+	for (int i = 0; i < npages; i++)
+		pa[i] = vm_page_grab(obj, OFF_TO_IDX(vmoff(i, off)),
+		    VM_ALLOC_WIRED | VM_ALLOC_NORMAL);
 
-	/*
-	 * Check if page is valid for what we need, otherwise initiate I/O.
-	 *
-	 * The non-zero nd argument prevents disk I/O, instead we
-	 * return the caller what he specified in nd.  In particular,
-	 * if we already turned some pages into mbufs, nd == EAGAIN
-	 * and the main function send them the pages before we come
-	 * here again and block.
-	 */
-	if (m->valid != 0 && vm_page_is_valid(m, off & PAGE_MASK, xfsize)) {
-		if (vp == NULL)
-			vm_page_xunbusy(m);
-		VM_OBJECT_WUNLOCK(obj);
-		*res = m;
-		return (0);
-	} else if (nd != 0) {
-		if (vp == NULL)
-			vm_page_xunbusy(m);
-		error = nd;
-		goto free_page;
-	}
+	for (int i = 0; i < npages;) {
+		int j, a, count, rv;
 
-	/*
-	 * Get the page from backing store.
-	 */
-	error = 0;
-	if (vp != NULL) {
-		VM_OBJECT_WUNLOCK(obj);
-		readahead = sfreadahead * MAXBSIZE;
+		if (vm_page_is_valid(pa[i], vmoff(i, off) & PAGE_MASK,
+		    xfsize(i, npages, off, len))) {
+			vm_page_xunbusy(pa[i]);
+			i++;
+			continue;
+		}
 
-		/*
-		 * Use vn_rdwr() instead of the pager interface for
-		 * the vnode, to allow the read-ahead.
-		 *
-		 * XXXMAC: Because we don't have fp->f_cred here, we
-		 * pass in NOCRED.  This is probably wrong, but is
-		 * consistent with our original implementation.
-		 */
-		error = vn_rdwr(UIO_READ, vp, NULL, readahead, trunc_page(off),
-		    UIO_NOCOPY, IO_NODELOCKED | IO_VMIO | ((readahead /
-		    bsize) << IO_SEQSHIFT), td->td_ucred, NOCRED, &resid, td);
-		SFSTAT_INC(sf_iocnt);
-		VM_OBJECT_WLOCK(obj);
-	} else {
-		if (vm_pager_has_page(obj, pindex, NULL, NULL)) {
-			rv = vm_pager_get_pages(obj, &m, 1, 0);
-			SFSTAT_INC(sf_iocnt);
-			m = vm_page_lookup(obj, pindex);
-			if (m == NULL)
-				error = EIO;
-			else if (rv != VM_PAGER_OK) {
-				vm_page_lock(m);
-				vm_page_free(m);
-				vm_page_unlock(m);
-				m = NULL;
-				error = EIO;
+		for (j = i + 1; j < npages; j++)
+			if (vm_page_is_valid(pa[j], vmoff(j, off) & PAGE_MASK,
+			    xfsize(j, npages, off, len)))
+				break;
+
+		while (!vm_pager_has_page(obj, OFF_TO_IDX(vmoff(i, off)),
+		    NULL, &a) && i < j) {
+			pmap_zero_page(pa[i]);
+			pa[i]->valid = VM_PAGE_BITS_ALL;
+			pa[i]->dirty = 0;
+			vm_page_xunbusy(pa[i]);
+			i++;
+		}
+		if (i == j)
+			continue;
+
+		count = min(a + 1, npages + rhpages - i);
+		for (j = npages; j < i + count; j++) {
+			pa[j] = vm_page_grab(obj, OFF_TO_IDX(vmoff(j, off)),
+			    VM_ALLOC_NORMAL | VM_ALLOC_NOWAIT);
+			if (pa[j] == NULL) {
+				count = j - i;
+				break;
 			}
-		} else {
-			pmap_zero_page(m);
-			m->valid = VM_PAGE_BITS_ALL;
-			m->dirty = 0;
+			if (pa[j]->valid) {
+				vm_page_xunbusy(pa[j]);
+				count = j - i;
+				break;
+			}
 		}
-		if (m != NULL)
-			vm_page_xunbusy(m);
+
+		refcount_acquire(&sfio->nios);
+		rv = vm_pager_get_pages_async(obj, pa + i, count, 0,
+		    &sf_io_done, sfio);
+
+		KASSERT(rv == VM_PAGER_OK, ("%s: pager fail obj %p page %p",
+		    __func__, obj, pa[i]));
+
+		SFSTAT_INC(sf_iocnt);
+		nios++;
+
+		for (j = i; j < i + count && j < npages; j++)
+			KASSERT(pa[j] == vm_page_lookup(obj,
+			    OFF_TO_IDX(vmoff(j, off))),
+			    ("pa[j] %p lookup %p\n", pa[j],
+			    vm_page_lookup(obj, OFF_TO_IDX(vmoff(j, off)))));
+
+		i += count;
 	}
-	if (error == 0) {
-		*res = m;
-	} else if (m != NULL) {
-free_page:
-		vm_page_lock(m);
-		vm_page_unwire(m, PQ_INACTIVE);
 
-		/*
-		 * See if anyone else might know about this page.  If
-		 * not and it is not valid, then free it.
-		 */
-		if (m->wire_count == 0 && m->valid == 0 && !vm_page_busied(m))
-			vm_page_free(m);
-		vm_page_unlock(m);
-	}
-	KASSERT(error != 0 || (m->wire_count > 0 &&
-	    vm_page_is_valid(m, off & PAGE_MASK, xfsize)),
-	    ("wrong page state m %p off %#jx xfsize %d", m, (uintmax_t)off,
-	    xfsize));
 	VM_OBJECT_WUNLOCK(obj);
-	return (error);
+
+	return (nios);
 }
 
 static int
@@ -2833,41 +2927,26 @@ vn_sendfile(struct file *fp, int sockfd, struct ui
 	struct vnode *vp;
 	struct vm_object *obj;
 	struct socket *so;
-	struct mbuf *m;
+	struct mbuf *m, *mh, *mhtail;
 	struct sf_buf *sf;
-	struct vm_page *pg;
 	struct shmfd *shmfd;
 	struct vattr va;
-	off_t off, xfsize, fsbytes, sbytes, rem, obj_size;
-	int error, bsize, nd, hdrlen, mnw;
+	off_t off, sbytes, rem, obj_size;
+	int error, serror, bsize, hdrlen;
 
-	pg = NULL;
 	obj = NULL;
 	so = NULL;
-	m = NULL;
-	fsbytes = sbytes = 0;
-	hdrlen = mnw = 0;
-	rem = nbytes;
-	obj_size = 0;
+	m = mh = NULL;
+	sbytes = 0;
 
 	error = sendfile_getobj(td, fp, &obj, &vp, &shmfd, &obj_size, &bsize);
 	if (error != 0)
 		return (error);
-	if (rem == 0)
-		rem = obj_size;
 
 	error = kern_sendfile_getsock(td, sockfd, &sock_fp, &so);
 	if (error != 0)
 		goto out;
 
-	/*
-	 * Do not wait on memory allocations but return ENOMEM for
-	 * caller to retry later.
-	 * XXX: Experimental.
-	 */
-	if (flags & SF_MNOWAIT)
-		mnw = 1;
-
 #ifdef MAC
 	error = mac_socket_check_send(td->td_ucred, so);
 	if (error != 0)
@@ -2875,31 +2954,27 @@ vn_sendfile(struct file *fp, int sockfd, struct ui
 #endif
 
 	/* If headers are specified copy them into mbufs. */
-	if (hdr_uio != NULL) {
+	if (hdr_uio != NULL && hdr_uio->uio_resid > 0) {
 		hdr_uio->uio_td = td;
 		hdr_uio->uio_rw = UIO_WRITE;
-		if (hdr_uio->uio_resid > 0) {
-			/*
-			 * In FBSD < 5.0 the nbytes to send also included
-			 * the header.  If compat is specified subtract the
-			 * header size from nbytes.
-			 */
-			if (kflags & SFK_COMPAT) {
-				if (nbytes > hdr_uio->uio_resid)
-					nbytes -= hdr_uio->uio_resid;
-				else
-					nbytes = 0;
-			}
-			m = m_uiotombuf(hdr_uio, (mnw ? M_NOWAIT : M_WAITOK),
-			    0, 0, 0);
-			if (m == NULL) {
-				error = mnw ? EAGAIN : ENOBUFS;
-				goto out;
-			}
-			hdrlen = m_length(m, NULL);
+		/*
+		 * In FBSD < 5.0 the nbytes to send also included
+		 * the header.  If compat is specified subtract the
+		 * header size from nbytes.
+		 */
+		if (kflags & SFK_COMPAT) {
+			if (nbytes > hdr_uio->uio_resid)
+				nbytes -= hdr_uio->uio_resid;
+			else
+				nbytes = 0;
 		}
-	}
+		mh = m_uiotombuf(hdr_uio, M_WAITOK, 0, 0, 0);
+		hdrlen = m_length(mh, &mhtail);
+	} else
+		hdrlen = 0;
 
+	rem = nbytes ? omin(nbytes, obj_size - offset) : obj_size - offset;
+
 	/*
 	 * Protect against multiple writers to the socket.
 	 *
@@ -2919,21 +2994,13 @@ vn_sendfile(struct file *fp, int sockfd, struct ui
 	 * The outer loop checks the state and available space of the socket
 	 * and takes care of the overall progress.
 	 */
-	for (off = offset; ; ) {
+	for (off = offset; rem > 0; ) {
+		struct sf_io *sfio;
+		vm_page_t *pa;
 		struct mbuf *mtail;
-		int loopbytes;
-		int space;
-		int done;
+		int nios, space, npages, rhpages;
 
-		if ((nbytes != 0 && nbytes == fsbytes) ||
-		    (nbytes == 0 && obj_size == fsbytes))
-			break;
-
 		mtail = NULL;
-		loopbytes = 0;
-		space = 0;
-		done = 0;
-
 		/*
 		 * Check the socket state for ongoing connection,
 		 * no errors and space in socket buffer.
@@ -3009,53 +3076,44 @@ retry_space:
 				VOP_UNLOCK(vp, 0);
 				goto done;
 			}
-			obj_size = va.va_size;
+			if (va.va_size != obj_size) {
+				if (nbytes == 0)
+					rem += va.va_size - obj_size;
+				else if (offset + nbytes > va.va_size)
+					rem -= (offset + nbytes - va.va_size);
+				obj_size = va.va_size;
+			}
 		}
 
+		if (space > rem)
+			space = rem;
+
+		if (off & PAGE_MASK)
+			npages = 1 + howmany(space -
+			    (PAGE_SIZE - (off & PAGE_MASK)), PAGE_SIZE);
+		else
+			npages = howmany(space, PAGE_SIZE);
+
+		rhpages = SF_READAHEAD(flags) ?
+		    SF_READAHEAD(flags) : sfreadahead;
+		rhpages = min(howmany(obj_size - (off & ~PAGE_MASK) -
+		    (npages * PAGE_SIZE), PAGE_SIZE), rhpages);
+
+		sfio = malloc(sizeof(struct sf_io) +
+		    (rhpages + npages) * sizeof(vm_page_t), M_TEMP, M_WAITOK);
+		refcount_init(&sfio->nios, 1);
+
+		nios = sendfile_swapin(obj, sfio, off, space, npages, rhpages);
+
 		/*
 		 * Loop and construct maximum sized mbuf chain to be bulk
 		 * dumped into socket buffer.
 		 */
-		while (space > loopbytes) {
-			vm_offset_t pgoff;
+		pa = sfio->pa;
+		for (int i = 0; i < npages; i++) {
 			struct mbuf *m0;
 
 			/*
-			 * Calculate the amount to transfer.
-			 * Not to exceed a page, the EOF,
-			 * or the passed in nbytes.
-			 */
-			pgoff = (vm_offset_t)(off & PAGE_MASK);
-			rem = obj_size - offset;
-			if (nbytes != 0)
-				rem = omin(rem, nbytes);
-			rem -= fsbytes + loopbytes;
-			xfsize = omin(PAGE_SIZE - pgoff, rem);
-			xfsize = omin(space - loopbytes, xfsize);
-			if (xfsize <= 0) {
-				done = 1;		/* all data sent */
-				break;
-			}
-
-			/*
-			 * Attempt to look up the page.  Allocate
-			 * if not found or wait and loop if busy.
-			 */
-			if (m != NULL)
-				nd = EAGAIN; /* send what we already got */
-			else if ((flags & SF_NODISKIO) != 0)
-				nd = EBUSY;
-			else
-				nd = 0;
-			error = sendfile_readpage(obj, vp, nd, off,
-			    xfsize, bsize, td, &pg);
-			if (error != 0) {
-				if (error == EAGAIN)
-					error = 0;	/* not a real error */
-				break;
-			}
-
-			/*
 			 * Get a sendfile buf.  When allocating the
 			 * first buffer for mbuf chain, we usually
 			 * wait as long as necessary, but this wait
@@ -3064,56 +3122,60 @@ retry_space:
 			 * threads might exhaust the buffers and then
 			 * deadlock.
 			 */
-			sf = sf_buf_alloc(pg, (mnw || m != NULL) ? SFB_NOWAIT :
-			    SFB_CATCH);
+			sf = sf_buf_alloc(pa[i],
+			    m != NULL ? SFB_NOWAIT : SFB_CATCH);
 			if (sf == NULL) {
 				SFSTAT_INC(sf_allocfail);
-				vm_page_lock(pg);
-				vm_page_unwire(pg, PQ_INACTIVE);
-				KASSERT(pg->object != NULL,
-				    ("%s: object disappeared", __func__));
-				vm_page_unlock(pg);
+				for (int j = i; j < npages; j++) {
+					vm_page_lock(pa[j]);
+					vm_page_unwire(pa[j], PQ_INACTIVE);
+					vm_page_unlock(pa[j]);
+				}
 				if (m == NULL)
-					error = (mnw ? EAGAIN : EINTR);
+					error = ENOBUFS;
+				fixspace(npages, i, off, &space);
 				break;
 			}
 
 			/*
-			 * Get an mbuf and set it up as having
-			 * external storage.
+			 * Get an mbuf and set it up.
+			 *
+			 * SF_NOCACHE sets the page as being freed upon send.
+			 * However, we ignore it for the last page in 'space',
+			 * if the page is truncated, and we got more data to
+			 * send (rem > space), or if we have readahead
+			 * configured (rhpages > 0).
 			 */
-			m0 = m_get((mnw ? M_NOWAIT : M_WAITOK), MT_DATA);
-			if (m0 == NULL) {
-				error = (mnw ? EAGAIN : ENOBUFS);
-				sf_ext_free(sf, NULL);
-				break;
-			}
-			/*
-			 * Attach EXT_SFBUF external storage.
-			 */
-			m0->m_ext.ext_buf = (caddr_t )sf_buf_kva(sf);
+			m0 = m_get(M_WAITOK, MT_DATA);
+			m0->m_ext.ext_buf = (char *)sf_buf_kva(sf);
 			m0->m_ext.ext_size = PAGE_SIZE;
 			m0->m_ext.ext_arg1 = sf;
 			m0->m_ext.ext_arg2 = sfs;
-			m0->m_ext.ext_type = EXT_SFBUF;
+			if ((flags & SF_NOCACHE) == 0 ||
+			    (i == npages - 1 &&
+			    ((off + space) & PAGE_MASK) &&
+			    (rem > space || rhpages > 0)))
+				m0->m_ext.ext_type = EXT_SFBUF;
+			else
+				m0->m_ext.ext_type = EXT_SFBUF_NOCACHE;
 			m0->m_ext.ext_flags = 0;
-			m0->m_flags |= (M_EXT|M_RDONLY);
-			m0->m_data = (char *)sf_buf_kva(sf) + pgoff;
-			m0->m_len = xfsize;
+			m0->m_flags |= (M_EXT | M_RDONLY);
+			if (nios)
+				m0->m_flags |= M_NOTREADY;
+			m0->m_data = (char *)sf_buf_kva(sf) +
+			    (vmoff(i, off) & PAGE_MASK);
+			m0->m_len = xfsize(i, npages, off, space);
 
+			if (i == 0)
+				sfio->m = m0;
+
 			/* Append to mbuf chain. */
 			if (mtail != NULL)
 				mtail->m_next = m0;
-			else if (m != NULL)
-				m_last(m)->m_next = m0;
 			else
 				m = m0;
 			mtail = m0;
 
-			/* Keep track of bits processed. */
-			loopbytes += xfsize;
-			off += xfsize;
-
 			/*
 			 * XXX eventually this should be a sfsync
 			 * method call!
@@ -3125,47 +3187,51 @@ retry_space:
 		if (vp != NULL)
 			VOP_UNLOCK(vp, 0);
 
+		/* Keep track of bytes processed. */
+		off += space;
+		rem -= space;
+
+		/* Prepend header, if any. */
+		if (hdrlen) {
+			mhtail->m_next = m;
+			m = mh;
+			mh = NULL;
+		}
+
+		if (error) {
+			free(sfio, M_TEMP);
+			goto done;
+		}
+
 		/* Add the buffer chain to the socket buffer. */
-		if (m != NULL) {
-			int mlen, err;
+		KASSERT(m_length(m, NULL) == space + hdrlen,
+		    ("%s: mlen %u space %d hdrlen %d",
+		    __func__, m_length(m, NULL), space, hdrlen));
 
-			mlen = m_length(m, NULL);
-			SOCKBUF_LOCK(&so->so_snd);
-			if (so->so_snd.sb_state & SBS_CANTSENDMORE) {
-				error = EPIPE;
-				SOCKBUF_UNLOCK(&so->so_snd);
-				goto done;
-			}
-			SOCKBUF_UNLOCK(&so->so_snd);
-			CURVNET_SET(so->so_vnet);
-			/* Avoid error aliasing. */
-			err = (*so->so_proto->pr_usrreqs->pru_send)
-				    (so, 0, m, NULL, NULL, td);
-			CURVNET_RESTORE();
-			if (err == 0) {
-				/*
-				 * We need two counters to get the
-				 * file offset and nbytes to send
-				 * right:
-				 * - sbytes contains the total amount
-				 *   of bytes sent, including headers.
-				 * - fsbytes contains the total amount
-				 *   of bytes sent from the file.
-				 */
-				sbytes += mlen;
-				fsbytes += mlen;
-				if (hdrlen) {
-					fsbytes -= hdrlen;
-					hdrlen = 0;
-				}
-			} else if (error == 0)
-				error = err;
-			m = NULL;	/* pru_send always consumes */
+		CURVNET_SET(so->so_vnet);
+		if (nios == 0) {
+			free(sfio, M_TEMP);
+			serror = (*so->so_proto->pr_usrreqs->pru_send)
+			    (so, 0, m, NULL, NULL, td);
+		} else {
+			sfio->sock_fp = sock_fp;
+			sfio->npages = npages;
+			fhold(sock_fp);
+			serror = (*so->so_proto->pr_usrreqs->pru_send)
+			    (so, PRUS_NOTREADY, m, NULL, NULL, td);
+			sf_io_done(sfio);
 		}
+		CURVNET_RESTORE();
 
-		/* Quit outer loop on error or when we're done. */
-		if (done)
-			break;
+		if (serror == 0) {
+			sbytes += space + hdrlen;
+			if (hdrlen)
+				hdrlen = 0;
+		} else if (error == 0)
+			error = serror;
+		m = NULL;	/* pru_send always consumes */
+
+		/* Quit outer loop on error. */
 		if (error != 0)
 			goto done;
 	}
@@ -3200,6 +3266,8 @@ out:
 		fdrop(sock_fp, td);
 	if (m)
 		m_freem(m);
+	if (mh)
+		m_freem(mh);
 
 	if (error == ERESTART)
 		error = EINTR;
Index: sys/kern/uipc_debug.c
===================================================================
--- sys/kern/uipc_debug.c	(.../head)	(revision 270879)
+++ sys/kern/uipc_debug.c	(.../projects/sendfile)	(revision 270881)
@@ -403,7 +403,8 @@ db_print_sockbuf(struct sockbuf *sb, const char *s
 	db_printf("sb_sndptroff: %u\n", sb->sb_sndptroff);
 
 	db_print_indent(indent);
-	db_printf("sb_cc: %u   ", sb->sb_cc);
+	db_printf("sb_acc: %u   ", sb->sb_acc);
+	db_printf("sb_ccc: %u   ", sb->sb_ccc);
 	db_printf("sb_hiwat: %u   ", sb->sb_hiwat);
 	db_printf("sb_mbcnt: %u   ", sb->sb_mbcnt);
 	db_printf("sb_mbmax: %u\n", sb->sb_mbmax);
Index: sys/kern/uipc_mbuf.c
===================================================================
--- sys/kern/uipc_mbuf.c	(.../head)	(revision 270879)
+++ sys/kern/uipc_mbuf.c	(.../projects/sendfile)	(revision 270881)
@@ -300,6 +300,9 @@ mb_free_ext(struct mbuf *m)
 	case EXT_SFBUF:
 		sf_ext_free(m->m_ext.ext_arg1, m->m_ext.ext_arg2);
 		break;
+	case EXT_SFBUF_NOCACHE:
+		sf_ext_free_nocache(m->m_ext.ext_arg1, m->m_ext.ext_arg2);
+		break;
 	default:
 		KASSERT(m->m_ext.ext_cnt != NULL,
 		    ("%s: no refcounting pointer on %p", __func__, m));
@@ -366,6 +369,7 @@ mb_dupcl(struct mbuf *n, struct mbuf *m)
 
 	switch (m->m_ext.ext_type) {
 	case EXT_SFBUF:
+	case EXT_SFBUF_NOCACHE:
 		sf_ext_ref(m->m_ext.ext_arg1, m->m_ext.ext_arg2);
 		break;
 	default:
@@ -388,7 +392,7 @@ mb_dupcl(struct mbuf *n, struct mbuf *m)
  * cleaned too.
  */
 void
-m_demote(struct mbuf *m0, int all)
+m_demote(struct mbuf *m0, int all, int flags)
 {
 	struct mbuf *m;
 
@@ -404,7 +408,7 @@ void
 			m_freem(m->m_nextpkt);
 			m->m_nextpkt = NULL;
 		}
-		m->m_flags = m->m_flags & (M_EXT|M_RDONLY|M_NOFREE);
+		m->m_flags = m->m_flags & (M_EXT | M_RDONLY | M_NOFREE | flags);
 	}
 }
 
Index: sys/kern/sys_socket.c
===================================================================
--- sys/kern/sys_socket.c	(.../head)	(revision 270879)
+++ sys/kern/sys_socket.c	(.../projects/sendfile)	(revision 270881)
@@ -165,20 +165,17 @@ soo_ioctl(struct file *fp, u_long cmd, void *data,
 
 	case FIONREAD:
 		/* Unlocked read. */
-		*(int *)data = so->so_rcv.sb_cc;
+		*(int *)data = sbavail(&so->so_rcv);
 		break;
 
 	case FIONWRITE:
 		/* Unlocked read. */
-		*(int *)data = so->so_snd.sb_cc;
+		*(int *)data = sbavail(&so->so_snd);
 		break;
 
 	case FIONSPACE:
-		if ((so->so_snd.sb_hiwat < so->so_snd.sb_cc) ||
-		    (so->so_snd.sb_mbmax < so->so_snd.sb_mbcnt))
-			*(int *)data = 0;
-		else
-			*(int *)data = sbspace(&so->so_snd);
+		/* Unlocked read. */
+		*(int *)data = sbspace(&so->so_snd);
 		break;
 
 	case FIOSETOWN:
@@ -244,6 +241,7 @@ soo_stat(struct file *fp, struct stat *ub, struct
     struct thread *td)
 {
 	struct socket *so = fp->f_data;
+	struct sockbuf *sb;
 #ifdef MAC
 	int error;
 #endif
@@ -259,15 +257,18 @@ soo_stat(struct file *fp, struct stat *ub, struct
 	 * If SBS_CANTRCVMORE is set, but there's still data left in the
 	 * receive buffer, the socket is still readable.
 	 */
-	SOCKBUF_LOCK(&so->so_rcv);
-	if ((so->so_rcv.sb_state & SBS_CANTRCVMORE) == 0 ||
-	    so->so_rcv.sb_cc != 0)
+	sb = &so->so_rcv;
+	SOCKBUF_LOCK(sb);
+	if ((sb->sb_state & SBS_CANTRCVMORE) == 0 || sbavail(sb))
 		ub->st_mode |= S_IRUSR | S_IRGRP | S_IROTH;
-	ub->st_size = so->so_rcv.sb_cc - so->so_rcv.sb_ctl;
-	SOCKBUF_UNLOCK(&so->so_rcv);
-	/* Unlocked read. */
-	if ((so->so_snd.sb_state & SBS_CANTSENDMORE) == 0)
+	ub->st_size = sbavail(sb) - sb->sb_ctl;
+	SOCKBUF_UNLOCK(sb);
+
+	sb = &so->so_snd;
+	SOCKBUF_LOCK(sb);
+	if ((sb->sb_state & SBS_CANTSENDMORE) == 0)
 		ub->st_mode |= S_IWUSR | S_IWGRP | S_IWOTH;
+	SOCKBUF_UNLOCK(sb);
 	ub->st_uid = so->so_cred->cr_uid;
 	ub->st_gid = so->so_cred->cr_gid;
 	return (*so->so_proto->pr_usrreqs->pru_sense)(so, ub);
Index: sys/kern/uipc_usrreq.c
===================================================================
--- sys/kern/uipc_usrreq.c	(.../head)	(revision 270879)
+++ sys/kern/uipc_usrreq.c	(.../projects/sendfile)	(revision 270881)
@@ -793,11 +793,10 @@ uipc_rcvd(struct socket *so, int flags)
 	u_int mbcnt, sbcc;
 
 	unp = sotounpcb(so);
-	KASSERT(unp != NULL, ("uipc_rcvd: unp == NULL"));
+	KASSERT(unp != NULL, ("%s: unp == NULL", __func__));
+	KASSERT(so->so_type == SOCK_STREAM || so->so_type == SOCK_SEQPACKET,
+	    ("%s: socktype %d", __func__, so->so_type));
 
-	if (so->so_type != SOCK_STREAM && so->so_type != SOCK_SEQPACKET)
-		panic("uipc_rcvd socktype %d", so->so_type);
-
 	/*
 	 * Adjust backpressure on sender and wakeup any waiting to write.
 	 *
@@ -810,7 +809,7 @@ uipc_rcvd(struct socket *so, int flags)
 	 */
 	SOCKBUF_LOCK(&so->so_rcv);
 	mbcnt = so->so_rcv.sb_mbcnt;
-	sbcc = so->so_rcv.sb_cc;
+	sbcc = sbavail(&so->so_rcv);
 	SOCKBUF_UNLOCK(&so->so_rcv);
 	/*
 	 * There is a benign race condition at this point.  If we're planning to
@@ -846,7 +845,10 @@ uipc_send(struct socket *so, int flags, struct mbu
 	int error = 0;
 
 	unp = sotounpcb(so);
-	KASSERT(unp != NULL, ("uipc_send: unp == NULL"));
+	KASSERT(unp != NULL, ("%s: unp == NULL", __func__));
+	KASSERT(so->so_type == SOCK_STREAM || so->so_type == SOCK_DGRAM ||
+	    so->so_type == SOCK_SEQPACKET,
+	    ("%s: socktype %d", __func__, so->so_type));
 
 	if (flags & PRUS_OOB) {
 		error = EOPNOTSUPP;
@@ -997,8 +999,11 @@ uipc_send(struct socket *so, int flags, struct mbu
 		}
 
 		mbcnt = so2->so_rcv.sb_mbcnt;
-		sbcc = so2->so_rcv.sb_cc;
-		sorwakeup_locked(so2);
+		sbcc = sbavail(&so2->so_rcv);
+		if (sbcc)
+			sorwakeup_locked(so2);
+		else
+			SOCKBUF_UNLOCK(&so2->so_rcv);
 
 		/*
 		 * The PCB lock on unp2 protects the SB_STOP flag.  Without it,
@@ -1014,9 +1019,6 @@ uipc_send(struct socket *so, int flags, struct mbu
 		UNP_PCB_UNLOCK(unp2);
 		m = NULL;
 		break;
-
-	default:
-		panic("uipc_send unknown socktype");
 	}
 
 	/*
@@ -1046,6 +1048,35 @@ release:
 }
 
 static int
+uipc_ready(struct socket *so, struct mbuf *m, int count)
+{
+	struct unpcb *unp, *unp2;
+	struct socket *so2;
+	int error;
+
+	unp = sotounpcb(so);
+
+	UNP_LINK_RLOCK();
+	unp2 = unp->unp_conn;
+	UNP_PCB_LOCK(unp2);
+	so2 = unp2->unp_socket;
+
+	SOCKBUF_LOCK(&so2->so_rcv);
+	if (so2->so_rcv.sb_state & SBS_CANTRCVMORE) {
+		SOCKBUF_UNLOCK(&so2->so_rcv);
+		error = ENOTCONN;
+	} else if ((error = sbready(&so2->so_rcv, m, count)) == 0)
+		sorwakeup_locked(so2);
+	else
+		SOCKBUF_UNLOCK(&so2->so_rcv);
+
+	UNP_PCB_UNLOCK(unp2);
+	UNP_LINK_RUNLOCK();
+
+	return (error);
+}
+
+static int
 uipc_sense(struct socket *so, struct stat *sb)
 {
 	struct unpcb *unp;
@@ -1115,6 +1146,7 @@ static struct pr_usrreqs uipc_usrreqs_dgram = {
 	.pru_peeraddr =		uipc_peeraddr,
 	.pru_rcvd =		uipc_rcvd,
 	.pru_send =		uipc_send,
+	.pru_ready =		uipc_ready,
 	.pru_sense =		uipc_sense,
 	.pru_shutdown =		uipc_shutdown,
 	.pru_sockaddr =		uipc_sockaddr,
@@ -1137,6 +1169,7 @@ static struct pr_usrreqs uipc_usrreqs_seqpacket =
 	.pru_peeraddr =		uipc_peeraddr,
 	.pru_rcvd =		uipc_rcvd,
 	.pru_send =		uipc_send,
+	.pru_ready =		uipc_ready,
 	.pru_sense =		uipc_sense,
 	.pru_shutdown =		uipc_shutdown,
 	.pru_sockaddr =		uipc_sockaddr,
@@ -1159,6 +1192,7 @@ static struct pr_usrreqs uipc_usrreqs_stream = {
 	.pru_peeraddr =		uipc_peeraddr,
 	.pru_rcvd =		uipc_rcvd,
 	.pru_send =		uipc_send,
+	.pru_ready =		uipc_ready,
 	.pru_sense =		uipc_sense,
 	.pru_shutdown =		uipc_shutdown,
 	.pru_sockaddr =		uipc_sockaddr,
Index: sys/kern/vfs_default.c
===================================================================
--- sys/kern/vfs_default.c	(.../head)	(revision 270879)
+++ sys/kern/vfs_default.c	(.../projects/sendfile)	(revision 270881)
@@ -111,6 +111,7 @@ struct vop_vector default_vnodeops = {
 	.vop_close =		VOP_NULL,
 	.vop_fsync =		VOP_NULL,
 	.vop_getpages =		vop_stdgetpages,
+	.vop_getpages_async =	vop_stdgetpages_async,
 	.vop_getwritemount = 	vop_stdgetwritemount,
 	.vop_inactive =		VOP_NULL,
 	.vop_ioctl =		VOP_ENOTTY,
@@ -726,10 +727,19 @@ vop_stdgetpages(ap)
 {
 
 	return vnode_pager_generic_getpages(ap->a_vp, ap->a_m,
-	    ap->a_count, ap->a_reqpage);
+	    ap->a_count, ap->a_reqpage, NULL, NULL);
 }
 
+/* XXX Needs good comment and a manpage. */
 int
+vop_stdgetpages_async(struct vop_getpages_async_args *ap)
+{
+
+	return vnode_pager_generic_getpages(ap->a_vp, ap->a_m,
+	    ap->a_count, ap->a_reqpage, ap->a_vop_getpages_iodone, ap->a_arg);
+}
+
+int
 vop_stdkqfilter(struct vop_kqfilter_args *ap)
 {
 	return vfs_kqfilter(ap);
Index: sys/kern/uipc_socket.c
===================================================================
--- sys/kern/uipc_socket.c	(.../head)	(revision 270879)
+++ sys/kern/uipc_socket.c	(.../projects/sendfile)	(revision 270881)
@@ -1526,12 +1526,12 @@ restart:
 	 *   2. MSG_DONTWAIT is not set
 	 */
 	if (m == NULL || (((flags & MSG_DONTWAIT) == 0 &&
-	    so->so_rcv.sb_cc < uio->uio_resid) &&
-	    so->so_rcv.sb_cc < so->so_rcv.sb_lowat &&
+	    sbavail(&so->so_rcv) < uio->uio_resid) &&
+	    sbavail(&so->so_rcv) < so->so_rcv.sb_lowat &&
 	    m->m_nextpkt == NULL && (pr->pr_flags & PR_ATOMIC) == 0)) {
-		KASSERT(m != NULL || !so->so_rcv.sb_cc,
-		    ("receive: m == %p so->so_rcv.sb_cc == %u",
-		    m, so->so_rcv.sb_cc));
+		KASSERT(m != NULL || !sbavail(&so->so_rcv),
+		    ("receive: m == %p sbavail == %u",
+		    m, sbavail(&so->so_rcv)));
 		if (so->so_error) {
 			if (m != NULL)
 				goto dontblock;
@@ -1710,7 +1710,8 @@ dontblock:
 	 */
 	moff = 0;
 	offset = 0;
-	while (m != NULL && uio->uio_resid > 0 && error == 0) {
+	while (m != NULL && !(m->m_flags & M_NOTAVAIL) && uio->uio_resid > 0
+	    && error == 0) {
 		/*
 		 * If the type of mbuf has changed since the last mbuf
 		 * examined ('type'), end the receive operation.
@@ -1813,9 +1814,7 @@ dontblock:
 						SOCKBUF_LOCK(&so->so_rcv);
 					}
 				}
-				m->m_data += len;
-				m->m_len -= len;
-				so->so_rcv.sb_cc -= len;
+				sbmtrim(&so->so_rcv, m, len);
 			}
 		}
 		SOCKBUF_LOCK_ASSERT(&so->so_rcv);
@@ -1980,7 +1979,7 @@ restart:
 
 	/* Abort if socket has reported problems. */
 	if (so->so_error) {
-		if (sb->sb_cc > 0)
+		if (sbavail(sb) > 0)
 			goto deliver;
 		if (oresid > uio->uio_resid)
 			goto out;
@@ -1992,7 +1991,7 @@ restart:
 
 	/* Door is closed.  Deliver what is left, if any. */
 	if (sb->sb_state & SBS_CANTRCVMORE) {
-		if (sb->sb_cc > 0)
+		if (sbavail(sb) > 0)
 			goto deliver;
 		else
 			goto out;
@@ -1999,7 +1998,7 @@ restart:
 	}
 
 	/* Socket buffer is empty and we shall not block. */
-	if (sb->sb_cc == 0 &&
+	if (sbavail(sb) == 0 &&
 	    ((so->so_state & SS_NBIO) || (flags & (MSG_DONTWAIT|MSG_NBIO)))) {
 		error = EAGAIN;
 		goto out;
@@ -2006,18 +2005,18 @@ restart:
 	}
 
 	/* Socket buffer got some data that we shall deliver now. */
-	if (sb->sb_cc > 0 && !(flags & MSG_WAITALL) &&
+	if (sbavail(sb) > 0 && !(flags & MSG_WAITALL) &&
 	    ((sb->sb_flags & SS_NBIO) ||
 	     (flags & (MSG_DONTWAIT|MSG_NBIO)) ||
-	     sb->sb_cc >= sb->sb_lowat ||
-	     sb->sb_cc >= uio->uio_resid ||
-	     sb->sb_cc >= sb->sb_hiwat) ) {
+	     sbavail(sb) >= sb->sb_lowat ||
+	     sbavail(sb) >= uio->uio_resid ||
+	     sbavail(sb) >= sb->sb_hiwat) ) {
 		goto deliver;
 	}
 
 	/* On MSG_WAITALL we must wait until all data or error arrives. */
 	if ((flags & MSG_WAITALL) &&
-	    (sb->sb_cc >= uio->uio_resid || sb->sb_cc >= sb->sb_hiwat))
+	    (sbavail(sb) >= uio->uio_resid || sbavail(sb) >= sb->sb_hiwat))
 		goto deliver;
 
 	/*
@@ -2031,7 +2030,7 @@ restart:
 
 deliver:
 	SOCKBUF_LOCK_ASSERT(&so->so_rcv);
-	KASSERT(sb->sb_cc > 0, ("%s: sockbuf empty", __func__));
+	KASSERT(sbavail(sb) > 0, ("%s: sockbuf empty", __func__));
 	KASSERT(sb->sb_mb != NULL, ("%s: sb_mb == NULL", __func__));
 
 	/* Statistics. */
@@ -2039,7 +2038,7 @@ deliver:
 		uio->uio_td->td_ru.ru_msgrcv++;
 
 	/* Fill uio until full or current end of socket buffer is reached. */
-	len = min(uio->uio_resid, sb->sb_cc);
+	len = min(uio->uio_resid, sbavail(sb));
 	if (mp0 != NULL) {
 		/* Dequeue as many mbufs as possible. */
 		if (!(flags & MSG_PEEK) && len >= sb->sb_mb->m_len) {
@@ -2050,6 +2049,8 @@ deliver:
 			for (m = sb->sb_mb;
 			     m != NULL && m->m_len <= len;
 			     m = m->m_next) {
+				KASSERT(!(m->m_flags & M_NOTAVAIL),
+				    ("%s: m %p not available", __func__, m));
 				len -= m->m_len;
 				uio->uio_resid -= m->m_len;
 				sbfree(sb, m);
@@ -2174,9 +2175,9 @@ soreceive_dgram(struct socket *so, struct sockaddr
 	 */
 	SOCKBUF_LOCK(&so->so_rcv);
 	while ((m = so->so_rcv.sb_mb) == NULL) {
-		KASSERT(so->so_rcv.sb_cc == 0,
-		    ("soreceive_dgram: sb_mb NULL but sb_cc %u",
-		    so->so_rcv.sb_cc));
+		KASSERT(sbavail(&so->so_rcv) == 0,
+		    ("soreceive_dgram: sb_mb NULL but sbavail %u",
+		    sbavail(&so->so_rcv)));
 		if (so->so_error) {
 			error = so->so_error;
 			so->so_error = 0;
@@ -3178,6 +3179,13 @@ pru_send_notsupp(struct socket *so, int flags, str
 	return EOPNOTSUPP;
 }
 
+int
+pru_ready_notsupp(struct socket *so, struct mbuf *m, int count)
+{
+
+	return (EOPNOTSUPP);
+}
+
 /*
  * This isn't really a ``null'' operation, but it's the default one and
  * doesn't do anything destructive.
@@ -3249,7 +3257,7 @@ filt_soread(struct knote *kn, long hint)
 	so = kn->kn_fp->f_data;
 	SOCKBUF_LOCK_ASSERT(&so->so_rcv);
 
-	kn->kn_data = so->so_rcv.sb_cc - so->so_rcv.sb_ctl;
+	kn->kn_data = sbavail(&so->so_rcv) - so->so_rcv.sb_ctl;
 	if (so->so_rcv.sb_state & SBS_CANTRCVMORE) {
 		kn->kn_flags |= EV_EOF;
 		kn->kn_fflags = so->so_error;
@@ -3261,7 +3269,7 @@ filt_soread(struct knote *kn, long hint)
 		if (kn->kn_data >= kn->kn_sdata)
 			return 1;
 	} else {
-		if (so->so_rcv.sb_cc >= so->so_rcv.sb_lowat)
+		if (sbavail(&so->so_rcv) >= so->so_rcv.sb_lowat)
 			return 1;
 	}
 
@@ -3456,7 +3464,7 @@ soisdisconnected(struct socket *so)
 	sorwakeup_locked(so);
 	SOCKBUF_LOCK(&so->so_snd);
 	so->so_snd.sb_state |= SBS_CANTSENDMORE;
-	sbdrop_locked(&so->so_snd, so->so_snd.sb_cc);
+	sbdrop_locked(&so->so_snd, sbused(&so->so_snd));
 	sowwakeup_locked(so);
 	wakeup(&so->so_timeo);
 }
Index: sys/tools/vnode_if.awk
===================================================================
--- sys/tools/vnode_if.awk	(.../head)	(revision 270879)
+++ sys/tools/vnode_if.awk	(.../projects/sendfile)	(revision 270881)
@@ -254,16 +254,26 @@ while ((getline < srcfile) > 0) {
 		if (sub(/;$/, "") < 1)
 			die("Missing end-of-line ; in \"%s\".", $0);
 
-		# pick off variable name
-		if ((argp = match($0, /[A-Za-z0-9_]+$/)) < 1)
-			die("Missing var name \"a_foo\" in \"%s\".", $0);
-		args[numargs] = substr($0, argp);
-		$0 = substr($0, 1, argp - 1);
-
-		# what is left must be type
-		# remove trailing space (if any)
-		sub(/ $/, "");
-		types[numargs] = $0;
+		# pick off argument name
+		if ((argp = match($0, /[A-Za-z0-9_]+$/)) > 0) {
+			args[numargs] = substr($0, argp);
+			$0 = substr($0, 1, argp - 1);
+			sub(/ $/, "");
+			delete fargs[numargs];
+			types[numargs] = $0;
+		} else {	# try to parse a function pointer argument
+			if ((argp = match($0,
+			    /\(\*[A-Za-z0-9_]+\)\([A-Za-z0-9_*, ]+\)$/)) < 1)
+				die("Missing var name \"a_foo\" in \"%s\".",
+				    $0);
+			args[numargs] = substr($0, argp + 2);
+			sub(/\).+/, "", args[numargs]);
+			fargs[numargs] = substr($0, argp);
+			sub(/^\([^)]+\)/, "", fargs[numargs]);
+			$0 = substr($0, 1, argp - 1);
+			sub(/ $/, "");
+			types[numargs] = $0;
+		}
 	}
 	if (numargs > 4)
 		ctrargs = 4;
@@ -286,8 +296,13 @@ while ((getline < srcfile) > 0) {
 	if (hfile) {
 		# Print out the vop_F_args structure.
 		printh("struct "name"_args {\n\tstruct vop_generic_args a_gen;");
-		for (i = 0; i < numargs; ++i)
-			printh("\t" t_spc(types[i]) "a_" args[i] ";");
+		for (i = 0; i < numargs; ++i) {
+			if (fargs[i]) {
+				printh("\t" t_spc(types[i]) "(*a_" args[i] \
+				    ")" fargs[i] ";");
+			} else
+				printh("\t" t_spc(types[i]) "a_" args[i] ";");
+		}
 		printh("};");
 		printh("");
 
@@ -301,8 +316,14 @@ while ((getline < srcfile) > 0) {
 		printh("");
 		printh("static __inline int " uname "(");
 		for (i = 0; i < numargs; ++i) {
-			printh("\t" t_spc(types[i]) args[i] \
-			    (i < numargs - 1 ? "," : ")"));
+			if (fargs[i]) {
+				printh("\t" t_spc(types[i]) "(*" args[i] \
+				    ")" fargs[i] \
+				    (i < numargs - 1 ? "," : ")"));
+			} else {
+				printh("\t" t_spc(types[i]) args[i] \
+				    (i < numargs - 1 ? "," : ")"));
+			}
 		}
 		printh("{");
 		printh("\tstruct " name "_args a;");
Index: sys/netinet/sctp_var.h
===================================================================
--- sys/netinet/sctp_var.h	(.../head)	(revision 270879)
+++ sys/netinet/sctp_var.h	(.../projects/sendfile)	(revision 270881)
@@ -82,9 +82,9 @@ extern struct pr_usrreqs sctp_usrreqs;
 
 #define sctp_maxspace(sb) (max((sb)->sb_hiwat,SCTP_MINIMAL_RWND))
 
-#define	sctp_sbspace(asoc, sb) ((long) ((sctp_maxspace(sb) > (asoc)->sb_cc) ? (sctp_maxspace(sb) - (asoc)->sb_cc) : 0))
+#define	sctp_sbspace(asoc, sb) ((long) ((sctp_maxspace(sb) > (asoc)->sb_ccc) ? (sctp_maxspace(sb) - (asoc)->sb_ccc) : 0))
 
-#define	sctp_sbspace_failedmsgs(sb) ((long) ((sctp_maxspace(sb) > (sb)->sb_cc) ? (sctp_maxspace(sb) - (sb)->sb_cc) : 0))
+#define	sctp_sbspace_failedmsgs(sb) ((long) ((sctp_maxspace(sb) > (sb)->sb_ccc) ? (sctp_maxspace(sb) - (sb)->sb_ccc) : 0))
 
 #define sctp_sbspace_sub(a,b) ((a > b) ? (a - b) : 0)
 
@@ -195,10 +195,10 @@ extern struct pr_usrreqs sctp_usrreqs;
 }
 
 #define sctp_sbfree(ctl, stcb, sb, m) { \
-	SCTP_SAVE_ATOMIC_DECREMENT(&(sb)->sb_cc, SCTP_BUF_LEN((m))); \
+	SCTP_SAVE_ATOMIC_DECREMENT(&(sb)->sb_ccc, SCTP_BUF_LEN((m))); \
 	SCTP_SAVE_ATOMIC_DECREMENT(&(sb)->sb_mbcnt, MSIZE); \
 	if (((ctl)->do_not_ref_stcb == 0) && stcb) {\
-		SCTP_SAVE_ATOMIC_DECREMENT(&(stcb)->asoc.sb_cc, SCTP_BUF_LEN((m))); \
+		SCTP_SAVE_ATOMIC_DECREMENT(&(stcb)->asoc.sb_ccc, SCTP_BUF_LEN((m))); \
 		SCTP_SAVE_ATOMIC_DECREMENT(&(stcb)->asoc.my_rwnd_control_len, MSIZE); \
 	} \
 	if (SCTP_BUF_TYPE(m) != MT_DATA && SCTP_BUF_TYPE(m) != MT_HEADER && \
@@ -207,10 +207,10 @@ extern struct pr_usrreqs sctp_usrreqs;
 }
 
 #define sctp_sballoc(stcb, sb, m) { \
-	atomic_add_int(&(sb)->sb_cc,SCTP_BUF_LEN((m))); \
+	atomic_add_int(&(sb)->sb_ccc,SCTP_BUF_LEN((m))); \
 	atomic_add_int(&(sb)->sb_mbcnt, MSIZE); \
 	if (stcb) { \
-		atomic_add_int(&(stcb)->asoc.sb_cc,SCTP_BUF_LEN((m))); \
+		atomic_add_int(&(stcb)->asoc.sb_ccc,SCTP_BUF_LEN((m))); \
 		atomic_add_int(&(stcb)->asoc.my_rwnd_control_len, MSIZE); \
 	} \
 	if (SCTP_BUF_TYPE(m) != MT_DATA && SCTP_BUF_TYPE(m) != MT_HEADER && \
Index: sys/netinet/tcp_usrreq.c
===================================================================
--- sys/netinet/tcp_usrreq.c	(.../head)	(revision 270879)
+++ sys/netinet/tcp_usrreq.c	(.../projects/sendfile)	(revision 270881)
@@ -826,7 +826,7 @@ tcp_usr_send(struct socket *so, int flags, struct
 		m_freem(control);	/* empty control, just free it */
 	}
 	if (!(flags & PRUS_OOB)) {
-		sbappendstream(&so->so_snd, m);
+		sbappendstream(&so->so_snd, m, flags);
 		if (nam && tp->t_state < TCPS_SYN_SENT) {
 			/*
 			 * Do implied connect if not yet connected,
@@ -858,7 +858,8 @@ tcp_usr_send(struct socket *so, int flags, struct
 			socantsendmore(so);
 			tcp_usrclosed(tp);
 		}
-		if (!(inp->inp_flags & INP_DROPPED)) {
+		if (!(inp->inp_flags & INP_DROPPED) &&
+		    !(flags & PRUS_NOTREADY)) {
 			if (flags & PRUS_MORETOCOME)
 				tp->t_flags |= TF_MORETOCOME;
 			error = tcp_output(tp);
@@ -884,7 +885,7 @@ tcp_usr_send(struct socket *so, int flags, struct
 		 * of data past the urgent section.
 		 * Otherwise, snd_up should be one lower.
 		 */
-		sbappendstream_locked(&so->so_snd, m);
+		sbappendstream_locked(&so->so_snd, m, flags);
 		SOCKBUF_UNLOCK(&so->so_snd);
 		if (nam && tp->t_state < TCPS_SYN_SENT) {
 			/*
@@ -908,10 +909,12 @@ tcp_usr_send(struct socket *so, int flags, struct
 			tp->snd_wnd = TTCP_CLIENT_SND_WND;
 			tcp_mss(tp, -1);
 		}
-		tp->snd_up = tp->snd_una + so->so_snd.sb_cc;
-		tp->t_flags |= TF_FORCEDATA;
-		error = tcp_output(tp);
-		tp->t_flags &= ~TF_FORCEDATA;
+		tp->snd_up = tp->snd_una + sbavail(&so->so_snd);
+		if (!(flags & PRUS_NOTREADY)) {
+			tp->t_flags |= TF_FORCEDATA;
+			error = tcp_output(tp);
+			tp->t_flags &= ~TF_FORCEDATA;
+		}
 	}
 out:
 	TCPDEBUG2((flags & PRUS_OOB) ? PRU_SENDOOB :
@@ -922,6 +925,38 @@ out:
 	return (error);
 }
 
+static int
+tcp_usr_ready(struct socket *so, struct mbuf *m, int count)
+{
+	struct inpcb *inp;
+	struct tcpcb *tp;
+	int error;
+
+	inp = sotoinpcb(so);
+	INP_WLOCK(inp);
+	if (inp->inp_flags & (INP_TIMEWAIT | INP_DROPPED)) {
+		INP_WUNLOCK(inp);
+		return (ECONNRESET);
+	}
+	tp = intotcpcb(inp);
+
+	SOCKBUF_LOCK(&so->so_snd);
+	if (so->so_snd.sb_state & SBS_CANTSENDMORE) {
+		SOCKBUF_UNLOCK(&so->so_snd);
+		error = ENOTCONN;
+	} else if (sbready(&so->so_snd, m, count) == 0) {
+		SOCKBUF_UNLOCK(&so->so_snd);
+		error = tcp_output(tp);
+	} else {
+		SOCKBUF_UNLOCK(&so->so_snd);
+		error = EINPROGRESS;
+	}
+
+	INP_WUNLOCK(inp);
+
+	return (error);
+}
+
 /*
  * Abort the TCP.  Drop the connection abruptly.
  */
@@ -1056,6 +1091,7 @@ struct pr_usrreqs tcp_usrreqs = {
 	.pru_rcvd =		tcp_usr_rcvd,
 	.pru_rcvoob =		tcp_usr_rcvoob,
 	.pru_send =		tcp_usr_send,
+	.pru_ready =		tcp_usr_ready,
 	.pru_shutdown =		tcp_usr_shutdown,
 	.pru_sockaddr =		in_getsockaddr,
 	.pru_sosetlabel =	in_pcbsosetlabel,
Index: sys/netinet/siftr.c
===================================================================
--- sys/netinet/siftr.c	(.../head)	(revision 270879)
+++ sys/netinet/siftr.c	(.../projects/sendfile)	(revision 270881)
@@ -781,9 +781,9 @@ siftr_siftdata(struct pkt_node *pn, struct inpcb *
 	pn->flags = tp->t_flags;
 	pn->rxt_length = tp->t_rxtcur;
 	pn->snd_buf_hiwater = inp->inp_socket->so_snd.sb_hiwat;
-	pn->snd_buf_cc = inp->inp_socket->so_snd.sb_cc;
+	pn->snd_buf_cc = sbused(&inp->inp_socket->so_snd);
 	pn->rcv_buf_hiwater = inp->inp_socket->so_rcv.sb_hiwat;
-	pn->rcv_buf_cc = inp->inp_socket->so_rcv.sb_cc;
+	pn->rcv_buf_cc = sbused(&inp->inp_socket->so_rcv);
 	pn->sent_inflight_bytes = tp->snd_max - tp->snd_una;
 	pn->t_segqlen = tp->t_segqlen;
 
Index: sys/netinet/sctp_os_bsd.h
===================================================================
--- sys/netinet/sctp_os_bsd.h	(.../head)	(revision 270879)
+++ sys/netinet/sctp_os_bsd.h	(.../projects/sendfile)	(revision 270881)
@@ -405,7 +405,7 @@ typedef struct callout sctp_os_timer_t;
 #define SCTP_SOWAKEUP(so)	wakeup(&(so)->so_timeo)
 /* clear the socket buffer state */
 #define SCTP_SB_CLEAR(sb)	\
-	(sb).sb_cc = 0;		\
+	(sb).sb_ccc = 0;		\
 	(sb).sb_mb = NULL;	\
 	(sb).sb_mbcnt = 0;
 
Index: sys/netinet/tcp_reass.c
===================================================================
--- sys/netinet/tcp_reass.c	(.../head)	(revision 270879)
+++ sys/netinet/tcp_reass.c	(.../projects/sendfile)	(revision 270881)
@@ -248,7 +248,7 @@ present:
 			m_freem(mq);
 		else {
 			mq->m_nextpkt = NULL;
-			sbappendstream_locked(&so->so_rcv, mq);
+			sbappendstream_locked(&so->so_rcv, mq, 0);
 			wakeup = 1;
 		}
 	}
Index: sys/netinet/sctp_indata.c
===================================================================
--- sys/netinet/sctp_indata.c	(.../head)	(revision 270879)
+++ sys/netinet/sctp_indata.c	(.../projects/sendfile)	(revision 270881)
@@ -70,7 +70,7 @@ sctp_calc_rwnd(struct sctp_tcb *stcb, struct sctp_
 
 	/*
 	 * This is really set wrong with respect to a 1-2-m socket. Since
-	 * the sb_cc is the count that everyone as put up. When we re-write
+	 * the sb_ccc is the count that everyone as put up. When we re-write
 	 * sctp_soreceive then we will fix this so that ONLY this
 	 * associations data is taken into account.
 	 */
@@ -77,7 +77,7 @@ sctp_calc_rwnd(struct sctp_tcb *stcb, struct sctp_
 	if (stcb->sctp_socket == NULL)
 		return (calc);
 
-	if (stcb->asoc.sb_cc == 0 &&
+	if (stcb->asoc.sb_ccc == 0 &&
 	    asoc->size_on_reasm_queue == 0 &&
 	    asoc->size_on_all_streams == 0) {
 		/* Full rwnd granted */
@@ -1363,7 +1363,7 @@ sctp_process_a_data_chunk(struct sctp_tcb *stcb, s
 		 * When we have NO room in the rwnd we check to make sure
 		 * the reader is doing its job...
 		 */
-		if (stcb->sctp_socket->so_rcv.sb_cc) {
+		if (stcb->sctp_socket->so_rcv.sb_ccc) {
 			/* some to read, wake-up */
 #if defined(__APPLE__) || defined(SCTP_SO_LOCK_TESTING)
 			struct socket *so;
Index: sys/netinet/accf_http.c
===================================================================
--- sys/netinet/accf_http.c	(.../head)	(revision 270879)
+++ sys/netinet/accf_http.c	(.../projects/sendfile)	(revision 270881)
@@ -92,7 +92,7 @@ sbfull(struct sockbuf *sb)
 	    "mbcnt(%ld) >= mbmax(%ld): %d",
 	    sb->sb_cc, sb->sb_hiwat, sb->sb_cc >= sb->sb_hiwat,
 	    sb->sb_mbcnt, sb->sb_mbmax, sb->sb_mbcnt >= sb->sb_mbmax);
-	return (sb->sb_cc >= sb->sb_hiwat || sb->sb_mbcnt >= sb->sb_mbmax);
+	return (sbused(sb) >= sb->sb_hiwat || sb->sb_mbcnt >= sb->sb_mbmax);
 }
 
 /*
@@ -162,13 +162,14 @@ static int
 sohashttpget(struct socket *so, void *arg, int waitflag)
 {
 
-	if ((so->so_rcv.sb_state & SBS_CANTRCVMORE) == 0 && !sbfull(&so->so_rcv)) {
+	if ((so->so_rcv.sb_state & SBS_CANTRCVMORE) == 0 &&
+	    !sbfull(&so->so_rcv)) {
 		struct mbuf *m;
 		char *cmp;
 		int	cmplen, cc;
 
 		m = so->so_rcv.sb_mb;
-		cc = so->so_rcv.sb_cc - 1;
+		cc = sbavail(&so->so_rcv) - 1;
 		if (cc < 1)
 			return (SU_OK);
 		switch (*mtod(m, char *)) {
@@ -215,7 +216,7 @@ soparsehttpvers(struct socket *so, void *arg, int
 		goto fallout;
 
 	m = so->so_rcv.sb_mb;
-	cc = so->so_rcv.sb_cc;
+	cc = sbavail(&so->so_rcv);
 	inspaces = spaces = 0;
 	for (m = so->so_rcv.sb_mb; m; m = n) {
 		n = m->m_nextpkt;
@@ -304,7 +305,7 @@ soishttpconnected(struct socket *so, void *arg, in
 	 * have NCHRS left
 	 */
 	copied = 0;
-	ccleft = so->so_rcv.sb_cc;
+	ccleft = sbavail(&so->so_rcv);
 	if (ccleft < NCHRS)
 		goto readmore;
 	a = b = c = '\0';
Index: sys/netinet/accf_dns.c
===================================================================
--- sys/netinet/accf_dns.c	(.../head)	(revision 270879)
+++ sys/netinet/accf_dns.c	(.../projects/sendfile)	(revision 270881)
@@ -75,7 +75,7 @@ sohasdns(struct socket *so, void *arg, int waitfla
 	struct sockbuf *sb = &so->so_rcv;
 
 	/* If the socket is full, we're ready. */
-	if (sb->sb_cc >= sb->sb_hiwat || sb->sb_mbcnt >= sb->sb_mbmax)
+	if (sbused(sb) >= sb->sb_hiwat || sb->sb_mbcnt >= sb->sb_mbmax)
 		goto ready;
 
 	/* Check to see if we have a request. */
@@ -115,7 +115,7 @@ skippacket(struct sockbuf *sb) {
 	unsigned long packlen;
 	struct packet q, *p = &q;
 
-	if (sb->sb_cc < 2)
+	if (sbavail(sb) < 2)
 		return DNS_WAIT;
 
 	q.m = sb->sb_mb;
@@ -122,7 +122,7 @@ skippacket(struct sockbuf *sb) {
 	q.n = q.m->m_nextpkt;
 	q.moff = 0;
 	q.offset = 0;
-	q.len = sb->sb_cc;
+	q.len = sbavail(sb);
 
 	GET16(p, packlen);
 	if (packlen + 2 > q.len)
Index: sys/netinet/sctp_structs.h
===================================================================
--- sys/netinet/sctp_structs.h	(.../head)	(revision 270879)
+++ sys/netinet/sctp_structs.h	(.../projects/sendfile)	(revision 270881)
@@ -990,7 +990,7 @@ struct sctp_association {
 
 	uint32_t total_output_queue_size;
 
-	uint32_t sb_cc;		/* shadow of sb_cc */
+	uint32_t sb_ccc;		/* shadow of sb_ccc */
 	uint32_t sb_send_resv;	/* amount reserved on a send */
 	uint32_t my_rwnd_control_len;	/* shadow of sb_mbcnt used for rwnd
 					 * control */
Index: sys/netinet/tcp_output.c
===================================================================
--- sys/netinet/tcp_output.c	(.../head)	(revision 270879)
+++ sys/netinet/tcp_output.c	(.../projects/sendfile)	(revision 270881)
@@ -322,7 +322,7 @@ after_sack_rexmit:
 			 * to send then the probe will be the FIN
 			 * itself.
 			 */
-			if (off < so->so_snd.sb_cc)
+			if (off < sbavail(&so->so_snd))
 				flags &= ~TH_FIN;
 			sendwin = 1;
 		} else {
@@ -348,7 +348,8 @@ after_sack_rexmit:
 	 */
 	if (sack_rxmit == 0) {
 		if (sack_bytes_rxmt == 0)
-			len = ((long)ulmin(so->so_snd.sb_cc, sendwin) - off);
+			len = ((long)ulmin(sbavail(&so->so_snd), sendwin) -
+			    off);
 		else {
 			long cwin;
 
@@ -357,8 +358,8 @@ after_sack_rexmit:
 			 * sending new data, having retransmitted all the
 			 * data possible in the scoreboard.
 			 */
-			len = ((long)ulmin(so->so_snd.sb_cc, tp->snd_wnd) 
-			       - off);
+			len = ((long)ulmin(sbavail(&so->so_snd), tp->snd_wnd) -
+			    off);
 			/*
 			 * Don't remove this (len > 0) check !
 			 * We explicitly check for len > 0 here (although it 
@@ -457,12 +458,15 @@ after_sack_rexmit:
 	 * TODO: Shrink send buffer during idle periods together
 	 * with congestion window.  Requires another timer.  Has to
 	 * wait for upcoming tcp timer rewrite.
+	 *
+	 * XXXGL: should there be used sbused() or sbavail()?
 	 */
 	if (V_tcp_do_autosndbuf && so->so_snd.sb_flags & SB_AUTOSIZE) {
 		if ((tp->snd_wnd / 4 * 5) >= so->so_snd.sb_hiwat &&
-		    so->so_snd.sb_cc >= (so->so_snd.sb_hiwat / 8 * 7) &&
-		    so->so_snd.sb_cc < V_tcp_autosndbuf_max &&
-		    sendwin >= (so->so_snd.sb_cc - (tp->snd_nxt - tp->snd_una))) {
+		    sbused(&so->so_snd) >= (so->so_snd.sb_hiwat / 8 * 7) &&
+		    sbused(&so->so_snd) < V_tcp_autosndbuf_max &&
+		    sendwin >= (sbused(&so->so_snd) -
+		    (tp->snd_nxt - tp->snd_una))) {
 			if (!sbreserve_locked(&so->so_snd,
 			    min(so->so_snd.sb_hiwat + V_tcp_autosndbuf_inc,
 			     V_tcp_autosndbuf_max), so, curthread))
@@ -499,10 +503,11 @@ after_sack_rexmit:
 		tso = 1;
 
 	if (sack_rxmit) {
-		if (SEQ_LT(p->rxmit + len, tp->snd_una + so->so_snd.sb_cc))
+		if (SEQ_LT(p->rxmit + len, tp->snd_una + sbavail(&so->so_snd)))
 			flags &= ~TH_FIN;
 	} else {
-		if (SEQ_LT(tp->snd_nxt + len, tp->snd_una + so->so_snd.sb_cc))
+		if (SEQ_LT(tp->snd_nxt + len, tp->snd_una +
+		    sbavail(&so->so_snd)))
 			flags &= ~TH_FIN;
 	}
 
@@ -532,7 +537,7 @@ after_sack_rexmit:
 		 */
 		if (!(tp->t_flags & TF_MORETOCOME) &&	/* normal case */
 		    (idle || (tp->t_flags & TF_NODELAY)) &&
-		    len + off >= so->so_snd.sb_cc &&
+		    len + off >= sbavail(&so->so_snd) &&
 		    (tp->t_flags & TF_NOPUSH) == 0) {
 			goto send;
 		}
@@ -660,7 +665,7 @@ dontupdate:
 	 * if window is nonzero, transmit what we can,
 	 * otherwise force out a byte.
 	 */
-	if (so->so_snd.sb_cc && !tcp_timer_active(tp, TT_REXMT) &&
+	if (sbavail(&so->so_snd) && !tcp_timer_active(tp, TT_REXMT) &&
 	    !tcp_timer_active(tp, TT_PERSIST)) {
 		tp->t_rxtshift = 0;
 		tcp_setpersist(tp);
@@ -786,7 +791,7 @@ send:
 			 * fractional unless the send sockbuf can
 			 * be emptied.
 			 */
-			if (sendalot && off + len < so->so_snd.sb_cc) {
+			if (sendalot && off + len < sbavail(&so->so_snd)) {
 				len -= len % (tp->t_maxopd - optlen);
 				sendalot = 1;
 			}
@@ -889,7 +894,7 @@ send:
 		 * give data to the user when a buffer fills or
 		 * a PUSH comes in.)
 		 */
-		if (off + len == so->so_snd.sb_cc)
+		if (off + len == sbavail(&so->so_snd))
 			flags |= TH_PUSH;
 		SOCKBUF_UNLOCK(&so->so_snd);
 	} else {
Index: sys/netinet/sctputil.c
===================================================================
--- sys/netinet/sctputil.c	(.../head)	(revision 270879)
+++ sys/netinet/sctputil.c	(.../projects/sendfile)	(revision 270881)
@@ -67,9 +67,9 @@ sctp_sblog(struct sockbuf *sb, struct sctp_tcb *st
 	struct sctp_cwnd_log sctp_clog;
 
 	sctp_clog.x.sb.stcb = stcb;
-	sctp_clog.x.sb.so_sbcc = sb->sb_cc;
+	sctp_clog.x.sb.so_sbcc = sb->sb_ccc;
 	if (stcb)
-		sctp_clog.x.sb.stcb_sbcc = stcb->asoc.sb_cc;
+		sctp_clog.x.sb.stcb_sbcc = stcb->asoc.sb_ccc;
 	else
 		sctp_clog.x.sb.stcb_sbcc = 0;
 	sctp_clog.x.sb.incr = incr;
@@ -4363,7 +4363,7 @@ sctp_add_to_readq(struct sctp_inpcb *inp,
 {
 	/*
 	 * Here we must place the control on the end of the socket read
-	 * queue AND increment sb_cc so that select will work properly on
+	 * queue AND increment sb_ccc so that select will work properly on
 	 * read.
 	 */
 	struct mbuf *m, *prev = NULL;
@@ -4489,7 +4489,7 @@ sctp_append_to_readq(struct sctp_inpcb *inp,
 	 * the reassembly queue.
 	 * 
 	 * If PDAPI this means we need to add m to the end of the data.
-	 * Increase the length in the control AND increment the sb_cc.
+	 * Increase the length in the control AND increment the sb_ccc.
 	 * Otherwise sb is NULL and all we need to do is put it at the end
 	 * of the mbuf chain.
 	 */
@@ -4701,10 +4701,10 @@ sctp_free_bufspace(struct sctp_tcb *stcb, struct s
 
 	if (stcb->sctp_socket && (((stcb->sctp_ep->sctp_flags & SCTP_PCB_FLAGS_IN_TCPPOOL)) ||
 	    ((stcb->sctp_ep->sctp_flags & SCTP_PCB_FLAGS_TCPTYPE)))) {
-		if (stcb->sctp_socket->so_snd.sb_cc >= tp1->book_size) {
-			stcb->sctp_socket->so_snd.sb_cc -= tp1->book_size;
+		if (stcb->sctp_socket->so_snd.sb_ccc >= tp1->book_size) {
+			stcb->sctp_socket->so_snd.sb_ccc -= tp1->book_size;
 		} else {
-			stcb->sctp_socket->so_snd.sb_cc = 0;
+			stcb->sctp_socket->so_snd.sb_ccc = 0;
 
 		}
 	}
@@ -5254,11 +5254,11 @@ sctp_sorecvmsg(struct socket *so,
 	in_eeor_mode = sctp_is_feature_on(inp, SCTP_PCB_FLAGS_EXPLICIT_EOR);
 	if (SCTP_BASE_SYSCTL(sctp_logging_level) & SCTP_RECV_RWND_LOGGING_ENABLE) {
 		sctp_misc_ints(SCTP_SORECV_ENTER,
-		    rwnd_req, in_eeor_mode, so->so_rcv.sb_cc, uio->uio_resid);
+		    rwnd_req, in_eeor_mode, so->so_rcv.sb_ccc, uio->uio_resid);
 	}
 	if (SCTP_BASE_SYSCTL(sctp_logging_level) & SCTP_RECV_RWND_LOGGING_ENABLE) {
 		sctp_misc_ints(SCTP_SORECV_ENTERPL,
-		    rwnd_req, block_allowed, so->so_rcv.sb_cc, uio->uio_resid);
+		    rwnd_req, block_allowed, so->so_rcv.sb_ccc, uio->uio_resid);
 	}
 	error = sblock(&so->so_rcv, (block_allowed ? SBL_WAIT : 0));
 	if (error) {
@@ -5277,7 +5277,7 @@ restart_nosblocks:
 	    (inp->sctp_flags & SCTP_PCB_FLAGS_SOCKET_ALLGONE)) {
 		goto out;
 	}
-	if ((so->so_rcv.sb_state & SBS_CANTRCVMORE) && (so->so_rcv.sb_cc == 0)) {
+	if ((so->so_rcv.sb_state & SBS_CANTRCVMORE) && (so->so_rcv.sb_ccc == 0)) {
 		if (so->so_error) {
 			error = so->so_error;
 			if ((in_flags & MSG_PEEK) == 0)
@@ -5284,7 +5284,7 @@ restart_nosblocks:
 				so->so_error = 0;
 			goto out;
 		} else {
-			if (so->so_rcv.sb_cc == 0) {
+			if (so->so_rcv.sb_ccc == 0) {
 				/* indicate EOF */
 				error = 0;
 				goto out;
@@ -5291,9 +5291,9 @@ restart_nosblocks:
 			}
 		}
 	}
-	if ((so->so_rcv.sb_cc <= held_length) && block_allowed) {
+	if ((so->so_rcv.sb_ccc <= held_length) && block_allowed) {
 		/* we need to wait for data */
-		if ((so->so_rcv.sb_cc == 0) &&
+		if ((so->so_rcv.sb_ccc == 0) &&
 		    ((inp->sctp_flags & SCTP_PCB_FLAGS_TCPTYPE) ||
 		    (inp->sctp_flags & SCTP_PCB_FLAGS_IN_TCPPOOL))) {
 			if ((inp->sctp_flags & SCTP_PCB_FLAGS_CONNECTED) == 0) {
@@ -5329,7 +5329,7 @@ restart_nosblocks:
 		}
 		held_length = 0;
 		goto restart_nosblocks;
-	} else if (so->so_rcv.sb_cc == 0) {
+	} else if (so->so_rcv.sb_ccc == 0) {
 		if (so->so_error) {
 			error = so->so_error;
 			if ((in_flags & MSG_PEEK) == 0)
@@ -5386,11 +5386,11 @@ restart_nosblocks:
 			SCTP_INP_READ_LOCK(inp);
 		}
 		control = TAILQ_FIRST(&inp->read_queue);
-		if ((control == NULL) && (so->so_rcv.sb_cc != 0)) {
+		if ((control == NULL) && (so->so_rcv.sb_ccc != 0)) {
 #ifdef INVARIANTS
 			panic("Huh, its non zero and nothing on control?");
 #endif
-			so->so_rcv.sb_cc = 0;
+			so->so_rcv.sb_ccc = 0;
 		}
 		SCTP_INP_READ_UNLOCK(inp);
 		hold_rlock = 0;
@@ -5511,11 +5511,11 @@ restart_nosblocks:
 		}
 		/*
 		 * if we reach here, not suitable replacement is available
-		 * <or> fragment interleave is NOT on. So stuff the sb_cc
+		 * <or> fragment interleave is NOT on. So stuff the sb_ccc
 		 * into the our held count, and its time to sleep again.
 		 */
-		held_length = so->so_rcv.sb_cc;
-		control->held_length = so->so_rcv.sb_cc;
+		held_length = so->so_rcv.sb_ccc;
+		control->held_length = so->so_rcv.sb_ccc;
 		goto restart;
 	}
 	/* Clear the held length since there is something to read */
@@ -5812,10 +5812,10 @@ get_more_data:
 					if (SCTP_BASE_SYSCTL(sctp_logging_level) & SCTP_SB_LOGGING_ENABLE) {
 						sctp_sblog(&so->so_rcv, control->do_not_ref_stcb ? NULL : stcb, SCTP_LOG_SBFREE, cp_len);
 					}
-					atomic_subtract_int(&so->so_rcv.sb_cc, cp_len);
+					atomic_subtract_int(&so->so_rcv.sb_ccc, cp_len);
 					if ((control->do_not_ref_stcb == 0) &&
 					    stcb) {
-						atomic_subtract_int(&stcb->asoc.sb_cc, cp_len);
+						atomic_subtract_int(&stcb->asoc.sb_ccc, cp_len);
 					}
 					copied_so_far += cp_len;
 					freed_so_far += cp_len;
@@ -5960,7 +5960,7 @@ wait_some_more:
 		    (sctp_is_feature_on(inp, SCTP_PCB_FLAGS_FRAG_INTERLEAVE))) {
 			goto release;
 		}
-		if (so->so_rcv.sb_cc <= control->held_length) {
+		if (so->so_rcv.sb_ccc <= control->held_length) {
 			error = sbwait(&so->so_rcv);
 			if (error) {
 				goto release;
@@ -5987,8 +5987,8 @@ wait_some_more:
 				}
 				goto done_with_control;
 			}
-			if (so->so_rcv.sb_cc > held_length) {
-				control->held_length = so->so_rcv.sb_cc;
+			if (so->so_rcv.sb_ccc > held_length) {
+				control->held_length = so->so_rcv.sb_ccc;
 				held_length = 0;
 			}
 			goto wait_some_more;
@@ -6135,13 +6135,13 @@ out:
 			    freed_so_far,
 			    ((uio) ? (slen - uio->uio_resid) : slen),
 			    stcb->asoc.my_rwnd,
-			    so->so_rcv.sb_cc);
+			    so->so_rcv.sb_ccc);
 		} else {
 			sctp_misc_ints(SCTP_SORECV_DONE,
 			    freed_so_far,
 			    ((uio) ? (slen - uio->uio_resid) : slen),
 			    0,
-			    so->so_rcv.sb_cc);
+			    so->so_rcv.sb_ccc);
 		}
 	}
 stage_left:
Index: sys/netinet/sctp_usrreq.c
===================================================================
--- sys/netinet/sctp_usrreq.c	(.../head)	(revision 270879)
+++ sys/netinet/sctp_usrreq.c	(.../projects/sendfile)	(revision 270881)
@@ -586,7 +586,7 @@ sctp_must_try_again:
 	if (((flags & SCTP_PCB_FLAGS_SOCKET_GONE) == 0) &&
 	    (atomic_cmpset_int(&inp->sctp_flags, flags, (flags | SCTP_PCB_FLAGS_SOCKET_GONE | SCTP_PCB_FLAGS_CLOSE_IP)))) {
 		if (((so->so_options & SO_LINGER) && (so->so_linger == 0)) ||
-		    (so->so_rcv.sb_cc > 0)) {
+		    (so->so_rcv.sb_ccc > 0)) {
 #ifdef SCTP_LOG_CLOSING
 			sctp_log_closing(inp, NULL, 13);
 #endif
@@ -751,7 +751,7 @@ sctp_disconnect(struct socket *so)
 			}
 			if (((so->so_options & SO_LINGER) &&
 			    (so->so_linger == 0)) ||
-			    (so->so_rcv.sb_cc > 0)) {
+			    (so->so_rcv.sb_ccc > 0)) {
 				if (SCTP_GET_STATE(asoc) !=
 				    SCTP_STATE_COOKIE_WAIT) {
 					/* Left with Data unread */
@@ -916,7 +916,7 @@ sctp_flush(struct socket *so, int how)
 		inp->sctp_flags |= SCTP_PCB_FLAGS_SOCKET_CANT_READ;
 		SCTP_INP_READ_UNLOCK(inp);
 		SCTP_INP_WUNLOCK(inp);
-		so->so_rcv.sb_cc = 0;
+		so->so_rcv.sb_ccc = 0;
 		so->so_rcv.sb_mbcnt = 0;
 		so->so_rcv.sb_mb = NULL;
 	}
@@ -925,7 +925,7 @@ sctp_flush(struct socket *so, int how)
 		 * First make sure the sb will be happy, we don't use these
 		 * except maybe the count
 		 */
-		so->so_snd.sb_cc = 0;
+		so->so_snd.sb_ccc = 0;
 		so->so_snd.sb_mbcnt = 0;
 		so->so_snd.sb_mb = NULL;
 
Index: sys/netinet/sctputil.h
===================================================================
--- sys/netinet/sctputil.h	(.../head)	(revision 270879)
+++ sys/netinet/sctputil.h	(.../projects/sendfile)	(revision 270881)
@@ -286,10 +286,10 @@ do { \
 		} \
 		if (stcb->sctp_socket && ((stcb->sctp_ep->sctp_flags & SCTP_PCB_FLAGS_TCPTYPE) || \
 		    (stcb->sctp_ep->sctp_flags & SCTP_PCB_FLAGS_IN_TCPPOOL))) { \
-			if (stcb->sctp_socket->so_snd.sb_cc >= tp1->book_size) { \
-				atomic_subtract_int(&((stcb)->sctp_socket->so_snd.sb_cc), tp1->book_size); \
+			if (stcb->sctp_socket->so_snd.sb_ccc >= tp1->book_size) { \
+				atomic_subtract_int(&((stcb)->sctp_socket->so_snd.sb_ccc), tp1->book_size); \
 			} else { \
-				stcb->sctp_socket->so_snd.sb_cc = 0; \
+				stcb->sctp_socket->so_snd.sb_ccc = 0; \
 			} \
 		} \
 	} \
@@ -307,10 +307,10 @@ do { \
 		} \
 		if (stcb->sctp_socket && ((stcb->sctp_ep->sctp_flags & SCTP_PCB_FLAGS_TCPTYPE) || \
 		    (stcb->sctp_ep->sctp_flags & SCTP_PCB_FLAGS_IN_TCPPOOL))) { \
-			if (stcb->sctp_socket->so_snd.sb_cc >= sp->length) { \
-				atomic_subtract_int(&stcb->sctp_socket->so_snd.sb_cc,sp->length); \
+			if (stcb->sctp_socket->so_snd.sb_ccc >= sp->length) { \
+				atomic_subtract_int(&stcb->sctp_socket->so_snd.sb_ccc,sp->length); \
 			} else { \
-				stcb->sctp_socket->so_snd.sb_cc = 0; \
+				stcb->sctp_socket->so_snd.sb_ccc = 0; \
 			} \
 		} \
 	} \
@@ -322,7 +322,7 @@ do { \
 	if ((stcb->sctp_socket != NULL) && \
 	    ((stcb->sctp_ep->sctp_flags & SCTP_PCB_FLAGS_TCPTYPE) || \
 	     (stcb->sctp_ep->sctp_flags & SCTP_PCB_FLAGS_IN_TCPPOOL))) { \
-		atomic_add_int(&stcb->sctp_socket->so_snd.sb_cc,sz); \
+		atomic_add_int(&stcb->sctp_socket->so_snd.sb_ccc,sz); \
 	} \
 } while (0)
 
Index: sys/netinet/sctp_input.c
===================================================================
--- sys/netinet/sctp_input.c	(.../head)	(revision 270879)
+++ sys/netinet/sctp_input.c	(.../projects/sendfile)	(revision 270881)
@@ -1044,7 +1044,7 @@ sctp_handle_shutdown_ack(struct sctp_shutdown_ack_
 	if (stcb->sctp_socket) {
 		if ((stcb->sctp_ep->sctp_flags & SCTP_PCB_FLAGS_TCPTYPE) ||
 		    (stcb->sctp_ep->sctp_flags & SCTP_PCB_FLAGS_IN_TCPPOOL)) {
-			stcb->sctp_socket->so_snd.sb_cc = 0;
+			stcb->sctp_socket->so_snd.sb_ccc = 0;
 		}
 		sctp_ulp_notify(SCTP_NOTIFY_ASSOC_DOWN, stcb, 0, NULL, SCTP_SO_NOT_LOCKED);
 	}
Index: sys/netinet/sctp_output.c
===================================================================
--- sys/netinet/sctp_output.c	(.../head)	(revision 270879)
+++ sys/netinet/sctp_output.c	(.../projects/sendfile)	(revision 270881)
@@ -7257,7 +7257,7 @@ one_more_time:
 			if ((stcb->sctp_socket != NULL) && \
 			    ((stcb->sctp_ep->sctp_flags & SCTP_PCB_FLAGS_TCPTYPE) ||
 			    (stcb->sctp_ep->sctp_flags & SCTP_PCB_FLAGS_IN_TCPPOOL))) {
-				atomic_subtract_int(&stcb->sctp_socket->so_snd.sb_cc, sp->length);
+				atomic_subtract_int(&stcb->sctp_socket->so_snd.sb_ccc, sp->length);
 			}
 			if (sp->data) {
 				sctp_m_freem(sp->data);
@@ -11537,7 +11537,7 @@ jump_out:
 		drp->current_onq = htonl(asoc->size_on_reasm_queue +
 		    asoc->size_on_all_streams +
 		    asoc->my_rwnd_control_len +
-		    stcb->sctp_socket->so_rcv.sb_cc);
+		    stcb->sctp_socket->so_rcv.sb_ccc);
 	} else {
 		/*-
 		 * If my rwnd is 0, possibly from mbuf depletion as well as
Index: sys/netinet/sctp_pcb.c
===================================================================
--- sys/netinet/sctp_pcb.c	(.../head)	(revision 270879)
+++ sys/netinet/sctp_pcb.c	(.../projects/sendfile)	(revision 270881)
@@ -3407,7 +3407,7 @@ sctp_inpcb_free(struct sctp_inpcb *inp, int immedi
 			if ((asoc->asoc.size_on_reasm_queue > 0) ||
 			    (asoc->asoc.control_pdapi) ||
 			    (asoc->asoc.size_on_all_streams > 0) ||
-			    (so && (so->so_rcv.sb_cc > 0))) {
+			    (so && (so->so_rcv.sb_ccc > 0))) {
 				/* Left with Data unread */
 				struct mbuf *op_err;
 
@@ -3635,7 +3635,7 @@ sctp_inpcb_free(struct sctp_inpcb *inp, int immedi
 		TAILQ_REMOVE(&inp->read_queue, sq, next);
 		sctp_free_remote_addr(sq->whoFrom);
 		if (so)
-			so->so_rcv.sb_cc -= sq->length;
+			so->so_rcv.sb_ccc -= sq->length;
 		if (sq->data) {
 			sctp_m_freem(sq->data);
 			sq->data = NULL;
@@ -4863,7 +4863,7 @@ sctp_free_assoc(struct sctp_inpcb *inp, struct sct
 			inp->sctp_flags |= SCTP_PCB_FLAGS_WAS_CONNECTED;
 			if (so) {
 				SOCK_LOCK(so);
-				if (so->so_rcv.sb_cc == 0) {
+				if (so->so_rcv.sb_ccc == 0) {
 					so->so_state &= ~(SS_ISCONNECTING |
 					    SS_ISDISCONNECTING |
 					    SS_ISCONFIRMING |
Index: sys/netinet/sctp_pcb.h
===================================================================
--- sys/netinet/sctp_pcb.h	(.../head)	(revision 270879)
+++ sys/netinet/sctp_pcb.h	(.../projects/sendfile)	(revision 270881)
@@ -369,7 +369,7 @@ struct sctp_inpcb {
 	}     ip_inp;
 
 
-	/* Socket buffer lock protects read_queue and of course sb_cc */
+	/* Socket buffer lock protects read_queue and of course sb_ccc */
 	struct sctp_readhead read_queue;
 
 	              LIST_ENTRY(sctp_inpcb) sctp_list;	/* lists all endpoints */
Index: sys/netinet/tcp_input.c
===================================================================
--- sys/netinet/tcp_input.c	(.../head)	(revision 270879)
+++ sys/netinet/tcp_input.c	(.../projects/sendfile)	(revision 270881)
@@ -1734,7 +1734,7 @@ tcp_do_segment(struct mbuf *m, struct tcphdr *th,
 					tcp_timer_activate(tp, TT_REXMT,
 						      tp->t_rxtcur);
 				sowwakeup(so);
-				if (so->so_snd.sb_cc)
+				if (sbavail(&so->so_snd))
 					(void) tcp_output(tp);
 				goto check_delack;
 			}
@@ -1844,7 +1844,7 @@ tcp_do_segment(struct mbuf *m, struct tcphdr *th,
 					    newsize, so, NULL))
 						so->so_rcv.sb_flags &= ~SB_AUTOSIZE;
 				m_adj(m, drop_hdrlen);	/* delayed header drop */
-				sbappendstream_locked(&so->so_rcv, m);
+				sbappendstream_locked(&so->so_rcv, m, 0);
 			}
 			/* NB: sorwakeup_locked() does an implicit unlock. */
 			sorwakeup_locked(so);
@@ -2548,7 +2548,7 @@ tcp_do_segment(struct mbuf *m, struct tcphdr *th,
 					 * Otherwise we would send pure ACKs.
 					 */
 					SOCKBUF_LOCK(&so->so_snd);
-					avail = so->so_snd.sb_cc -
+					avail = sbavail(&so->so_snd) -
 					    (tp->snd_nxt - tp->snd_una);
 					SOCKBUF_UNLOCK(&so->so_snd);
 					if (avail > 0)
@@ -2683,10 +2683,10 @@ process_ACK:
 		cc_ack_received(tp, th, CC_ACK);
 
 		SOCKBUF_LOCK(&so->so_snd);
-		if (acked > so->so_snd.sb_cc) {
-			tp->snd_wnd -= so->so_snd.sb_cc;
+		if (acked > sbavail(&so->so_snd)) {
+			tp->snd_wnd -= sbavail(&so->so_snd);
 			mfree = sbcut_locked(&so->so_snd,
-			    (int)so->so_snd.sb_cc);
+			    (int)sbavail(&so->so_snd));
 			ourfinisacked = 1;
 		} else {
 			mfree = sbcut_locked(&so->so_snd, acked);
@@ -2812,7 +2812,7 @@ step6:
 		 * actually wanting to send this much urgent data.
 		 */
 		SOCKBUF_LOCK(&so->so_rcv);
-		if (th->th_urp + so->so_rcv.sb_cc > sb_max) {
+		if (th->th_urp + sbavail(&so->so_rcv) > sb_max) {
 			th->th_urp = 0;			/* XXX */
 			thflags &= ~TH_URG;		/* XXX */
 			SOCKBUF_UNLOCK(&so->so_rcv);	/* XXX */
@@ -2834,7 +2834,7 @@ step6:
 		 */
 		if (SEQ_GT(th->th_seq+th->th_urp, tp->rcv_up)) {
 			tp->rcv_up = th->th_seq + th->th_urp;
-			so->so_oobmark = so->so_rcv.sb_cc +
+			so->so_oobmark = sbavail(&so->so_rcv) +
 			    (tp->rcv_up - tp->rcv_nxt) - 1;
 			if (so->so_oobmark == 0)
 				so->so_rcv.sb_state |= SBS_RCVATMARK;
@@ -2904,7 +2904,7 @@ dodata:							/* XXX */
 			if (so->so_rcv.sb_state & SBS_CANTRCVMORE)
 				m_freem(m);
 			else
-				sbappendstream_locked(&so->so_rcv, m);
+				sbappendstream_locked(&so->so_rcv, m, 0);
 			/* NB: sorwakeup_locked() does an implicit unlock. */
 			sorwakeup_locked(so);
 		} else {
Index: sys/netgraph/bluetooth/socket/ng_btsocket_l2cap.c
===================================================================
--- sys/netgraph/bluetooth/socket/ng_btsocket_l2cap.c	(.../head)	(revision 270879)
+++ sys/netgraph/bluetooth/socket/ng_btsocket_l2cap.c	(.../projects/sendfile)	(revision 270881)
@@ -1127,9 +1127,8 @@ ng_btsocket_l2cap_process_l2ca_write_rsp(struct ng
 	/*
  	 * Check if we have more data to send
  	 */
-
 	sbdroprecord(&pcb->so->so_snd);
-	if (pcb->so->so_snd.sb_cc > 0) {
+	if (sbavail(&pcb->so->so_snd) > 0) {
 		if (ng_btsocket_l2cap_send2(pcb) == 0)
 			ng_btsocket_l2cap_timeout(pcb);
 		else
@@ -2514,7 +2513,7 @@ ng_btsocket_l2cap_send2(ng_btsocket_l2cap_pcb_p pc
 	
 	mtx_assert(&pcb->pcb_mtx, MA_OWNED);
 
-	if (pcb->so->so_snd.sb_cc == 0)
+	if (sbavail(&pcb->so->so_snd) == 0)
 		return (EINVAL); /* XXX */
 
 	m = m_dup(pcb->so->so_snd.sb_mb, M_NOWAIT);
Index: sys/netgraph/bluetooth/socket/ng_btsocket_rfcomm.c
===================================================================
--- sys/netgraph/bluetooth/socket/ng_btsocket_rfcomm.c	(.../head)	(revision 270879)
+++ sys/netgraph/bluetooth/socket/ng_btsocket_rfcomm.c	(.../projects/sendfile)	(revision 270881)
@@ -3279,7 +3279,7 @@ ng_btsocket_rfcomm_pcb_send(ng_btsocket_rfcomm_pcb
 	}
 
 	for (error = 0, sent = 0; sent < limit; sent ++) { 
-		length = min(pcb->mtu, pcb->so->so_snd.sb_cc);
+		length = min(pcb->mtu, sbavail(&pcb->so->so_snd));
 		if (length == 0)
 			break;
 
Index: sys/netgraph/bluetooth/socket/ng_btsocket_sco.c
===================================================================
--- sys/netgraph/bluetooth/socket/ng_btsocket_sco.c	(.../head)	(revision 270879)
+++ sys/netgraph/bluetooth/socket/ng_btsocket_sco.c	(.../projects/sendfile)	(revision 270881)
@@ -906,7 +906,7 @@ ng_btsocket_sco_default_msg_input(struct ng_mesg *
 				sbdroprecord(&pcb->so->so_snd);
 
 			/* Send more if we have any */
-			if (pcb->so->so_snd.sb_cc > 0)
+			if (sbavail(&pcb->so->so_snd) > 0)
 				if (ng_btsocket_sco_send2(pcb) == 0)
 					ng_btsocket_sco_timeout(pcb);
 
@@ -1748,7 +1748,7 @@ ng_btsocket_sco_send2(ng_btsocket_sco_pcb_p pcb)
 	mtx_assert(&pcb->pcb_mtx, MA_OWNED);
 
 	while (pcb->rt->pending < pcb->rt->num_pkts &&
-	       pcb->so->so_snd.sb_cc > 0) {
+	       sbavail(&pcb->so->so_snd) > 0) {
 		/* Get a copy of the first packet on send queue */
 		m = m_dup(pcb->so->so_snd.sb_mb, M_NOWAIT);
 		if (m == NULL) {
Index: sys/ofed/drivers/infiniband/ulp/sdp/sdp_rx.c
===================================================================
--- sys/ofed/drivers/infiniband/ulp/sdp/sdp_rx.c	(.../head)	(revision 270879)
+++ sys/ofed/drivers/infiniband/ulp/sdp/sdp_rx.c	(.../projects/sendfile)	(revision 270881)
@@ -183,7 +183,7 @@ sdp_post_recvs_needed(struct sdp_sock *ssk)
 	 * Compute bytes in the receive queue and socket buffer.
 	 */
 	bytes_in_process = (posted - SDP_MIN_TX_CREDITS) * buffer_size;
-	bytes_in_process += ssk->socket->so_rcv.sb_cc;
+	bytes_in_process += sbused(&ssk->socket->so_rcv);
 
 	return bytes_in_process < max_bytes;
 }
Index: sys/ofed/drivers/infiniband/ulp/sdp/sdp_main.c
===================================================================
--- sys/ofed/drivers/infiniband/ulp/sdp/sdp_main.c	(.../head)	(revision 270879)
+++ sys/ofed/drivers/infiniband/ulp/sdp/sdp_main.c	(.../projects/sendfile)	(revision 270881)
@@ -747,7 +747,7 @@ sdp_start_disconnect(struct sdp_sock *ssk)
 		    ("sdp_start_disconnect: sdp_drop() returned NULL"));
 	} else {
 		soisdisconnecting(so);
-		unread = so->so_rcv.sb_cc;
+		unread = sbused(&so->so_rcv);
 		sbflush(&so->so_rcv);
 		sdp_usrclosed(ssk);
 		if (!(ssk->flags & SDP_DROPPED)) {
@@ -889,7 +889,7 @@ sdp_append(struct sdp_sock *ssk, struct sockbuf *s
 		m_adj(mb, SDP_HEAD_SIZE);
 		n->m_pkthdr.len += mb->m_pkthdr.len;
 		n->m_flags |= mb->m_flags & (M_PUSH | M_URG);
-		m_demote(mb, 1);
+		m_demote(mb, 1, 0);
 		sbcompress(sb, mb, sb->sb_mbtail);
 		return;
 	}
@@ -1259,7 +1259,7 @@ sdp_sorecv(struct socket *so, struct sockaddr **ps
 	/* We will never ever get anything unless we are connected. */
 	if (!(so->so_state & (SS_ISCONNECTED|SS_ISDISCONNECTED))) {
 		/* When disconnecting there may be still some data left. */
-		if (sb->sb_cc > 0)
+		if (sbavail(sb))
 			goto deliver;
 		if (!(so->so_state & SS_ISDISCONNECTED))
 			error = ENOTCONN;
@@ -1267,7 +1267,7 @@ sdp_sorecv(struct socket *so, struct sockaddr **ps
 	}
 
 	/* Socket buffer is empty and we shall not block. */
-	if (sb->sb_cc == 0 &&
+	if (sbavail(sb) == 0 &&
 	    ((so->so_state & SS_NBIO) || (flags & (MSG_DONTWAIT|MSG_NBIO)))) {
 		error = EAGAIN;
 		goto out;
@@ -1278,7 +1278,7 @@ restart:
 
 	/* Abort if socket has reported problems. */
 	if (so->so_error) {
-		if (sb->sb_cc > 0)
+		if (sbavail(sb))
 			goto deliver;
 		if (oresid > uio->uio_resid)
 			goto out;
@@ -1290,7 +1290,7 @@ restart:
 
 	/* Door is closed.  Deliver what is left, if any. */
 	if (sb->sb_state & SBS_CANTRCVMORE) {
-		if (sb->sb_cc > 0)
+		if (sbavail(sb))
 			goto deliver;
 		else
 			goto out;
@@ -1297,18 +1297,18 @@ restart:
 	}
 
 	/* Socket buffer got some data that we shall deliver now. */
-	if (sb->sb_cc > 0 && !(flags & MSG_WAITALL) &&
+	if (sbavail(sb) && !(flags & MSG_WAITALL) &&
 	    ((so->so_state & SS_NBIO) ||
 	     (flags & (MSG_DONTWAIT|MSG_NBIO)) ||
-	     sb->sb_cc >= sb->sb_lowat ||
-	     sb->sb_cc >= uio->uio_resid ||
-	     sb->sb_cc >= sb->sb_hiwat) ) {
+	     sbavail(sb) >= sb->sb_lowat ||
+	     sbavail(sb) >= uio->uio_resid ||
+	     sbavail(sb) >= sb->sb_hiwat) ) {
 		goto deliver;
 	}
 
 	/* On MSG_WAITALL we must wait until all data or error arrives. */
 	if ((flags & MSG_WAITALL) &&
-	    (sb->sb_cc >= uio->uio_resid || sb->sb_cc >= sb->sb_lowat))
+	    (sbavail(sb) >= uio->uio_resid || sbavail(sb) >= sb->sb_lowat))
 		goto deliver;
 
 	/*
@@ -1322,7 +1322,7 @@ restart:
 
 deliver:
 	SOCKBUF_LOCK_ASSERT(&so->so_rcv);
-	KASSERT(sb->sb_cc > 0, ("%s: sockbuf empty", __func__));
+	KASSERT(sbavail(sb), ("%s: sockbuf empty", __func__));
 	KASSERT(sb->sb_mb != NULL, ("%s: sb_mb == NULL", __func__));
 
 	/* Statistics. */
@@ -1330,7 +1330,7 @@ deliver:
 		uio->uio_td->td_ru.ru_msgrcv++;
 
 	/* Fill uio until full or current end of socket buffer is reached. */
-	len = min(uio->uio_resid, sb->sb_cc);
+	len = min(uio->uio_resid, sbavail(sb));
 	if (mp0 != NULL) {
 		/* Dequeue as many mbufs as possible. */
 		if (!(flags & MSG_PEEK) && len >= sb->sb_mb->m_len) {
@@ -1510,7 +1510,7 @@ sdp_urg(struct sdp_sock *ssk, struct mbuf *mb)
 	if (so == NULL)
 		return;
 
-	so->so_oobmark = so->so_rcv.sb_cc + mb->m_pkthdr.len - 1;
+	so->so_oobmark = sbused(&so->so_rcv) + mb->m_pkthdr.len - 1;
 	sohasoutofband(so);
 	ssk->oobflags &= ~(SDP_HAVEOOB | SDP_HADOOB);
 	if (!(so->so_options & SO_OOBINLINE)) {
Index: sys/dev/cxgb/ulp/tom/cxgb_cpl_io.c
===================================================================
--- sys/dev/cxgb/ulp/tom/cxgb_cpl_io.c	(.../head)	(revision 270879)
+++ sys/dev/cxgb/ulp/tom/cxgb_cpl_io.c	(.../projects/sendfile)	(revision 270881)
@@ -445,8 +445,8 @@ t3_push_frames(struct socket *so, int req_completi
 	 * Autosize the send buffer.
 	 */
 	if (snd->sb_flags & SB_AUTOSIZE && VNET(tcp_do_autosndbuf)) {
-		if (snd->sb_cc >= (snd->sb_hiwat / 8 * 7) &&
-		    snd->sb_cc < VNET(tcp_autosndbuf_max)) {
+		if (sbused(snd) >= (snd->sb_hiwat / 8 * 7) &&
+		    sbused(snd) < VNET(tcp_autosndbuf_max)) {
 			if (!sbreserve_locked(snd, min(snd->sb_hiwat +
 			    VNET(tcp_autosndbuf_inc), VNET(tcp_autosndbuf_max)),
 			    so, curthread))
@@ -597,10 +597,10 @@ t3_rcvd(struct toedev *tod, struct tcpcb *tp)
 	INP_WLOCK_ASSERT(inp);
 
 	SOCKBUF_LOCK(so_rcv);
-	KASSERT(toep->tp_enqueued >= so_rcv->sb_cc,
-	    ("%s: so_rcv->sb_cc > enqueued", __func__));
-	toep->tp_rx_credits += toep->tp_enqueued - so_rcv->sb_cc;
-	toep->tp_enqueued = so_rcv->sb_cc;
+	KASSERT(toep->tp_enqueued >= sbused(so_rcv),
+	    ("%s: sbused(so_rcv) > enqueued", __func__));
+	toep->tp_rx_credits += toep->tp_enqueued - sbused(so_rcv);
+	toep->tp_enqueued = sbused(so_rcv);
 	SOCKBUF_UNLOCK(so_rcv);
 
 	must_send = toep->tp_rx_credits + 16384 >= tp->rcv_wnd;
@@ -1199,7 +1199,7 @@ do_rx_data(struct sge_qset *qs, struct rsp_desc *r
 	}
 
 	toep->tp_enqueued += m->m_pkthdr.len;
-	sbappendstream_locked(so_rcv, m);
+	sbappendstream_locked(so_rcv, m, 0);
 	sorwakeup_locked(so);
 	SOCKBUF_UNLOCK_ASSERT(so_rcv);
 
@@ -1768,7 +1768,7 @@ wr_ack(struct toepcb *toep, struct mbuf *m)
 		so_sowwakeup_locked(so);
 	}
 
-	if (snd->sb_sndptroff < snd->sb_cc)
+	if (snd->sb_sndptroff < sbused(snd))
 		t3_push_frames(so, 0);
 
 out_free:
Index: sys/dev/cxgb/ulp/iw_cxgb/iw_cxgb_cm.c
===================================================================
--- sys/dev/cxgb/ulp/iw_cxgb/iw_cxgb_cm.c	(.../head)	(revision 270879)
+++ sys/dev/cxgb/ulp/iw_cxgb/iw_cxgb_cm.c	(.../projects/sendfile)	(revision 270881)
@@ -1507,11 +1507,11 @@ process_data(struct iwch_ep *ep)
 		process_mpa_request(ep);
 		break;
 	default:
-		if (ep->com.so->so_rcv.sb_cc) 
+		if (sbavail(&ep->com.so->so_rcv)) 
 			printf("%s Unexpected streaming data."
 			       " ep %p state %d so %p so_state %x so_rcv.sb_cc %u so_rcv.sb_mb %p\n",
 			       __FUNCTION__, ep, state_read(&ep->com), ep->com.so, ep->com.so->so_state,
-			       ep->com.so->so_rcv.sb_cc, ep->com.so->so_rcv.sb_mb);
+			       sbavail(&ep->com.so->so_rcv), ep->com.so->so_rcv.sb_mb);
 		break;
 	}
 	return;
Index: sys/dev/cxgbe/tom/t4_cpl_io.c
===================================================================
--- sys/dev/cxgbe/tom/t4_cpl_io.c	(.../head)	(revision 270879)
+++ sys/dev/cxgbe/tom/t4_cpl_io.c	(.../projects/sendfile)	(revision 270881)
@@ -365,15 +365,15 @@ t4_rcvd(struct toedev *tod, struct tcpcb *tp)
 	INP_WLOCK_ASSERT(inp);
 
 	SOCKBUF_LOCK(sb);
-	KASSERT(toep->sb_cc >= sb->sb_cc,
+	KASSERT(toep->sb_cc >= sbused(sb),
 	    ("%s: sb %p has more data (%d) than last time (%d).",
-	    __func__, sb, sb->sb_cc, toep->sb_cc));
+	    __func__, sb, sbused(sb), toep->sb_cc));
 	if (toep->ulp_mode == ULP_MODE_ISCSI) {
 		toep->rx_credits += toep->sb_cc;
 		toep->sb_cc = 0;
 	} else {
-		toep->rx_credits += toep->sb_cc - sb->sb_cc;
-		toep->sb_cc = sb->sb_cc;
+		toep->rx_credits += toep->sb_cc - sbused(sb);
+		toep->sb_cc = sbused(sb);
 	}
 	credits = toep->rx_credits;
 	SOCKBUF_UNLOCK(sb);
@@ -1079,15 +1079,15 @@ do_peer_close(struct sge_iq *iq, const struct rss_
 		tp->rcv_nxt = be32toh(cpl->rcv_nxt);
 		toep->ddp_flags &= ~(DDP_BUF0_ACTIVE | DDP_BUF1_ACTIVE);
 
-		KASSERT(toep->sb_cc >= sb->sb_cc,
+		KASSERT(toep->sb_cc >= sbused(sb),
 		    ("%s: sb %p has more data (%d) than last time (%d).",
-		    __func__, sb, sb->sb_cc, toep->sb_cc));
-		toep->rx_credits += toep->sb_cc - sb->sb_cc;
+		    __func__, sb, sbused(sb), toep->sb_cc));
+		toep->rx_credits += toep->sb_cc - sbused(sb);
 #ifdef USE_DDP_RX_FLOW_CONTROL
 		toep->rx_credits -= m->m_len;	/* adjust for F_RX_FC_DDP */
 #endif
-		sbappendstream_locked(sb, m);
-		toep->sb_cc = sb->sb_cc;
+		sbappendstream_locked(sb, m, 0);
+		toep->sb_cc = sbused(sb);
 	}
 	socantrcvmore_locked(so);	/* unlocks the sockbuf */
 
@@ -1582,12 +1582,12 @@ do_rx_data(struct sge_iq *iq, const struct rss_hea
 		}
 	}
 
-	KASSERT(toep->sb_cc >= sb->sb_cc,
+	KASSERT(toep->sb_cc >= sbused(sb),
 	    ("%s: sb %p has more data (%d) than last time (%d).",
-	    __func__, sb, sb->sb_cc, toep->sb_cc));
-	toep->rx_credits += toep->sb_cc - sb->sb_cc;
-	sbappendstream_locked(sb, m);
-	toep->sb_cc = sb->sb_cc;
+	    __func__, sb, sbused(sb), toep->sb_cc));
+	toep->rx_credits += toep->sb_cc - sbused(sb);
+	sbappendstream_locked(sb, m, 0);
+	toep->sb_cc = sbused(sb);
 	sorwakeup_locked(so);
 	SOCKBUF_UNLOCK_ASSERT(sb);
 
Index: sys/dev/cxgbe/tom/t4_ddp.c
===================================================================
--- sys/dev/cxgbe/tom/t4_ddp.c	(.../head)	(revision 270879)
+++ sys/dev/cxgbe/tom/t4_ddp.c	(.../projects/sendfile)	(revision 270881)
@@ -224,15 +224,15 @@ insert_ddp_data(struct toepcb *toep, uint32_t n)
 	tp->rcv_wnd -= n;
 #endif
 
-	KASSERT(toep->sb_cc >= sb->sb_cc,
+	KASSERT(toep->sb_cc >= sbused(sb),
 	    ("%s: sb %p has more data (%d) than last time (%d).",
-	    __func__, sb, sb->sb_cc, toep->sb_cc));
-	toep->rx_credits += toep->sb_cc - sb->sb_cc;
+	    __func__, sb, sbused(sb), toep->sb_cc));
+	toep->rx_credits += toep->sb_cc - sbused(sb);
 #ifdef USE_DDP_RX_FLOW_CONTROL
 	toep->rx_credits -= n;	/* adjust for F_RX_FC_DDP */
 #endif
-	sbappendstream_locked(sb, m);
-	toep->sb_cc = sb->sb_cc;
+	sbappendstream_locked(sb, m, 0);
+	toep->sb_cc = sbused(sb);
 }
 
 /* SET_TCB_FIELD sent as a ULP command looks like this */
@@ -459,15 +459,15 @@ handle_ddp_data(struct toepcb *toep, __be32 ddp_re
 	else
 		discourage_ddp(toep);
 
-	KASSERT(toep->sb_cc >= sb->sb_cc,
+	KASSERT(toep->sb_cc >= sbused(sb),
 	    ("%s: sb %p has more data (%d) than last time (%d).",
-	    __func__, sb, sb->sb_cc, toep->sb_cc));
-	toep->rx_credits += toep->sb_cc - sb->sb_cc;
+	    __func__, sb, sbused(sb), toep->sb_cc));
+	toep->rx_credits += toep->sb_cc - sbused(sb);
 #ifdef USE_DDP_RX_FLOW_CONTROL
 	toep->rx_credits -= len;	/* adjust for F_RX_FC_DDP */
 #endif
-	sbappendstream_locked(sb, m);
-	toep->sb_cc = sb->sb_cc;
+	sbappendstream_locked(sb, m, 0);
+	toep->sb_cc = sbused(sb);
 wakeup:
 	KASSERT(toep->ddp_flags & db_flag,
 	    ("%s: DDP buffer not active. toep %p, ddp_flags 0x%x, report 0x%x",
@@ -908,7 +908,7 @@ handle_ddp(struct socket *so, struct uio *uio, int
 #endif
 
 	/* XXX: too eager to disable DDP, could handle NBIO better than this. */
-	if (sb->sb_cc >= uio->uio_resid || uio->uio_resid < sc->tt.ddp_thres ||
+	if (sbused(sb) >= uio->uio_resid || uio->uio_resid < sc->tt.ddp_thres ||
 	    uio->uio_resid > MAX_DDP_BUFFER_SIZE || uio->uio_iovcnt > 1 ||
 	    so->so_state & SS_NBIO || flags & (MSG_DONTWAIT | MSG_NBIO) ||
 	    error || so->so_error || sb->sb_state & SBS_CANTRCVMORE)
@@ -946,7 +946,7 @@ handle_ddp(struct socket *so, struct uio *uio, int
 	 * payload.
 	 */
 	ddp_flags = select_ddp_flags(so, flags, db_idx);
-	wr = mk_update_tcb_for_ddp(sc, toep, db_idx, sb->sb_cc, ddp_flags);
+	wr = mk_update_tcb_for_ddp(sc, toep, db_idx, sbused(sb), ddp_flags);
 	if (wr == NULL) {
 		/*
 		 * Just unhold the pages.  The DDP buffer's software state is
@@ -971,8 +971,9 @@ handle_ddp(struct socket *so, struct uio *uio, int
 	 */
 	rc = sbwait(sb);
 	while (toep->ddp_flags & buf_flag) {
+		/* XXXGL: shouldn't here be sbwait() call? */
 		sb->sb_flags |= SB_WAIT;
-		msleep(&sb->sb_cc, &sb->sb_mtx, PSOCK , "sbwait", 0);
+		msleep(&sb->sb_acc, &sb->sb_mtx, PSOCK , "sbwait", 0);
 	}
 	unwire_ddp_buffer(db);
 	return (rc);
@@ -1134,8 +1135,8 @@ restart:
 
 		/* uio should be just as it was at entry */
 		KASSERT(oresid == uio->uio_resid,
-		    ("%s: oresid = %d, uio_resid = %zd, sb_cc = %d",
-		    __func__, oresid, uio->uio_resid, sb->sb_cc));
+		    ("%s: oresid = %d, uio_resid = %zd, sbused = %d",
+		    __func__, oresid, uio->uio_resid, sbused(sb)));
 
 		error = handle_ddp(so, uio, flags, 0);
 		ddp_handled = 1;
@@ -1145,7 +1146,7 @@ restart:
 
 	/* Abort if socket has reported problems. */
 	if (so->so_error) {
-		if (sb->sb_cc > 0)
+		if (sbused(sb))
 			goto deliver;
 		if (oresid > uio->uio_resid)
 			goto out;
@@ -1157,7 +1158,7 @@ restart:
 
 	/* Door is closed.  Deliver what is left, if any. */
 	if (sb->sb_state & SBS_CANTRCVMORE) {
-		if (sb->sb_cc > 0)
+		if (sbused(sb))
 			goto deliver;
 		else
 			goto out;
@@ -1164,7 +1165,7 @@ restart:
 	}
 
 	/* Socket buffer is empty and we shall not block. */
-	if (sb->sb_cc == 0 &&
+	if (sbused(sb) == 0 &&
 	    ((so->so_state & SS_NBIO) || (flags & (MSG_DONTWAIT|MSG_NBIO)))) {
 		error = EAGAIN;
 		goto out;
@@ -1171,18 +1172,18 @@ restart:
 	}
 
 	/* Socket buffer got some data that we shall deliver now. */
-	if (sb->sb_cc > 0 && !(flags & MSG_WAITALL) &&
+	if (sbused(sb) && !(flags & MSG_WAITALL) &&
 	    ((sb->sb_flags & SS_NBIO) ||
 	     (flags & (MSG_DONTWAIT|MSG_NBIO)) ||
-	     sb->sb_cc >= sb->sb_lowat ||
-	     sb->sb_cc >= uio->uio_resid ||
-	     sb->sb_cc >= sb->sb_hiwat) ) {
+	     sbused(sb) >= sb->sb_lowat ||
+	     sbused(sb) >= uio->uio_resid ||
+	     sbused(sb) >= sb->sb_hiwat) ) {
 		goto deliver;
 	}
 
 	/* On MSG_WAITALL we must wait until all data or error arrives. */
 	if ((flags & MSG_WAITALL) &&
-	    (sb->sb_cc >= uio->uio_resid || sb->sb_cc >= sb->sb_lowat))
+	    (sbused(sb) >= uio->uio_resid || sbused(sb) >= sb->sb_lowat))
 		goto deliver;
 
 	/*
@@ -1201,7 +1202,7 @@ restart:
 
 deliver:
 	SOCKBUF_LOCK_ASSERT(&so->so_rcv);
-	KASSERT(sb->sb_cc > 0, ("%s: sockbuf empty", __func__));
+	KASSERT(sbused(sb) > 0, ("%s: sockbuf empty", __func__));
 	KASSERT(sb->sb_mb != NULL, ("%s: sb_mb == NULL", __func__));
 
 	if (sb->sb_flags & SB_DDP_INDICATE && !ddp_handled)
@@ -1212,7 +1213,7 @@ deliver:
 		uio->uio_td->td_ru.ru_msgrcv++;
 
 	/* Fill uio until full or current end of socket buffer is reached. */
-	len = min(uio->uio_resid, sb->sb_cc);
+	len = min(uio->uio_resid, sbused(sb));
 	if (mp0 != NULL) {
 		/* Dequeue as many mbufs as possible. */
 		if (!(flags & MSG_PEEK) && len >= sb->sb_mb->m_len) {
Index: sys/dev/cxgbe/iw_cxgbe/cm.c
===================================================================
--- sys/dev/cxgbe/iw_cxgbe/cm.c	(.../head)	(revision 270879)
+++ sys/dev/cxgbe/iw_cxgbe/cm.c	(.../projects/sendfile)	(revision 270881)
@@ -584,8 +584,8 @@ process_data(struct c4iw_ep *ep)
 {
 	struct sockaddr_in *local, *remote;
 
-	CTR5(KTR_IW_CXGBE, "%s: so %p, ep %p, state %s, sb_cc %d", __func__,
-	    ep->com.so, ep, states[ep->com.state], ep->com.so->so_rcv.sb_cc);
+	CTR5(KTR_IW_CXGBE, "%s: so %p, ep %p, state %s, sbused %d", __func__,
+	    ep->com.so, ep, states[ep->com.state], sbused(&ep->com.so->so_rcv));
 
 	switch (state_read(&ep->com)) {
 	case MPA_REQ_SENT:
@@ -601,11 +601,11 @@ process_data(struct c4iw_ep *ep)
 		process_mpa_request(ep);
 		break;
 	default:
-		if (ep->com.so->so_rcv.sb_cc)
-			log(LOG_ERR, "%s: Unexpected streaming data.  "
-			    "ep %p, state %d, so %p, so_state 0x%x, sb_cc %u\n",
+		if (sbused(&ep->com.so->so_rcv))
+			log(LOG_ERR, "%s: Unexpected streaming data. ep %p, "
+			    "state %d, so %p, so_state 0x%x, sbused %u\n",
 			    __func__, ep, state_read(&ep->com), ep->com.so,
-			    ep->com.so->so_state, ep->com.so->so_rcv.sb_cc);
+			    ep->com.so->so_state, sbused(&ep->com.so->so_rcv));
 		break;
 	}
 }
Index: sys/dev/iscsi/icl.c
===================================================================
--- sys/dev/iscsi/icl.c	(.../head)	(revision 270879)
+++ sys/dev/iscsi/icl.c	(.../projects/sendfile)	(revision 270881)
@@ -758,7 +758,7 @@ icl_receive_thread(void *arg)
 		 * is enough data received to read the PDU.
 		 */
 		SOCKBUF_LOCK(&so->so_rcv);
-		available = so->so_rcv.sb_cc;
+		available = sbavail(&so->so_rcv);
 		if (available < ic->ic_receive_len) {
 			so->so_rcv.sb_lowat = ic->ic_receive_len;
 			cv_wait(&ic->ic_receive_cv, &so->so_rcv.sb_mtx);
Index: sys/dev/ti/if_ti.c
===================================================================
--- sys/dev/ti/if_ti.c	(.../head)	(revision 270879)
+++ sys/dev/ti/if_ti.c	(.../projects/sendfile)	(revision 270881)
@@ -1637,7 +1637,7 @@ ti_newbuf_jumbo(struct ti_softc *sc, int idx, stru
 			m[i]->m_data = (void *)sf_buf_kva(sf[i]);
 			m[i]->m_len = PAGE_SIZE;
 			MEXTADD(m[i], sf_buf_kva(sf[i]), PAGE_SIZE,
-			    sf_buf_mext, (void*)sf_buf_kva(sf[i]), sf[i],
+			    sf_mext_free, (void*)sf_buf_kva(sf[i]), sf[i],
 			    0, EXT_DISPOSABLE);
 			m[i]->m_next = m[i+1];
 		}
@@ -1702,7 +1702,7 @@ nobufs:
 		if (m[i])
 			m_freem(m[i]);
 		if (sf[i])
-			sf_buf_mext((void *)sf_buf_kva(sf[i]), sf[i]);
+			sf_mext_free((void *)sf_buf_kva(sf[i]), sf[i]);
 	}
 	return (ENOBUFS);
 }
Index: sys/vm/vm_pager.h
===================================================================
--- sys/vm/vm_pager.h	(.../head)	(revision 270879)
+++ sys/vm/vm_pager.h	(.../projects/sendfile)	(revision 270881)
@@ -51,18 +51,21 @@ typedef vm_object_t pgo_alloc_t(void *, vm_ooffset
     struct ucred *);
 typedef void pgo_dealloc_t(vm_object_t);
 typedef int pgo_getpages_t(vm_object_t, vm_page_t *, int, int);
+typedef int pgo_getpages_async_t(vm_object_t, vm_page_t *, int, int,
+    void(*)(void *), void *);
 typedef void pgo_putpages_t(vm_object_t, vm_page_t *, int, int, int *);
 typedef boolean_t pgo_haspage_t(vm_object_t, vm_pindex_t, int *, int *);
 typedef void pgo_pageunswapped_t(vm_page_t);
 
 struct pagerops {
-	pgo_init_t	*pgo_init;		/* Initialize pager. */
-	pgo_alloc_t	*pgo_alloc;		/* Allocate pager. */
-	pgo_dealloc_t	*pgo_dealloc;		/* Disassociate. */
-	pgo_getpages_t	*pgo_getpages;		/* Get (read) page. */
-	pgo_putpages_t	*pgo_putpages;		/* Put (write) page. */
-	pgo_haspage_t	*pgo_haspage;		/* Does pager have page? */
-	pgo_pageunswapped_t *pgo_pageunswapped;
+	pgo_init_t		*pgo_init;		/* Initialize pager. */
+	pgo_alloc_t		*pgo_alloc;		/* Allocate pager. */
+	pgo_dealloc_t		*pgo_dealloc;		/* Disassociate. */
+	pgo_getpages_t		*pgo_getpages;		/* Get (read) page. */
+	pgo_getpages_async_t	*pgo_getpages_async;	/* Get page asyncly. */
+	pgo_putpages_t		*pgo_putpages;		/* Put (write) page. */
+	pgo_haspage_t		*pgo_haspage;		/* Query page. */
+	pgo_pageunswapped_t	*pgo_pageunswapped;
 };
 
 extern struct pagerops defaultpagerops;
@@ -103,6 +106,8 @@ vm_object_t vm_pager_allocate(objtype_t, void *, v
 void vm_pager_bufferinit(void);
 void vm_pager_deallocate(vm_object_t);
 static __inline int vm_pager_get_pages(vm_object_t, vm_page_t *, int, int);
+static __inline int vm_pager_get_pages_async(vm_object_t, vm_page_t *, int,
+    int, void(*)(void *), void *);
 static __inline boolean_t vm_pager_has_page(vm_object_t, vm_pindex_t, int *, int *);
 void vm_pager_init(void);
 vm_object_t vm_pager_object_lookup(struct pagerlst *, void *);
@@ -131,6 +136,27 @@ vm_pager_get_pages(
 	return (r);
 }
 
+static __inline int
+vm_pager_get_pages_async(vm_object_t object, vm_page_t *m, int count,
+    int reqpage, void (*iodone)(void *), void *arg)
+{
+	int r;
+
+	VM_OBJECT_ASSERT_WLOCKED(object);
+
+	if (*pagertab[object->type]->pgo_getpages_async == NULL) {
+		/* Emulate async operation. */
+		r = vm_pager_get_pages(object, m, count, reqpage);
+		VM_OBJECT_WUNLOCK(object);
+		(iodone)(arg);
+		VM_OBJECT_WLOCK(object);
+	} else
+		r = (*pagertab[object->type]->pgo_getpages_async)(object, m,
+		    count, reqpage, iodone, arg);
+
+	return (r);
+}
+
 static __inline void
 vm_pager_put_pages(
 	vm_object_t object,
Index: sys/vm/vm_page.c
===================================================================
--- sys/vm/vm_page.c	(.../head)	(revision 270879)
+++ sys/vm/vm_page.c	(.../projects/sendfile)	(revision 270881)
@@ -2692,6 +2692,8 @@ retrylookup:
 		sleep = (allocflags & VM_ALLOC_IGN_SBUSY) != 0 ?
 		    vm_page_xbusied(m) : vm_page_busied(m);
 		if (sleep) {
+			if (allocflags & VM_ALLOC_NOWAIT)
+				return (NULL);
 			/*
 			 * Reference the page before unlocking and
 			 * sleeping so that the page daemon is less
@@ -2719,6 +2721,8 @@ retrylookup:
 	}
 	m = vm_page_alloc(object, pindex, allocflags & ~VM_ALLOC_IGN_SBUSY);
 	if (m == NULL) {
+		if (allocflags & VM_ALLOC_NOWAIT)
+			return (NULL);
 		VM_OBJECT_WUNLOCK(object);
 		VM_WAIT;
 		VM_OBJECT_WLOCK(object);
Index: sys/vm/vm_page.h
===================================================================
--- sys/vm/vm_page.h	(.../head)	(revision 270879)
+++ sys/vm/vm_page.h	(.../projects/sendfile)	(revision 270881)
@@ -391,6 +391,7 @@ vm_page_t PHYS_TO_VM_PAGE(vm_paddr_t pa);
 #define	VM_ALLOC_IGN_SBUSY	0x1000	/* vm_page_grab() only */
 #define	VM_ALLOC_NODUMP		0x2000	/* don't include in dump */
 #define	VM_ALLOC_SBUSY		0x4000	/* Shared busy the page */
+#define	VM_ALLOC_NOWAIT		0x8000	/* Return NULL instead of sleeping */
 
 #define	VM_ALLOC_COUNT_SHIFT	16
 #define	VM_ALLOC_COUNT(count)	((count) << VM_ALLOC_COUNT_SHIFT)
Index: sys/vm/vnode_pager.c
===================================================================
--- sys/vm/vnode_pager.c	(.../head)	(revision 270879)
+++ sys/vm/vnode_pager.c	(.../projects/sendfile)	(revision 270881)
@@ -83,6 +83,8 @@ static int vnode_pager_input_smlfs(vm_object_t obj
 static int vnode_pager_input_old(vm_object_t object, vm_page_t m);
 static void vnode_pager_dealloc(vm_object_t);
 static int vnode_pager_getpages(vm_object_t, vm_page_t *, int, int);
+static int vnode_pager_getpages_async(vm_object_t, vm_page_t *, int, int,
+    void(*)(void  *), void *);
 static void vnode_pager_putpages(vm_object_t, vm_page_t *, int, boolean_t, int *);
 static boolean_t vnode_pager_haspage(vm_object_t, vm_pindex_t, int *, int *);
 static vm_object_t vnode_pager_alloc(void *, vm_ooffset_t, vm_prot_t,
@@ -92,6 +94,7 @@ struct pagerops vnodepagerops = {
 	.pgo_alloc =	vnode_pager_alloc,
 	.pgo_dealloc =	vnode_pager_dealloc,
 	.pgo_getpages =	vnode_pager_getpages,
+	.pgo_getpages_async = vnode_pager_getpages_async,
 	.pgo_putpages =	vnode_pager_putpages,
 	.pgo_haspage =	vnode_pager_haspage,
 };
@@ -664,6 +667,40 @@ vnode_pager_getpages(vm_object_t object, vm_page_t
 	return rtval;
 }
 
+static int
+vnode_pager_getpages_async(vm_object_t object, vm_page_t *m, int count,
+    int reqpage, void (*iodone)(void *), void *arg)
+{
+	int rtval;
+	struct vnode *vp;
+	int bytes = count * PAGE_SIZE;
+
+	vp = object->handle;
+	VM_OBJECT_WUNLOCK(object);
+	rtval = VOP_GETPAGES_ASYNC(vp, m, bytes, reqpage, 0, iodone, arg);
+	KASSERT(rtval != EOPNOTSUPP,
+	    ("vnode_pager: FS getpages_async not implemented\n"));
+	VM_OBJECT_WLOCK(object);
+	return rtval;
+}
+
+struct getpages_softc {
+	vm_page_t *m;
+	struct buf *bp;
+	vm_object_t object;
+	vm_offset_t kva;
+	off_t foff;
+	int size;
+	int count;
+	int unmapped;
+	int reqpage;
+	void (*iodone)(void *);
+	void *arg;
+};
+
+int	vnode_pager_generic_getpages_done(struct getpages_softc *);
+void	vnode_pager_generic_getpages_done_async(struct buf *);
+
 /*
  * This is now called from local media FS's to operate against their
  * own vnodes if they fail to implement VOP_GETPAGES.
@@ -670,11 +707,11 @@ vnode_pager_getpages(vm_object_t object, vm_page_t
  */
 int
 vnode_pager_generic_getpages(struct vnode *vp, vm_page_t *m, int bytecount,
-    int reqpage)
+    int reqpage, void (*iodone)(void *), void *arg)
 {
 	vm_object_t object;
 	vm_offset_t kva;
-	off_t foff, tfoff, nextoff;
+	off_t foff;
 	int i, j, size, bsize, first;
 	daddr_t firstaddr, reqblock;
 	struct bufobj *bo;
@@ -684,6 +721,7 @@ vnode_pager_generic_getpages(struct vnode *vp, vm_
 	struct mount *mp;
 	int count;
 	int error;
+	int unmapped;
 
 	object = vp->v_object;
 	count = bytecount / PAGE_SIZE;
@@ -891,8 +929,8 @@ vnode_pager_generic_getpages(struct vnode *vp, vm_
 	 * requires mapped buffers.
 	 */
 	mp = vp->v_mount;
-	if (mp != NULL && (mp->mnt_kern_flag & MNTK_UNMAPPED_BUFS) != 0 &&
-	    unmapped_buf_allowed) {
+	unmapped = (mp != NULL && (mp->mnt_kern_flag & MNTK_UNMAPPED_BUFS));
+	if (unmapped && unmapped_buf_allowed) {
 		bp->b_data = unmapped_buf;
 		bp->b_kvabase = unmapped_buf;
 		bp->b_offset = 0;
@@ -905,7 +943,6 @@ vnode_pager_generic_getpages(struct vnode *vp, vm_
 
 	/* build a minimal buffer header */
 	bp->b_iocmd = BIO_READ;
-	bp->b_iodone = bdone;
 	KASSERT(bp->b_rcred == NOCRED, ("leaking read ucred"));
 	KASSERT(bp->b_wcred == NOCRED, ("leaking write ucred"));
 	bp->b_rcred = crhold(curthread->td_ucred);
@@ -923,10 +960,88 @@ vnode_pager_generic_getpages(struct vnode *vp, vm_
 
 	/* do the input */
 	bp->b_iooffset = dbtob(bp->b_blkno);
-	bstrategy(bp);
 
-	bwait(bp, PVM, "vnread");
+	if (iodone) { /* async */
+		struct getpages_softc *sc;
 
+		sc = malloc(sizeof(*sc), M_TEMP, M_WAITOK);
+
+		sc->m = m;
+		sc->bp = bp;
+		sc->object = object;
+		sc->foff = foff;
+		sc->size = size;
+		sc->count = count;
+		sc->unmapped = unmapped;
+		sc->reqpage = reqpage;
+		sc->kva = kva;
+
+		sc->iodone = iodone;
+		sc->arg = arg;
+
+		bp->b_iodone = vnode_pager_generic_getpages_done_async;
+		bp->b_caller1 = sc;
+		BUF_KERNPROC(bp);
+		bstrategy(bp);
+		/* Good bye! */
+	} else {
+		struct getpages_softc sc;
+
+		sc.m = m;
+		sc.bp = bp;
+		sc.object = object;
+		sc.foff = foff;
+		sc.size = size;
+		sc.count = count;
+		sc.unmapped = unmapped;
+		sc.reqpage = reqpage;
+		sc.kva = kva;
+
+		bp->b_iodone = bdone;
+		bstrategy(bp);
+		bwait(bp, PVM, "vnread");
+		error = vnode_pager_generic_getpages_done(&sc);
+	}
+
+	return (error ? VM_PAGER_ERROR : VM_PAGER_OK);
+}
+
+void
+vnode_pager_generic_getpages_done_async(struct buf *bp)
+{
+	struct getpages_softc *sc = bp->b_caller1;
+	int error;
+
+	error = vnode_pager_generic_getpages_done(sc);
+
+	vm_page_xunbusy(sc->m[sc->reqpage]);
+
+	sc->iodone(sc->arg);
+
+	free(sc, M_TEMP);
+}
+
+int
+vnode_pager_generic_getpages_done(struct getpages_softc *sc)
+{
+	vm_object_t object;
+	vm_offset_t kva;
+	vm_page_t *m;
+	struct buf *bp;
+	off_t foff, tfoff, nextoff;
+	int i, size, count, unmapped, reqpage;
+	int error = 0;
+
+	m = sc->m;
+	bp = sc->bp;
+	object = sc->object;
+	foff = sc->foff;
+	size = sc->size;
+	count = sc->count;
+	unmapped = sc->unmapped;
+	reqpage = sc->reqpage;
+	kva = sc->kva;
+
 	if ((bp->b_ioflags & BIO_ERROR) != 0)
 		error = EIO;
 
@@ -939,7 +1054,7 @@ vnode_pager_generic_getpages(struct vnode *vp, vm_
 	}
 	if ((bp->b_flags & B_UNMAPPED) == 0)
 		pmap_qremove(kva, count);
-	if (mp != NULL && (mp->mnt_kern_flag & MNTK_UNMAPPED_BUFS) != 0) {
+	if (unmapped) {
 		bp->b_data = (caddr_t)kva;
 		bp->b_kvabase = (caddr_t)kva;
 		bp->b_flags &= ~B_UNMAPPED;
@@ -995,7 +1110,8 @@ vnode_pager_generic_getpages(struct vnode *vp, vm_
 	if (error) {
 		printf("vnode_pager_getpages: I/O read error\n");
 	}
-	return (error ? VM_PAGER_ERROR : VM_PAGER_OK);
+
+	return (error);
 }
 
 /*
Index: sys/vm/vnode_pager.h
===================================================================
--- sys/vm/vnode_pager.h	(.../head)	(revision 270879)
+++ sys/vm/vnode_pager.h	(.../projects/sendfile)	(revision 270881)
@@ -41,7 +41,7 @@
 #ifdef _KERNEL
 
 int vnode_pager_generic_getpages(struct vnode *vp, vm_page_t *m,
-					  int count, int reqpage);
+    int count, int reqpage, void (*iodone)(void *), void *arg);
 int vnode_pager_generic_putpages(struct vnode *vp, vm_page_t *m,
 					  int count, boolean_t sync,
 					  int *rtvals);
Index: usr.bin/netstat/inet.c
===================================================================
--- usr.bin/netstat/inet.c	(.../head)	(revision 270879)
+++ usr.bin/netstat/inet.c	(.../projects/sendfile)	(revision 270881)
@@ -137,7 +137,7 @@ pcblist_sysctl(int proto, const char *name, char *
 static void
 sbtoxsockbuf(struct sockbuf *sb, struct xsockbuf *xsb)
 {
-	xsb->sb_cc = sb->sb_cc;
+	xsb->sb_cc = sb->sb_ccc;
 	xsb->sb_hiwat = sb->sb_hiwat;
 	xsb->sb_mbcnt = sb->sb_mbcnt;
 	xsb->sb_mcnt = sb->sb_mcnt;
@@ -479,7 +479,8 @@ protopr(u_long off, const char *name, int af1, int
 				printf("%6u %6u %6u ", tp->t_sndrexmitpack,
 				       tp->t_rcvoopack, tp->t_sndzerowin);
 		} else {
-			printf("%6u %6u ", so->so_rcv.sb_cc, so->so_snd.sb_cc);
+			printf("%6u %6u ",
+			    so->so_rcv.sb_cc, so->so_snd.sb_cc);
 		}
 		if (numeric_port) {
 			if (inp->inp_vflag & INP_IPV4) {
Index: usr.bin/netstat/netgraph.c
===================================================================
--- usr.bin/netstat/netgraph.c	(.../head)	(revision 270879)
+++ usr.bin/netstat/netgraph.c	(.../projects/sendfile)	(revision 270881)
@@ -119,7 +119,7 @@ netgraphprotopr(u_long off, const char *name, int
 		if (Aflag)
 			printf("%8lx ", (u_long) this);
 		printf("%-5.5s %6u %6u ",
-		    name, sockb.so_rcv.sb_cc, sockb.so_snd.sb_cc);
+		    name, sockb.so_rcv.sb_ccc, sockb.so_snd.sb_ccc);
 
 		/* Get info on associated node */
 		if (ngpcb.node_id == 0 || csock == -1)
Index: usr.bin/netstat/unix.c
===================================================================
--- usr.bin/netstat/unix.c	(.../head)	(revision 270879)
+++ usr.bin/netstat/unix.c	(.../projects/sendfile)	(revision 270881)
@@ -287,7 +287,8 @@ unixdomainpr(struct xunpcb *xunp, struct xsocket *
 	} else {
 		printf("%8lx %-6.6s %6u %6u %8lx %8lx %8lx %8lx",
 		    (long)so->so_pcb, socktype[so->so_type], so->so_rcv.sb_cc,
-		    so->so_snd.sb_cc, (long)unp->unp_vnode, (long)unp->unp_conn,
+		    so->so_snd.sb_cc, (long)unp->unp_vnode,
+		    (long)unp->unp_conn,
 		    (long)LIST_FIRST(&unp->unp_refs),
 		    (long)LIST_NEXT(unp, unp_reflink));
 	}
Index: usr.bin/systat/netstat.c
===================================================================
--- usr.bin/systat/netstat.c	(.../head)	(revision 270879)
+++ usr.bin/systat/netstat.c	(.../projects/sendfile)	(revision 270881)
@@ -333,8 +333,8 @@ enter_kvm(struct inpcb *inp, struct socket *so, in
 	struct netinfo *p;
 
 	if ((p = enter(inp, state, proto)) != NULL) {
-		p->ni_rcvcc = so->so_rcv.sb_cc;
-		p->ni_sndcc = so->so_snd.sb_cc;
+		p->ni_rcvcc = so->so_rcv.sb_ccc;
+		p->ni_sndcc = so->so_snd.sb_ccc;
 	}
 }
 
Index: usr.bin/bluetooth/btsockstat/btsockstat.c
===================================================================
--- usr.bin/bluetooth/btsockstat/btsockstat.c	(.../head)	(revision 270879)
+++ usr.bin/bluetooth/btsockstat/btsockstat.c	(.../projects/sendfile)	(revision 270881)
@@ -255,8 +255,8 @@ hcirawpr(kvm_t *kvmd, u_long addr)
 			(unsigned long) pcb.so,
 			(unsigned long) this,
 			pcb.flags,
-			so.so_rcv.sb_cc,
-			so.so_snd.sb_cc,
+			so.so_rcv.sb_ccc,
+			so.so_snd.sb_ccc,
 			pcb.addr.hci_node);
 	}
 } /* hcirawpr */
@@ -303,8 +303,8 @@ l2caprawpr(kvm_t *kvmd, u_long addr)
 "%-8lx %-8lx %6d %6d %-17.17s\n",
 			(unsigned long) pcb.so,
 			(unsigned long) this,
-			so.so_rcv.sb_cc,
-			so.so_snd.sb_cc,
+			so.so_rcv.sb_ccc,
+			so.so_snd.sb_ccc,
 			bdaddrpr(&pcb.src, NULL, 0));
 	}
 } /* l2caprawpr */
@@ -361,8 +361,8 @@ l2cappr(kvm_t *kvmd, u_long addr)
 		fprintf(stdout,
 "%-8lx %6d %6d %-17.17s/%-5d %-17.17s %-5d %s\n",
 			(unsigned long) this,
-			so.so_rcv.sb_cc,
-			so.so_snd.sb_cc,
+			so.so_rcv.sb_ccc,
+			so.so_snd.sb_ccc,
 			bdaddrpr(&pcb.src, local, sizeof(local)),
 			pcb.psm,
 			bdaddrpr(&pcb.dst, remote, sizeof(remote)),
@@ -467,8 +467,8 @@ rfcommpr(kvm_t *kvmd, u_long addr)
 		fprintf(stdout,
 "%-8lx %6d %6d %-17.17s %-17.17s %-4d %-4d %s\n",
 			(unsigned long) this,
-			so.so_rcv.sb_cc,
-			so.so_snd.sb_cc,
+			so.so_rcv.sb_ccc,
+			so.so_snd.sb_ccc,
 			bdaddrpr(&pcb.src, local, sizeof(local)),
 			bdaddrpr(&pcb.dst, remote, sizeof(remote)),
 			pcb.channel,

--hTiIB9CRvBOLTyqY--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20140831164820.GD7693>