Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 30 Aug 2004 12:20:30 -0600 (MDT)
From:      "M. Warner Losh" <imp@bsdimp.com>
To:        scottl@freebsd.org
Cc:        freebsd-arch@freebsd.org
Subject:   Re: splxxx level?
Message-ID:  <20040830.122030.48201747.imp@bsdimp.com>
In-Reply-To: <41336DC8.7080808@freebsd.org>
References:  <4133682D.3000403@freebsd.org> <20040830.120124.28086427.imp@bsdimp.com> <41336DC8.7080808@freebsd.org>

next in thread | previous in thread | raw e-mail | index | archive | help
In message: <41336DC8.7080808@freebsd.org>
            Scott Long <scottl@freebsd.org> writes:
: M. Warner Losh wrote:
: 
: > In message: <4133682D.3000403@freebsd.org>
: >             Scott Long <scottl@freebsd.org> writes:
: > : M. Warner Losh wrote:
: > : 
: > : > In message: <20040830.102606.130865377.imp@bsdimp.com>
: > : >             "M. Warner Losh" <imp@bsdimp.com> writes:
: > : > : In message: <41334C3B.4070101@freebsd.org>
: > : > :             Scott Long <scottl@freebsd.org> writes:
: > : > : : Sam wrote:
: > : > : : 
: > : > : : > Hello -
: > : > : : > 
: > : > : : > I'm almost to testing on my AoE driver for 4.x and have
: > : > : : > a question about interrupt priority levels.
: > : > : : > 
: > : > : : > There are currently three entry points into the driver:
: > : > : : > 
: > : > : : > a) strategy routine
: > : > : : > b) network frame reception routine
: > : > : : > c) timer rexmit routine
: > : > : : > 
: > : > : : > Any of the three can diddle with the device structure
: > : > : : > and thusly I need to ensure they're not running simultaneously.
: > : > : : > For example, the network reception can cause a buf to be completed
: > : > : : > and the rexmit timer can cause a buf to be failed.
: > : > : : > 
: > : > : : > So, what kind of contexts are the callout, strategy, and
: > : > : : > network soft interrupt called in?  Which splxxx will give
: > : > : : > one of them exclusive access to whatever they need?
: > : > : : > 
: > : > : : > Just as a reality check -- I am thinking about this correct, right?
: > : > : : > 
: > : > : : > Cheers,
: > : > : : > 
: > : > : : > Sam
: > : > : : > 
: > : > : : 
: > : > : : With 4.x, only one CPU can be in the kernel at a time.  You won't have
: > : > : : to worry about multiple processes trying to get into strategy at the
: > : > : : same time and whatnot.  However, you can be preempted by your interrupt
: > : > : : handler or by a timeout or by a software interrupt like the netisr.  I
: > : > : : don't remember if your driver is for a specific piece of hardware or if
: > : > : : it's a generic layer that sits in between the network interface and the
: > : > : : block layer.  If it's for dedicated hardware then you'll need to define
: > : > : : a interrupt type in bus_setup_intr() and use that type for the spl
: > : > : : (i.e. INTR_TYPE_NET translates to splnet(), INTR_TYPE_BIO translates to
: > : > : : splbio(), etc).
: > : > : : 
: > : > : : The safe way to go is to protect all of your critical code sections with
: > : > : : the appropriate spl level regardless.  spls are very cheap and can be
: > : > : : set/reset from an interrupt context so there is little penalty in using
: > : > : : them liberally at first and then narrowing them down later.  Just make
: > : > : : sure that you don't leak an spl references, and don't hold an spl for so
: > : > : : long that it creates priority inversions.  Since the only interrupts and
: > : > : : timeouts that you'll likely be dealing with are  network related,
: > : > : : splnet() is probably the right one to use.
: > : > : 
: > : > : splimp() is what you want to use, not splnet().  Yes, this is
: > : > : confusing, but it appears to be what all the other network drivers
: > : > : use.  None of them are using splnet() that I could find.  splimp() is
: > : > : also used by the mbuf routines to protect mbuf operations.
: > : > : 
: > : > : splnet() is a list of the software interrupts that are network
: > : > : drivers.
: > : > : 
: > : > : splimp() is splnet() plus the hardware interrupts, so is more
: > : > : appropriate to block things called from the driver.  Especially one
: > : > : that's described as having timeouts.  If it is a network driver, you
: > : > : might consider using the timeout functionality in the net stack as
: > : > : opposed to the callout functions.  This makes it possible to have
: > : > : almost the entire driver w/o doing any spls (most of the network
: > : > : drivers in 4 don't do spl at all, except for entry points that are
: > : > : outside the scope of the network/interrupt entry points, eg
: > : > : device_suspend).
: > : > 
: > : > Ah, just saw 'AoE' in the reply.
: > : > 
: > : > This likely means that you are writing a network stack that's glued
: > : > into the system with the strategy routine.  Inside the strategy
: > : > routine, splbio() is likely what you need to hold while dealing with
: > : > the struct buf that's passed into that routine.  I'm not sure what
: > : > spl level you need to be at to call into the network code.  I'm
: > : > thinking it is splimp(), but I'm not 100% sure about that.  Stevens
: > : > will likely be a good resource for 4.x.
: > : > 
: > : > You may have to define a software interrupt to pass the packets to the
: > : > network code to make the spls work out correctly.
: > : > 
: > : > Warner
: > : 
: > : No, splbio() is not explicitely needed in this case.  You'll be
: > : dealing with a bioq that might need protection, but you have total
: > : control over that.  There is nothing else in the block layer that
: > : will interrupt you, not like the netisr in the network stack or the
: > : camisr in the CAM layer.
: > 
: > Why does da use splbio in its strategy routine then:
: > 
: > static void
: > dastrategy(struct buf *bp)
: > {
: > ...
: > 
: > 	/*
: > 	 * Mask interrupts so that the pack cannot be invalidated until
: > 	 * after we are in the queue.  Otherwise, we might not properly
: > 	 * clean up one of the buffers.
: > 	 */
: > 	s = splbio();
: > 	
: > ...(error cases)
: > 	/*
: > 	 * Place it in the queue of disk activities for this disk
: > 	 */
: > 	bufqdisksort(&softc->buf_queue, bp);
: > 
: > 	splx(s);
: > 	
: > 	/*
: > 	 * Schedule ourselves for performing the work.
: > 	 */
: > 	xpt_schedule(periph, /* XXX priority */1);
: > 
: > 	return;
: > ... (error cases)
: > }
: > 
: > Is that due to the ISR routine from the cam layer?  Or is the
: > softc->buf_queue the 'bioq' that you were talking about?
: > 
: > Warner
: 
: Well, CAM is special in that it expects that all of the hardware drivers
: underneath it will use INTR_TYPE_BIO for their interrupts.  That of
: course won't be true in the AoE case with network interface devices.
: Again, the only reason it is holding splbio() there is to protect the
: bioq during disksort.  It can be protected using whatever spl you want,
: likely splimp() in the case of AoE, or nothing at all if you ensure that
: the bioq won't be touched from an interrupt/timeout/callout context.

OK.  That makes perfect sense.  I wasn't sure if that was the reason,
or if there were other things that might also be mucking with the
bioq.  It is good to know that it is explicitly due to the scsi host
adapter driver's isr running at splbio calling back into the
peripheral driver's code.  Thanks for the clearification.

Warner



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20040830.122030.48201747.imp>