Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 8 Dec 2011 17:34:30 -0800
From:      Chuck Tuffli <chuck@tuffli.net>
To:        freebsd-scsi <freebsd-scsi@freebsd.org>
Subject:   CAM locking question
Message-ID:  <CAM0tzX0KefAfzJx_iVME1BGhX%2BykahT0z1%2BVscSFxrT7OHxcNg@mail.gmail.com>

next in thread | raw e-mail | index | archive | help
Hi -

I've been debugging a hang and am wondering if this might be a CAM
problem. The setup is 8-stable, a FC initiator I'm developing, and fio
using the POSIX aio engine with a queue depth > 1 (i.e. a bunch of
concurrent IO). Note that this setup with a queue depth of 1 runs
without issue.

The symptom is fio gets stuck in aio_suspend() waiting for submitted
IOs to complete. But I've verified the driver has already completed
the IOs in question.

Playing around with DTrace, it appears that camisr_runqueue() is
running at the same time the driver is completing CCBs with xpt_done()
albeit on different processors. Staring at the code seems to indicate
that camisr_runqueue() is running inside CAM_SIM_LOCK() while
manipulating the sim_doneq list, but it looks like xpt_done() could
potentially be touching the same sim_doneq without locking.

As an experiment, I added CAM_SIM_LOCK/CAM_SIM_UNLOCK around the
TAILQ_INSERT_TAIL() in xpt_done(), and what was a reliable hang after
a minute or two hasn't shown up in an hour.

I'm not sure this is the right fix, but I wanted to run the scenario
by the experts to get some feedback. Thoughts?

---chuck



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAM0tzX0KefAfzJx_iVME1BGhX%2BykahT0z1%2BVscSFxrT7OHxcNg>