Date: Thu, 8 Dec 2011 17:34:30 -0800 From: Chuck Tuffli <chuck@tuffli.net> To: freebsd-scsi <freebsd-scsi@freebsd.org> Subject: CAM locking question Message-ID: <CAM0tzX0KefAfzJx_iVME1BGhX%2BykahT0z1%2BVscSFxrT7OHxcNg@mail.gmail.com>
next in thread | raw e-mail | index | archive | help
Hi - I've been debugging a hang and am wondering if this might be a CAM problem. The setup is 8-stable, a FC initiator I'm developing, and fio using the POSIX aio engine with a queue depth > 1 (i.e. a bunch of concurrent IO). Note that this setup with a queue depth of 1 runs without issue. The symptom is fio gets stuck in aio_suspend() waiting for submitted IOs to complete. But I've verified the driver has already completed the IOs in question. Playing around with DTrace, it appears that camisr_runqueue() is running at the same time the driver is completing CCBs with xpt_done() albeit on different processors. Staring at the code seems to indicate that camisr_runqueue() is running inside CAM_SIM_LOCK() while manipulating the sim_doneq list, but it looks like xpt_done() could potentially be touching the same sim_doneq without locking. As an experiment, I added CAM_SIM_LOCK/CAM_SIM_UNLOCK around the TAILQ_INSERT_TAIL() in xpt_done(), and what was a reliable hang after a minute or two hasn't shown up in an hour. I'm not sure this is the right fix, but I wanted to run the scenario by the experts to get some feedback. Thoughts? ---chuck
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAM0tzX0KefAfzJx_iVME1BGhX%2BykahT0z1%2BVscSFxrT7OHxcNg>