From owner-freebsd-scsi@FreeBSD.ORG Sat Apr 10 16:30:48 2010 Return-Path: Delivered-To: freebsd-scsi@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id AA337106566C for ; Sat, 10 Apr 2010 16:30:48 +0000 (UTC) (envelope-from mj@feral.com) Received: from ns1.feral.com (ns1.feral.com [192.67.166.1]) by mx1.freebsd.org (Postfix) with ESMTP id 848F48FC15 for ; Sat, 10 Apr 2010 16:30:48 +0000 (UTC) Received: from [192.168.0.102] (m206-63.dsl.tsoft.com [198.144.206.63]) by ns1.feral.com (8.14.3/8.14.3) with ESMTP id o3AGUmFP065438 for ; Sat, 10 Apr 2010 09:30:48 -0700 (PDT) (envelope-from mj@feral.com) Message-ID: <4BC0A7BF.7030100@feral.com> Date: Sat, 10 Apr 2010 09:30:55 -0700 From: Matthew Jacob User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.1.9) Gecko/20100317 Thunderbird/3.0.4 MIME-Version: 1.0 To: freebsd-scsi@freebsd.org References: <4BB8BEB4.70606@feral.com> In-Reply-To: <4BB8BEB4.70606@feral.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Greylist: Default is to whitelist mail, not delayed by milter-greylist-4.2.3 (ns1.feral.com [192.67.166.1]); Sat, 10 Apr 2010 09:30:48 -0700 (PDT) Subject: cam_periph, and locking? X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 10 Apr 2010 16:30:48 -0000 This subject seems to have petered out a bit..... Where are we on the locking for the list? I personally like Alexander's unit_lock change. On my own front, some work priorities shifted, so I haven't (yet) finished a lot of the test to destruction stuff, but I have made some findings and found some (partial, incomplete) remedies. Here are my notes from the other day on this. Bear with me on this- they aren't the most polished, it's WIP. Comments welcome. A) Four basic problems + Periph invalidation can occur after a periph_find. Not all calls are protected by a sim lock. + The probe state machine can (sometimes) continue despite a failure that caused a periph invalidation + Some of the periph driver callbacks (dasysctlinit, some side effects of disk_create) are not cognizant of periph invalidation and blindly use pointers, etc. + periph invalidation *during* probe can lead to reference after free or bad reference (panics) Note that some of this stuff is not really affected by locking. (minor addendum- cam_periph_release_locked can cause the ref count to go negative) B) Remedies => periph_find bumps a refcount (this has obvious MFC and other implications, as you have to have the caller remember to release) => the probe periph driver should do a periph_hold so that the periph doesn't disappear until the periph driver explicitly unholds it => periph drivers can't use callbacks that just have pointers to an unheld periph structure. With these changes in place, my simulated unit test ran much better- still ended up with a bug where cam_periph_runccb never came back, but at least I wasn't stuck in panics and ref's after free instantly like I was before.