From owner-freebsd-scsi@FreeBSD.ORG Sat Mar 19 15:59:38 2005 Return-Path: Delivered-To: freebsd-scsi@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id D291516A4CE; Sat, 19 Mar 2005 15:59:38 +0000 (GMT) Received: from pooker.samsco.org (pooker.samsco.org [168.103.85.57]) by mx1.FreeBSD.org (Postfix) with ESMTP id 672B243D1D; Sat, 19 Mar 2005 15:59:38 +0000 (GMT) (envelope-from scottl@samsco.org) Received: from [192.168.254.11] (junior-wifi.samsco.home [192.168.254.11]) (authenticated bits=0) by pooker.samsco.org (8.13.1/8.13.1) with ESMTP id j2JFwXn4044800; Sat, 19 Mar 2005 08:58:33 -0700 (MST) (envelope-from scottl@samsco.org) Message-ID: <423C4BAE.3010202@samsco.org> Date: Sat, 19 Mar 2005 08:56:30 -0700 From: Scott Long User-Agent: Mozilla/5.0 (X11; U; FreeBSD i386; en-US; rv:1.7.5) Gecko/20050218 X-Accept-Language: en-us, en MIME-Version: 1.0 To: Danny Braniss References: <423C4037.3090801@samsco.org> In-Reply-To: <423C4037.3090801@samsco.org> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-2.8 required=3.8 tests=ALL_TRUSTED autolearn=failed version=3.0.2 X-Spam-Checker-Version: SpamAssassin 3.0.2 (2004-11-16) on pooker.samsco.org cc: Sam Leffler cc: scsi@freebsd.org cc: net@freebsd.org Subject: Re: iSCSI initiator driver beta version, testers wanted X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 19 Mar 2005 15:59:39 -0000 Scott Long wrote: > Danny Braniss wrote: > >>> with tags enabled, iSCSI is much faster, but it also causes a >>> deadlock :-( >>> this is what i run: >>> newfs -U / >>> cd / >>> restore rf /home/file.dump >>> >>> on the same motherboard, a dual Xeon, with smp disabled all is OK >>> with smp enabled restore gets stuck usualy waiting on biord. >>> the iscsi driver shows that all requests have been done, the sniffing >>> shows the same(ie all request have been done). >>> >>> so this leads me to think that there is some race condition that i'm not >>> aware of in a SMP system, where xpt_done(ccb) is called while >>> another process is calling biowait. >>> >>> another lead is that after restore gets stuck, the system slowly gets >>> 'stalled'. >>> >>> any insight is most welcome!, i'm also stuck. >> >> >> >> ahh, hate talking to myself :-) >> >> grabbing Giant before calling xpt_done solved it, so the problem is >> most probably in the CAM ... >> >> danny >> >> >> > > No, you need to grab Giant when calling xpt_done(). I even put an > assertion into CAM to make sure of that. Are you running with WITNESS > and/or INVARIANTS enabled? Those would have caught this problem. > > Scott Oops, I forgot to mention that I recently addressed this in 6-CURRENT. Now, much of the rest of cam API still requires Giant to be held, but xpt_done() does not. This only applies to 6-CURRENT, and I doubt that it will be backported to 5-STABLE. Scott