From owner-freebsd-hackers@FreeBSD.ORG Fri Aug 5 10:11:29 2011 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 91DA1106564A for ; Fri, 5 Aug 2011 10:11:29 +0000 (UTC) (envelope-from prvs=1198404263=killing@multiplay.co.uk) Received: from mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23]) by mx1.freebsd.org (Postfix) with ESMTP id 02C828FC13 for ; Fri, 5 Aug 2011 10:11:28 +0000 (UTC) X-MDAV-Processed: mail1.multiplay.co.uk, Fri, 05 Aug 2011 11:00:36 +0100 X-Spam-Processed: mail1.multiplay.co.uk, Fri, 05 Aug 2011 11:00:33 +0100 X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on mail1.multiplay.co.uk X-Spam-Level: X-Spam-Status: No, score=-5.0 required=6.0 tests=USER_IN_WHITELIST shortcircuit=ham autolearn=disabled version=3.2.5 Received: from r2d2 ([188.220.16.49]) by mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23]) (MDaemon PRO v10.0.4) with ESMTP id md50014476977.msg; Fri, 05 Aug 2011 11:00:33 +0100 X-MDRemoteIP: 188.220.16.49 X-Return-Path: prvs=1198404263=killing@multiplay.co.uk X-Envelope-From: killing@multiplay.co.uk Message-ID: From: "Steven Hartland" To: "Eygene Ryabinkin" References: <4CAD348034DD463E80C89DD5A0BDD71B@multiplay.co.uk> Date: Fri, 5 Aug 2011 10:59:43 +0100 MIME-Version: 1.0 Content-Type: text/plain; format=flowed; charset="iso-8859-1"; reply-type=original Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.2900.5931 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.6109 Cc: freebsd-hackers@freebsd.org, mav@freebsd.org Subject: Re: cam / ata timeout limited to 2147 due to overflow bug? X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 05 Aug 2011 10:11:29 -0000 Fri, Aug 05, 2011 at 12:02:19AM +0100, Steven Hartland wrote: >> So I suspect that this is what's happening resulting in an extremely >> small timeout instead of a large one. Now I know that passed in value >> to the timeout is seconds * 1000 so we should be seeing 2148000 >> for ccb->ccb_h.timeout now multiply that by 1000 (hz) and your over >> the int wrap point 2147483647. >> >> So instead of the wrap point being 2147483 seconds (24 days), I suspect >> because of the way this is structured its actually 2147 seconds (26mins). >> >> If this is the case the fix is likely to be something like:- >> callout_reset(&slot->timeout, (int)(ccb->ccb_h.timeout * (hz / 2000)), > > It will give you 0 timeout for all values of hz that are lower than > 2000: hz is int, so you'll get integer division. Since ccb_h.timeout > is u_int32_t, the proper way to handle this situation would be > {{{ > (u_int64_t)ccb->ccb_h.timeout * (u_int32_t)hz)/2000 > }}} > as long as the value of hz won't be greater than 2^32. Ahh of course, was late ;-) > Can you try the patch at > http://codelabs.ru/fbsd/patches/ahci/AHCI-properly-convert-CAM-timeout-to-ticks.diff > >> What I don't understand is why the /2000 > > It gives (timeout_in_ticks)/2. The code in ahci_timeout does the following: > {{{ > /* Check if slot was not being executed last time we checked. */ > if (slot->state < AHCI_SLOT_EXECUTING) { snip.. > > So, my theory is that the first half of the timeout time is devoted > to the transition from AHCI_SLOT_RUNNING -> AHCI_SLOT_EXECUTING and > the second one is the transition from AHCI_SLOT_RUNNING -> TIMEOUT > to give the whole process the duration of a full timeout. However, > judging by the code, if the slot won't start executing at the first > invocation of ahci_timeout that was spawned by the callout armed in > ahci_execute_transaction, we can have timeouts more than for the > specified amount of time. And if the slot will never start its > execution, the callout will spin forever, unless I am missing something > important here. > > May be Alexander can shed some light into this? Interesting thanks for the explaination. I've tried the patch and it a few cut and paste errors, which I've fixed, and confirmed it works as expected, so thanks for that :) There's also a load more drivers with the same issue so I've gone through and fixed all the occurances I can find. Here's the updated patch:- http://blog.multiplay.co.uk/dropzone/freebsd/ccb_timeout.patch Regards Steve ================================================ This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337 or return the E.mail to postmaster@multiplay.co.uk.