From owner-freebsd-stable@FreeBSD.ORG Thu Sep 5 15:16:43 2013 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTP id D24003A7; Thu, 5 Sep 2013 15:16:43 +0000 (UTC) (envelope-from zbeeble@gmail.com) Received: from mail-vc0-x22b.google.com (mail-vc0-x22b.google.com [IPv6:2607:f8b0:400c:c03::22b]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 693442675; Thu, 5 Sep 2013 15:16:43 +0000 (UTC) Received: by mail-vc0-f171.google.com with SMTP id ij15so1240456vcb.16 for ; Thu, 05 Sep 2013 08:16:42 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=TdeuSSoGbdREMMlKXemEI68o59UlDkKftyZprEmDv7s=; b=xToEpFbiUTWqTxFx76bPQvYq0wNx2/LG1zMWnF0Gc55fuj8qigeoQw9bEvbOvOscAS rmR6JITqTowNGXorKgH4Qr2m7392zttzTpGWsgPNQfu/v1SQkTejtYgogire2wrp1nHk 08iBP/Rugp2+/R4Oxg4XVJw9VGrCaKAYWOcP2MjU0jrDCcoGfvtmZq+cSfmCsiKnxLXx KteLhALcj59P7WkIcFCLby65MhAU8CZK2kAUtK86kIDcYK6+RrwdvEOh1/X3Wi4J+WOR P3g7j715aS32jm1/B+4ZwStE3yjGRNtPSZ8BJt/aWbRH7Z8s+FtK9zjGVXJ31L2c1UuI Y2BQ== MIME-Version: 1.0 X-Received: by 10.220.10.194 with SMTP id q2mr2576633vcq.2.1378394202471; Thu, 05 Sep 2013 08:16:42 -0700 (PDT) Received: by 10.220.30.130 with HTTP; Thu, 5 Sep 2013 08:16:42 -0700 (PDT) In-Reply-To: References: <370A25C8-7747-4B96-A506-EB92FD0F77CF@FreeBSD.org> <1377895898.1111.341.camel@revolution.hippie.lan> Date: Thu, 5 Sep 2013 11:16:42 -0400 Message-ID: Subject: Re: gmirror crash writing to disk? Or is it su+j crash? From: Zaphod Beeblebrox To: =?ISO-8859-2?Q?Edward_Tomasz_Napiera=B3a?= Content-Type: text/plain; charset=ISO-8859-2 Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.14 Cc: FreeBSD Stable , Ian Lepore X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 05 Sep 2013 15:16:43 -0000 Replying to myself again, I again doubled the bio_transient_maxcnt: original value 160, failed doubling 360, new value 720; and the machine was able to successfully "for i in jot 10; do make -j4 buildkernel; done" ... But doesn't this mean that we still have a resource exhaustion to worry about? Isn't this just another race waiting for the the right set of conditions? On Tue, Sep 3, 2013 at 11:06 AM, Zaphod Beeblebrox wrote= : > Since there weren't any more ideas here, I tried turning off > hyper-threading. This is an old pentium-D type CPU --- that is: one core > with HT. I'm wondering if the HT nature is helping this resource > exhaustion, so I turned off HT (basically making this a single-threaded > CPU) and it seems to have made the problem go away. > > That is not to say that the problem is fixed: it simply means that > replication may be tied to multiple CPUs and/or the allocation of resourc= es > by an HT CPU core. > > > On Mon, Sep 2, 2013 at 3:53 AM, Zaphod Beeblebrox wrot= e: > >> The first one (kern.geom.transient_map_retries) causes the system to >> wedge. >> >> The second one (default is 180, I doubled to 360) causes the system to >> crash but not dump. >> >> So... neither fixes the problem. >> >> >> On Sat, Aug 31, 2013 at 5:27 AM, Edward Tomasz Napiera=B3a < >> trasz@freebsd.org> wrote: >> >>> Wiadomo=B6=E6 napisana przez Zaphod Beeblebrox w dn= iu >>> 31 sie 2013, o godz. 00:49: >>> > Because someone said that there would be no logging of unerlying ATA >>> errors without verbose, I rebooted with verbose and tried the same make= -j4 >>> again... and here is the relatively similar core.txt.5 >>> > >>> > >>> https://uk.eicat.ca/owncloud/public.php?service=3Dfiles&t=3Dd99648ef587= 6b91c5957148445e60c87 >>> > >>> > Looking at it, gmirror is dropping the same error and the underlying >>> hardware is not causing the error... >>> >>> Let me quote Konstantin: >>> >>> > It is either an exhaustion of the transient map, or a deadlock. >>> > For the first, setting kern.geom.transient_map_retries to 0 could hel= p. >>> > For the second, the count of the transient buffers must be increased, >>> > by kern.bio_transient_maxcnt loader tunable. >>> >>> Could you try both and tell which one of them fixed the problem? Thank= s! >>> >>> >> >