From owner-freebsd-stable@FreeBSD.ORG  Thu Sep  5 15:16:43 2013
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
Delivered-To: freebsd-stable@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTP id D24003A7;
 Thu,  5 Sep 2013 15:16:43 +0000 (UTC)
 (envelope-from zbeeble@gmail.com)
Received: from mail-vc0-x22b.google.com (mail-vc0-x22b.google.com
 [IPv6:2607:f8b0:400c:c03::22b])
 (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits))
 (No client certificate requested)
 by mx1.freebsd.org (Postfix) with ESMTPS id 693442675;
 Thu,  5 Sep 2013 15:16:43 +0000 (UTC)
Received: by mail-vc0-f171.google.com with SMTP id ij15so1240456vcb.16
 for <multiple recipients>; Thu, 05 Sep 2013 08:16:42 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=mime-version:in-reply-to:references:date:message-id:subject:from:to
 :cc:content-type;
 bh=TdeuSSoGbdREMMlKXemEI68o59UlDkKftyZprEmDv7s=;
 b=xToEpFbiUTWqTxFx76bPQvYq0wNx2/LG1zMWnF0Gc55fuj8qigeoQw9bEvbOvOscAS
 rmR6JITqTowNGXorKgH4Qr2m7392zttzTpGWsgPNQfu/v1SQkTejtYgogire2wrp1nHk
 08iBP/Rugp2+/R4Oxg4XVJw9VGrCaKAYWOcP2MjU0jrDCcoGfvtmZq+cSfmCsiKnxLXx
 KteLhALcj59P7WkIcFCLby65MhAU8CZK2kAUtK86kIDcYK6+RrwdvEOh1/X3Wi4J+WOR
 P3g7j715aS32jm1/B+4ZwStE3yjGRNtPSZ8BJt/aWbRH7Z8s+FtK9zjGVXJ31L2c1UuI
 Y2BQ==
MIME-Version: 1.0
X-Received: by 10.220.10.194 with SMTP id q2mr2576633vcq.2.1378394202471; Thu,
 05 Sep 2013 08:16:42 -0700 (PDT)
Received: by 10.220.30.130 with HTTP; Thu, 5 Sep 2013 08:16:42 -0700 (PDT)
In-Reply-To: <CACpH0Mc5GEg4EJfMzpg3c0H7BQti+V6Gu_E9q7uZCVwEo-OvOw@mail.gmail.com>
References: <CACpH0MdU5NMi=GvYbG5W-F-UkH=s89d8Docvz8_KS0+fqNH1cQ@mail.gmail.com>
 <370A25C8-7747-4B96-A506-EB92FD0F77CF@FreeBSD.org>
 <1377895898.1111.341.camel@revolution.hippie.lan>
 <CACpH0MeOG9v422AdpGpbos6mTLFu69PNXmP3ZFTV66Vc=H9Eew@mail.gmail.com>
 <CACpH0Mc+egqb7V9=bJ73VpzLvc5K0cTRN6fQZOkBwPD_TetRDg@mail.gmail.com>
 <A8B0CAB6-6646-4AA0-A03C-412C2438716D@freebsd.org>
 <CACpH0MdQPGB-psLj-xyALtrid6P2ByXH9sZsDxuhT1owdtg4zw@mail.gmail.com>
 <CACpH0Mc5GEg4EJfMzpg3c0H7BQti+V6Gu_E9q7uZCVwEo-OvOw@mail.gmail.com>
Date: Thu, 5 Sep 2013 11:16:42 -0400
Message-ID: <CACpH0MfrO-0m1PN9w3LpTUO6cN4OkZGuZ8HmhijjeG12cxZR-g@mail.gmail.com>
Subject: Re: gmirror crash writing to disk? Or is it su+j crash?
From: Zaphod Beeblebrox <zbeeble@gmail.com>
To: =?ISO-8859-2?Q?Edward_Tomasz_Napiera=B3a?= <trasz@freebsd.org>
Content-Type: text/plain; charset=ISO-8859-2
Content-Transfer-Encoding: quoted-printable
X-Content-Filtered-By: Mailman/MimeDel 2.1.14
Cc: FreeBSD Stable <freebsd-stable@freebsd.org>, Ian Lepore <ian@freebsd.org>
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-stable>,
 <mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
 <mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 05 Sep 2013 15:16:43 -0000

Replying to myself again, I again doubled the bio_transient_maxcnt:
original value 160, failed doubling 360, new value 720; and the machine was
able to successfully "for i in jot 10; do make -j4 buildkernel; done" ...

But doesn't this mean that we still have a resource exhaustion to worry
about?  Isn't this just another race waiting for the the right set of
conditions?


On Tue, Sep 3, 2013 at 11:06 AM, Zaphod Beeblebrox <zbeeble@gmail.com>wrote=
:

> Since there weren't any more ideas here, I tried turning off
> hyper-threading.  This is an old pentium-D type CPU --- that is: one core
> with HT.  I'm wondering if the HT nature is helping this resource
> exhaustion, so I turned off HT (basically making this a single-threaded
> CPU) and it seems to have made the problem go away.
>
> That is not to say that the problem is fixed: it simply means that
> replication may be tied to multiple CPUs and/or the allocation of resourc=
es
> by an HT CPU core.
>
>
> On Mon, Sep 2, 2013 at 3:53 AM, Zaphod Beeblebrox <zbeeble@gmail.com>wrot=
e:
>
>> The first one (kern.geom.transient_map_retries) causes the system to
>> wedge.
>>
>> The second one (default is 180, I doubled to 360) causes the system to
>> crash but not dump.
>>
>> So... neither fixes the problem.
>>
>>
>> On Sat, Aug 31, 2013 at 5:27 AM, Edward Tomasz Napiera=B3a <
>> trasz@freebsd.org> wrote:
>>
>>> Wiadomo=B6=E6 napisana przez Zaphod Beeblebrox <zbeeble@gmail.com> w dn=
iu
>>> 31 sie 2013, o godz. 00:49:
>>> > Because someone said that there would be no logging of unerlying ATA
>>> errors without verbose, I rebooted with verbose and tried the same make=
 -j4
>>> again... and here is the relatively similar core.txt.5
>>> >
>>> >
>>> https://uk.eicat.ca/owncloud/public.php?service=3Dfiles&t=3Dd99648ef587=
6b91c5957148445e60c87
>>> >
>>> > Looking at it, gmirror is dropping the same error and the underlying
>>> hardware is not causing the error...
>>>
>>> Let me quote Konstantin:
>>>
>>> > It is either an exhaustion of the transient map, or a deadlock.
>>> > For the first, setting kern.geom.transient_map_retries to 0 could hel=
p.
>>> > For the second, the count of the transient buffers must be increased,
>>> > by kern.bio_transient_maxcnt loader tunable.
>>>
>>> Could you try both and tell which one of them fixed the problem?  Thank=
s!
>>>
>>>
>>
>