From owner-freebsd-stable@FreeBSD.ORG Fri Oct 23 10:56:31 2009 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 751C01065670 for ; Fri, 23 Oct 2009 10:56:31 +0000 (UTC) (envelope-from petefrench@ticketswitch.com) Received: from constantine.ticketswitch.com (constantine.ticketswitch.com [IPv6:2002:57e0:1d4e:1::3]) by mx1.freebsd.org (Postfix) with ESMTP id 38E2F8FC08 for ; Fri, 23 Oct 2009 10:56:31 +0000 (UTC) Received: from dilbert.rattatosk ([10.64.50.6] helo=dilbert.ticketswitch.com) by constantine.ticketswitch.com with esmtps (TLSv1:AES256-SHA:256) (Exim 4.69 (FreeBSD)) (envelope-from ) id 1N1Hom-000BU1-7F for freebsd-stable@freebsd.org; Fri, 23 Oct 2009 11:56:24 +0100 Received: from petefrench by dilbert.ticketswitch.com with local (Exim 4.69 (FreeBSD)) (envelope-from ) id 1N1Hom-000HpE-6S for freebsd-stable@freebsd.org; Fri, 23 Oct 2009 11:56:24 +0100 To: freebsd-stable@freebsd.org Message-Id: From: Pete French Date: Fri, 23 Oct 2009 11:56:24 +0100 Subject: problems with gmirror on ggate over slow link X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 23 Oct 2009 10:56:31 -0000 [ originally sent to geom, but am throwing it open to a wider audience as I didn;t get any replies there] I am using 7.2-STABLE from October 7th on all amchines, but this has been going on a while. Very simply I am mirroring together a pair of discs, one local, one remote. The remote disc is accessed using ggate. If the remote diisc is actually on a very close machine - e.g. a server plugged into the same ether net - then all works fine. If I make the remote disc somewhere actually substantially further away on the nbetwork, however, then when I attach the disc it starts to rebuild the mirror but then fails a fraction of a second later thus: GEOM_MIRROR: Device mysql0: rebuilding provider ggate1a. GEOM_MIRROR: Synchronization request failed (error=5). ggate1a[WRITE(offset=1310720, length=131072)] GEOM_MIRROR: Device mysql0: provider ggate1a disconnected. GEOM_MIRROR: Device mysql0: rebuilding provider ggate1a stopped. The interesting this is that the problem is only with gmirror, not with the underlying ggate disc which remains attached and accessible. I tested this by adding a second partition (ggate1b in the example above) and mounting a UFS filesystem on that. I've looked at the kernel code briefly, but it is not clear to me what is causing that write to fail. My conjecture would be that a buffer somewhere is filling up, causing a write to fail, and instead of gmirror waiting and retrying, instead it just fails the synchronisation. Any ideas ? Is this actually a bug ? I am wondering if it would also happen if mirroring a very fast disc against a very slow one (i.e. maybe it is independent of ggate) -pete.