Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 11 Aug 2017 08:09:29 +0000
From:      bugzilla-noreply@freebsd.org
To:        freebsd-fs@FreeBSD.org
Subject:   [Bug 219760] ZFS iSCSI w/ Win10 Initiator Causes pool corruption
Message-ID:  <bug-219760-3630-Cm6I06vHHE@https.bugs.freebsd.org/bugzilla/>
In-Reply-To: <bug-219760-3630@https.bugs.freebsd.org/bugzilla/>
References:  <bug-219760-3630@https.bugs.freebsd.org/bugzilla/>

next in thread | previous in thread | raw e-mail | index | archive | help
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D219760

emz@norma.perm.ru changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |emz@norma.perm.ru

--- Comment #6 from emz@norma.perm.ru ---
I observed similar behaviour on one of my SAN systems.

In my opinion, iSCSI + zfs is broken somewhere between r310734 and r320056.

Symptoms:

- random fatal trap 12 panics.
- random general protection faults panics
- random "Solaris(panic): zfs: allocating allocated segment" panics
- zfs pool corruption that happens ONLY on pools that serve the zvols as iS=
CSI
target devices
- zfs pool corruption happening _on the fly_, without system panicking.
- no zfs corruption is happening of the zfs pools not serving the devices f=
or
the iSCSI targets.

I have 7 SAN systems running this setup. No system more recent than r310734=
 is
showing that behaviour. The only system more recent than r310734 (at least
r320056, and until 11.1-RELEASE) was affected by this, and became healthy w=
hen
downgraded to r310734 (r310734 was chosen simply because it's the most rece=
nt
revision on all of the 7).

First I had the strong impression that we had a hardware problem. Memtest86+
found no problems. We found multiple SNART ATA errors that were caused by t=
he
bad cabling, and that seemed to be the rooy cause for the moment, but after
switching to a new cable (and also to a new HBA, new server and new enclosu=
re)
the problem hasn't vanished. It was solved only after the downgrade to the
r310734. The SAN system is up and running for 48 hours already without pool
corruption (which usually happened withing first 12 hours of running) and
without panics (which usually happened within first 24 hours).

Unfortunately, I have no crashdumps, because the mpr(4) blocks crashdump
collecting (see the discussion in the freebsd-scsi@). I have only the
backtraces  from serial-over-ethernet IPMI, which I will attach here.

Problem initial description:

https://lists.freebsd.org/pipermail/freebsd-fs/2017-August/025099.html

--=20
You are receiving this mail because:
You are the assignee for the bug.=



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?bug-219760-3630-Cm6I06vHHE>