Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 9 Mar 2010 17:03:41 -0600
From:      Kevin Day <toasty@dragondata.com>
To:        freebsd-fs@freebsd.org
Subject:   iscsi over HAST backed storage partial success
Message-ID:  <7418ECC2-55C1-4A28-82EA-0972AFE745EF@dragondata.com>

next in thread | raw e-mail | index | archive | help

I'm running istgt (iscsi target) using HAST backed storage. For the most =
part, it seems to work really well. I have ucarp running to change the =
IP that istgt is bound to, and modified the ucarp scripts to start/stop =
istgt depending on which side is the master. If I shut down the primary, =
the secondary takes over and all seems well.

However, if I reboot the secondary, the primary starts freezing up for =
long periods:

Mar  9 22:46:27 cs04 hastd: [iscsi1] (primary) Unable to r: Socket is =
not connected.
Mar  9 22:46:27 cs04 hastd: [iscsi1] (primary) Unable to co: Connection =
refused.
Mar  9 22:46:42 cs04 last message repeated 3 times
Mar  9 22:46:53 cs04 istgt[14298]: ABORT_TASK
Mar  9 22:47:35 cs04 last message repeated 3 times
Mar  9 22:48:02 cs04 hastd: [iscsi1] (primary) Unable to co: Operation =
timed out.
Mar  9 22:48:02 cs04 istgt[14298]: CmdSN(45748), OP=3D0x2a, =
ElapsedTime=3D74 cleared=20
Mar  9 22:48:02 cs04 istgt[14298]: istgt_iscsi.c: =
640:istgt_iscsi_write_pdu: ***ERROR*** iscsi_write() failed (errno=3D32)
Mar  9 22:48:02 cs04 istgt[14298]: =
istgt_iscsi.c:3327:istgt_iscsi_op_task: ***ERROR*** iscsi_write_pdu() =
failed
Mar  9 22:48:02 cs04 istgt[14298]: =
istgt_iscsi.c:3867:istgt_iscsi_execute: ***ERROR*** iscsi_op_task() =
failed =20
Mar  9 22:48:02 cs04 istgt[14298]: istgt_iscsi.c:4337:worker: =
***ERROR*** iscsi_execute() failed
Mar  9 22:48:02 cs04 istgt[14298]: CmdSN(490802), OP=3D0x2a, =
ElapsedTime=3D73 cleared
Mar  9 22:48:02 cs04 istgt[14298]: CmdSN(28387), OP=3D0x2a, =
ElapsedTime=3D73 cleared=20
Mar  9 22:48:14 cs04 istgt[14298]: ABORT_TASK
Mar  9 22:48:52 cs04 last message repeated 2 times
Mar  9 22:49:22 cs04 hastd: [iscsi1] (primary) Unable to co: Operation =
timed out.

As soon as the secondary comes back online, everything starts behaving =
again and all is well.

Is this expected behavior at this point, or should hastd not block like =
this?

-- Kevin




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?7418ECC2-55C1-4A28-82EA-0972AFE745EF>