Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 12 Jun 2010 16:26:29 +0200
From:      Rolf Grossmann <rg@xamine.com>
To:        Daniel Braniss <danny@cs.huji.ac.il>
Cc:        freebsd-scsi@freebsd.org
Subject:   Re: ZFS reports problem on iscsi target
Message-ID:  <4C139915.8040308@xamine.com>
In-Reply-To: <E1ONLjb-00086F-C8@kabab.cs.huji.ac.il>
References:  <4C12538C.9000400@xamine.com> <E1ONLjb-00086F-C8@kabab.cs.huji.ac.il>

next in thread | previous in thread | raw e-mail | index | archive | help
On 12.06.2010 10:06, Daniel Braniss wrote:
>> Hi,
>>
>> I'm having some trouble with iscsi on FreeBSD 8. My current setup is a
>> stock FreeBSD 8.1-PRERELEASE (as of 2 days ago), GENERIC kernel with
>> some modules loaded, running on a Dell PowerEdge R905 with 64GB RAM, 4
>> quad code CPUs. Attached is an EqualLogic PS6500 storage array with some
>> configured volumes, one of which is for testing. It is configured in
>> /etc/iscsi.conf like this:
>>
>> test2 {
>>
>> TargetName=iqn.2001-05.com.equallogic:0-8a0906-7a4bb9f06-038000000304c0d1-test2
>>   TargetAddress=10.26.17.10:3260,1
>>   tags = 256
>> }
>>
>> Now I'm running the following sequence of commands (shown with output):
>>
>> # iscontrol -n test2
>> iscontrol[56255]: running
>> iscontrol[56255]: (pass2:iscsi0:0:0:0):  tagged openings now 256
>> iscontrol[56255]: cam_open_btl: no passthrough device found at 2:0:1
>> iscontrol[56255]: cam_open_btl: no passthrough device found at 2:0:2
>> iscontrol[56255]: cam_open_btl: no passthrough device found at 2:0:3
>> iscontrol: supervise starting main loop
>> # zpool create test2 da2
>> # zpool scrub test2
>> # zpool status test2
>>   pool: test2
>>  state: ONLINE
>>  scrub: scrub completed after 0h0m with 0 errors on Fri Jun 11 16:56:33 2010
>> config:
>>
>>         NAME        STATE     READ WRITE CKSUM
>>         test2       ONLINE       0     0     0
>>           da2       ONLINE       0     0     0
>>
>> errors: No known data errors
>> # cp -Rp /export/system /test2/
>> # zpool scrub test2
>> # zpool status test2
>>   pool: test2
>>  state: ONLINE
>> status: One or more devices has experienced an error resulting in data
>>         corruption.  Applications may be affected.
>> action: Restore the file in question if possible.  Otherwise restore the
>>         entire pool from backup.
>>    see: http://www.sun.com/msg/ZFS-8000-8A
>>  scrub: scrub completed after 0h0m with 19 errors on Fri Jun 11 17:00:38
>> 2010
>> config:
>>
>>         NAME        STATE     READ WRITE CKSUM
>>         test2       ONLINE       0     0    19
>>           da2       ONLINE       0     0    38
>>
>> errors: 19 data errors, use '-v' for a list
>> #
>>
>> /export/system is a FreeBSD distribution (make install
>> DESTDIR=/export/system). Note how zfs thinks there are 19 files broken
>> after the copy. If I repeat the process, the files vary, but there are
>> always some reported as broken. In this case, they don't seem to be (as
>> checked with md5 and rsync --checksum), but I've had files only giving
>> me an i/o error. Also, if I repeat the same steps on a local disk, zfs
>> is reporting no errors.
>>
>> What I would like to know is:
>> - Is there anything I'm doing wrong? Is there a known problem?
>> - Are there any tools to debug or more reliably reproduce (and narrow
>> down) the problem? I've tried fsx (from /usr/src/tools/regression), but
>> I couldn't find any usage suggestions (other than the usage when run
>> without options) and it doesn't complain when run.
>> - On a different system I've tried using a newer iscsi version from
>> http://www.cs.huji.ac.il/~danny/ftp/freebsd/ but it didn't make any
>> difference. Is that still preferable?
>>
>> Some help would be appreciated.
> 
> Hi Rolf,
> 	I just ran a bunch of tests, like yours, without any problem.
> my setup:
> the target is a NetApp, the host runing the initiator is an
>  AMD Phenom(tm) II X6 1090T Processor, running a very resent 8.1-PRERELEASE
> with 4GB of RAM so that  "vfs.zfs.prefetch_disable" is true, so maybe
> you can try disabling it?
> appart from that, maybe you can check EqualLogic's logs.
> HTH,
> 	danny
> PS: you should use the latest iscsi-2.2.4.tar.gz
> 
> 
Hi Danny,

thanks for your reply. I've just tried again with
vfs.zfs.prefetch_disable=1, but it makes no difference. I also don't
expect zfs to be my problem, so I've just had the idea to try ufs with
the following result (still on stock 8.1-PRERELEASE):

# newfs /dev/da2
/dev/da2: 20490.0MB (41963520 sectors) block size 16384, fragment size 2048
        using 112 cylinder groups of 183.72MB, 11758 blks, 23552 inodes.
super-block backups (for fsck -b #) at:
 160, 376416, 752672, 1128928, 1505184, 1881440, 2257696, 2633952,
[...]
 41388320, 41764576
# fsck -t ufs /dev/da2
** /dev/da2
** Last Mounted on
** Phase 1 - Check Blocks and Sizes
PARTIALLY ALLOCATED INODE I=94272
CLEAR? [yn]

*ouch*

interesting is the fact that this time it seems to be very repeatable. I
can even have fsck fix the problem (and subsequent fsck are fine), but
after a newfs, fsck complains about this inode.

There is nothing in the EqualLogic's logs except for connect and
disconnect entries. Also, I'm using a different volume on the same
EqualLogic from a different machine running Ubuntu Linux with open-iscsi
and fuse-zfs with no problems (except less performance ;P), so I don't
suspect a hardware problem.

I guess I'll spend some time looking at a tcpdump of the newfs/fsck
test, but it will be a while until I understand all the protocols
involved. Any other suggestions would be very welcome.

Thanks, Rolf.



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4C139915.8040308>