Date: Wed, 12 Jun 2013 19:40:49 -0400 From: Jona Schuman <jonaschuman@gmail.com> To: freebsd-fs@freebsd.org Subject: zfs send/recv dies when transferring large-ish dataset Message-ID: <CAC-LZTYLzFPTvA6S4CN0xTd-E_x9c3kxYwQoFed5LkVBrwVk0Q@mail.gmail.com>
next in thread | raw e-mail | index | archive | help
Hi, I'm getting some strange behavior from zfs send/recv and I'm hoping someone may be able to provide some insight. I have two identical machines running 9.0-RELEASE-p3, each having a ZFS pool (zfs 5, zpool 28) for storage. I want to use zfs send/recv for replication between the two machines. For the most part, this has worked as expected. However, send/recv fails when transferring the largest dataset (both in actual size and in terms of number of files) on either machine. With these datasets, issuing: machine2# nc -d -l 9999 | zfs recv -d storagepool machine1# zfs send dataset@snap | nc machine2 9999 terminates early on the sending side without any error messages. The receiving end continues on as expected, cleaning up the partial data received so far and reverting to its initial state. (I've tried using mbuffer instead of nc, or just using ssh, both with similar results.) Oddly, zfs send dies slightly differently depending on how the two machines are connected. When connected through the racktop switch, zfs send dies quietly without any indication that the transfer has failed. When connected directly using a crossover cable, zfs send dies quietly and machine1 becomes unresponsive (no network, no keyboard, hard reset required). In both cases, no messages are printed to screen or to anything in /var/log/. I can transfer the same datasets successfully if I send/recv to/from file: machine1# zfs send dataset@snap > /tmp/dump machine1# scp /tmp/dump machine2:/tmp/dump machine2# zfs recv -d storagepool < /tmp/dump so I don't think the datasets themselves are the issue. I've also successfully tried send/recv over the network using different network interfaces (10GbE ixgbe cards instead of the 1GbE igb links), which would suggest the issue is with the 1GbE links. Might there be some buffering parameter that I'm neglecting to tune, which is essential on the 1GbE links but may be less important on the faster links? Are there any known issues with the igb driver that might be the culprit here? Any other suggestions? Thanks, Jona
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAC-LZTYLzFPTvA6S4CN0xTd-E_x9c3kxYwQoFed5LkVBrwVk0Q>