FreeBSD Mail Archives

Date:      Wed, 17 Aug 2016 13:03:19 -0500
From:      Linda Kateley <lkateley@kateley.com>
To:        Chris Watson <bsdunix44@gmail.com>, linda@kateley.com
Cc:        freebsd-fs@freebsd.org
Subject:   Re: HAST + ZFS + NFS + CARP
Message-ID:  <6b866b6e-1ab3-bcc5-151b-653e401742bd@kateley.com>
In-Reply-To: <92F4BE3D-E4C1-4E5C-B631-D8F124988A83@gmail.com>
References:  <61283600-A41A-4A8A-92F9-7FAFF54DD175@ixsystems.com> <20160704183643.GI41276@mordor.lan> <AE372BF0-02BE-4BF3-9073-A05DB4E7FE34@ixsystems.com> <20160704193131.GJ41276@mordor.lan> <E7D42341-D324-41C7-B03A-2420DA7A7952@sarenet.es> <20160811091016.GI70364@mordor.lan> <1AA52221-9B04-4CF6-97A3-D2C2B330B7F9@sarenet.es> <472bc879-977f-8c4c-c91a-84cc61efcd86@internetx.com> <20160817085413.GE22506@mordor.lan> <465bdec5-45b7-8a1d-d580-329ab6d4881b@internetx.com> <20160817095222.GG22506@mordor.lan> <52d5b687-1351-9ec5-7b67-bfa0be1c8415@kateley.com> <92F4BE3D-E4C1-4E5C-B631-D8F124988A83@gmail.com>

I just do consulting so I don't always get to see the end of the 
project. Although we are starting to do more ongoing support so we can 
see the progress..

I have worked with some of the guys from high-availability.com for maybe 
20 years. RSF-1 is the cluster that is bundled with nexenta. Does work 
beautifully with omni/illumos. The one customer I have running it in 
prod is an isp in south america running openstack and zfs on freebsd as 
iscsi. Big boxes, 90+ drives per frame.  If someone would like try it, i 
have some contacts there. Ping me offlist.

You do risk losing data if you batch zfs send. It is very hard to run 
that real time. You have to take the snap then send the snap. Most 
people run in cron, even if it's not in cron, you would want one to 
finish before you started the next. If you lose the sending host before 
the receive is complete you won't have a full copy. With zfs though you 
will probably still have the data on the sending host, however long it 
takes to bring it back up. RSF-1 runs in the zfs stack and send the 
writes to the second system. It's kind of pricey, but actually much less 
expensive than commercial alternatives.

Anytime you run anything sync it adds latency but makes things safer.. 
There is also a cool tool I like, called zerto for vmware that sits in 
the hypervisor and sends a sync copy of a write locally and then an 
async remotely. It's pretty cool. Although I haven't run it myself, have 
a bunch of customers running it. I believe it works with proxmox too.

Most people I run into (these days) don't mind losing 5 or even 30 
minutes of data. Small shops. They usually have a copy somewhere else. 
Or the cost of 5-30 minutes isn't that great. I used work as a 
datacenter architect for sun/oracle with only fortune 500. There losing 
1 sec could put large companies out of business. I worked with banks and 
exchanges. They couldn't ever lose a single transaction. Most people 
nowadays do the replication/availability in the application though and 
don't care about underlying hardware, especially disk.


On 8/17/16 11:55 AM, Chris Watson wrote:
> Of course, if you are willing to accept some amount of data loss that 
> opens up a lot more options. :)
>
> Some may find that acceptable though. Like turning off fsync with 
> PostgreSQL to get much higher throughput. As little no as you are made 
> *very* aware of the risks.
>
> It's good to have input in this thread from one with more experience 
> with RSF-1 than the rest of us. You confirm what others have that said 
> about RSF-1, that it's stable and works well. What were you deploying 
> it on?
>
> Chris
>
> Sent from my iPhone 5
>
> On Aug 17, 2016, at 11:18 AM, Linda Kateley <lkateley@kateley.com 
> <mailto:lkateley@kateley.com>> wrote:
>
>> The question I always ask, as an architect, is "can you lose 1 minute 
>> worth of data?" If you can, then batched replication is perfect. If 
>> you can't.. then HA. Every place I have positioned it, rsf-1 has 
>> worked extremely well. If i remember right, it works at the dmu. I 
>> would suggest try it. They have been trying to have a full freebsd 
>> solution, I have several customers running it well.
>>
>> linda
>>
>>
>> On 8/17/16 4:52 AM, Julien Cigar wrote:
>>> On Wed, Aug 17, 2016 at 11:05:46AM +0200, InterNetX - Juergen 
>>> Gotteswinter wrote:
>>>>
>>>> Am 17.08.2016 um 10:54 schrieb Julien Cigar:
>>>>> On Wed, Aug 17, 2016 at 09:25:30AM +0200, InterNetX - Juergen 
>>>>> Gotteswinter wrote:
>>>>>>
>>>>>> Am 11.08.2016 um 11:24 schrieb Borja Marcos:
>>>>>>>> On 11 Aug 2016, at 11:10, Julien Cigar <julien@perdition.city 
>>>>>>>> <mailto:julien@perdition.city>> wrote:
>>>>>>>>
>>>>>>>> As I said in a previous post I tested the zfs send/receive 
>>>>>>>> approach (with
>>>>>>>> zrep) and it works (more or less) perfectly.. so I concur in 
>>>>>>>> all what you
>>>>>>>> said, especially about off-site replicate and synchronous 
>>>>>>>> replication.
>>>>>>>>
>>>>>>>> Out of curiosity I'm also testing a ZFS + iSCSI + CARP at the 
>>>>>>>> moment,
>>>>>>>> I'm in the early tests, haven't done any heavy writes yet, but 
>>>>>>>> ATM it
>>>>>>>> works as expected, I havent' managed to corrupt the zpool.
>>>>>>> I must be too old school, but I don’t quite like the idea of 
>>>>>>> using an essentially unreliable transport
>>>>>>> (Ethernet) for low-level filesystem operations.
>>>>>>>
>>>>>>> In case something went wrong, that approach could risk 
>>>>>>> corrupting a pool. Although, frankly,
>>>>>>> ZFS is extremely resilient. One of mine even survived a SAS HBA 
>>>>>>> problem that caused some
>>>>>>> silent corruption.
>>>>>> try dual split import :D i mean, zpool -f import on 2 machines 
>>>>>> hooked up
>>>>>> to the same disk chassis.
>>>>> Yes this is the first thing on the list to avoid .. :)
>>>>>
>>>>> I'm still busy to test the whole setup here, including the
>>>>> MASTER -> BACKUP failover script (CARP), but I think you can prevent
>>>>> that thanks to:
>>>>>
>>>>> - As long as ctld is running on the BACKUP the disks are locked
>>>>> and you can't import the pool (even with -f) for ex (filer2 is the
>>>>> BACKUP):
>>>>> https://gist.github.com/silenius/f9536e081d473ba4fddd50f59c56b58f
>>>>>
>>>>> - The shared pool should not be mounted at boot, and you should ensure
>>>>> that the failover script is not executed during boot time too: this is
>>>>> to handle the case wherein both machines turn off and/or re-ignite at
>>>>> the same time. Indeed, the CARP interface can "flip" it's status 
>>>>> if both
>>>>> machines are powered on at the same time, for ex:
>>>>> https://gist.github.com/silenius/344c3e998a1889f988fdfc3ceba57aaf and
>>>>> you will have a split-brain scenario
>>>>>
>>>>> - Sometimes you'll need to reboot the MASTER for some $reasons
>>>>> (freebsd-update, etc) and the MASTER -> BACKUP switch should not
>>>>> happen, this can be handled with a trigger file or something like that
>>>>>
>>>>> - I've still have to check if the order is OK, but I think that as 
>>>>> long
>>>>> as you shutdown the replication interface and that you adapt the
>>>>> advskew (including the config file) of the CARP interface before the
>>>>> zpool import -f in the failover script you can be relatively confident
>>>>> that nothing will be written on the iSCSI targets
>>>>>
>>>>> - A zpool scrub should be run at regular intervals
>>>>>
>>>>> This is my MASTER -> BACKUP CARP script ATM
>>>>> https://gist.github.com/silenius/7f6ee8030eb6b923affb655a259bfef7
>>>>>
>>>>> Julien
>>>>>
>>>> 100€ question without detailed looking at that script. yes from a first
>>>> view its super simple, but: why are solutions like rsf-1 such more
>>>> powerful / featurerich. Theres a reason for, which is that they try to
>>>> cover every possible situation (which makes more than sense for this).
>>> I've never used "rsf-1" so I can't say much more about it, but I have
>>> no doubts about it's ability to handle "complex situations", where
>>> multiple nodes / networks are involved.
>>>
>>>> That script works for sure, within very limited cases imho
>>>>
>>>>>> kaboom, really ugly kaboom. thats what is very likely to happen 
>>>>>> sooner
>>>>>> or later especially when it comes to homegrown automatism solutions.
>>>>>> even the commercial parts where much more time/work goes into such
>>>>>> solutions fail in a regular manner
>>>>>>
>>>>>>> The advantage of ZFS send/receive of datasets is, however, that 
>>>>>>> you can consider it
>>>>>>> essentially atomic. A transport corruption should not cause 
>>>>>>> trouble (apart from a failed
>>>>>>> "zfs receive") and with snapshot retention you can even roll 
>>>>>>> back. You can’t roll back
>>>>>>> zpool replications :)
>>>>>>>
>>>>>>> ZFS receive does a lot of sanity checks as well. As long as your 
>>>>>>> zfs receive doesn’t involve a rollback
>>>>>>> to the latest snapshot, it won’t destroy anything by mistake. 
>>>>>>> Just make sure that your replica datasets
>>>>>>> aren’t mounted and zfs receive won’t complain.
>>>>>>>
>>>>>>>
>>>>>>> Cheers,
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Borja.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> freebsd-fs@freebsd.org <mailto:freebsd-fs@freebsd.org> mailing list
>>>>>>> https://lists.freebsd.org/mailman/listinfo/freebsd-fs
>>>>>>> To unsubscribe, send any mail to 
>>>>>>> "freebsd-fs-unsubscribe@freebsd.org 
>>>>>>> <mailto:freebsd-fs-unsubscribe@freebsd.org>"
>>>>>>>
>>>>>> _______________________________________________
>>>>>> freebsd-fs@freebsd.org <mailto:freebsd-fs@freebsd.org> mailing list
>>>>>> https://lists.freebsd.org/mailman/listinfo/freebsd-fs
>>>>>> To unsubscribe, send any mail to 
>>>>>> "freebsd-fs-unsubscribe@freebsd.org 
>>>>>> <mailto:freebsd-fs-unsubscribe@freebsd.org>"
>>
>> _______________________________________________
>> freebsd-fs@freebsd.org <mailto:freebsd-fs@freebsd.org> mailing list
>> https://lists.freebsd.org/mailman/listinfo/freebsd-fs
>> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org 
>> <mailto:freebsd-fs-unsubscribe@freebsd.org>"

Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?6b866b6e-1ab3-bcc5-151b-653e401742bd>

Header And Logo

Peripheral Links

Site Navigation

Header And Logo

Peripheral Links

Search

Site Navigation