Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 10 Sep 2015 13:44:18 -0400
From:      "Chad J. Milios" <milios@ccsys.com>
To:        Erich Dollansky <erichsfreebsdlist@alogt.com>
Cc:        "freebsd-questions@freebsd.org" <freebsd-questions@freebsd.org>
Subject:   Re: mdconfig creating file based memory disk
Message-ID:  <A646B4C4-04DD-4840-A478-2EB28B0951F9@ccsys.com>
In-Reply-To: <20150910111034.20b97c41@X220.alogt.com>
References:  <20150910111034.20b97c41@X220.alogt.com>

next in thread | previous in thread | raw e-mail | index | archive | help
> On Sep 9, 2015, at 11:10 PM, Erich Dollansky <erichsfreebsdlist@alogt.com>=
 wrote:
>=20
> Hi,
>=20
> I just came across a simple question. What will happen when I create
> two memory disks using the same file?
>=20
> Example:
>=20
> mdconfig -f /usr/home/swap/swapfile -u 0
> mdconfig -f /usr/home/swap/swapfile -u 1
>=20
> and then I do a
>=20
> swapon /dev/md0
> swapon /dev/md1
>=20
> It gives me double the size of 'swapfile' as swap space. It is obvious
> to me that this must fail.
>=20
> Shouldn't there be a note in the documentation?
>=20
> Erich

Perhaps, but if we documented every way in which FreeBSD allows one to shoot=
 oneself in the foot, the docs would probably more than triple in size. :)

This is an interesting experiment but I can't imagine anyone inviting the da=
nger while actually expecting to get away with such a configuration and I do=
n't imagine happening onto it by accident any more likely than the other inf=
inite potentially dangerous misconfigurations of *nix. I doubt this merits a=
 mention for safety's sake, though as an illustration of how swap actually w=
orks internally it has a lot of merit. I'd be curious to see more thorough t=
est results and discussion from those with intimate knowledge of the virtual=
 memory and swapper/pager systems.

Imagine the following analog: a hypothetical database software which mmap()s=
 a file possibly larger than physical memory to rely on the VM system for de=
mand paging. Now imagine two or more instances of the database software bein=
g started with hard links to the same underlying file and both/all are allow=
ed to read and write. If the software is SMP-capable and uses locks or data s=
tructures WITHIN the mapped region to handle synchronization (and doesn't go=
 out of its way to in-and-of-itself cache/process the data (beyond the help t=
he kernel already provides) outside that region for moments during which the=
 data could become stale) then the multiple instances could all serve data f=
rom, AND modify data in, that same single source of truth and will remain st=
able and in-sync even without msync()ing to the underlying file or storage. I=
'm also positive this holds true though any (or an arbitrary and very large)=
 number/combination of indirections through hardlinks, symlinks, mdconfig, n=
ullfs and/or unionfs (or it intends to, so any failure or race should be con=
sidered a kernel bug).

So without inspecting the relevant kernel source myself, based on the little=
 experiment you've conducted, I can imagine the swap perhaps having been set=
 up in a way that the data structure(s) that map swapped regions is either f=
ully inside or fully outside the swap partition/file in a way in which any "=
surprise" data showing up in the "other" swap device (besides the one it was=
 written to) ends up being non-problematic. I am just brainstorming here and=
 would love it if someone with knowledge rather than conjecture chimes in. :=
)

On the outset of the experiment you describe, my expectation was almost cert=
ain spectacular failure. Anything else actually is quite curious and if such=
 a config doesn't just burst right into flames I consider it quite a testame=
nt to sound *nix engineering. I'd be interested to hear someone exercise it w=
ith more swapping out and paging in of data and verifying the data and seman=
tics.=



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?A646B4C4-04DD-4840-A478-2EB28B0951F9>