Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 13 Jun 2003 10:14:21 -0400
From:      Ken Smith <kensmith@cse.Buffalo.EDU>
To:        freebsd-hubs@freebsd.org
Subject:   RFC - extra sync functionality...
Message-ID:  <20030613141421.GD13868@electra.cse.Buffalo.EDU>

next in thread | raw e-mail | index | archive | help

I ran this by Jun and he said the idea seems like it might be a good
one but it's not possible right now due to the way data gets loaded
onto ftp-master.  But if we're changing stuff around maybe this could
change too.  I thought I'd post it here first for comments before
seeing if it's worth following up with that.  If this doesn't seem
like it would be worthwhile I'll just forget about it for now.

In thinking about push mechanism type stuff I thought the whole
mirror system has a lot in common with DNS and zone files.  One
master server where data gets loaded, and slave servers that need
to be kept up to date.  Sometimes you want the data to propagate
instantly, other times you're not in a huge rush.  And so on...

Taking that view of things Jun and others have already started to
carve the FTP site into the equivalent of zones.  The trick though
is coming up with the equivalent of what triggers a Zone Transfer,
which is a different serial number in the SOA for DNS.  Slave servers
poll the master periodically to check the serial number and only do
a zone transfer if the serial number went up.  There is also the
NOTIFY mechanism in DNS to have data propagate faster than would
happen by the normal polling.

I thought that maybe there could be a top-level directory in the
FTP repository named "serial_numbers" and inside that would be an
individual file for each of the modules making up the FTP site.
I wouldn't want these files to be part of what gets *transferred*
because I'd like it to be under the control of a client-side perl
script which is why the separate directory and not having these
serial number files part of the module itself (e.g. not putting
the serial number file for the ports section in the ports directory).

Mirror sites use a perl script to run for the nightly cron jobs (I'd
be able to provide this I think...).  It starts by transferring the
contents of the serial_numbers directory but not writing them to
its local directory yet - just storing them in memory.  The serial
numbers are the standard one recommended in the DNS docs, YYYYMMDDXX
where XX is just two extra digits in case you decide to change the
contents more than once in a day (e.g. today would be 2003061300).
The perl script then only calls your transfer script (you get to
choose what that uses - rsync, cvsup, whatever...) for the modules
that have had serial number changes.  *After* successful transfer
the perl script writes the new serial number to your local directory.

This would also help with a few scenarios I've run into elsewhere.
Suppose you're a Tier-2 mirror and the Tier-1 you try to connect to
is actually a DNS round-robin of a couple machines.  What happens
if those Tier-1's could potentially be updated at different times
and you happen to connect to one that's a bit behind after you've
connected to one that's exactly current?  Here we can do the standard
DNS thing - don't do the transfer and whine about the serial number
having gone down.

And you're only running the transfer scripts (sometimes large load on
both you and the site you're pulling from) if it would be useful.

We would probably recommend (or implement in this some sort of force
mechanism...) running a real transfer pass once a week or so "just
because" in case someone(s) don't quite play nice with the serial
number generation and update content by mistake.  I know there have
been times I've forgotten to update the serial number in DNS SOA
records when I should have...  None of this would break the existing
mechanisms if some sites chose to not use it - no big deal.  There
would just be one more small directory at the top of the site that
gets transferred...  It also would give us an easy way to check on
whether sites are staying up to date if anyone ever decided that was
a good idea.  I don't know if advertised sites have ever just stopped
updating but kept allowing anonymous FTP connects but I've seen other
distributions start to pay at least a little attention to that (Apache
pops to mind - someone checks on the Apache mirrors and nags periodically).

The catch is, of course, the reliable generation of the serial number
files.  That's where existing practice on ftp-master makes this hard.

Is this a good idea?  Is it worth seeing if existing practice on ftp-master
can be changed?  If you reply to this please don't include the whole
thing, just the relevant pieces - this list has already been more
active than normal and I'm starting to worry some folks might be
considering leaving.

-- 
						Ken Smith
- From there to here, from here to      |       kensmith@cse.buffalo.edu
  there, funny things are everywhere.   |
                      - Theodore Geisel |



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20030613141421.GD13868>