From owner-freebsd-hubs@FreeBSD.ORG Fri Jun 13 07:14:22 2003 Return-Path: Delivered-To: freebsd-hubs@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id EA2FE37B401 for ; Fri, 13 Jun 2003 07:14:22 -0700 (PDT) Received: from electra.cse.Buffalo.EDU (electra.cse.Buffalo.EDU [128.205.32.2]) by mx1.FreeBSD.org (Postfix) with ESMTP id 3EEBF43F93 for ; Fri, 13 Jun 2003 07:14:22 -0700 (PDT) (envelope-from kensmith@cse.Buffalo.EDU) Received: from electra.cse.Buffalo.EDU (kensmith@localhost [127.0.0.1]) h5DEELbr016601 for ; Fri, 13 Jun 2003 10:14:21 -0400 (EDT) Received: (from kensmith@localhost) by electra.cse.Buffalo.EDU (8.12.9/8.12.9/Submit) id h5DEELUK016600 for freebsd-hubs@freebsd.org; Fri, 13 Jun 2003 10:14:21 -0400 (EDT) Date: Fri, 13 Jun 2003 10:14:21 -0400 From: Ken Smith To: freebsd-hubs@freebsd.org Message-ID: <20030613141421.GD13868@electra.cse.Buffalo.EDU> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.4.1i Subject: RFC - extra sync functionality... X-BeenThere: freebsd-hubs@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: FreeBSD Distributions Hubs: mail sup ftp List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 13 Jun 2003 14:14:23 -0000 I ran this by Jun and he said the idea seems like it might be a good one but it's not possible right now due to the way data gets loaded onto ftp-master. But if we're changing stuff around maybe this could change too. I thought I'd post it here first for comments before seeing if it's worth following up with that. If this doesn't seem like it would be worthwhile I'll just forget about it for now. In thinking about push mechanism type stuff I thought the whole mirror system has a lot in common with DNS and zone files. One master server where data gets loaded, and slave servers that need to be kept up to date. Sometimes you want the data to propagate instantly, other times you're not in a huge rush. And so on... Taking that view of things Jun and others have already started to carve the FTP site into the equivalent of zones. The trick though is coming up with the equivalent of what triggers a Zone Transfer, which is a different serial number in the SOA for DNS. Slave servers poll the master periodically to check the serial number and only do a zone transfer if the serial number went up. There is also the NOTIFY mechanism in DNS to have data propagate faster than would happen by the normal polling. I thought that maybe there could be a top-level directory in the FTP repository named "serial_numbers" and inside that would be an individual file for each of the modules making up the FTP site. I wouldn't want these files to be part of what gets *transferred* because I'd like it to be under the control of a client-side perl script which is why the separate directory and not having these serial number files part of the module itself (e.g. not putting the serial number file for the ports section in the ports directory). Mirror sites use a perl script to run for the nightly cron jobs (I'd be able to provide this I think...). It starts by transferring the contents of the serial_numbers directory but not writing them to its local directory yet - just storing them in memory. The serial numbers are the standard one recommended in the DNS docs, YYYYMMDDXX where XX is just two extra digits in case you decide to change the contents more than once in a day (e.g. today would be 2003061300). The perl script then only calls your transfer script (you get to choose what that uses - rsync, cvsup, whatever...) for the modules that have had serial number changes. *After* successful transfer the perl script writes the new serial number to your local directory. This would also help with a few scenarios I've run into elsewhere. Suppose you're a Tier-2 mirror and the Tier-1 you try to connect to is actually a DNS round-robin of a couple machines. What happens if those Tier-1's could potentially be updated at different times and you happen to connect to one that's a bit behind after you've connected to one that's exactly current? Here we can do the standard DNS thing - don't do the transfer and whine about the serial number having gone down. And you're only running the transfer scripts (sometimes large load on both you and the site you're pulling from) if it would be useful. We would probably recommend (or implement in this some sort of force mechanism...) running a real transfer pass once a week or so "just because" in case someone(s) don't quite play nice with the serial number generation and update content by mistake. I know there have been times I've forgotten to update the serial number in DNS SOA records when I should have... None of this would break the existing mechanisms if some sites chose to not use it - no big deal. There would just be one more small directory at the top of the site that gets transferred... It also would give us an easy way to check on whether sites are staying up to date if anyone ever decided that was a good idea. I don't know if advertised sites have ever just stopped updating but kept allowing anonymous FTP connects but I've seen other distributions start to pay at least a little attention to that (Apache pops to mind - someone checks on the Apache mirrors and nags periodically). The catch is, of course, the reliable generation of the serial number files. That's where existing practice on ftp-master makes this hard. Is this a good idea? Is it worth seeing if existing practice on ftp-master can be changed? If you reply to this please don't include the whole thing, just the relevant pieces - this list has already been more active than normal and I'm starting to worry some folks might be considering leaving. -- Ken Smith - From there to here, from here to | kensmith@cse.buffalo.edu there, funny things are everywhere. | - Theodore Geisel |