Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 19 Mar 2008 06:36:30 -0700
From:      Jeremy Chadwick <koitsu@freebsd.org>
To:        Michael Grant <mg-fbsd3@grant.org>
Cc:        freebsd-stable@freebsd.org
Subject:   Re: Upgrading to 7.0 - stupid requirements
Message-ID:  <20080319133630.GA14376@eos.sc1.parodius.com>
In-Reply-To: <62b856460803190546t4abfcb9fu7d3410646d81b656@mail.gmail.com>
References:  <867igo3cih.fsf@zid.claresco.hr> <200803191047.m2JAl7YL070946@lurza.secnetix.de> <62b856460803190546t4abfcb9fu7d3410646d81b656@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Wed, Mar 19, 2008 at 01:46:07PM +0100, Michael Grant wrote:
> My server is live and serving customers.  I can't afford to take the
> box down for a whole day while I upgrade ports.  Is there any
> intelligent way to do this?

The ways people have given you are proper *and* intelligent.  I think
the piece of information you're not understanding (or haven't been
given?) is that library semantics change.  That doesn't just mean
library version numbers -- it means actual API functionality changes.

> For example, could I do everything on a second disk while running the
> live system on the first disk?  For example using a chroot so it
> thinks it's
> 
> For example, might this work?
> 
> 1) upgrade system in the canonical way:
> # make buildworld
> # make buildkernel
> # make installkernel
> # reboot
> # mergemaster -p
> # make installworld
> # mergemaster
> # reboot

Assuming after "reboot" you include boot into single user: yep, correct.
Follow the procedure documented in /usr/src/Makefile to a tee.  Do not
deviate from it unless you know *exactly* the risks of doing so.  :-)

By the way, all of the above guarantees downtime.  You seem to be of the
"I must achieve Five Nines" mentality, so I'll point you to why Five
Nines is absurd: http://en.wikipedia.org/wiki/Myth_of_the_nines

> 2) make sure misc/compat6x is installed

You're asking for trouble right here.  I've discussed in a previous
thread (if you want me to dig it up I can) the dangers of using the
compatXx packages.  You can:

* End up with dual libraries in your ld.so path, which can result
  in functions of the same name being loaded, which is a problem
  because...
* Function semantics/API differences exist between RELENG_7 and
  RELENG_6, 6 and 5, 5 and 4, 4 and 3 -- but function names remain
  the same.
* Library innards change.  Best example?  libkvm.  These are so
  major that an entries goes into /usr/src/UPDATING when the semantics
  change.  People who don't follow the proper upgrade procedure get
  amusing results: "top doesn't work, it spits out some weird error!"
  "why is ps broken?!?" etc.
* "Library nightmare" syndrome, which is the UNIX equivalent of
  Windows' "DLL Hell".  Some program on the machine attempts to
  link to a library called "libapemans.so.4", and you have two
  versions of it: one for 6, and one for 7.  Uh oh!

How do you avoid these problems?

You use software that you have the source for, and rebuild that software
from the source.  You then guarantee (assuming compatXx isn't installed)
that the software links to a proper library, works with proper API
functionality (or else it won't compile/link properly), and you retain a
clean library tree.

In the case of packages, they're also OK.  Just be sure to use ones for
the OS release you're using; don't go pkg_add'ing packages from RELENG_5
on your RELENG_7 box, for example.  :-)  Remember: you have the source.

Do you have programs from commercial vendors who do not give you source,
which relies upon RELENG_6?  If so, you should consider *not* migrating
from RELENG_6, and instead getting your vendor to build their software
on RELENG_7!  Work with them, help them, test with them.

> 3) on a second disk or in a directory somewhere like /new
> a) nullfs mount read-only all the things one needs inside a chroot to
> work except /usr/local
> b) create a writable /usr/local, /usr/X11R6, /compat/linux and /var/db
> in the chroot
> 
> 4) then for each package installed, install it within the chroot

I don't see how this is going to work.  You would need to copy a TON of
files from /usr/lib, /libexec, /usr/libexec, and other places into your
chroot tree before anything will work.  You'll also have to use chflags
to deal with files that're schg.

Additionally, if you're installing **packages**, why are you bothering
with the chroot aspect?  This should be a VERY quick task -- no
compiling needed, since they're all binary.  It WILL NOT take an entire
day.  I'd say 30 minutes -- tops.

All that said...

I'm a hosting provider myself, not just of websites, but of servers and
of rack space as well.  I've done it since 1993, using Linux from 1993
to 1997.

We have users who are lax ("oh, the server was down for 2 hours?  No
biggie"), and those who are so incredibly anal that I consider
terminating them ("The site was offline for 15 seconds when you reloaded
Apache, why?!").  We also have commercial customers who demand *prior*
notice of when we do things, and trust our ability/judgement.

In the case of all clientel, we tell them when there's going to be
downtime (unless otherwise unexpected), and give them a general estimate
of the downtime when it's going to happen.  They have come to accept
that, regardless of how long.  Customers are actually not too bad as
long as you communicate with them honestly and openly.

We have multiple servers.  Some run RELENG_7, others RELENG_6.  We've
gone through the pain of RELENG_3 --> 4 --> 5 --> 6 --> 7 over the
years, and definitely found what's most effective.  We reinstall the OS,
because that's what works best -- and because the "magical major
revision upgrade path" is chaotic, has a history of leaving random crap
laying around on your filesystem, and (any good SA will know what I mean
by this) does not give the "good feeling" of a clean system.

So how did we upgrade from RELENG_4 to RELENG_6 (we skipped 5) on our
main webserver?  It was fairly simple:

We had a RELENG_6 server built, and compiled all the same ports we had
on the RELENG_4 box.  I spent a week or so making sure the machine had a
working Apache server, and did my best to thoroughly test it --
including asking some users "would you mind if we moved your site over
to a beta/test box to make sure things worked?"  Things did work.

I then announced downtime to all the clientel, saying "this is a major
upgrade, and will probably take a few hours guaranteed".  I started
moving home directories over using dump/restore, updated DNS records,
and did numerous other things which I can't even remember.  Entire
downtime was about an hour, most of which was due to the amount of data
we had to copy over.  If we used NFS, this would've been much easier.
There was one mistake which happened (some incorrect firewall rules, the
result of our ipfw --> pf migration), and I fixed it when someone
reported it ("I can't FTP...").

We'll have to do the same thing when going from 6 to 7.  I have a spare
box ready to go for said migration path.

My advice to you is get another box.  Prep it for the migration prior,
then move things over.  It's the logical, intelligent, and professional
way of doing upgrades in any production environment.

-- 
| Jeremy Chadwick                                    jdc at parodius.com |
| Parodius Networking                           http://www.parodius.com/ |
| UNIX Systems Administrator                      Mountain View, CA, USA |
| Making life hard for others since 1977.                  PGP: 4BD6C0CB |




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20080319133630.GA14376>