Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 5 May 2002 21:17:31 -0400
From:      Anthony Schneider <aschneid@mail.slc.edu>
To:        Patrick Thomas <root@utility.clubscholarship.com>
Cc:        freebsd-hackers@FreeBSD.ORG
Subject:   Re: what causes a userland to stop, but allows kernel to continue ?
Message-ID:  <20020505211731.A1386@mail.slc.edu>
In-Reply-To: <20020505162455.K86733-100000@utility.clubscholarship.com>; from root@utility.clubscholarship.com on Sun, May 05, 2002 at 04:31:36PM -0700
References:  <20020505162455.K86733-100000@utility.clubscholarship.com>

next in thread | previous in thread | raw e-mail | index | archive | help

--OgqxwSJOaUobr8KG
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

FWIW, I've very recently had something similar happen to a 4.5-STABLE box.
The machine was NOT SMP, and the cause, as far as we know, was that /var
had been filled by apache's error_log -- a funky new mod_throttle install
with lots of=20
critical_acquire() failed: Permission denied
critical_release() failed: Permission denied
entries.

Now, I assume that this is not because /var was full, but actually because
of system V semaphore locking in the mod_throttle code.

In mod_throttle-3.1.2...
The critical_acquire() code from mod_throttle.c (assuming
defined(USE_SYSTEM_V_SERIALIZATION)):

<snip>

struct critical {
        int id;
        struct sembuf on;
        struct sembuf off;
};

</snip><snip>

static int
critical_acquire(t_critical *mp)
{
        for (errno =3D 0; semop(mp->id, &mp->on, 1) < 0; ) {
                if (errno !=3D EINTR) {
                        /*** We really should kill the server here. ***/
                        perror("critical_acquire() failed");

                        /* Neither of these calls appear to shutdown the
                         * server and its children; exit(APEXIT_CHILDFATAL),
                         * appears to kill only the parent process.
                         */
                        ap_start_shutdown();
                        return -1;
                }
        }

        return 0;
}

</snip>

Livelock, maybe?  Is there some sort of internal kernel semaphore table whi=
ch
might be getting filled up or something?  I'd also like to find out more ab=
out
this, but sadly, the machine is a remote one and I can't drop into ddb as
suggested...
Thanks you all very much.  Hope this information is of use.
-Anthony.


On Sun, May 05, 2002 at 04:31:36PM -0700, Patrick Thomas wrote:
>=20
> So, based on a previous thread, it looks like I have a server whose
> userland halted, essentially, but the kernel continued running.
>=20
> As evidenced by:
>=20
> - you can still ping the server just fine
> - you can still connect to running services just fine - if you ssh to it,
> `ssh -v` (verbose) claims a connection is established, but the server
> doesn't respond in any way over that connection.  Further, you can telnet
> to POP or IMAP or HTTP ports, and get a connection, but you can't get any
> response.
> - cron does NOT run while the server is in this state - no jobs run
> - no response from the console - caps lock does NOT toggle the LED
>=20
> So, as was suggested in the previous thread, it looks like my kernel is
> still running, but the userland has halted.  There are no log entries that
> give any clue as to why this happened last week.
>=20
>=20
> 1. from a theoretical standpoint, how would this happen ?
> 2. Is there any way to watchdog for it and escape from it before the
> userland completely crashes ?
> 3. any previous/old problems that would cause this behavior ?
>=20
>=20
> It is a FreeBSD 4.5-RELEASE system, and it is SMP - fairly heavily loaded
> (averages 60% CPU idle in `top` output).
>=20
> thanks,
>=20
> PT
>=20
>=20
>=20
> To Unsubscribe: send mail to majordomo@FreeBSD.org
> with "unsubscribe freebsd-hackers" in the body of the message
-----------------------------------------------
PGP key at:
    http://www.keyserver.net/
    http://www.anthonydotcom.com/gpgkey/key.txt
Home:
    http://www.anthonydotcom.com
-----------------------------------------------


--OgqxwSJOaUobr8KG
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.0.6 (FreeBSD)
Comment: For info see http://www.gnupg.org

iEYEARECAAYFAjzV2asACgkQ+rDjkNht5F0YaACeM1vJW/faHB3qhHUddINZMnx3
pn8AoIqn2u4B3pCmqFC9Dwi8TV84isUb
=wl0Z
-----END PGP SIGNATURE-----

--OgqxwSJOaUobr8KG--

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20020505211731.A1386>