Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 4 Jul 2014 08:10:28 -0400 (EDT)
From:      Rick Macklem <rmacklem@uoguelph.ca>
To:        Marc Fournier <scrappy@hub.org>
Cc:        freebsd-stable@freebsd.org
Subject:   Re: FreeBSD 10.x + LiquidSoap + NFS == Server Hang
Message-ID:  <1805070922.7180203.1404475828988.JavaMail.root@uoguelph.ca>
In-Reply-To: <BA548D77-CCE9-454E-97E4-C78DDA837975@hub.org>

next in thread | previous in thread | raw e-mail | index | archive | help
Marc Founier wrote:
>=20
> k, just found
> http://www.freebsd.org/doc/en/books/developers-handbook/kerneldebug-onlin=
e-ddb.html
> and setup KDB/DDB and just tested that using the =E2=80=98sysctl=E2=80=99=
 works to
> get me to the KDB prompt =E2=80=A6 hopefully this will allow me to provid=
e
> more useful information, if someone can let me know what exactly
> that would be for next time it hangs? :)
>=20
>=20
> thx
>=20
>=20
> On Jul 3, 2014, at 9:26 PM, Marc Fournier <scrappy@hub.org> wrote:
>=20
> >=20
> > Oh, on the remote console, last two lines I see are:
> >=20
> > =3D=3D
> > nfs_getpages: error 4
> > vm_fault: pager read error, pid 2957 (liquid soap)
> > =3D=3D
4 is EINTR. That would suggest you might have the "intr" option on the moun=
t?

If so, try taking out the "intr" option on the mount, if you are using it.

The problem with it is that, if anything posts a signal to a process while
I/O is in progress it will fail. In this case the failure is in nfs_getpage=
s(),
which is a pagein operation (and you don't want those to fail).

If you aren't using "intr", then I have no idea why a read would fail with =
EINTR.

rick

> >=20
> > if that helps any ...
> >=20
> > On Jul 3, 2014, at 9:23 PM, Marc Fournier <scrappy@hub.org> wrote:
> >=20
> >>=20
> >> Hi all =E2=80=A6
> >>=20
> >> =09I have a jail running on FreeBSD 10-STABLE (svn update as of July
> >> =092nd @ ~05:30 UTC:
> >>=20
> >> =3D=3D
> >> Working Copy Root Path: /usr/src
> >> URL: https://svn0.us-east.freebsd.org/base/stable/10
> >> Relative URL: ^/stable/10
> >> Repository Root: https://svn0.us-east.freebsd.org/base
> >> Repository UUID: ccf9f872-aa2e-dd11-9fc8-001c23d0bc1f
> >> Revision: 268135
> >> Node Kind: directory
> >> Schedule: normal
> >> Last Changed Author: pfg
> >> Last Changed Rev: 268132
> >> Last Changed Date: 2014-07-02 01:28:38 +0000 (Wed, 02 Jul 2014)
> >> =3D=3D
> >>=20
> >> =09Currently it has 3 jail=E2=80=99d environments running off it, with=
 the
> >> =09files for them NFS mounted from a NetApp filer =E2=80=A6 and right =
now,
> >> =09the NFS mount that these jails are running from is =E2=80=9Clocked=
=E2=80=9D =E2=80=A6 a
> >> =09=E2=80=98df=E2=80=99 hangs =E2=80=A6 trying to do a =E2=80=98jexec =
# /bin/tcsh=E2=80=99 into one of the
> >> =09jail=E2=80=99s hangs =E2=80=A6 etc.
> >>=20
> >> =09The same NFS file system is mounted and running on a half dozen
> >> =09other servers, and they are all operating just fine, so the
> >> =09NetApp is operating properly.
> >>=20
> >> =09If I move the jail with liquidsoap running around to a different
> >> =09server, the hang will follow to the new server, and the old
> >> =09server will once more become rock solid =E2=80=A6
> >>=20
> >> =09I=E2=80=99m not 100% certain it is liquidsoap, but the hang appears=
 to
> >> =09always coincide with reloading a new playlist =E2=80=A6 and althoug=
h it
> >> =09happens frequently (more with recent upgrades), it doesn=E2=80=99t
> >> =09happen *every* night =E2=80=A6
> >>=20
> >> =09This is on a remote server =E2=80=A6 so doing things at the console=
 isn=E2=80=99t
> >> =09possible, and although I=E2=80=99ve got a remote console on this, I=
=E2=80=99ve
> >> =09never figured out how to break to the debugger through it,
> >> =09although I=E2=80=99m going to work on it to see if I can=E2=80=99t =
get it to
> >> =09work =E2=80=A6
> >>=20
> >> =09Baring breaking to the debugger (is there a way, from the command
> >> =09line, to force it to break to the debugger?), is there anything
> >> =09else I can use to provide some sort of useful information?
> >>=20
> >> ps aux for the proces shows:
> >>=20
> >> # ps aux | grep liq
> >> 1002     2957   0.0  0.7 226888 112792  -  TLJ   4:45AM
> >>   370:27.23 /usr/local/bin/liquidsoap -q -d
> >> /usr/local/etc/liquidsoap/liquidsoap.liq
> >>=20
> >> and:
> >>=20
> >> # ps auxxwl | grep 2957
> >> 1002     2957   0.0  0.7 226888 112792  -  TLJ   4:45AM
> >>   370:27.23 /usr/local/bin/l  1002     1   0  20  0 -
> >> 1002    96280   0.0  0.0  12316      0  -  IWJ  -
> >>           0:00.00 pwait 2957        1002 96274   0  52  0 kqread
> >> root    96508   0.0  0.0  18788   1828  4  S+    4:19AM
> >>     0:00.00 grep 2957            0 96505   0  20  0 piperd
> >>=20
> >> =09Other commands I can / should run next time it happens =E2=80=A6 ?
> >> =09   Which won=E2=80=99t take long ...
> >>=20
> >> Thanks =E2=80=A6
> >>=20
> >>=20
> >=20
> > _______________________________________________
> > freebsd-stable@freebsd.org mailing list
> > http://lists.freebsd.org/mailman/listinfo/freebsd-stable
> > To unsubscribe, send any mail to
> > "freebsd-stable-unsubscribe@freebsd.org"
>=20
> _______________________________________________
> freebsd-stable@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-stable
> To unsubscribe, send any mail to
> "freebsd-stable-unsubscribe@freebsd.org"
>=20



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?1805070922.7180203.1404475828988.JavaMail.root>