Date: Sat, 09 Jan 2010 05:58:19 +0100 From: Rainer Duffner <rainer@ultra-secure.de> To: freebsd-stable@freebsd.org Subject: apache hanging on 8.0 AMD64 Message-ID: <4B480CEB.4040806@ultra-secure.de>
next in thread | raw e-mail | index | archive | help
Hi, we have an "interesting" problem with FreeBSD 8.0 AMD64: The server is a HP DL380G5 with two Harpertown-class CPUs and 8 GB RAM. It is running MySQL, Apache (worker MPM) and PHP as CGI with Fast-CGI and SUEXEC. It has over 500 ZFS filesystems that comprise various customers websites, each running PHP as their own user. Soon after we put this system into production, we saw httpd-processes being stalled in the "ucond" state, leading to a total stand-still of the apache-server (apache blocked itself somehow). I disabled ZFS prefetching and the problem went away for a couple of days - until yesterday, when it happened again. Swap was unused when it happened the last time. I switched top into "thread-mode" (M) and saw that the processes actually seemed to be in different state (zio->i, arc_mr, tx_tx, RUN). I cannot get any info from kstat, because when the problem happens and I attach to one of the processes, I don't get anything back - it just sits there. If there anything I can take a look at to further debug this problem? At the time of the hang, no swap was used: last pid: 6450; load averages: 36.32, 30.17, 17.75 up 4+11:15:44 20:11:01 482 processes: 28 running, 452 sleeping, 1 zombie, 1 lock CPU: % user, % nice, % system, % interrupt, % idle Mem: 1619M Active, 3829M Inact, 2066M Wired, 211M Cache, 827M Buf, 188M Free Swap: 8192M Total, 8192M Free PID USERNAME PRI NICE SIZE RES STATE C TIME WCPU COMMAND 6011 user1 44 0 24960K 3432K RUN 1 2:50 7.08% pure-ftpd 6038 user2 66 0 161M 18856K RUN 3 1:26 3.47% php-cgi 716 root 46 0 32452K 13776K select 5 104:53 3.08% snmpd 6021 user3 63 0 163M 20232K RUN 7 1:28 2.49% php-cgi 6009 www 44 0 103M 26952K tx->tx 3 0:55 1.76% {httpd} 6030 www 44 0 101M 26168K CPU4 7 0:57 1.66% {httpd} 6028 www 44 0 101M 26476K tx->tx 2 0:55 1.66% {httpd} 6030 www 44 0 101M 26168K zio->i 5 0:55 1.66% {httpd} 6008 www 44 0 102M 26640K RUN 2 1:23 1.56% {httpd} 6009 www 46 0 103M 26952K tx->tx 3 1:22 1.56% {httpd} 6016 www 44 0 102M 26636K tx->tx 2 1:17 1.56% {httpd} 6024 www 44 0 106M 26568K RUN 1 1:07 1.56% {httpd} 5978 www 44 0 102M 26960K RUN 0 1:00 1.56% {httpd} 6008 www 44 0 102M 26640K zio->i 7 0:55 1.56% {httpd} 5970 www 44 0 108M 27700K arc_mr 4 0:59 1.46% {httpd} 6024 www 44 0 106M 26568K tx->tx 5 0:50 1.46% {httpd} 5979 www 45 0 102M 26904K zio->i 1 1:14 1.37% {httpd} 6009 www 47 0 103M 26952K zio->i 7 1:11 1.37% {httpd} I disabled all the apache-modules we don't need. This is the only system of its kind we have, currently, but we would really like to get this fixed so we can move more of our hosting-customers to similar setup servers. Another detail: due to the fact that every user has a access- and error-logfile, we had to bump FD_SETSIZE to 16384U. We tried bumping kern.maxvnodes to larger and larger values (now at 400000, <200k are used), but it didn't really help that much. Disabling prefetching helped a lot (only one crash in 5 days) - but we would like to know why it actually happens and then fix it forever ;-) Best Regards, Rainer
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4B480CEB.4040806>