Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 29 Oct 2015 16:46:25 +0100
From:      Zara Kanaeva <zara.kanaeva@ggi.uni-tuebingen.de>
To:        =?utf-8?b?0JTQvNC40YLRgNC40Lkg0JTQvtC70LHQvdC40L0=?= <bad_hdd@list.ru>
Cc:        freebsd-stable@freebsd.org
Subject:   Re: Stuck processes in unkillable (STOP) state, listen queue overflow
Message-ID:  <20151029164625.Horde.xUq7LWav-EtuUEJ1LMs31F1@webmail.uni-tuebingen.de>
In-Reply-To: <1446080762.820771804@f25.i.mail.ru>
References:  <mailman.11.1446033600.61768.freebsd-stable@freebsd.org> <1446080762.820771804@f25.i.mail.ru>

next in thread | previous in thread | raw e-mail | index | archive | help
Hello Дмитрий,

thank you very much for your message.

First of all: I like FreeBSD (the installation logic, the good  
documentation etc.), this is why I use FreeBSD as Server OS. But in my  
case I must desagree your strong theoretical probability  
consideration. In my case I have one machine (7 years old), that had  
1-2 spontaneous rebootes in a year. In my case I got a lot of "already  
in queue awaiting acceptance"-Errors and the machine rebootes  
immediately after this.

I will get soon a new replacement for this old machine with at least  
32 GB RAM and (of course) new power supply. So I will see if my  
problem (perhaps it is only my problem) still persist.

Greetings, Z. Kanaeva.

Zitat von Дмитрий Долбнин <bad_hdd@list.ru>:

> Good day everyone !
> From my point of view it seems like you're experiencing the  
> "downgraded" hardware performance which causes you the problems you  
> meet.
> Try to switch for the "new-one" power supply at least.
> Why I think so ? Because the bad power supplies are met much more  
> often than the bad source code for FreeBSD. Of course I can't tell  
> you you're completely wrong.
> Best regards, Dimitry.
>> Среда, 28 октября 2015, 12:00 UTC от freebsd-stable-request@freebsd.org:
>>
>> Send freebsd-stable mailing list submissions to
>> freebsd-stable@freebsd.org
>>
>> To subscribe or unsubscribe via the World Wide Web, visit
>> https://lists.freebsd.org/mailman/listinfo/freebsd-stable
>> or, via email, send a message with subject or body 'help' to
>> freebsd-stable-request@freebsd.org
>>
>> You can reach the person managing the list at
>> freebsd-stable-owner@freebsd.org
>>
>> When replying, please edit your Subject line so it is more specific
>> than "Re: Contents of freebsd-stable digest..."
>>
>>
>> Today's Topics:
>>
>>    1. Re: Stuck processes in unkillable (STOP) state, listen queue
>>       overflow (Zara Kanaeva)
>>    2. Re: Stuck processes in unkillable (STOP) state, listen queue
>>       overflow (Nagy, Attila)
>>
>>
>> ----------------------------------------------------------------------
>>
>> Message: 1
>> Date: Tue, 27 Oct 2015 14:42:42 +0100
>> From: Zara Kanaeva < zara.kanaeva@ggi.uni-tuebingen.de >
>> To:  freebsd-stable@freebsd.org
>> Subject: Re: Stuck processes in unkillable (STOP) state, listen queue
>> overflow
>> Message-ID:
>> < 20151027144242.Horde.3Xc1_RqzaVMAZ12X6OPXfdN@webmail.uni-tuebingen.de >
>>
>> Content-Type: text/plain; charset=utf-8; format=flowed; DelSp=Yes
>>
>> Hello,
>>
>> I have the same experience with apache and mapserver. It happens on
>> physical machine and ends with spontaneous reboot. This machine is
>> updated from FREEBSD 9.0 RELEASE to FREEBSD 10.2-PRERELEASE. Perhaps
>> this machine doesn't have enough RAM (only 8GB), but I think that must
>> not be a reason for a spontaneous reboot.
>>
>> I had no such behavior with the same machine and FREEBSD 9.0 RELEASE
>> on it (I am not 100% sure, I have yet no possibility to test it).
>>
>> Regards, Z. Kanaeva.
>>
>> Zitat von "Nagy, Attila" < bra@fsn.hu >:
>>
>>> Hi,
>>>
>>> Recently I've started to see a lot of cases, where the log is full
>>> with "listen queue overflow" messages and the process behind the
>>> network socket is unavailable.
>>> When I open a TCP to it, it opens but nothing happens (for example I
>>> get no SMTP banner from postfix, nor I get a log entry about the new
>>> connection).
>>>
>>> I've seen this with Java programs, postfix and redis, basically
>>> everything which opens a TCP and listens on the machine.
>>>
>>> For example, I have a redis process, which listens on 6381. When I
>>> telnet into it, the TCP opens, but the program doesn't respond.
>>> When I kill it, nothing happens. Even with kill -9 yields only this state:
>>>   PID USERNAME       THR PRI NICE   SIZE    RES STATE   C TIME     
>>> WCPU COMMAN
>>>   776 redis            2  20    0 24112K  2256K STOP    3 16:56
>>> 0.00% redis-
>>>
>>> When I tcpdrop the connections of the process, tcpdrop reports
>>> success for the first time and failure for the second (No such
>>> process), but the connections remain:
>>> # sockstat -4 | grep 776
>>> redis    redis-serv 776   6  tcp4   *:6381 *:*
>>> redis    redis-serv 776   9  tcp4   *:16381 *:*
>>> redis    redis-serv 776   10 tcp4   127.0.0.1:16381 127.0.0.1:10460
>>> redis    redis-serv 776   11 tcp4   127.0.0.1:16381 127.0.0.1:35795
>>> redis    redis-serv 776   13 tcp4   127.0.0.1:30027 127.0.0.1:16379
>>> redis    redis-serv 776   14 tcp4   127.0.0.1:58802 127.0.0.1:16384
>>> redis    redis-serv 776   17 tcp4   127.0.0.1:16381 127.0.0.1:24354
>>> redis    redis-serv 776   18 tcp4   127.0.0.1:16381 127.0.0.1:56999
>>> redis    redis-serv 776   19 tcp4   127.0.0.1:16381 127.0.0.1:39488
>>> redis    redis-serv 776   20 tcp4   127.0.0.1:6381 127.0.0.1:39491
>>> # sockstat -4 | grep 776 | awk '{print "tcpdrop "$6" "$7}' | /bin/sh
>>> tcpdrop: getaddrinfo: * port 6381: hostname nor servname provided,
>>> or not known
>>> tcpdrop: getaddrinfo: * port 16381: hostname nor servname provided,
>>> or not known
>>> tcpdrop: 127.0.0.1 16381 127.0.0.1 10460: No such process
>>> tcpdrop: 127.0.0.1 16381 127.0.0.1 35795: No such process
>>> tcpdrop: 127.0.0.1 30027 127.0.0.1 16379: No such process
>>> tcpdrop: 127.0.0.1 58802 127.0.0.1 16384: No such process
>>> tcpdrop: 127.0.0.1 16381 127.0.0.1 24354: No such process
>>> tcpdrop: 127.0.0.1 16381 127.0.0.1 56999: No such process
>>> tcpdrop: 127.0.0.1 16381 127.0.0.1 39488: No such process
>>> tcpdrop: 127.0.0.1 6381 127.0.0.1 39491: No such process
>>> # sockstat -4 | grep 776
>>> redis    redis-serv 776   6  tcp4   *:6381 *:*
>>> redis    redis-serv 776   9  tcp4   *:16381 *:*
>>> redis    redis-serv 776   10 tcp4   127.0.0.1:16381 127.0.0.1:10460
>>> redis    redis-serv 776   11 tcp4   127.0.0.1:16381 127.0.0.1:35795
>>> redis    redis-serv 776   13 tcp4   127.0.0.1:30027 127.0.0.1:16379
>>> redis    redis-serv 776   14 tcp4   127.0.0.1:58802 127.0.0.1:16384
>>> redis    redis-serv 776   17 tcp4   127.0.0.1:16381 127.0.0.1:24354
>>> redis    redis-serv 776   18 tcp4   127.0.0.1:16381 127.0.0.1:56999
>>> redis    redis-serv 776   19 tcp4   127.0.0.1:16381 127.0.0.1:39488
>>> redis    redis-serv 776   20 tcp4   127.0.0.1:6381 127.0.0.1:39491
>>>
>>> $ procstat -k 776
>>>   PID    TID COMM             TDNAME KSTACK
>>>   776 100725 redis-server     -                mi_switch
>>> sleepq_timedwait_sig _sleep kern_kevent sys_kevent amd64_syscall
>>> Xfast_syscall
>>>   776 100744 redis-server     -                mi_switch
>>> thread_suspend_switch thread_single exit1 sigexit postsig ast
>>> doreti_ast
>>>
>>> I can do nothing to get out from this state, only reboot helps.
>>>
>>> The OS is stable/10@r289313, but I could observe this behaviour with
>>> earlier releases too.
>>>
>>> The dmesg is full with lines like these:
>>> sonewconn: pcb 0xfffff8004dc54498: Listen queue overflow: 193
>>> already in queue awaiting acceptance (3142 occurrences)
>>> sonewconn: pcb 0xfffff8004d9ed188: Listen queue overflow: 193
>>> already in queue awaiting acceptance (3068 occurrences)
>>> sonewconn: pcb 0xfffff8004d9ed188: Listen queue overflow: 193
>>> already in queue awaiting acceptance (3057 occurrences)
>>> sonewconn: pcb 0xfffff8004d9ed188: Listen queue overflow: 193
>>> already in queue awaiting acceptance (3037 occurrences)
>>> sonewconn: pcb 0xfffff8004d9ed188: Listen queue overflow: 193
>>> already in queue awaiting acceptance (3015 occurrences)
>>> sonewconn: pcb 0xfffff8004d9ed188: Listen queue overflow: 193
>>> already in queue awaiting acceptance (3035 occurrences)
>>>
>>> I guess this is the effect of the process freeze, not the cause (the
>>> listen queue fills up because the app can't handle the incoming
>>> connections).
>>>
>>> I'm not sure it matters, but some of the machines (and the above)
>>> runs on an ESX hypervisor (but as far as I can remember, I could see
>>> this on physical machines too, but I'm not sure about that).
>>> Also -so far- I could only see this where some "exotic" stuff ran,
>>> like a java or erlang based server (opendj, elasticsearch and
>>> rabbitmq).
>>>
>>> Also not sure about which triggers this. I've never seen this after
>>> some hours of uptime, at least some days or a week must've been
>>> passed to get stuck like the above.
>>>
>>> Any ideas about this?
>>>
>>> Thanks,
>>> _______________________________________________
>>>  freebsd-stable@freebsd.org mailing list
>>>  https://lists.freebsd.org/mailman/listinfo/freebsd-stable
>>> To unsubscribe, send any mail to " freebsd-stable-unsubscribe@freebsd.org "
>>
>>
>>
>> --
>> Dipl.-Inf. Zara Kanaeva
>> Heidelberger Akademie der Wissenschaften
>> Forschungsstelle "The role of culture in early expansions of humans"
>> an der Universit?t T?bingen
>> Geographisches Institut
>> Universit?t T?bingen
>> Ruemelinstr. 19-23
>> 72070 Tuebingen
>>
>> Tel.: +49-(0)7071-2972132
>> e-mail:  zara.kanaeva@geographie.uni-tuebingen.de
>> -------
>> - Theory is when you know something but it doesn't work.
>> - Practice is when something works but you don't know why.
>> - Usually we combine theory and practice:
>>          Nothing works and we don't know why.
>>
>>
>>
>> ------------------------------
>>
>> Message: 2
>> Date: Tue, 27 Oct 2015 17:25:01 +0100
>> From: "Nagy, Attila" < bra@fsn.hu >
>> To: Zara Kanaeva < zara.kanaeva@ggi.uni-tuebingen.de >,
>> freebsd-stable@freebsd.org
>> Subject: Re: Stuck processes in unkillable (STOP) state, listen queue
>> overflow
>> Message-ID: < 562FA55D.6050503@fsn.hu >
>> Content-Type: text/plain; charset=utf-8; format=flowed
>>
>> Hi,
>>
>> (following topposting)
>> I have seen this with 16 and 32 GiB of RAM, but anyways, it shouldn't
>> matter.
>> Do you use zfs? Although it doesn't seem to be stuck on IO...
>>
>> On 10/27/15 14:42, Zara Kanaeva wrote:
>>> Hello,
>>>
>>> I have the same experience with apache and mapserver. It happens on
>>> physical machine and ends with spontaneous reboot. This machine is
>>> updated from FREEBSD 9.0 RELEASE to FREEBSD 10.2-PRERELEASE. Perhaps
>>> this machine doesn't have enough RAM (only 8GB), but I think that must
>>> not be a reason for a spontaneous reboot.
>>>
>>> I had no such behavior with the same machine and FREEBSD 9.0 RELEASE
>>> on it (I am not 100% sure, I have yet no possibility to test it).
>>>
>>> Regards, Z. Kanaeva.
>>
>
> _______________________________________________
> freebsd-stable@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-stable
> To unsubscribe, send any mail to "freebsd-stable-unsubscribe@freebsd.org"



-- 
Dipl.-Inf. Zara Kanaeva
Heidelberger Akademie der Wissenschaften
Forschungsstelle "The role of culture in early expansions of humans"
an der Universität Tübingen
Geographisches Institut
Universität Tübingen
Ruemelinstr. 19-23
72070 Tuebingen

Tel.: +49-(0)7071-2972132
e-mail: zara.kanaeva@geographie.uni-tuebingen.de
-------
- Theory is when you know something but it doesn't work.
- Practice is when something works but you don't know why.
- Usually we combine theory and practice:
         Nothing works and we don't know why.




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20151029164625.Horde.xUq7LWav-EtuUEJ1LMs31F1>