From owner-freebsd-questions@FreeBSD.ORG Tue Mar 13 09:00:02 2012 Return-Path: Delivered-To: freebsd-questions@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 13224106564A; Tue, 13 Mar 2012 09:00:02 +0000 (UTC) (envelope-from c.kworr@gmail.com) Received: from mail-bk0-f54.google.com (mail-bk0-f54.google.com [209.85.214.54]) by mx1.freebsd.org (Postfix) with ESMTP id 62AE28FC15; Tue, 13 Mar 2012 09:00:01 +0000 (UTC) Received: by bkcjc3 with SMTP id jc3so261754bkc.13 for ; Tue, 13 Mar 2012 02:00:00 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=message-id:date:from:user-agent:mime-version:to:cc:subject :references:in-reply-to:content-type:content-transfer-encoding; bh=QEh8A3bsrnZXr+l+LyJ5mZrYlC02kNLoTj0JXXnwRlk=; b=HKQP3nMxyHd2JV0/nvW8zxZ93JJbegvI0g25kCyMjXTOy8wVLJ3UCnCIVSjkE/xj1/ slb2NTIHfztO9OYud2gy3VslZ5WyvIf0eeGr2J1vbdDUI4KwDFhCPyO8SlHawR6kA6fb RW3aduP7qQJWbINqs1quEeFKexJD6bV57BzQzZ6ynE9ToLPgaUGLkKCjKpbNVP7uv5rQ zyqawjocjWSJKkibWzkO102e8Q8JN3oyq6Sz5sMcOGXjEqfV7UyY5zH6tD5C+kv9mxjS ctpVegVMyDYoaeTTmL2pCjU9DhJuiWALx0pgNiaoebubaedOZtOvfYyMg3rYMZs66JVu Bo0A== Received: by 10.204.133.210 with SMTP id g18mr5939337bkt.107.1331629200429; Tue, 13 Mar 2012 02:00:00 -0700 (PDT) Received: from green.tandem.local (43-91-132-95.pool.ukrtel.net. [95.132.91.43]) by mx.google.com with ESMTPS id r14sm11513801bkv.11.2012.03.13.01.59.58 (version=SSLv3 cipher=OTHER); Tue, 13 Mar 2012 01:59:59 -0700 (PDT) Message-ID: <4F5F0C8A.7090203@gmail.com> Date: Tue, 13 Mar 2012 10:59:54 +0200 From: Volodymyr Kostyrko User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:10.0.2) Gecko/20120220 Firefox/10.0.2 SeaMonkey/2.7.2 MIME-Version: 1.0 To: Matthew Seaman References: <4F5E031D.5060203@gmail.com> <4F5E2ADB.6020104@FreeBSD.org> <4F5F00AA.1060008@gmail.com> <4F5F07A3.2010606@FreeBSD.org> In-Reply-To: <4F5F07A3.2010606@FreeBSD.org> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Cc: freebsd-questions@FreeBSD.org Subject: Re: 9.0 spontaneously reboots X-BeenThere: freebsd-questions@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: User questions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 13 Mar 2012 09:00:02 -0000 Matthew Seaman wrote: >> The only load I know to cause sure lockup in some hours is memcached. >> Right now project is migrated to redis and machines survives for two >> weeks. Most common problem for lockup is ECC error. > > I see. That puts a different complexion on things. Although it is > application specific it doesn't rule out hardware problems. In fact, > given the nature of the error -- ECC problems -- it pretty much nails it > as something wrong with the RAM in that machine. > > Given that memtest86 doesn't show any problems, and you can run a > similar workload with different software it suggests that you have a > memory stick (or sticks) that are marginal. Something like extra heat > due to higher rates of memory accesses from a particular application > could be tipping it over the edge into failure. > > The 'marginal' behaviour need not be a fault in the memory stick per se. > It could simply be the particular characteristics of the memory you > have installed not being exactly compatible with your motherboard. In > theory the memory conforming to a particular standard should avoid this > sort of problem, but this is unfortunately not completely infallible. > Swapping out memory sticks for an equivalent specification from a > different manufacturer should give good results. I already moved from Kingston to Hynix with no luck. Next guess points is motherboard problem (as memory is separated between processors) or processor problem. I'll gonna pop one processor out Leaving all memory on another one. The only other weird thing about this server is: dev.cpu.0.temperature: 37,0C dev.cpu.1.temperature: 37,0C dev.cpu.2.temperature: 35,0C dev.cpu.3.temperature: 35,0C dev.cpu.4.temperature: 43,0C dev.cpu.5.temperature: 43,0C dev.cpu.6.temperature: 38,0C dev.cpu.7.temperature: 38,0C dev.cpu.8.temperature: 38,0C dev.cpu.9.temperature: 38,0C dev.cpu.10.temperature: 37,0C dev.cpu.11.temperature: 37,0C dev.cpu.12.temperature: 33,0C dev.cpu.13.temperature: 33,0C dev.cpu.14.temperature: 34,0C dev.cpu.15.temperature: 34,0C And it's consistent - cores 4 and 5 always are hotter then any other. This can be something with scheduler, however this started before any actual load. Though numbers are normal I had never seen something alike... -- Sphinx of black quartz judge my vow.