Date: Mon, 4 Jan 2016 23:07:07 +0300 From: Slawa Olhovchenkov <slw@zxy.spb.ru> To: shahzaibcb <shahzaib.cb@gmail.com> Cc: freebsd-current@freebsd.org Subject: Re: FreeBsd MCA Panic Crash !! Message-ID: <20160104200707.GI70867@zxy.spb.ru> In-Reply-To: <1451903649383-6064691.post@n5.nabble.com> References: <1451903649383-6064691.post@n5.nabble.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On Mon, Jan 04, 2016 at 03:34:09AM -0700, shahzaibcb wrote: > Hi, > > We've switched to FreeBSD recently to accomodate large video storage as we > are running video streaming website. So the job of the FreeBSD is to > transcode the uploaded videos using ffmpeg and serve them to users via nginx > webserver but so far our experience is not very good with it. It crashes > every 2-3 days and we're unable to track down the problem. The server specs > are pretty high : > > > Supermicro X5690 (12 cores, 24 threads - 2u) > 96GB RAM > 12x3TB RAID-10 (HBA-LSI9211) > > Here is the screenshot of recent crash : > > http://prntscr.com/9er3pk > > One thing worth mentioning is, before going down there's no load on server, > more or less free RAM usually is around 12GB. We've tried following > solutions so far : > > > - Updated FreeBSD OS > - Replaced 800W PS with 900W > - We've reduced CMOS from MAX(26x) to 18x as suggested in this post Do you try to replace CPU? > http://unix.stackexchange.com/questions/60574/determining-cause-of-linux-kernel-panic > > The solution we've not performed so far is : > > - Disable mca using (hw.mca.enabled: 0) - As we're getting MCA panics. > > Here is the crash dump : > > [root@cw001 /var/crash]# mcelog --no-dmi --ascii --file core.txt.1 > HARDWARE ERROR. This is *NOT* a software problem! > Please contact your hardware vendor > CPU 3 BANK 5 > MISC 0 ADDR 802bf6a69 > MCG status:MCIP > MCi status: > Uncorrected error > Error enabled > MCi_MISC register valid > MCi_ADDR register valid > Processor context corrupt > MCA: Internal Timer error > STATUS be00000000800400 MCGSTATUS 4 > MCGCAP 1c09 APICID 3 SOCKETID 0 > CPUID Vendor Intel Family 6 Model 44 > HARDWARE ERROR. This is *NOT* a software problem! > Please contact your hardware vendor > CPU 2 BANK 5 > MISC 0 ADDR 802bf6a69 > MCG status:MCIP > MCi status: > Uncorrected error > Error enabled > MCi_MISC register valid > MCi_ADDR register valid > Processor context corrupt > MCA: Internal Timer error > STATUS be00000000800400 MCGSTATUS 4 > MCGCAP 1c09 APICID 2 SOCKETID 0 > CPUID Vendor Intel Family 6 Model 44 > HARDWARE ERROR. This is *NOT* a software problem! > Please contact your hardware vendor > CPU 3 BANK 5 > MISC 0 ADDR 802bf6a69 > MCG status:MCIP > MCi status: > Uncorrected error > Error enabled > MCi_MISC register valid > MCi_ADDR register valid > Processor context corrupt > MCA: Internal Timer error > STATUS be00000000800400 MCGSTATUS 4 > MCGCAP 1c09 APICID 3 SOCKETID 0 > CPUID Vendor Intel Family 6 Model 44 > HARDWARE ERROR. This is *NOT* a software problem! > Please contact your hardware vendor > CPU 2 BANK 5 > MISC 0 ADDR 802bf6a69 > MCG status:MCIP > MCi status: > Uncorrected error > Error enabled > MCi_MISC register valid > MCi_ADDR register valid > Processor context corrupt > MCA: Internal Timer error > STATUS be00000000800400 MCGSTATUS 4 > MCGCAP 1c09 APICID 2 SOCKETID 0 > CPUID Vendor Intel Family 6 Model 44 > > ----------------------------------------------------------------------------------- > > I showed those Hardware errors to Vendor from whom we purchased Supermicro > servers . This is what he has to say : > > ----------------------------------- > Why do you not made one test environment with CentOS or one other Linux that > you know to use, and see if you have same errors ??? if not than you know > that the errors come from OS not from hardware. ( CentOS, RedHead….work > diferend like FreeBSD – work direct on hardware if you don’t have the right > kernel settings can the server crashed. CentOS , RedHead…. don’t work direct > on hardware and distribute the resource load better and you have better > control and you can better debug one situation) > ----------------------------------- > > Now we're on a black hole and unable to find that either issue with FreeBSD > or Hardware. We're thinking to disable mca in loader.conf but ppl are not > suggesting it. If you guys can help us, it'd be very kind. > > > > -- > View this message in context: http://freebsd.1045724.n5.nabble.com/FreeBsd-MCA-Panic-Crash-tp6064691.html > Sent from the freebsd-current mailing list archive at Nabble.com. > _______________________________________________ > freebsd-current@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-current > To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org"
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20160104200707.GI70867>