Date: Sat, 21 Jun 1997 10:49:01 -0700 (PDT) From: Simon Shapiro <Shimon@i-Connect.Net> To: FreeBSD-Hackers@FreeBSD.ORG, FreeBSD-SCSI@FreeBSD.ORG Subject: Mystery of The missing I/O - Help Solicited Message-ID: <XFMail.970621104901.Shimon@i-Connect.Net>
next in thread | raw e-mail | index | archive | help
Hi Y'all This message is for all those who are still speaking to me after daring to suggest that plasic (yuck!) disk carriers can be as good as steel ones (imagine that!) :-)) No, really, there is something serious we could be helped with: With the new DPT driver, we were plagued with occasional getting stuck. what happens is that after few minutes of operation, or after few days of operation, under varying loads, any process which goes to a certain disk would just block indefinitely. We verified that we do not miss processing any interrupt. We fixed a minor hole that causes biodone to get confused every million I/O's or so. We traced individual commands to make sure that we do not have any SCSI command which we do not return to sd.c To make these verifications we built all kinds of strange and interesting tools. Nothing helps. Oh, to confuse everyone, we can reproduce this problem only on Pentium Pros. Pentium-100's simply will not fail. We braught the load on test systems all the way up to about 120. Nothing. Next hint set; We can reliably reproduce the problem only on sendero, only when doing make release. So we though. Today we decided to try something else. We quited down ALL networking activity on the system, including disconnecting PPP. We managed to build make release flawlessly. Several times. Connect PPP and SCSI command completions seem to disappear somewhere between sd.c and the driver or higher. Disconnect PPP and all is well. Before someone tells me to shut down the software interrupts, I will be quickly to point out that I can #ifdef it out and still get the same problem. Exactly. Let me point out that the DPT can complete a SCSI READ/WRITE command in about 250 microseconds (on a cache hit). We measured, occasionally, interruptscoming as fast as 4 microseconds apart (like two consecutive cache hits). We are at our wits end to find an explanation for this. Any suggestion will be greatly appreciated. Thamx, Simon Quiz: How many SCSI commands does it take to run make release? Answer: 300,000 reads and 2.1 million writes.
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?XFMail.970621104901.Shimon>