From owner-freebsd-small Sun Dec 23 22:22:19 2001 Delivered-To: freebsd-small@freebsd.org Received: from mail013.syd.optusnet.com.au (mail013.syd.optusnet.com.au [203.2.75.174]) by hub.freebsd.org (Postfix) with ESMTP id F2CB637B41A for ; Sun, 23 Dec 2001 22:22:15 -0800 (PST) Received: from w95 (wdcax13-020.dialup.optusnet.com.au [198.142.220.20]) by mail013.syd.optusnet.com.au (8.11.1/8.11.1) with SMTP id fBO6MDD20415 for ; Mon, 24 Dec 2001 17:22:14 +1100 Message-ID: <003d01c18c43$5446ad20$0104010a@famzon.com.au> Reply-To: "Andrew Hannam" From: "Andrew Hannam" To: Subject: Disk Writes Date: Mon, 24 Dec 2001 16:21:58 +1000 Organization: FamZon Systems MIME-Version: 1.0 Content-Type: text/plain; charset="Windows-1252" Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 5.50.4807.1700 X-MimeOLE: Produced By Microsoft MimeOLE V5.50.4807.1700 Sender: owner-freebsd-small@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG I have been doing some embedded work based on Free/PicoBSD based on a 4.2 kernel but have come across a problem that is showing it's head occasionally. We are currently in the process of rolling out 1000 devices (currently at 75) over the next 6 months. Fortunately I have the ability to upgrade the devices in the field over a GSM mobile modem. The device has a 128M read-only root file system, an MFS /tmp file-system (currently backed by hard-disk based swap), and a 64M /app read/write file system on hard-disk for storing transaction data. Transactions occur roughly 5->10 times a day. The application has been written to only append to files or to replace the file by creating a new file and then writing a single byte to redirect which file is in use. The problem is that in certain circumstances in power down (which can occur at any time), the /app file system is being corrupted beyond what an automatic fsck at boot can repair. Unfortunately it is not possible to mount and unmount the /app file-system around each write. My current kernel settings are: sysctl -w vfs.write_behind=0 kern.filedelay=2 kern.dirdelay=1 kern.metadelay=0 Can anyone please help with this write problem. I am also looking to move to compact flash in the next few months - how should the read/write file-system be mounted/options etc to work economically on the compact flash. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-small" in the body of the message From owner-freebsd-small Sun Dec 23 22:58:49 2001 Delivered-To: freebsd-small@freebsd.org Received: from web10108.mail.yahoo.com (web10108.mail.yahoo.com [216.136.130.58]) by hub.freebsd.org (Postfix) with SMTP id C267737B41A for ; Sun, 23 Dec 2001 22:58:46 -0800 (PST) Message-ID: <20011224065846.76833.qmail@web10108.mail.yahoo.com> Received: from [65.88.96.239] by web10108.mail.yahoo.com via HTTP; Sun, 23 Dec 2001 22:58:46 PST Date: Sun, 23 Dec 2001 22:58:46 -0800 (PST) From: John Hanley Subject: Re: Disk Writes To: Andrew Hannam , freebsd-small@freebsd.org In-Reply-To: <003d01c18c43$5446ad20$0104010a@famzon.com.au> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Sender: owner-freebsd-small@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG --- Andrew Hannam wrote: > Transactions occur roughly 5->10 times a day. The application has been > written to only append to files or to replace the file by creating a new > file and then writing a single byte to redirect which file is in use. > > The problem is that in certain circumstances in power down (which can > occur at any time), the /app file system is being corrupted beyond what > an automatic fsck at boot can repair. It sounds like you do a short duration write followed by fsync() (or at least you could sync if you wanted to). Did you enable soft updates for /app? What did ``fsck -y'' report? Cheers, JH __________________________________________________ Do You Yahoo!? Send your FREE holiday greetings online! http://greetings.yahoo.com To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-small" in the body of the message From owner-freebsd-small Tue Dec 25 19: 8: 6 2001 Delivered-To: freebsd-small@freebsd.org Received: from mail003.syd.optusnet.com.au (mail003.syd.optusnet.com.au [203.2.75.251]) by hub.freebsd.org (Postfix) with ESMTP id 3723F37B417 for ; Tue, 25 Dec 2001 19:08:00 -0800 (PST) Received: from w95 (wdcax13-187.dialup.optusnet.com.au [198.142.220.187]) by mail003.syd.optusnet.com.au (8.11.1/8.11.1) with SMTP id fBQ37su02783; Wed, 26 Dec 2001 14:07:54 +1100 Message-ID: <001901c18dba$83dfcbc0$0104010a@famzon.com.au> Reply-To: "Andrew Hannam" From: "Andrew Hannam" To: "John Hanley" Cc: References: <20011224070645.77398.qmail@web10108.mail.yahoo.com> Subject: Re: Disk Writes Date: Wed, 26 Dec 2001 13:07:46 +1000 Organization: FamZon Systems MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 5.50.4807.1700 Disposition-Notification-To: "Andrew Hannam" X-MimeOLE: Produced By Microsoft MimeOLE V5.50.4807.1700 Sender: owner-freebsd-small@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG Thanks for your help... Just a short note on the power-fail condition ; With our equipment a power failure is more likely to come just after a transaction that requires writing to the disk. The power fail (if it occurs) is most likely to occur 5 -> 10 seconds after the transaction and is the result of action by a serviceman at the machine (about once or twice a week). I therefore believe it should be possible to achieve a safe write 100% of the time. This however has not been born out in practice with a failure rate of about 0.5% in these conditions. With a 1000 machines in the field this would equate to a failure of about 1 to 2 machines a day. This is not acceptable in practice so I must find a solution. The hard-links idea is a useful bit - I'll add it to my toolkit for FreeBSD. I had tried this technique before on a Linux box but in Linux the rename(2) call is not atomic where an existing file exists. Using fsync() or even sync are not generally options for me as I am using java for a large part of the application. Where C has been used - it is liberally sprinkled with fsync and sync. Special care has been taken with the java to ensure that files are being closed properly. Without the files being closed after each write operation I found that even on unbuffered writes that there was a high probability of file corruption on power-down. Now that files are being closed after each write, I never seem to lose information during an fsck auto-repair (a great improvement) however occasionally fsck is not able to repair it at all effectively causing the device to be inoperable with a return to depot for repair. The return to depot is expensive and requires special low level data extraction to try and get the information off the now badly corrupted disk. I have tried both with and without soft updates. The largest problem of using soft updates is the latency - using the standard parameters it can take up to 30 seconds for data to actually be written out to disk thus introducing a good probability of losing information on a power down. The 0, 1 & 2 second delays are much better (using my kernel parameters) but 2 seconds is still a long time in my application. Looking through the code - 0, 1 & 2 second delays appear to be the smallest periods available without affecting the way soft updates work. With soft updates I seem to be more likely to lose information but less likely to kill the disk. Given this compromise I have turned off soft-updates. Is there an alternative file-system that would be more tolerant to power-down issues? For example, with original DOS operating system I can't remember ever having this sort of problem until Windows started adding write caching. Is there some option to make file-system calls totally synchronous (turning off all write caching) thus significantly reducing the risk? Write speed performance is not a critical criteria. Integrity and completeness are far more important. It might be true that voltage sag on write is toasting extra super-block copies or alternatively that the super-blocks are not being written synchronously. I have two variants of the motherboard hardware using different chipsets with two different sized hard-drives (3" and 2") so it doesn't appear to be a hardware specific problem. If this is what is happening then having more than one super-block is an integrity risk rather than an integrity improver because I cannot manually fsck after a super-block corruption. How then would I turn off the extra super-block copies? I presume this would be done at file-system creation time. An example of redundant information being useless is in the original FAT file-system. The second copy of the FAT is only ever used to detect that the two copies of the FAT are out of sync. I have never seen a DOS or Windows utility that takes any notice of the information written in the second copy of the FAT. For example, scandisk (equivalent to fsck) detects that they are different but the only repair option is to write the first copy of the FAT on top of the second copy of the FAT. Is the UFS file-system and fsck different in this regard? ----- Original Message ----- From: "John Hanley" To: "Andrew Hannam" Sent: Monday, December 24, 2001 5:06 PM Subject: Re: Disk Writes > --- Andrew Hannam wrote: > > The application has been > > written to only append to files or to replace the file by creating a new > > file and then writing a single byte to redirect which file is in use. > > BTW, using hard links might be pretty slick, here. Watch me atomically > delete "file": > > $ date > file > $ TMP=file.$$ > $ mv $TMP file > > At worst, file.${pid} is left around. > At all times, we have either the old or new contents available. > The rename(2) does the "delete" and the "make new contents available" > operations as a single atomic operation, safe in the face of power fails. > > > > My current kernel settings are: > > sysctl -w vfs.write_behind=0 kern.filedelay=2 kern.dirdelay=1 > > kern.metadelay=0 > > I'll bet that combination of parameters has received less testing > than the default parameters. I feel that the default params with > soft updates should be working just fine for you. > > Is power fail pretty straightforward? Does the CPU go down before > the device that /app is on goes down? Maybe voltage sag at the time > of a write is toasting one or more superblock replicas? > > > Cheers, > JH > > __________________________________________________ > Do You Yahoo!? > Send your FREE holiday greetings online! > http://greetings.yahoo.com To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-small" in the body of the message From owner-freebsd-small Wed Dec 26 15: 9:55 2001 Delivered-To: freebsd-small@freebsd.org Received: from workhorse.iMach.com (workhorse.iMach.com [206.127.77.89]) by hub.freebsd.org (Postfix) with ESMTP id 8083437B405 for ; Wed, 26 Dec 2001 15:09:48 -0800 (PST) Received: from localhost (forrestc@localhost) by workhorse.iMach.com (8.9.3/8.9.3) with ESMTP id QAA19049; Wed, 26 Dec 2001 16:09:12 -0700 (MST) Date: Wed, 26 Dec 2001 16:09:12 -0700 (MST) From: "Forrest W. Christian" To: Andrew Hannam Cc: John Hanley , freebsd-small@FreeBSD.ORG Subject: Re: Disk Writes In-Reply-To: <001901c18dba$83dfcbc0$0104010a@famzon.com.au> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-small@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG Are these ide drives? If so. you probably need to turn off write caching. I *THINK* this is hw.ata.wc but needs to be turned off in a special way. see man tuning. On Wed, 26 Dec 2001, Andrew Hannam wrote: > Date: Wed, 26 Dec 2001 13:07:46 +1000 > From: Andrew Hannam > To: John Hanley > Cc: freebsd-small@FreeBSD.ORG > Subject: Re: Disk Writes > > Thanks for your help... > > Just a short note on the power-fail condition ; With our equipment a power > failure is more likely to come just after a transaction that requires > writing to the disk. The power fail (if it occurs) is most likely to occur > 5 -> 10 seconds after the transaction and is the result of action by a > serviceman at the machine (about once or twice a week). I therefore believe > it should be possible to achieve a safe write 100% of the time. > This however has not been born out in practice with a failure rate of about > 0.5% in these conditions. With a 1000 machines in the field this would > equate to a failure of about 1 to 2 machines a day. This is not acceptable > in practice so I must find a solution. > > The hard-links idea is a useful bit - I'll add it to my toolkit for FreeBSD. > I had tried this technique before on a Linux box but in Linux the rename(2) > call is not atomic where an existing file exists. > > Using fsync() or even sync are not generally options for me as I am using > java for a large part of the application. Where C has been used - it is > liberally sprinkled with fsync and sync. Special care has been taken with > the java to ensure that files are being closed properly. Without the files > being closed after each write operation I found that even on unbuffered > writes that there was a high probability of file corruption on power-down. > Now that files are being closed after each write, I never seem to lose > information during an fsck auto-repair (a great improvement) however > occasionally fsck is not able to repair it at all effectively causing the > device to be inoperable with a return to depot for repair. The return to > depot is expensive and requires special low level data extraction to try and > get the information off the now badly corrupted disk. > > I have tried both with and without soft updates. The largest problem of > using soft updates is the latency - using the standard parameters it can > take up to 30 seconds for data to actually be written out to disk thus > introducing a good probability of losing information on a power down. > > The 0, 1 & 2 second delays are much better (using my kernel parameters) but > 2 seconds is still a long time in my application. Looking through the code - > 0, 1 & 2 second delays appear to be the smallest periods available without > affecting the way soft updates work. With soft updates I seem to be more > likely to lose information but less likely to kill the disk. Given this > compromise I have turned off soft-updates. > > Is there an alternative file-system that would be more tolerant to > power-down issues? For example, with original DOS operating system I can't > remember ever having this sort of problem until Windows started adding write > caching. > > Is there some option to make file-system calls totally synchronous (turning > off all write caching) thus significantly reducing the risk? Write speed > performance is not a critical criteria. Integrity and completeness are far > more important. > > It might be true that voltage sag on write is toasting extra super-block > copies or alternatively that the super-blocks are not being written > synchronously. I have two variants of the motherboard hardware using > different chipsets with two different sized hard-drives (3" and 2") so it > doesn't appear to be a hardware specific problem. > > If this is what is happening then having more than one super-block is an > integrity risk rather than an integrity improver because I cannot manually > fsck after a super-block corruption. How then would I turn off the extra > super-block copies? I presume this would be done at file-system creation > time. > > An example of redundant information being useless is in the original FAT > file-system. The second copy of the FAT is only ever used to detect that the > two copies of the FAT are out of sync. I have never seen a DOS or Windows > utility that takes any notice of the information written in the second copy > of the FAT. For example, scandisk (equivalent to fsck) detects that they are > different but the only repair option is to write the first copy of the FAT > on top of the second copy of the FAT. > > Is the UFS file-system and fsck different in this regard? > > ----- Original Message ----- > From: "John Hanley" > To: "Andrew Hannam" > Sent: Monday, December 24, 2001 5:06 PM > Subject: Re: Disk Writes > > > > --- Andrew Hannam wrote: > > > The application has been > > > written to only append to files or to replace the file by creating a new > > > file and then writing a single byte to redirect which file is in use. > > > > BTW, using hard links might be pretty slick, here. Watch me atomically > > delete "file": > > > > $ date > file > > $ TMP=file.$$ > > $ mv $TMP file > > > > At worst, file.${pid} is left around. > > At all times, we have either the old or new contents available. > > The rename(2) does the "delete" and the "make new contents available" > > operations as a single atomic operation, safe in the face of power fails. > > > > > > > My current kernel settings are: > > > sysctl -w vfs.write_behind=0 kern.filedelay=2 kern.dirdelay=1 > > > kern.metadelay=0 > > > > I'll bet that combination of parameters has received less testing > > than the default parameters. I feel that the default params with > > soft updates should be working just fine for you. > > > > Is power fail pretty straightforward? Does the CPU go down before > > the device that /app is on goes down? Maybe voltage sag at the time > > of a write is toasting one or more superblock replicas? > > > > > > Cheers, > > JH > > > > __________________________________________________ > > Do You Yahoo!? > > Send your FREE holiday greetings online! > > http://greetings.yahoo.com > > > To Unsubscribe: send mail to majordomo@FreeBSD.org > with "unsubscribe freebsd-small" in the body of the message > - Forrest W. Christian (forrestc@imach.com) AC7DE ---------------------------------------------------------------------- The Innovation Machine Ltd. P.O. Box 5749 http://www.imach.com/ Helena, MT 59604 Home of PacketFlux Technogies and BackupDNS.com (406)-442-6648 ---------------------------------------------------------------------- Protect your personal freedoms - visit http://www.lp.org/ To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-small" in the body of the message