Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 20 Dec 2009 07:16:17 -0800 (PST)
From:      Mark Terribile <materribile@yahoo.com>
To:        freebsd-questions@freebsd.org
Subject:   Editing a binary file
Message-ID:  <227962.95563.qm@web110314.mail.gq1.yahoo.com>
In-Reply-To: <20091219233405.6E2421065764@hub.freebsd.org>

next in thread | previous in thread | raw e-mail | index | archive | help
> On Fri, Dec 18, 2009 at 09:33:49AM -0700, Warren Block wrote:
> > perryh@pluto.rain.com wrote:
> > > Greg Larkin <glarkin@freebsd.org> wrote:
> > > > ...
> > > > > truncate -4 myfile should get rid of the last four bytes.  Maybe
> > > > > there's a similar efficient way to truncate the start of a file.
> > > >
> > > > This should do it:
> > > >
> > > > dd if=oldfile of=newfile bs=1 skip=4
> > > >
> > > > Or, perhaps marginally more efficient:
> > > >
> > > > dd if=oldfile of=newfile bs=4 skip=1
> > >
> > > It would be nice to avoid the file copy, but maybe there's no way to do
> > > that.  The small buffer size for dd will probably make copies of
> > > multi-gig files slow.  This might be faster:
> > >
> > > tail -c +5 myfile > outfile
> > > truncate -4 outfile

> yes, quite. On 1.5GHz ia64, on 1GB binary file tail takes about 25 s,
> but dd.. I killed after 25 min (!) and it had only done 1/3 of the file.
> 
> But even tail is too slow.
> 
> So I'll probably have to write a C I/O routine and avoid fortran I/O
> alltogether, so I write straight away just my data.

I'm a ksh partisan, so I tried it this way:

  { dd bs=4 count=1 of=/dev/null ; cat ; } < oldfile > newfile

I ran this on a 640M file residing on a 10K rpm SCSI disk on an old 5.4 system.  (Yes, I'm trying to upgrade but the ports are killing me; I may have q?s later.)  It took 111 seconds wall time.  Not great, not bad for 640M in the file system.  Both files were on the same disk, which was buzzing along at about 120 tps.

I'm sure this is possible in csh, though I'd have to spend some man page time to get the syntax right.

Yes, a custom program will be faster if you go through stdio or C++'s iostreams AND OPEN THE FILE EXPLICITLY because they do the read via mmap, saving one copy.  If you do the read via read(2) it won't be that much faster.  I suspect (but have not bothered to prove) that in this case cat(1) used simple reads.


      



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?227962.95563.qm>