Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 02 Jul 2007 11:57:33 +0200
From:      Alexander Leidinger <Alexander@Leidinger.net>
To:        Garrett Cooper <youshi10@u.washington.edu>
Cc:        ports@FreeBSD.org, "\[LoN\]Kamikaze" <LoN_Kamikaze@gmx.de>
Subject:   Re: +CONTENTS files
Message-ID:  <20070702115733.3fotau92scgs4g4s@webmail.leidinger.net>
In-Reply-To: <4688AF6D.90904@u.washington.edu>
References:  <46887FD3.3080307@u.washington.edu> <46889F5D.70801@gmx.de> <4688AF6D.90904@u.washington.edu>

next in thread | previous in thread | raw e-mail | index | archive | help
Quoting Garrett Cooper <youshi10@u.washington.edu> (from Mon, 02 Jul =20
2007 00:55:25 -0700):

> [LoN]Kamikaze wrote:
>> Garrett Cooper wrote:
>>
>>> Pardon me for being naive, but wouldn't it be wiser for all of the data
>>> in the +CONTENTS file to be aggregated into sections instead of having
>>> line by line info?
>>>
>>> Example (net/samba_3.0.25a):
>>>
>>> @comment MD5:9e94560ac5e757d3bc5f922dcf3ab4fb
>>> man/man1/log2pcap.1.gz
>>> [~100 lines of repetitive data...]
>>> @comment MD5:9f5fc8df2a1383a175e165ef2e0b10cc
>>> man/man8/vfs_notify_fam.8.gz
>>>
>>>   Could be aggregated into:
>>>
>>> @MD5
>>> 9e94560ac5e757d3bc5f922dcf3ab4fb man/man1/log2pcap.1.gz
>>> c58f068d603a12d4af867c15cf77e636 man/man1/nmblookup.1.gz
>>> [etc..]
>>> @end MD5
>>>
>>>   or something similar to XML.
>>>
>>>   This would reduce the filesize from n bytes to n - (9 + 4 -1) *
>>> i_entries + 8. In larger package files this would reduce the amount of
>>> data parsing by a long shot. Also, more powerful scripting languages
>>> like Perl, Python, or smart parsers in C could make short work of this
>>> data and just extract the MD5 elements for comparison.
>>>
>>>   Also, by doing a little extra work when creating packages by
>>> organizing all the sections together, I think that the file size could
>>> be reduced by a large degree.
>>>
>>>   Similar fields to @comment MD5 could be reduced I believe, but with
>>> less benefit maybe, other than just the @unexec rmdir, etc lines.
>>>
>>>   Either that, or the data should be organized into separate files I
>>> think (increases number of files, but reduces overall processing time IM=
O).

>> In some cases the order of data stored is important and thus it cannot be
>> seperated into section. Also, this layout allows for very simple  =20
>> parsing with
>> usual UNIX tools (sed, cut, awk, perl, simply everything). Unlike  =20
>> XML, which is
>> rather complex and thus does not belong into base, in my opinion.

We have libbsdxml in the base already (an old version of one in the ports).

>    I didn't say XML exactly. I say XML-like, with implied end and begin
> tags, but keeping with the Makefile like syntax of @MD5 ... @end MD5,
> or something similar.

The problem is, that a change would break existing installations, as =20
they can not cope with such a new format. Feel free to propose =20
improvements, but you need to keep in your mind, that any supported =20
FreeBSD release has to be able to install packages with only the =20
package tools available in the basesystem.

>    My point being is that the +CONTENTS file is bloated a lot by
> useless lines, and it would help speed up package processing if it was
> clipped or reduced somehow I would think.

You need to provide numbers. Without them this is pure speculation.

And you have to explain, why the current parsing routines can not be =20
speed up for the current format, maybe the implementation is just a =20
little bit outdated compared to todays parsing knowledge...

Bye,
Alexander.

--=20
Life is a grand adventure -- or it is nothing.
=09=09-- Helen Keller

http://www.Leidinger.net    Alexander @ Leidinger.net: PGP ID =3D B0063FE7
http://www.FreeBSD.org       netchild @ FreeBSD.org  : PGP ID =3D 72077137



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20070702115733.3fotau92scgs4g4s>