Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 14 Apr 2010 00:56:22 -0700
From:      Garrett Cooper <gcooper@FreeBSD.org>
To:        Garrett Cooper <gcooper@freebsd.org>
Cc:        David Forsythe <dforsyth@freebsd.org>, Florent Thoumie <flz@esat.net>, Tim Kientzle <kientzle@freebsd.org>, Perforce Change Reviews <perforce@freebsd.org>
Subject:   Re: PERFORCE change 176831 for review
Message-ID:  <g2v364299f41004140056pee1cde28laee26c8d5d1c9d19@mail.gmail.com>
In-Reply-To: <r2x364299f41004140053y7b43d454qb7804e111d1920a6@mail.gmail.com>
References:  <201004121230.o3CCUsIX029146@repoman.freebsd.org> <4BC3EB5B.5070801@freebsd.org> <l2y364299f41004122249q2f5734a7j89be807581e42dac@mail.gmail.com> <4BC53714.80805@freebsd.org> <r2x364299f41004140053y7b43d454qb7804e111d1920a6@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Wed, Apr 14, 2010 at 12:53 AM, Garrett Cooper <gcooper@freebsd.org> wrot=
e:
> On Tue, Apr 13, 2010 at 8:31 PM, Tim Kientzle <kientzle@freebsd.org> wrot=
e:
>>>> ... I would also put in a simple verification
>>>> that the entry you found is a regular entry:
>>>> =A0archive_entry_filetype(entry) =3D=3D AE_IFREG
>>>
>>> Ok. What does a regular entry correspond to (I assume not a regular fil=
e)?
>>
>> Sorry, "regular entry" =3D=3D "regular file" (or at least, what
>> will become a regular file once you write it to disk, if you
>> want to be really pedantic about it ;-). =A0Coincidentally,
>> AE_IFREG =3D=3D S_IFREG. =A0The separate AE_IFxxx constants
>> are there mostly portability shims (e.g., Windows doesn't
>> have block devices, so doesn't define S_IFBLK,
>> but AE_IFBLK is always available).
>
> Ok. I think I got everything you were trying to say before, but just
> to be clear:
>
> To prevent race conditions [with regular files only], I should use
> open(2). For all other files I would use the standard extract method
> (so I don't get too fancy?). It seems like the file creation times
> should be sufficiently fast with other file types that this kind of
> behavior of open(2), blah wouldn't be required.
>
>>>>> + * NOTE: the exit code is 0 / 1 so that this can be fed directly int=
o
>>>>> exit
>>>>> + * when doing piped tar commands for copying hierarchies *hint*,
>>>>> *hint*.
>>>>
>>>> Why do you want to copy hierarchies?
>>>> Seems a waste of disk bandwidth. =A0*hint* *hint* ;-)
>>>
>>> There's a fair amount of tar cpf - -C <dir_a> . | tar xpf - -C <dir_b>
>>> in libinstall . This is being done to preserve file attributes,
>>> fflags, modes, ownership, permissions, etc. ... Installing directly to
>>> hierarchies works in theory as well, but it's
>>> considerably more dangerous unless you do an information transfer
>>> between the pkg contents and the install base, and unfortunately that
>>> is tough to achieve with 100% clarity from what I've seen.
>>
>> That's exactly what I was getting to: =A0Someday, pkg_add
>> should write the files directly to their final location
>> and avoid the copy. =A0(You're exactly right that you
>> don't want to build a "copy tree" facility.)
>>
>> Clearly, direct install is not something you want to
>> tackle in this stage of the rewrite. =A0There's a lot of
>> gains from exactly what you're doing.
>
> Yeah, I've been thinking about what you said... and there's ultimately
> nothing to be gained through a staged install (in fact there's a lot
> lost). So unless two files overwrite one another (which suggest broken
> or conflicting packages, or not so clever users), people shouldn't be
> installing conflicting packages or partially installing multiple
> versions of the same package unless they're using -f, and if they're
> doing that, they're basically damning the consequences of doing so.
>
> Until we have functional tests in place to actually test this though,
> I need to stick with a conservative, functional analog to piped
> tar(1)'ing though.
>
>> But I really do believe that single-pass direct
>> install is feasible and is eventually where we want
>> to be. =A0The key insight for me was when Florent recently
>> pointed out that you could just read the +CONTENTS,
>> then do a verify pass, then extract everything.
>> (Any conflict during the extraction pass
>> would be a fatal error: =A0delete everything
>> extracted so far and scream loudly.)
>
> Agreed on the final point. I'm kind of interested though about the
> first point -- the verify pass -- are you verifying that the contents
> are sane on the disk or in the tarball? If so, why not just verify the
> contents after the fact [with mtree -- it's has checksumming, mode and
> permissions checking, you name it... everything minus some special
> facilities like ACLs, fflags, etc -- I guess I'll need to add those],
> then roll everything back if it isn't?
>
> One the entire operation is atomic and exclusive -- only one package
> operation is allowed to run at any given time on a particular package,
> running into this problem with the packaging software will be a
> non-issue, but instead a problem with an external source (user, etc),
> or a bad package.
>
>> Dealing with dependent packages is still
>> a little tricky but I think it's all doable.
>
> Yeah, there's a crapload of overlap between multiple packages, like
> cad-salome, and with some packages where you're need to build up a
> hierarchy that's common between several packages, it becomes a royal
> pain in the ass, but it's required otherwise stuff won't get setup and
> torn down properly.
>
> So I'm guessing for right now we should just avoid squealing loudly
> except with regular files; maybe whining a bit with non-regular files
> would be a good idea though because it would force people to clean up
> packages, and then things could be completely fixed later on so that
> they're hard errors.
>
>>> I saw that in the original comments and while I think that you have a
>>> point, I still am leery about someone specifying a package with
>>> contents that start with + because it immediately breaks the
>>> assumption. ...
>>
>> Unless I'm mistaken, pkg_add has used "+*" to identify
>> metadata files for a long time, so I think your own
>> argument from compatibility might be against you here.
>> (Though I suspect that you're right in the long term that
>> we should be obeying the +CONTENTS declarations instead of
>> using filename conventions.)
>
> I still want to avoid this potential corner case. Users have an
> incapability to surprise developers, and I don't want to have to deal

s/in/unerring /

> with this item in the future. A few less CPU cycles processing strings
> is worth sacrificing over trying to be clever and producing a broken
> program.
>
>>> unpack_file, blah with the appropriate metadata filenames considering
>>> that it's a lot quicker than having to go through the entire tarball
>>> end to end (especially if it's a large tarfile like openoffice),
>>> unless someone unfortunately put the files at the end of the tarball
>>> (in that case they do need to modify how the package is created).
>>
>> As for the speed argument: =A0Do you really need to
>> extract the metadata files before you extract
>> everything else? =A0Those files are going to a
>> different place, but that's not an issue with a
>> libarchive-driven extraction. =A0At one point, I had
>> convinced myself that you could do it all in one
>> pass, just writing the files to different places
>> depending on whether they were metadata files
>> (however identified) or not. =A0I recall, for example,
>> that the +MTREE file needs to be used only at the
>> end of the extraction (to fix-up permissions).
>
> Yes. pkg_info's primary uses (not -W) read the metadata if the
> metadata isn't on the disk already in $PKGDBDIR; it's an optimization,
> and I don't want to reduce performance in pkg_info.
>
> Other than that, pkg_install really doesn't care one iota about those
> files (minus +CONTENTS).
>
>>> Thanks again for the comments :)... it would be helpful BTW if the
>>> manpages linked together because archive_read(3) doesn't reference the
>>> archive_read_disk(3) APIs, etc, so I kind of have to fish for the
>>> right APIs to use, and I feel like I'm catching more boots than fish
>>> sometimes... but it's a learning experience and I don't expect to get
>>> it right the first time.
>>
>> Please let me know any doc shortcomings; the documentation
>> could use a lot of work, I know. =A0I also have a growing
>> Examples page on the libarchive Wiki and conversations
>> like this help me a lot to better understand what needs
>> to go there.
>
> Will definitely help whenever I can :).
>
> Also, getting back to a topic you mentioned earlier, the issue with
> creating the decompressor multiple times can be alleviated like so:
>
> Make the basic extract API follow a format like so:
>
> struct package_archive;
>
> struct package_archive {
> =A0 =A0/*
> =A0 =A0 * File descriptor for the archive. Set to NULL if you don't want
> to reuse the
> =A0 =A0 * descriptor, i.e. want to only use the initializer once.
> =A0 =A0 */
> =A0 =A0int *pkg_fd;
> =A0 =A0/*
> =A0 =A0 * Compare a package filename vs an expression.
> =A0 =A0 *
> =A0 =A0 * Could be a small wrapper above fnmatch (for simple glob express=
ions),
> =A0 =A0 * strncmp (for exact matches -- would be set to PATH_MAX),
> glob(3), or it could
> =A0 =A0 * always return 0 (for match all). Would return 0 if the files
> matched, non-zero
> =A0 =A0 * if they didn't match.
> =A0 =A0 *
> =A0 =A0 * This would help us avoid hardcoding and at the small cost of a =
stack
> =A0 =A0 * push, pop, the additional overhead would be fairly negligible I=
 would
> =A0 =A0 * think, and could potentially be optimized out using a smart com=
piler.
> =A0 =A0 */
> =A0 =A0int (*pkg_file_comparator) (const struct *package_archive, const c=
har *);
> };
>
> =A0 =A0I'd need to develop new and free `methods', but that's the general
> idea I had in mind.
> Thanks!
> -Garrett
>



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?g2v364299f41004140056pee1cde28laee26c8d5d1c9d19>