Date: Wed, 14 Apr 2010 00:56:22 -0700 From: Garrett Cooper <gcooper@FreeBSD.org> To: Garrett Cooper <gcooper@freebsd.org> Cc: David Forsythe <dforsyth@freebsd.org>, Florent Thoumie <flz@esat.net>, Tim Kientzle <kientzle@freebsd.org>, Perforce Change Reviews <perforce@freebsd.org> Subject: Re: PERFORCE change 176831 for review Message-ID: <g2v364299f41004140056pee1cde28laee26c8d5d1c9d19@mail.gmail.com> In-Reply-To: <r2x364299f41004140053y7b43d454qb7804e111d1920a6@mail.gmail.com> References: <201004121230.o3CCUsIX029146@repoman.freebsd.org> <4BC3EB5B.5070801@freebsd.org> <l2y364299f41004122249q2f5734a7j89be807581e42dac@mail.gmail.com> <4BC53714.80805@freebsd.org> <r2x364299f41004140053y7b43d454qb7804e111d1920a6@mail.gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On Wed, Apr 14, 2010 at 12:53 AM, Garrett Cooper <gcooper@freebsd.org> wrot= e: > On Tue, Apr 13, 2010 at 8:31 PM, Tim Kientzle <kientzle@freebsd.org> wrot= e: >>>> ... I would also put in a simple verification >>>> that the entry you found is a regular entry: >>>> =A0archive_entry_filetype(entry) =3D=3D AE_IFREG >>> >>> Ok. What does a regular entry correspond to (I assume not a regular fil= e)? >> >> Sorry, "regular entry" =3D=3D "regular file" (or at least, what >> will become a regular file once you write it to disk, if you >> want to be really pedantic about it ;-). =A0Coincidentally, >> AE_IFREG =3D=3D S_IFREG. =A0The separate AE_IFxxx constants >> are there mostly portability shims (e.g., Windows doesn't >> have block devices, so doesn't define S_IFBLK, >> but AE_IFBLK is always available). > > Ok. I think I got everything you were trying to say before, but just > to be clear: > > To prevent race conditions [with regular files only], I should use > open(2). For all other files I would use the standard extract method > (so I don't get too fancy?). It seems like the file creation times > should be sufficiently fast with other file types that this kind of > behavior of open(2), blah wouldn't be required. > >>>>> + * NOTE: the exit code is 0 / 1 so that this can be fed directly int= o >>>>> exit >>>>> + * when doing piped tar commands for copying hierarchies *hint*, >>>>> *hint*. >>>> >>>> Why do you want to copy hierarchies? >>>> Seems a waste of disk bandwidth. =A0*hint* *hint* ;-) >>> >>> There's a fair amount of tar cpf - -C <dir_a> . | tar xpf - -C <dir_b> >>> in libinstall . This is being done to preserve file attributes, >>> fflags, modes, ownership, permissions, etc. ... Installing directly to >>> hierarchies works in theory as well, but it's >>> considerably more dangerous unless you do an information transfer >>> between the pkg contents and the install base, and unfortunately that >>> is tough to achieve with 100% clarity from what I've seen. >> >> That's exactly what I was getting to: =A0Someday, pkg_add >> should write the files directly to their final location >> and avoid the copy. =A0(You're exactly right that you >> don't want to build a "copy tree" facility.) >> >> Clearly, direct install is not something you want to >> tackle in this stage of the rewrite. =A0There's a lot of >> gains from exactly what you're doing. > > Yeah, I've been thinking about what you said... and there's ultimately > nothing to be gained through a staged install (in fact there's a lot > lost). So unless two files overwrite one another (which suggest broken > or conflicting packages, or not so clever users), people shouldn't be > installing conflicting packages or partially installing multiple > versions of the same package unless they're using -f, and if they're > doing that, they're basically damning the consequences of doing so. > > Until we have functional tests in place to actually test this though, > I need to stick with a conservative, functional analog to piped > tar(1)'ing though. > >> But I really do believe that single-pass direct >> install is feasible and is eventually where we want >> to be. =A0The key insight for me was when Florent recently >> pointed out that you could just read the +CONTENTS, >> then do a verify pass, then extract everything. >> (Any conflict during the extraction pass >> would be a fatal error: =A0delete everything >> extracted so far and scream loudly.) > > Agreed on the final point. I'm kind of interested though about the > first point -- the verify pass -- are you verifying that the contents > are sane on the disk or in the tarball? If so, why not just verify the > contents after the fact [with mtree -- it's has checksumming, mode and > permissions checking, you name it... everything minus some special > facilities like ACLs, fflags, etc -- I guess I'll need to add those], > then roll everything back if it isn't? > > One the entire operation is atomic and exclusive -- only one package > operation is allowed to run at any given time on a particular package, > running into this problem with the packaging software will be a > non-issue, but instead a problem with an external source (user, etc), > or a bad package. > >> Dealing with dependent packages is still >> a little tricky but I think it's all doable. > > Yeah, there's a crapload of overlap between multiple packages, like > cad-salome, and with some packages where you're need to build up a > hierarchy that's common between several packages, it becomes a royal > pain in the ass, but it's required otherwise stuff won't get setup and > torn down properly. > > So I'm guessing for right now we should just avoid squealing loudly > except with regular files; maybe whining a bit with non-regular files > would be a good idea though because it would force people to clean up > packages, and then things could be completely fixed later on so that > they're hard errors. > >>> I saw that in the original comments and while I think that you have a >>> point, I still am leery about someone specifying a package with >>> contents that start with + because it immediately breaks the >>> assumption. ... >> >> Unless I'm mistaken, pkg_add has used "+*" to identify >> metadata files for a long time, so I think your own >> argument from compatibility might be against you here. >> (Though I suspect that you're right in the long term that >> we should be obeying the +CONTENTS declarations instead of >> using filename conventions.) > > I still want to avoid this potential corner case. Users have an > incapability to surprise developers, and I don't want to have to deal s/in/unerring / > with this item in the future. A few less CPU cycles processing strings > is worth sacrificing over trying to be clever and producing a broken > program. > >>> unpack_file, blah with the appropriate metadata filenames considering >>> that it's a lot quicker than having to go through the entire tarball >>> end to end (especially if it's a large tarfile like openoffice), >>> unless someone unfortunately put the files at the end of the tarball >>> (in that case they do need to modify how the package is created). >> >> As for the speed argument: =A0Do you really need to >> extract the metadata files before you extract >> everything else? =A0Those files are going to a >> different place, but that's not an issue with a >> libarchive-driven extraction. =A0At one point, I had >> convinced myself that you could do it all in one >> pass, just writing the files to different places >> depending on whether they were metadata files >> (however identified) or not. =A0I recall, for example, >> that the +MTREE file needs to be used only at the >> end of the extraction (to fix-up permissions). > > Yes. pkg_info's primary uses (not -W) read the metadata if the > metadata isn't on the disk already in $PKGDBDIR; it's an optimization, > and I don't want to reduce performance in pkg_info. > > Other than that, pkg_install really doesn't care one iota about those > files (minus +CONTENTS). > >>> Thanks again for the comments :)... it would be helpful BTW if the >>> manpages linked together because archive_read(3) doesn't reference the >>> archive_read_disk(3) APIs, etc, so I kind of have to fish for the >>> right APIs to use, and I feel like I'm catching more boots than fish >>> sometimes... but it's a learning experience and I don't expect to get >>> it right the first time. >> >> Please let me know any doc shortcomings; the documentation >> could use a lot of work, I know. =A0I also have a growing >> Examples page on the libarchive Wiki and conversations >> like this help me a lot to better understand what needs >> to go there. > > Will definitely help whenever I can :). > > Also, getting back to a topic you mentioned earlier, the issue with > creating the decompressor multiple times can be alleviated like so: > > Make the basic extract API follow a format like so: > > struct package_archive; > > struct package_archive { > =A0 =A0/* > =A0 =A0 * File descriptor for the archive. Set to NULL if you don't want > to reuse the > =A0 =A0 * descriptor, i.e. want to only use the initializer once. > =A0 =A0 */ > =A0 =A0int *pkg_fd; > =A0 =A0/* > =A0 =A0 * Compare a package filename vs an expression. > =A0 =A0 * > =A0 =A0 * Could be a small wrapper above fnmatch (for simple glob express= ions), > =A0 =A0 * strncmp (for exact matches -- would be set to PATH_MAX), > glob(3), or it could > =A0 =A0 * always return 0 (for match all). Would return 0 if the files > matched, non-zero > =A0 =A0 * if they didn't match. > =A0 =A0 * > =A0 =A0 * This would help us avoid hardcoding and at the small cost of a = stack > =A0 =A0 * push, pop, the additional overhead would be fairly negligible I= would > =A0 =A0 * think, and could potentially be optimized out using a smart com= piler. > =A0 =A0 */ > =A0 =A0int (*pkg_file_comparator) (const struct *package_archive, const c= har *); > }; > > =A0 =A0I'd need to develop new and free `methods', but that's the general > idea I had in mind. > Thanks! > -Garrett >
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?g2v364299f41004140056pee1cde28laee26c8d5d1c9d19>