Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 12 Aug 2016 10:04:31 +0200
From:      Polytropon <freebsd@edvax.de>
To:        galtsev@kicp.uchicago.edu
Cc:        freebsd-questions@freebsd.org
Subject:   Re: script to make webpage snapshot
Message-ID:  <20160812100431.8af84eeb.freebsd@edvax.de>
In-Reply-To: <33717.128.135.52.6.1470954497.squirrel@cosmo.uchicago.edu>
References:  <33717.128.135.52.6.1470954497.squirrel@cosmo.uchicago.edu>

next in thread | previous in thread | raw e-mail | index | archive | help
On Thu, 11 Aug 2016 17:28:17 -0500 (CDT), Valeri Galtsev wrote:
> Dear Experts,
> 
> Could someone recommend a script or utility one can run from command line
> on Linux or UNIX machine to make a snapshot image of webpage?

When you say "snapshot", what exactly do you mean? I'm not sure
I understand your description correctly. Is a snapshot

(a) a _visual_ snapshot (image format or PDF) of how the web page
    renders inside a web browser, or

(b) an exactl local _copy_ (files and directories) on your disk?

For option (a), lang/phantomjs has been suggested. Check the
mailing list archives - I've been asking that kind if question
some years ago, but I cannot remember (or even find) the answers
I got. ;-)

For option (b), wget probably isn't bad, as long as you add some
options to avoid unneeded traffic, such as

	% wget -r -l 0 -k -nc <source>

If you are interested only in a specific sub-path, or subset of
file types (or want to reject them), use the -A or -R options.
Use -U to set the user agent string to a "real" web browser if
needed. See "man wget" for details.

This set of options should provide the ability to only "snapshot"
those elements of the web page content that have been changed.
Things you already have on your local disk won't be downloaded.



-- 
Polytropon
Magdeburg, Germany
Happy FreeBSD user since 4.0
Andra moi ennepe, Mousa, ...



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20160812100431.8af84eeb.freebsd>