Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 13 Oct 2012 23:15:36 +0200
From:      Polytropon <freebsd@edvax.de>
To:        Gary Kline <kline@thought.org>
Cc:        FreeBSD Mailing List <freebsd-questions@freebsd.org>
Subject:   Re: editing pdf files
Message-ID:  <20121013231536.c703bc21.freebsd@edvax.de>
In-Reply-To: <20121013204701.GE14155@ethic.thought.org>
References:  <5074A6B9.8040209@dreamchaser.org> <5078641D.4050905@passap.ru> <20121012234628.GA11112@ethic.thought.org> <20121013131907.c666bfc2.freebsd@edvax.de> <20121013204701.GE14155@ethic.thought.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On Sat, 13 Oct 2012 13:47:01 -0700, Gary Kline wrote:
> On Sat, Oct 13, 2012 at 01:19:07PM +0200, Polytropon wrote:
> > On Fri, 12 Oct 2012 16:46:28 -0700, Gary Kline wrote:
> > 
> > The disassembling can be done with 
> > 
> > 	% pdfimages source.pdf .
> > 
> > Then the files can be edited whatever tool you like, e. g. Gimp.
> > They often come out in PBM format.
> > 
> 
> 
> 	A qstn I should have asked last time.  this book is a history or
> 	bio of richland county, ohio:: 	in type, it's like 650 or more
> 	pages.  SO: Is pdfimages going to spit of 6t50 files?  as noted 
> 	in last email, only  a couple of these images are of any interest 

Depends on what actually _is_ in the PDF file. If every page is
represented as a picture, 650 pictures will be created. If it
contains text _and_ images, the images will be output, if will
_only_ output the images, with no real realtion to where they
have been placed in the text. As suggested by the name "pdfimages"
it takes the images from the PDF file. :-)

The easiest way to check for possible text is to install xpdf
which brings the binary "pdftotext" (if I remember correctly that
this tool is in _that_ package). You can then use it like this:

	% pdftotext source.pdf

It will create "source.txt" with all actual text (but of course
without _any_ formatting except line breaks and ^L page breaks),
including page numbers. But hey, it's pure ASCII text suitable
for further processing. :-)

Run "pdftotext" without parameters for a short summary of its
parameters; "man pdftotext" is also provided.


-- 
Polytropon
Magdeburg, Germany
Happy FreeBSD user since 4.0
Andra moi ennepe, Mousa, ...



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20121013231536.c703bc21.freebsd>