Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 16 May 2001 19:16:55 GMT
From:      Salvo Bartolotta <bartequi@inwind.it>
To:        freebsd-questions@FreeBSD.ORG
Subject:   Re: Manipulating pdf/ps files -- closer to a solution
Message-ID:  <20010516.19165500@bartequi.ottodomain.org>
References:  <20010513.18294500@bartequi.ottodomain.org> <20010515.1075700@bartequi.ottodomain.org>

next in thread | previous in thread | raw e-mail | index | archive | help
>>>>>>>>>>>>>>>>>> Original Message <<<<<<<<<<<<<<<<<<

On 5/15/01, 3:07:57 AM, Salvo Bartolotta <bartequi@inwind.it> wrote
regarding Re: Manipulating pdf/ps files:


> >>>>>>>>>>>>>>>>>> Original Message <<<<<<<<<<<<<<<<<<

> On 5/13/01, 8:29:45 PM, Salvo Bartolotta <bartequi@inwind.it> wrote
> regarding Manipulating pdf/ps files:


> > Dear FreeBSD'ers,

> > I would like to perform such operations as the following:

> > -- merge PDF/ps files
> > -- modify PDF/ps files in a more or less "graphical" (read:
> > human-understandable) fashion
> > -- convert PDF/ps files to other formats (eg text).

> > Browsing the archives, I learnt about pdf2ps, ps2pdf, pstotext and
> > psutils (both in the ports). I had also browsed the ports tree as we=
ll as
> > the Doc-primer, but I am probably missing something trivial here.

> > I have found some difficulties: eg, psmerge seems not to work on a f=
ew ps
> > files, which files I downloaded (originally as PDF files) from a www=

> > site. I have reason to believe those files were generated from one m=
ain
> > file (containing data arranged in a table) split into several pieces=
,
> > BTW. I couldn't convert the ps files to txt, either: pstotext genera=
ted
> > strings of hashes (the "#" character).




> I meet with problems when trying to convert PDF/ps files containing da=
ta
> arranged in a table, each raw of data being preceded as well as follow=
ed
> by a (continuous) horizontal line like this (the data were probably
> formatted with M$ excel):

> -------------------------------------------
> data data data...
> -------------------------------------------
> data data data...
> -------------------------------------------


> For example, running pdfinfo on one of the files spits out:

> Creator:      Windows NT 4.0
> Producer:     Acrobat Distiller 4.0 for Windows
> CreationDate: 20010511130351
> ModDate:      20010511130351+02'00'
> Pages:        60
> Encrypted:    no
> Linearized:   yes



> I tried xpdf (in the ports), namely pdftotext, but it didn't work.

> Summing up: I can convert those PDF files into ps, the information in =
the
> ps files IS displayed correctly, but I have managed to convert neither=

> the above-mentioned PDF nor ps files into plain text. There is a txt2p=
df
> utility on the Net, but I can't seem to find a **working** pdf2txt or
> ps2txt one. BTW, the "clipboard" (ie the mouse middle button) DOES cop=
y
> from Acrobat Reader (running in linux comp. layer) to other text edito=
rs
> within X, but it copies (raw) PDF data.




To whomever it may concern,

I keep replying to myself, but I seem to have made some progress.

I had successfully converted the PDF files into ps ones. The reason why
pstotext didn't work is probably that such files (eg a 60-page PDF file)=

are **images** (in the preceding example: a collection of 60 images, one=

per page, as was pointed out by ImageMagick). Which is also the reason
why pdftotext didn't work, BTW.

Since I had to deal with PDF "images", not "text" PDF files... I asked
(wait for it) ImageMagick for help :-)

convert <name_of_PDF_file_of_type_"image">  <name...jpg> DID work, and
created a collection of jpeg images (one per page). Thus, I can convert
PDF "images", or data acquired/manipulated/treated as such --
specifically, a M$ Excel table -- into other image formats.

<aside>pdftoimages does NOT seem to work, however</aside>

<question type=3D"dumb">
AAARGH! I am only missing the last step: how to recover text from eg suc=
h
jpeg images; and/or... which image format to choose in order to be able
to extract text from the images.
</question>

<advocacy>
Once again, I would very much like to work under FreeBSD, and NOT make
use of any M$-related product; the negation "NOT" extending from the
coasts of Western Europe to the Pacific coasts of USA -- just to make
sure that M$ is within the scope of negation :-)))

MTIA,
Salvo (with apologies for the dumb question)

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-questions" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20010516.19165500>