Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 23 Jan 2021 11:14:41 +0100
From:      Polytropon <freebsd@edvax.de>
To:        "Steve O'Hara-Smith" <steve@sohara.org>
Cc:        freebsd-questions@freebsd.org
Subject:   Re: Convert PDF to Excel
Message-ID:  <20210123111441.b8c5de4e.freebsd@edvax.de>
In-Reply-To: <20210123090421.7fb3ede1754fe280b685f83c@sohara.org>
References:  <CAAdA2WPoqEaew-OuDwAJ4pTbNUJsAzc2MpZE9di5HrJfGu%2Bexw@mail.gmail.com> <20210123054209.f03ac420.freebsd@edvax.de> <CAAdA2WP%2BAh6-9pFdB4VJg5asxqHKpEUNOrtxY0TsT9PVpWu26w@mail.gmail.com> <20210123094041.f932fd4c.freebsd@edvax.de> <20210123090421.7fb3ede1754fe280b685f83c@sohara.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On Sat, 23 Jan 2021 09:04:21 +0000, Steve O'Hara-Smith wrote:
> On Sat, 23 Jan 2021 09:40:41 +0100
> Polytropon <freebsd@edvax.de> wrote:
> 
> > They contain text, so the OCR problem is out of the way.
> > Sadly, the text is re-arranged so the optimal solution (one
> > line in a table equals one line of text, with the columns
> > being separated by whitespace) does not appear, instead it
> > is the other way round: one line equals one column.
> 
> 	I spy a fun interview question buried in this problem - flipping a
> text file like that efficiently is far from easy - dead easy if you
> don't mind eating memory of course.

The lesson to learn for this potential interview question
simply is RTFM; from "man pdftotext": -layout will try its
best to preserve the original display in the raw output.
So data that is in lines, but arranged to columns, will
then be output as columns; each "dataset" is one line.



-- 
Polytropon
Magdeburg, Germany
Happy FreeBSD user since 4.0
Andra moi ennepe, Mousa, ...



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20210123111441.b8c5de4e.freebsd>