Date: Sat, 23 Jan 2021 11:14:41 +0100 From: Polytropon <freebsd@edvax.de> To: "Steve O'Hara-Smith" <steve@sohara.org> Cc: freebsd-questions@freebsd.org Subject: Re: Convert PDF to Excel Message-ID: <20210123111441.b8c5de4e.freebsd@edvax.de> In-Reply-To: <20210123090421.7fb3ede1754fe280b685f83c@sohara.org> References: <CAAdA2WPoqEaew-OuDwAJ4pTbNUJsAzc2MpZE9di5HrJfGu%2Bexw@mail.gmail.com> <20210123054209.f03ac420.freebsd@edvax.de> <CAAdA2WP%2BAh6-9pFdB4VJg5asxqHKpEUNOrtxY0TsT9PVpWu26w@mail.gmail.com> <20210123094041.f932fd4c.freebsd@edvax.de> <20210123090421.7fb3ede1754fe280b685f83c@sohara.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On Sat, 23 Jan 2021 09:04:21 +0000, Steve O'Hara-Smith wrote: > On Sat, 23 Jan 2021 09:40:41 +0100 > Polytropon <freebsd@edvax.de> wrote: > > > They contain text, so the OCR problem is out of the way. > > Sadly, the text is re-arranged so the optimal solution (one > > line in a table equals one line of text, with the columns > > being separated by whitespace) does not appear, instead it > > is the other way round: one line equals one column. > > I spy a fun interview question buried in this problem - flipping a > text file like that efficiently is far from easy - dead easy if you > don't mind eating memory of course. The lesson to learn for this potential interview question simply is RTFM; from "man pdftotext": -layout will try its best to preserve the original display in the raw output. So data that is in lines, but arranged to columns, will then be output as columns; each "dataset" is one line. -- Polytropon Magdeburg, Germany Happy FreeBSD user since 4.0 Andra moi ennepe, Mousa, ...
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20210123111441.b8c5de4e.freebsd>