Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 28 Jan 2009 11:22:11 -0800
From:      Gary Kline <kline@thought.org>
To:        Reko Turja <reko.turja@liukuma.net>
Cc:        FreeBSD Mailing List <freebsd-questions@FreeBSD.ORG>
Subject:   Re: OCR...
Message-ID:  <20090128192211.GB22208@thought.org>
In-Reply-To: <319D789FD18042DBB7A19571DA26E5AE@rivendell>
References:  <20090128040802.GA94236@thought.org> <319D789FD18042DBB7A19571DA26E5AE@rivendell>

next in thread | previous in thread | raw e-mail | index | archive | help
On Wed, Jan 28, 2009 at 12:08:55PM +0200, Reko Turja wrote:
> >so what is the best commercial/shareware that can read a 10pt-font
> >file?  (( also, when i have time to get back into actually hacking,
> >this [[turning imaged pdf into OCR'able ascii or 8859-1]] is giong 
> >to
> >be a first target.  any idea which team i should go with.  gOCR 
> >looks
> >best so far to me.
> 
> AABBYY Finereader - Omnipage haven't been able to catch it in several 
> years either feature or qualitywise. No idea if Finereader runs under 
> emulator though.  If the file is already a PDF and 72 DPI with text as 
> graphics most of the damage has already been done, and it will be 
> extremely hard to OCR.
> 

	well, damage is probably done.  how can i check the resolution?
	i tried to increase it by creating huge ppm and tif files, but
	then that's really absurd since there can only be just so much
	data per image.  i _could_ try xv and jpeg and smoothing image to
	refine, but too much hassle.  

	(i used gocr -m 130 and "saw" the glyphs it (presumably) saw.
	seemed pretty much okay to my eyes.  but then i'm not a computer
	program.  [MAYBE :)]

	gary



> -Reko 
> 

-- 
 Gary Kline  kline@thought.org  http://www.thought.org  Public Service Unix
        http://jottings.thought.org   http://transfinite.thought.org
    The 2.23a release of Jottings: http://jottings.thought.org/index.php




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20090128192211.GB22208>