Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 18 Sep 2002 19:35:12 +0300 (EEST)
From:      Alexandr Kovalenko <never@nevermind.kiev.ua>
To:        FreeBSD-gnats-submit@FreeBSD.org
Subject:   ports/42931: New port: textproc/enca: detects file encoding
Message-ID:  <200209181635.g8IGZCdt087457@mile.nevermind.kiev.ua>

next in thread | raw e-mail | index | archive | help

>Number:         42931
>Category:       ports
>Synopsis:       New port: textproc/enca: detects file encoding
>Confidential:   no
>Severity:       non-critical
>Priority:       low
>Responsible:    freebsd-ports
>State:          open
>Quarter:        
>Keywords:       
>Date-Required:
>Class:          change-request
>Submitter-Id:   current-users
>Arrival-Date:   Wed Sep 18 09:40:02 PDT 2002
>Closed-Date:
>Last-Modified:
>Originator:     Alexandr Kovalenko
>Release:        FreeBSD 4.7-RC i386
>Organization:
Net.Style
>Environment:
System: FreeBSD mile.nevermind.kiev.ua 4.7-RC FreeBSD 4.7-RC #0: Wed Sep 18 12:04:53 EEST 2002 root@mile.nevermind.kiev.ua:/usr/obj/usr/src/sys/mile i386

>Description:
WWW: http://www.physics.muni.cz/~yeti/software/enca.shtml

Enca is an Extremely Naive Charset Analyser. It detects encoding of text files
and is also able to convert them to other encodings.

Enca currently can determine 8bit charsets of Belarussian, Czech, Polish,
Russian, Slovak and Ukrainian texts and also some multibyte encodings,
independently on language (provided it's some European language). The main
features include:

    * recognises following 8bit charsets:
		  o Belarussian: CP1251, IBM866, ISO-8859-5, KOI8-UNI, maccyr, IBM855
          o Czech: ISO-8859-2, KEYBCS2, IBM852, macce, KOI-8_CS_2, CP1250
		  o Polish: ISO-8859-2, IBM852, macce, ISO-8859-13, ISO-8859-16,
			CP1250, baltic
          o Russian: KOI8-R, IBM866, CP1251, ISO-8859-5, maccyr
          o Slovak: CP1250, KEYBCS2, IBM852, macce, KOI-8_CS_2, ISO-8859-2
          o Ukrainian: CP1251, IBM855, ISO-8859-5, KOI8-U, maccyr, CP1125

	* recognises several multibyte encodings: UCS-2, UCS-4, UTF-8, UTF-7 and
	  TeX accents
	* recognises all common EOL types, byte orders and also Quoted-printables
	* can report charset names after various conventions (or programs) as well
	  as human-readable descriptions; accepts all common charset aliases
	* works with multiple files and can act as an intelligent filter
	* converts files using a built-in convertor, GNU recode library, UNIX98
	  iconv functions or some external convertor that can be specified on
	  command line (e.g. cstocs, GNU recode)
    * has a special ambiguous mode for very short texts
	* can filter out binary parts of file and/or box drawing characters before
	  guessing so it can determine encoding of pretty messy files
	* uses various tricks to solve hardly decidable cases like distinguishing
	  between iso8859-2/cp1250, etc.
>How-To-Repeat:
N/A
>Fix:
# This is a shell archive.  Save it in a file, remove anything before
# this line, and then unpack it by entering "sh file".  Note, it may
# create directories; files and directories will be owned by you and
# have default permissions.
#
# This archive contains:
#
#	enca
#	enca/files
#	enca/files/patch-lib::encnames.c
#	enca/pkg-comment
#	enca/pkg-descr
#	enca/pkg-plist
#
echo c - enca
mkdir -p enca > /dev/null 2>&1
echo c - enca/files
mkdir -p enca/files > /dev/null 2>&1
echo x - enca/files/patch-lib::encnames.c
sed 's/^X//' >enca/files/patch-lib::encnames.c << 'END-of-enca/files/patch-lib::encnames.c'
X--- lib/encnames.c.orig	Sun Aug 18 13:05:20 2002
X+++ lib/encnames.c	Wed Sep 18 17:36:39 2002
X@@ -25,7 +25,7 @@
X 
X #include "enca.h"
X #include "internal.h"
X-#include "encodings.h"
X+#include "tools/encodings.h"
X 
X #define NCHARSETS (sizeof(CHARSET_INFO)/sizeof(EncaCharsetInfo))
X #define NALIASES (sizeof(ALIAS_LIST)/sizeof(char *))
END-of-enca/files/patch-lib::encnames.c
echo x - enca/pkg-comment
sed 's/^X//' >enca/pkg-comment << 'END-of-enca/pkg-comment'
XDetects encoding of text files
END-of-enca/pkg-comment
echo x - enca/pkg-descr
sed 's/^X//' >enca/pkg-descr << 'END-of-enca/pkg-descr'
XEnca currently can determine 8bit charsets of Belarussian, Czech, Polish,
XRussian, Slovak and Ukrainian texts and also some multibyte encodings,
Xindependently on language (provided it's some European language).
X
XWWW: http://www.physics.muni.cz/~yeti/software/enca.shtml
X
X- Alexandr "Nevermind" Kovalenko
Xnever@nevermind.kiev.ua
END-of-enca/pkg-descr
echo x - enca/pkg-plist
sed 's/^X//' >enca/pkg-plist << 'END-of-enca/pkg-plist'
Xbin/b-cstocs
Xbin/b-map
Xbin/b-recode
Xbin/enca
Xbin/enconv
Xinclude/enca.h
Xlib/libenca.so.1
Xlib/libenca.so
Xlib/libenca.la
Xlib/libenca.a
Xshare/doc/enca/libenca/c1197.html
Xshare/doc/enca/libenca/c4.html
Xshare/doc/enca/libenca/index.html
Xshare/doc/enca/libenca/libenca-analyser.html
Xshare/doc/enca/libenca/libenca-auxiliary-functions.html
Xshare/doc/enca/libenca/libenca-charsets-and-surfaces.html
Xshare/doc/enca/libenca/libenca-internal-functions.html
Xshare/doc/enca/libenca/libenca-typedefs-and-constants.html
Xshare/doc/enca/libenca/index.sgml
X@dirrm share/doc/enca/libenca
X@dirrm share/doc/enca
END-of-enca/pkg-plist
exit

>Release-Note:
>Audit-Trail:
>Unformatted:

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-ports" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200209181635.g8IGZCdt087457>