Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 13 May 1999 21:14:58 +0100
From:      Nik Clayton <nik@nothing-going-on.demon.co.uk>
To:        doc@freebsd.org, freebsd-translate@ngo.org.uk
Subject:   FDP Directory Reorganisation
Message-ID:  <19990513211458.B70767@catkin.nothing-going-on.org>

next in thread | raw e-mail | index | archive | help
Folks,

[ Sent to:

    doc@freebsd.org        FreeBSD Documentation Project

    freebsd-translate@ngo.org.uk      
                           FreeBSD Translation Teams

  Bcc'd (so that they don't get caught up in people doing "group replies"
  unless they want to) to

    ache@freebsd.org       Andrey Chernov, listed as being responsible for
                           "Internationalization" in the FreeBSD Handbook.

    terry@lambert.org      Terry Lambert.  My addled brain recalls that
                           Terry's occasionally posted messages about
                           i18n and l10n issues to the FreeBSD mailing 
                           lists, and I thought he might have useful input
                           to make. 

  To Andrey and Terry; I'd be particularly interested in your thoughts
  about the sense and viability of organising the documentation by language
  and character set encoding that I outline here. ]
 
   
   This is an attempt to put down all my thoughts about my plans for an
   FDP directory reorganisation down so they can be critiqued. Comments
   welcome to the <doc@freebsd.org> mailing list please.
     _________________________________________________________________
   
Overview

   The FreeBSD Documentation Project (FDP) directory structure has grown
   haphazardly over time. This was tolerable when the FDP repository only
   contained English versions of the documentation. However, as more
   translations are added to the repository it becomes important to have
   a consistent directory naming scheme followed by each translation.
   
   A consistent directory naming scheme will make it easier to write
   software that can automatically process FDP documentation without
   needing to be configured as to exactly where that documentation is in
   the tree; automated tools will be able to deduce this. Moving existant
   content that conflicts with this scheme will make automated tools
   simpler, as they will not need to handle exceptions to the rules.
   
   Finally, a consistent approach is much easier to document and to
   learn. Anything that can reduce the learning curve required before
   people can contribute to the FDP is a good thing.
     _________________________________________________________________
   
Current situation

   At the time of writing, the doc/ repository contains the following
   directories (ignoring empty directories);
    doc/
        FAQ/
        en/
           handbook/
           tutorials/
                     docproj-primer/
                     fonts/
                     ...
           share/
                 sgml/
         es/
            FAQ/
         ja/
            FAQ/
            handbook/
            man/
         ru/
            FAQ/
         share/
               sgml/
               mk/
         zh/
            FAQ/

   There are a number of anomalies and potential problems with this
   structure. It also gets a few things right.
     * doc/FAQ is out of place. It is the English version of the FreeBSD
       FAQ, and is a holdover from when the repository only contained the
       English documentation.
     * The English tutorials are one level lower in the tree than the
       English Handbook. Any commands to process the documentation that
       rely on relative paths will need to ensure that this is
       compensated for before running the command. See the current
       DOC_PREFIX kludge for an example of this.
     * Some of the documentation in tutorials/ should not be considered
       to be tutorials. A more neutral term would better describe the
       content.
     * No attempt is made to specify the character set used to write the
       documentation. While this is not a problem for the English
       translation, other languages, such as Japanese, Korean, and
       Chinese, have multiple character sets that could be used to encode
       the documentation. Some way of differentiating between these
       character sets should be provided, as should a mechanism for
       allowing multiple translations to the same language differing only
       in the choice of character set.
     * There is a proposed plan to split the Handbook up, and replace it
       with a number of smaller books with a tighter focus. The existing
       layout does not support this approach at all.
     * The use of share/ directories to contain files that are language
       neutral (in the first case) or can be used by all translations to
       a specific language (in the second case) is a good idea.
     _________________________________________________________________
   
The change

   Migrate to a new directory structure that follows this layout;
    doc/
        <lang>/
               <charset>/
                         articles/
                                  fonts/
                                  ...
                         books/
                               FAQ/
                               FDP-primer/
                               printing/
                               ...
                         man/
                             ...
                         share/
                               sgml/
            share/
                  ...
         share/
               sgml/
                    ...
               mk/
                  ...

   There are two top level directories. <lang> represents the language
   code, as we currently use it. The language codes are defined in
   ISO639, which can be found in /usr/share/misc/iso639 on a relatively
   recent FreeBSD system.
   
   The second top level directory is share/, which will contain language
   neutral files.
   
   Under each <lang> directory is at least one directory named after the
   character set encoding used. This approach will be followed even if
   there is only one character set that could be considered ``standard''
   for that language.
   
   I understand that for some languages (such as English) this introduces
   an additional directory where one is not strictly needed. However,
   this will ensure that the SGML source files are kept at the same level
   in the directory tree relative to one another. This helps avoid
   ambiguities with relative paths, and the need to special-case between
   languages that have multiple possible character sets, and languages
   that need only one character set. After all, a language with one
   character set is just a subset of a language that can be encoded in
   multiple character sets.
   
   There might also be a share/ directory at this level as well, to
   contain files that can be shared by all translations to this language,
   regardless of character set.
   
   Below the <charset> directory the documentation is categorised
   further. There are three categories that each document might be in;
   
   articles/
          An article is a short piece of documentation (although
          ``short'' is a relative term). In general, if the documentation
          does not contain any chapters then it is an article, and should
          be placed in a subdirectory of this directory.
          
          ``article'' is a neutral term that does not convey information
          about about the nature of the information contained within the
          article (unlike ``tutorials'').
          
          Examples of existing documentation that would fall in to this
          category are;
          
          + Using FreeBSD with other Operating Systems
          + ``Making the world'' your own
          + This document.
            
   books/
          Books are longer sets of documentation, characterised by their
          organisation in to multiple chapters.
          
          Examples of existing documentation that would fall in to this
          category are;
          
          + FreeBSD FAQ
          + FreeBSD Handbook
          + FDP Primer
            
   man/
          The system manual pages, translated to the target language.
          
          While it is feasible that the English manual pages could move
          out of the src/ repository and in to doc/, I don't see this
          actually happening any time soon (certainly not within my life
          time). The historical pressure to keep them in src/ is too
          great.
          
   share/
          Content that can be shared between different documentation, but
          is language and character set specific.
          
          For example, as a translation team translates the documentation
          there will be sections that haven't been translated yet. You
          can put the translation of the phrase ``This section has not
          been translated yet'' into a file in this directory, and then
          use a general entity to include it in all the documentation
          where it is necessary.
          
   So, there are three levels of shared content between the language
   projects; content that is shared globally (doc/share/), content that
   is specific to a particular language (doc/<lang>/share/), and content
   that is specific to a particular language and character set encoding
   (doc/<lang>/<charset>/share/).
   
   Each one of these share/ directories can (and will) contain
   subdirectories. share/sgml/ for SGML content, share/mk/ for
   includeable Makefiles, and so on.
   
   Based on the current doc/, the converted directory structure will look
   like this.
    doc/
        en/
           share/
                 sgml/
           iso8859-1/
                     articles/
                              writing-device-drivers/
                              programming-tools/
                              formatting-media/
                              ...
                     books/
                           FAQ/
                           FDP-primer/
                           handbook/
                           ...
                     share/
                           sgml/
        ja/
           share/
                 sgml/
           euc-jp/
                  books/
                        FAQ/
                        handbook/
                        ...
                  man/
                      ...
                  share/
                        sgml/
        zh/
           share/
                 sgml/
           big5/
                books/
                      FAQ/
                share/
                      sgml
           gb/
              books/
                    FAQ/
              share/
                    sgml
        fr/
           share/
                 sgml/
           iso8859-1/
                     books/
                           handbook/
                     share/
                           sgml
        share/
              sgml/
              mk/

   I don't know everthing I need to know about i18n and l10n yet, so
   there may be some problems with the above example. For example, is
   euc-jp the correct name to use for a character set, or is there a more
   precise term for it (perhaps an ISO number?) that should be used
   instead?
     _________________________________________________________________
   
Making the change

   This is quite a large change, and will need careful thought about how
   to carry it out. In particular, we want to avoid bloating the CVS
   repository any more than we have to.
   
   How files are moved will depend on their current DTD.
   
   All documentation that is already marked up according to the DocBook
   DTD (and the manual pages) can be moved within the repository by the
   repository managers (Peter Wemm and John Polstra). Some of the
   Makefiles will then need small changes made to them to reflect the
   directory names, but that should be about all.
   
   All documentation that is marked up according to the LinuxDoc DTD is
   treated differently. The original files are left where they are. Then,
   when the documentation is converted to DocBook the original LinuxDoc
   files are left, and the new DocBook files will be stored in the new
   directories as appropriate. We will then have two versions of the
   document in the repository, one marked up in LinuxDoc, one in DocBook.
   The Makefiles can continue to point to the LinuxDoc version until the
   DocBook conversion has completed. When the DocBook conversion has been
   completed the LinuxDoc version can be removed.
   
   The conversion will be complete when the last piece of LinuxDoc
   documentation has been removed from the tree.
     _________________________________________________________________
   
Additional resources

   I've found the following links useful while trying to find out more
   information about i18n and l10n.
   
   http://czyborra.com/charsets/
          Lots of information about different character sets, the
          iso8859* characters, and so on.
          
   http://www.ora.com/people/authors/lunde/cjk_inf.html
          The Chinese, Japanese, Korean information page has lots of
          information about how to encode these languages.
          
   http://www.vlsivie.tuwien.ac.at/mike/8bit/FAQ-ISO-8859-1
          The ISO8859-1 FAQ contains useful inforamtion.
-- 
    There's some milk in the fridge about to go off. . . and there it goes.


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-doc" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?19990513211458.B70767>