Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 3 Aug 2012 14:33:04 +0100
From:      "Simon L. B. Nielsen" <simon@FreeBSD.org>
To:        Gabor Kovesdan <gabor@freebsd.org>
Cc:        doc@freebsd.org, www@freebsd.org
Subject:   Re: RFC: doc/www cleanup
Message-ID:  <CAC8HS2E2ekMKJgY04qPrQGbEe_tPJ%2BHrf5_ToERptf0yawYoQA@mail.gmail.com>
In-Reply-To: <501BAFBD.3010008@FreeBSD.org>
References:  <501BAFBD.3010008@FreeBSD.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On Fri, Aug 3, 2012 at 12:02 PM, Gabor Kovesdan <gabor@freebsd.org> wrote:

> 1, Removing emacs PSGML comments: PSGML is an emacs mode for SGML editing.
> It can be instructed to behave in a determined way by SGML comments or
> separately with a configuration file (described in fdp-primer). Our
> documentation is scattered by PSGML comments like this:
>
> <!--
>      Local Variables:
>      mode: sgml
>      sgml-indent-data: t
>      sgml-omittag: nil
>      sgml-always-quote-attributes: t
>      End:
> -->
>
> XML requires tags to be closed and attributes to be always quoted so this
> loses most if its utility and these comments just confuse people, who don't
> know what they mean. Indenting or any other specific option can be
> configured in the .emacs file. I propose dropping these comments.

I don't care too much about it confusing people, but if they loose
their point with XML it sounds like a sane enough reason to remove
them.

> 2, Relaxing character entity usage: To be able to read non-ASCII characters
> on ASCII-only systems, we have been using character entities, like &aacute;.
> But in CJK languages, Greek and Russian every character is non-ASCII so
> practically they cannot be used nor were they used. So they are only used in
> ISO-8859 encodings (except Greek, which is also from this family). In fact,
> displaying these Latin-based characters nowadays isn't that problematic any
> more. Furthermore, if you edit text in a given language then we can suppose
> that you understand the language so you know what you should see and you
> know how to configure your system if you don't see the desired result. As a
> result, these entities nowadays don't have any real advantage any more but
> they highly "pollute" the text and make it much harder to edit and read. One

I agree that the entities should generally not be used. I think we
should just switch to UTF-8 and charecterset wherever possible to
simplify it even more.

And on that note, kill the useless character-set part of all our
language directories which generate horrible paths with no additional
value.

> exception is using characters in a specific language that aren't present
> there, e.g. a non-English developer name in the English documentation, etc.

UTF-8 would fix that.

> So I propose for every translation to convert back entities to normal
> characters and only conserve those that aren't present in the given
> language. Abundance of character entities used to mean difficulties for new
> documentation people, especially for those who don't have that much IT
> background. This change would make the texts more natural.

Sounds good to me.

> 3, Preferring XML/XSLT over scripts: Some parts of the web, like the A-Z
> index and sitemap pages have their own format that is processed with shell
> scripts. It would be more consistent to use an XML data file with an XSLT
> stylesheet for this objective. It would give us more flexibility for further
> changes and would reduce the several different methods we use to generate
> things.

To a degree. IMO XSLT is a horrible language to work with unless you
are really used to it, and I suspect most people aren't compared to
normal scripting languages.

Using XML as the main format sounds fine with me, but only use XSLT if
it can be done short and sanely.

The more relevant part of this to fix IMO, is that both sitemap and
a-z indexes are horribly out of date / incomplete.

> 4, Stricter XHML: I don't propose going directly to XHTML Strict 1.0 but

Eh, why would you go to XHTML at all considering it's basically
deprecated in favor for HTML5 (yes, there is no standard for that, but
still..).

> there are very inconsistently marked up <hr/>'s, <table>'s, etc. I would
> like to make them more consistent and prefer CSS styling when applicable.

> There are also empty paragraphs used as line breaks, which should also be
> eliminated. This would give us a more consistent look and more
> structure-oriented webpage files.

I agree with that, but do be aware that there might be reasons for it
being done that at times... Ie, don't blindly convert without checking
the output.

> And after the migration, I plan:
>
> 5, Identifying obsolete webpages: There are moved pages both in the English
> pages and translations that only serve for redirection. These pages were
> moved a very long time ago so any interested party could update her
> bookmarks. I would like to remove these finally. On the other hand, there

I personally prefer not killing the redirects if possible, but they
could be done better at the HTTP level. If you can just generate a
list of redirects I can move add them at the HTTP layer.

> are leftovers in translations, i.e. pages that were removed from the English
> web but not from the translations. I would like to generate a list of them
> and send patches to translation projects to clean these up.

That also hints at the general problem of stale translated pages,
which can be much worse than not translated at all.

Do we ever check how out of date pages are currently?

-- 
Simon L. B. Nielsen



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAC8HS2E2ekMKJgY04qPrQGbEe_tPJ%2BHrf5_ToERptf0yawYoQA>