Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 2 Mar 2014 23:07:02 +0400
From:      Dmitry Selyutin <ghostman.sd@gmail.com>
To:        =?UTF-8?Q?Edward_Tomasz_Napiera=C5=82a?= <trasz@freebsd.org>
Cc:        Jordan Hubbard <jkh@turbofuzz.com>, =?UTF-8?B?Pz91a2FzeiBXw7NqY2lr?= <lukasz.wojcik@zoho.com>, John-Mark Gurney <jmg@funkthat.com>, hackers@freebsd.org, =?UTF-8?Q?Fernando_Apestegu=C3=ADa?= <fernando.apesteguia@gmail.com>
Subject:   Re: GSoC proposal: Quirinus C library (qc)
Message-ID:  <CAMqzjet98Bo7b_ENBH_W6XoG5Fm6Z02sth2CDS60eVL1R1hc-Q@mail.gmail.com>

next in thread | raw e-mail | index | archive | help
Hello everyone!

In order to meet both GSoC requirements and to realize the most necessary
features first, I thought that it may be a good idea to start qc library
with core and i18n modules, since it seems to be a part where Open Source
suffers at most. But first I'd like to start with some general changes
inside the entire library. Then I'd like to send a letter (or may be two, I
guess) to describe both *core* and *i18n* modules.

1. The types that may be found in *core* module are unprefixed; the names
for the other types begin with module name followed by type name (e.g.
*qc_bytes*, but *qc_i18n_calendar*).

2. Each new type will receive a new macro that is used to initialize
variable on the stack. Such macro will be the uppercase form of the type
name, e.g.:
  qc_bytes encbuf = QC_BYTES; /* initialize new object */
I want to do it since I've realized that users sometimes need to allocate
type on the stack, not on the heap. qc_TYPE_release may be used to free
memory after object initialization or modification. So we will have
actually four general functions:
  void * qc_TYPE_construct(void);               /* allocate new TYPE on the
stack */
  void * qc_TYPE_replicate(void const*);        /* allocate new TYPE and
copy data from the old one */
  void qc_TYPE_destruct(void*);                 /* deallocate TYPE and all
its data */
  void qc_TYPE_release(void*);                  /* free memory occupied by
data of TYPE */
One may argue that it may be overkill to have qc_TYPE_construct while we
have qc_alloc and it is even worse to have both qc_TYPE_destruct and
qc_TYPE_release. The main reason why I do it is to provide general-type
array which will provide C++-like vectors. I'd like to discuss it; probably
qc_TYPE_replicate and qc_TYPE_release are enough to go.

3. Allocations and reallocations store size of allocated memory before
returned pointer (i.e. allocate `size` of bytes plus `sizeof(size_t)`
before). This was done to allow `qc_realloc` and `qc_crealloc` functions to
copy data immediately (plain `realloc` function doesn't copy data, just
allocates bigger buffer if necessary). Neither `qc_dealloc` nor
`qc_TYPE_dealloc` nor `qc_TYPE_release` shall set qc_errno variable.

4. Old null-terminated char and wchar_t strings shall be deprecated where
it is possible. I'd rather avoid them at all, since: 1) character may have
sign which seems to be absurd to me, at least on modern systems; 2) wchar_t
may have different size and doesn't imply neither UCS-4 nor UCS-2 nor
UTF-16. I see two solutions:
  A). Since it may be more habitual for some programmers and APIs to work
with null-terminated char strings, I've decided to leave some functions for
it, such as strcmp, stricmp, strlen, strdup, strchr, strrchr. There is
convention to distinct between different char types, so strcmp comes in
four flavours: qc_strcmp, qc_wstrcmp, qc_mbstrcmp, qc_ucstrcmp, where the
first works with char, the second with wchar_t and the last two work with
qc_byte and qc_ucs respectively.
  B). The only places where they may be actually in use are qc_TYPE_import
functions, where qc library implies that all data given is either
ASCII-encoded character sequence or properly formed Unicode. Strings in qc
library are only qc_bytes and qc_unicode.

I'd like to discuss this question too. I know that it may be convenient to
use old null-style strings; however, I rather think that it is the first
common mistake in string handling (the second is to use UTF-16 in APIs
extensively as it do Microsoft and a lot of libraries and languages like
ICU, Java, Python (until Py3K), etc.). See e.g. this discussion:
http://stackoverflow.com/questions/4418708/whats-the-rationale-for-null-terminated-strings.
To me it seems better to avoid null-terminated strings at all. No,
seriously.

This is what I wanted to write about some general thoughts about the future
of the library. What do you think about it? If some of you thinks about
this as mentor may think about project, I'd probably want to formulate my
proposal in some wiki or something similar where it is easier to edit and
review. It may be difficult to look though letters sometimes.

Thank you for your attention, and I'm looking forward to your letters.

-- 
With best regards,
Dmitry Selyutin



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAMqzjet98Bo7b_ENBH_W6XoG5Fm6Z02sth2CDS60eVL1R1hc-Q>