Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 25 Aug 1999 23:32:05 +0900
From:      Akira Wada <a-wada@mars.dti.ne.jp>
To:        Christopher Seiwald <seiwald@perforce.com>
Cc:        archie@whistle.com, vanderh@ecf.utoronto.ca, freebsd-hackers@FreeBSD.ORG
Subject:   Re: anybody love qsort.c?
Message-ID:  <199908251432.AA00055@a.mars.dti.ne.jp>
In-Reply-To: <199908230728.AAA24061@perforce.com>
References:  <199908230728.AAA24061@perforce.com>

next in thread | previous in thread | raw e-mail | index | archive | help

Christopher Seiwald wrote...
 >Archie's mod to qsort:
 >
 >|  >-	if (swap_cnt == 0) {  /* Switch to insertion sort */
 >|  >+	if (n <= 32 && swap_cnt == 0) {  /* Switch to insertion sort */
 >
 >As Akira Wada points out, this eliminates the benefit of the optimization
 >in the first place, which is to let isort take over if the data is largely
 >sorted in the first place.
 >
 >Akira Wada's alternative mod:
 >
 >| +       pl = (char *)a; pn = (char *)a + (n - 1) * es;
 >| +       while (pl < pn && cmp(pl, pl + es) <= 0) pl += es;
 >| +       if (pl >= pn) return;
 >
 >This is a little more comprehensive, but does throw in an extra pass
 >of comparisons, which (I'm sure) someone would complain about.
 >
 >The alteration that I've tried and tested is to have the isort bail
 >back to qsort if it does more than N swaps.  I put N at 1024, which
 >worked for my data :).    This is almost guaranteed to do no more work
 >than the current logic, because it it only gives up on the isort when
 >it has evidence that the isort is doing too much work.
 >
I think it would be difficult, not impossible, to decide the threshold
(Christopher Seiwald said, "# swaps >= 1024", and Tim Vanderhoek suggested,
"a ratio: #comparisons : # swaps") and to what point of qsort, to make
isort bail back, effectively, without troubles, and covering every situations.

I considered the cases (swap_cnt == 0) is true, and they could be
illustrated by figure [1],[2],[3] below.

[1] |            :************   ( 2 subarrays are randomized )
    |            :************   This is the worst case pointed out
    |            :************   by Christopher Seiwald, and so
    |            :************   isort has to work twice for random
  k |            :************   arrays of n/2 elements, although
  e +------------*------------   probability of this case might be
  y |************:               very very small in real life.
    |************:               Because of this low probability, 
    |************:               I guess, FreeBSD qsort had been left
    |************:               as it is, for about 10 years since the
    |************:               1'st issue, even though someone wuold
    +-------------+-----------   have been already aware of the thing.
                position

[2] |            :       *****  (sorted almost)
    |            :     *****    This is the case isort is expected to
    |     (X)    :   *****      work efficiently, but if any elements
    |            : *****        exist in region (X) and/or (Y),
  k |            :****          "swap_cnt" could not detect the advantages.
  e +------------*------------  And probability (X) and (Y) are empty,
  y |        ****:              might be very small, at least, in the
    |      ***** :              practical applications. This case might
    |    *****   :    (Y)       occur when a lower significant key is 
    |  *****     :              added for the dataset sorted already
    |*****       :              with some higher keys (same keys exsist),    
+------------+------------      etc..
              position

[3] |            :         *   (sorted completely)
    |            :       *     Therefor, usefulness of "swap_cnt" is
    |            :     *       guaranteed only for this case.
    |            :   *         And so, it would be worth while to detect
  k |            : *           the array sorted completely, without risk
  e +------------*------------ to degenerate, instead of "swap_cnt and isort", 
  y |          * :             and the test loop is not so expensive for 
    |        *   :             usual arrangements because of breaking
    |      *     :             out from loop at the 1'st pair out of order,
    |   *        :             and at worst n - 2 comparisons.
    | *          :
    +------------+------------
              position

Then an improvement would be :
  (1) delete the switching to "isort by swap_cnt",
     and add the routine detecting sorted completely.
  (2) adding (1), make the partitioning logic symmetrical,
     for dataset sorted in reversed order to be processed efficiently,
     e.g. malloc(size) for pivot instead of "swap(a, pm)"
     and some modifications required by this.

By the way, I am doubting yet the efficiency of isort for small subarrays
in quicksort, so I'd made some experiments for this and the results at ;
   http://www.mars.dti.ne.jp/~a-wada/qsortlib/ccn/qsortins.c
   http://www.mars.dti.ne.jp/~a-wada/qsortlib/ccn/qsortins.txt
so that I didn't use isort in my qsorts.c at my 1'st post,
and in the other codes for practical usages, at ;
   http://www.mars.dti.ne.jp/~a-wada/qsortlib.html
--
Akira Wada



To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199908251432.AA00055>