Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 1 Apr 2008 20:05:25 -0700
From:      "Martin Fouts" <mfouts@danger.com>
To:        <freebsd-arch@freebsd.org>, <arch@freebsd.org>
Subject:   RE: Flash disks and FFS layout heuristics
Message-ID:  <B95CEC1093787C4DB3655EF330984818051D26@EXCHANGE.danger.com>
In-Reply-To: <200804020103.m3213JEt043506@apollo.backplane.com>
References:  <20080330231544.A96475@localhost> <200803310135.m2V1ZpiN018354@apollo.backplane.com> <B95CEC1093787C4DB3655EF330984818051D03@EXCHANGE.danger.com> <200803312125.29325.qpadla@gmail.com> <200803311915.m2VJFSoR027593@apollo.backplane.com> <B95CEC1093787C4DB3655EF330984818051D09@EXCHANGE.danger.com> <200803312006.m2VK6Aom028133@apollo.backplane.com> <B95CEC1093787C4DB3655EF330984818051D0A@EXCHANGE.danger.com> <200803312254.m2VMsPqZ029549@apollo.backplane.com> <B95CEC1093787C4DB3655EF330984818051D0D@EXCHANGE.danger.com> <200804011733.m31HXF6e039649@apollo.backplane.com> <B95CEC1093787C4DB3655EF330984818051D17@EXCHANGE.danger.com> <200804012014.m31KEvTJ041049@apollo.backplane.com> <B95CEC1093787C4DB3655EF330984818051D1E@EXCHANGE.danger.com> <200804012325.m31NPwM1042551@apollo.backplane.com> <B95CEC1093787C4DB3655EF330984818051D22@EXCHANGE.danger.com> <200804020103.m3213JEt043506@apollo.backplane.com>

next in thread | previous in thread | raw e-mail | index | archive | help

To summarize, so that it's all in one place:

1) NAND flash is sufficiently different than either NOR flash or
rotational media, that filesystem design optimizations aimed at either
NOR or rotational tend to be inefficient in NAND and NAND offers
opportunities for optimizations not present on either. It also presents
challenges that don't exist for NOR or rotational media.  In particular,
seek and rotational latency are not present, but bit error rate is high,
the size of the erase unit is large compared to the size of the write
unit, and the presence of extra storage in the spare area makes
optimizations possible that are not available in the other media, with
the caveat that small page NAND devices cannot take advantage of the
same degree of optimization as large page NAND devices

2) It is *possible* to use a flash translation layer to hide the
complexity of flash from a filesystem implementation, and commercial
file systems exist which do this, most notably the FATFS implementation
used on most NAND based USB device, on the M-Systems parts, and
commercially from Datalight.

3) It is not possible on consumer electronics "convergent" devices to
take advantage of the usual techniques available for performance
improvement through caching that is available on systems with relatively
large amounts of NAND. A CE device with an included NAND part does not
optimize in the same way as an SSD using NAND parts.

4) Power management on battery powered devices makes for different
optimization trade-offs than on wall-powered devices. Most notably, it
is often desirable to turn off power to RAM when the system is inactive,
which has a design impact on robustness and performance.

5) The reduction in BOM and the increase in performance due to
customized filesystem design has proven the usefulness of NAND-aware
filesystems, at least in the commercial marketplace.

6) There are good reasons for exposing transactional semantics to the
users of NAND file systems, having to do with robustness.

7) These are the well known approaches, with different strengths and
weaknesses, to NAND-aware file systems:
   A) File system completely unaware of NAND, FTL takes care of the
differences. This is used in USB devices, and has the advantage of being
able to support those devices as if they were FATFS devices without
changes to the host filesystem software. It has the disadvantage of
performance and robustness penalties due to the filesystem making
excessive writes to what it believes are fixed location datablocks.
   B) File systems aware of NAND, with an FTL. Datalight's RelianceFS
and FFX products combine to provide this sort of approach. The advantage
is that they tend to be much more robust than systems without the
knowledge and even have higher performance. The disadvantage is the
complexity of the translation layer, and the interfaces between it and
the filesystem layer and the device layer.
   C) File systems that manage the NAND directly without an FTL. These
fall into two camps:
      i) filesystems that treat NAND like NOR using a flash adaptation
layer. JFFS and JFFS2, combined with MTD are the canonical examples.
      ii) filesystems that optimize for NAND properties. YAFFS2 direct
is the canonical example.

Because NAND provides no guarenteed good block, the performance issues
with it are related to sensitivity to scan time to find state.

JFFS2 failed in this area because of the nature of its embedded b-tree
data structures, which are expensive to maintain robustly, difficult to
garbage collect, and prone to needing frequent scanning and rewriting.
It is conjectured that any filesystem which embeds a block renaming
scheme into NAND will suffer the same fate. I for one would be
interested in seeing a refutation of that conjecture, but there are now
four different projects which have attempted to do so with no luck that
I'm aware of. The issue is one of locality in the b-tree versus
robustness. Sufficiently frequent updates of the structure to NAND to
meet robustness requirements tend to put a great deal of write pressure
on the device, as well as frequent garbage collection.

At PalmSource, Mike Chen and myself took the NetBSD version of LFS and
modified it sufficiently to produce a working log-structured file system
that was used in the unshipped PalmOS Cobalt product. The conversion was
relatively easy, taking somewhat less than 1.5 man years, and the
resulting filesystem benchmarked favorably against other commercial
products, but never saw field trial, so robustness is indetrminanent. A
key to the modification was reducing the amount of state that had to be
read during mount scan to a single block per erase unit and to be very
careful about block selection for garbage collection.

Charles Manning had already taken that approach one step further, in
yaffs2, when he was able to reduce the amount of information needing
scanning to a single spare area per erase unit, greatly reducing the
mount scan time.

Both the modified LFS and YAFFS2 take advantage of other properties of
the NAND to reduce metadata write frequency and both relax timestamp
semantics to do so. YAFFS2 goes farther than we did by providing a
checkpoint facility which is used to further speed mount time and
reconstruction. Both take advantage of spare area writing to determine
write transaction completion.




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?B95CEC1093787C4DB3655EF330984818051D26>