Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 27 Mar 1997 10:13:56 -0700 (MST)
From:      Terry Lambert <terry@lambert.org>
To:        rssh@cki.ipri.kiev.ua
Cc:        terry@lambert.org, leisner@sdsp.mc.xerox.com, msmith@atrad.adelaide.edu.au, johnp@lodgenet.com, se@freebsd.org, spaz@u.washington.edu, jkh@time.cdrom.com, hackers@freebsd.org
Subject:   Re: MSWord docs...
Message-ID:  <199703271713.KAA01589@phaeton.artisoft.com>
In-Reply-To: <333A6507.3FB7@cki.ipri.kiev.ua> from "Ruslan Shevchenko" at Mar 27, 97 03:16:06 pm

next in thread | previous in thread | raw e-mail | index | archive | help
> > Nothing public.  You can obtain documentation under NDA from
> > Microsoft, provided you agree not to implement anything useful
> > with the information (like a word processor).
> > 
> 
>  Hm, can I right to do a reingeneering of word file format,
> created with my word ?

You are only bound to not build a word processor if Microsoft tells
you the file format.  If you find out on your own, I think you are
free to do what you want (you can't copyright a file format, only
a document describing it or a program that implements it).

Microsoft may have attempted to patent the file format; I doubt it,
since GIF format is only in trouble because of the LZW technology
patents.

There may be similar patents, however, under Microsoft's belt, if
they have patented their tiny modifications to LZW77 for their
"compress/expand" technique, and if they use this technique on the
data stored in the files.

They may also have a patent on the encryption algorithm (a friend
of mine, while employed at Word Perfect, actually cracked their
encryption).


The MS-Word format is actually documented in:

	The File Formats Handbook
	Gunter Born
	International Thompson Computer Press
	ISBN 0-442-01995-5

But WinWord format (which is what we are really discussing here)
is not documented in the book, though some gross hints are given:

o	It's in three sections which are, in order, a header,
	text data, and formatting data

o	The header and format structure depend on the version of
	WinWord [1.0, 2.0, 6.0]

o	The total header size is 384 bytes

o	The text is stored as DOS ANSI

o	The first 36 bytes are:

	00	2	Signature (0x9BA5=1.0, 0x9DA5=2.0, 0xD0CF=6.0)
	02	2	version (major)
	04	2	version (minor)
	06	2	Language ID
	08	2	Next page number
	0A	1	Flags
	0B	1	Encryption (1=Yes)
	0C	6	Internal use (hah -- yeah, right)
	12	1	Platform (0=Windows, 1=Mac)
	13	1	Reserved
	14	2	Character set (0 = ANSI)
	16	2	Internal character set
	18	4	absolute offset 1st character of text
	1C	4	absolute offset end character of text + 1
	20	4	Offset to end of file
	...		Other file pointers

	I'd guess that most of the files following the header are
	what are called "Internal files" or are "Index files", and
	are probably stored in BTREE format, the same as the .HLP
	(Help) files.

I'm not really interested in hacking this out; I don't own a copy
of Word to use to generate test data sets of known content, and that's
probably prohibited in the license if I were to go buy a copy.


					Regards,
					Terry Lambert
					terry@lambert.org
---
Any opinions in this posting are my own and not those of my present
or previous employers.



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199703271713.KAA01589>