Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 23 Jan 2002 18:24:41 +0000 (GMT)
From:      "Philip M. Gollucci" <philip@sduwebship.student.umd.edu>
To:        <freebsd-questions@FreeBSD.ORG>
Subject:   MS+HTML -> Unix 
Message-ID:  <20020123182426.E63410-100000@sduwebship.student.umd.edu>

next in thread | raw e-mail | index | archive | help
Say I have a webpage where I want to offer people the ability to upload
either a .txt or a .html file.  Now these people basically are computer
illierate, and don't even konw that UNIX is different from Microsh$t.

At anyrate, they will use "Save as (HTML) from MSWord 97/2000, "Save as
(txt)", or worse yet, "Save as RTF".
Then upload that.

Big surprise it gets it really wrong basically meaning it doesn't format
correctly before or after they use the site in any Browser.
One file, tidy told me had over 300 errors and that was just with HTML4.01
not XHTML1.0.

Is there anyway I can on the fly take the messed up HTML file I get and
covert it to what they meant to give me.

Important cases :
  Parrell Columns not in a table
  Bullets
  <DIR> tags
  actually closing <u> tags so the whole page isn't underlined.

I've see the demoronizer port, but don't know that much about it, and I
don't think its quite what I want.

Basically I have to take html given me and make the html they mean.


Any Great Ideas


END
------------------------------------------------------------------------------
Philip M. Gollucci (p6m7g8) philip@p6m7g8.com 301.314.3118

Science, Discovery, & the Universe (UMCP)
        Webmaster & Webship Teacher
        URL: http://www.sdu.umd.edu

EJPress.com
        Database/PERL Programmer & System Admin
        URL : http://www.ejournalpress.com

Resume      : http://www.p6m7g8.com/resume.txt




To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-questions" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20020123182426.E63410-100000>