Democratic Underground Latest Greatest Lobby Journals Search Options Help Login
Google

Question: converting Word docs to HTML

Printer-friendly format Printer-friendly format
Printer-friendly format Email this thread to a friend
Printer-friendly format Bookmark this thread
Home » Discuss » DU Groups » Computers & Internet » Website, DB, & Software Developers Group Donate to DU
 
welshTerrier2 Donating Member (1000+ posts) Send PM | Profile | Ignore Wed Dec-22-04 10:27 PM
Original message
Question: converting Word docs to HTML
i have a user who provided a bunch of web content to me in the form of Word documents ... the web pages i'm building will have a bunch of navigational elements on them and will include the text from the Word docs as their main content ...

I tried using the "Save As" option in Word to save the document out as HTML ... it works but it embeds a whole lot of ugly tags in the code and it doesn't do the greatest job building the page ... it seems to have embedded extra lines between the paragraphs ... i removed these manually (using an HTML editor) but it's very time consuming and i have lots more pages to build like this ...

is there a better way to load Word docs onto a web page ???
Refresh | 0 Recommendations Printer Friendly | Permalink | Reply | Top
kcr Donating Member (1000+ posts) Send PM | Profile | Ignore Wed Dec-22-04 11:08 PM
Response to Original message
1. Well
Depending on the amount of formatting, saving them as text and wrapping them in pre tags might suffice. Other than that, I am not aware of any tools that convert Word docs to html. Try, if you haven't already, www.freshmeat.net.
Printer Friendly | Permalink | Reply | Top
 
welshTerrier2 Donating Member (1000+ posts) Send PM | Profile | Ignore Wed Dec-22-04 11:17 PM
Response to Reply #1
2. interesting ...
i started using the <pre> tags after dumping the word file into notepad ... this seems like one of those "you can't get there from here" scenarios ... everything works but nothing works well ...

i'll check out the link you provided ... thanks ...
Printer Friendly | Permalink | Reply | Top
 
Xithras Donating Member (1000+ posts) Send PM | Profile | Ignore Tue Dec-28-04 06:47 PM
Response to Original message
3. Do you have Dreamweaver available?
If not, I believe that you can download a 30-day demo from the Macromedia website. The simplest way to convert a Word doc to HTML is to open it in Word, choose Save As HTML, open the resulting HTML file in Dreamweaver, and choose "Clean Up Word HTML" from one of the options menus (can't remember which off the top of my head). Follow it up with the "Format HTML" command and you'll end up with a reasonably formatted webpage.

If that doesn't work for you, there are other tools like ClicktoConvert (http://www.clicktoconvert.com/Features/features.html), WordCleaner (http://www.wordcleaner.com/), or Doc2HTML (http://business.downloadatoz.com/cz-doc2htm/) which will do the same thing. I've never used them personally, but they all have demo's available so you can test them out to see if they'll satisfy your requirements.

Personally, I prefer to just cut and paste the Word text into a new HTML page and recreate it from scratch. It's less troublesome and time consuming, and I know in advance what the finished page will look like.
Printer Friendly | Permalink | Reply | Top
 
ixion Donating Member (1000+ posts) Send PM | Profile | Ignore Mon Jan-03-05 05:06 PM
Response to Original message
4. yeah, MS Word writes lousy HTML
I would pull the text out and format it yourself, if that's at all feasible.

Printer Friendly | Permalink | Reply | Top
 
welshTerrier2 Donating Member (1000+ posts) Send PM | Profile | Ignore Mon Jan-03-05 06:49 PM
Response to Reply #4
5. tell me about it ...
I would pull the text out and format it yourself, if that's at all feasible.

yup ... that's exactly what i did ... someone told me there are some third party tools that do a decent job converting Word docs ... but who's got time and money for that ...

a little cut, a little paste and we're done ...
Printer Friendly | Permalink | Reply | Top
 
danostuporstar Donating Member (147 posts) Send PM | Profile | Ignore Mon Jan-10-05 12:29 PM
Response to Original message
6. too late now it seems...
but tidy does a good job cleaning of Word's disgusting dialect of HTML

http://tidy.sourceforge.net/
Printer Friendly | Permalink | Reply | Top
 
welshTerrier2 Donating Member (1000+ posts) Send PM | Profile | Ignore Mon Jan-10-05 03:10 PM
Response to Reply #6
7. not too late ... thanks ...
i'll check it out ...

this might be a way for the user to provide content for the site in the future ...
Printer Friendly | Permalink | Reply | Top
 
demnan Donating Member (1000+ posts) Send PM | Profile | Ignore Fri Jan-14-05 10:47 AM
Response to Original message
8. I have to do this all the time
I use Dreamweaver because the point and click is quicker than hand coding. You can also use Dreamweaver to check links and ftp and other cool stuff.
Printer Friendly | Permalink | Reply | Top
 
WoodrowFan Donating Member (1000+ posts) Send PM | Profile | Ignore Thu Feb-03-05 09:08 AM
Response to Original message
9. I save it asd a TXT file
WITHOUT the line breaks.

But then, I'm just starting to learn Dreamweaver for work.
Printer Friendly | Permalink | Reply | Top
 
FormerDittoHead Donating Member (1000+ posts) Send PM | Profile | Ignore Mon Feb-07-05 11:45 AM
Response to Original message
10. Look for "Microsoft Office HTML Filter"...
off of the Microsoft website - strips most of the BS code out.

http://www.microsoft.com/downloads/details.aspx?FamilyID=209ADBEE-3FBD-482C-83B0-96FB79B74DED&displaylang=EN

I have to deal with one of these docs every month from a client who HEAVILY formats this table,but then wants it on their website. Takes me almost a whole hour to "unscrew" the formatting, starting with the free utility I refer to above.
Printer Friendly | Permalink | Reply | Top
 
FormerDittoHead Donating Member (1000+ posts) Send PM | Profile | Ignore Mon Feb-07-05 03:55 PM
Response to Reply #10
11. Follow up
I had to do that chore for my client today.

I found out that OPENOFFICE does a nice job of stripping away these codes if you simply "select all" and remove all the formatting accordingly.

The file sent to me was saved as a RTF, I don't know if that would make any difference, however, all those MS styles were still in there, but OpenOffice did save a few steps I ordinarily have to take.
Printer Friendly | Permalink | Reply | Top
 
DU AdBot (1000+ posts) Click to send private message to this author Click to view 
this author's profile Click to add 
this author to your buddy list Click to add 
this author to your Ignore list Sat May 04th 2024, 12:51 AM
Response to Original message
Advertisements [?]
 Top

Home » Discuss » DU Groups » Computers & Internet » Website, DB, & Software Developers Group Donate to DU

Powered by DCForum+ Version 1.1 Copyright 1997-2002 DCScripts.com
Software has been extensively modified by the DU administrators


Important Notices: By participating on this discussion board, visitors agree to abide by the rules outlined on our Rules page. Messages posted on the Democratic Underground Discussion Forums are the opinions of the individuals who post them, and do not necessarily represent the opinions of Democratic Underground, LLC.

Home  |  Discussion Forums  |  Journals |  Store  |  Donate

About DU  |  Contact Us  |  Privacy Policy

Got a message for Democratic Underground? Click here to send us a message.

© 2001 - 2011 Democratic Underground, LLC