* right aligned some #defines * finish adding version var, use purify to find problems with adding entries to TT table (debug only i believe) * modify justification so as to call wvExpand again to get the full string * create an abiword config, got document start and finish and paragraph start and finish working as well. * we can now output good html and abiword format docs with basic paragraph alignment, yippee. * converted most of the U8 name:s to U32 name:s (non critical), i never knew that using anything less that an int was not technically correct, well what d'ya know, some other minor stylistic changes. * wrote tiny stub of an abiword importer. * modify OLEdecode to take a FILE * rather than a filename, * standardized ret codes from OLEdecode. * added an error explanation table. Changes to to 0.5.17 * added clx.c, pcd.c, prm.c * clx.c is the successor to piecetable.c, * debuged clx * added GetPHE,fkp.c,bte.c,bx.c * debugged decode.c, all ok now. * paragraph begin and end marks now found for full saved files. * added codepage-1252.c, iso-5589-15.c & text.c if you want to add your own fontencoding conversion do... 1) add the language name to the charsets enum in wv.h 2) create a function like wvConvert1252Toiso8859_15 which converts cp1252 into your language 3) add to text.c in wvOutputFromCP1252 an extra case statement to call wvConvert1252To[YourEncoding] if outputtype == YourEncoding 4 create a function like U16 wvConvertUnicodeToiso8859_15 which converts unicode into your language. 5) add to text.c in wvOutputFromUnicode an extra case statement to call wvConvertUnicodeTo[YourEncoding] if outputtype == YourEncoding Be warned that converting from unicode to your language, which is the most likely scenario will only work out correctly if the unicode actually maps to your charset, so obviously converting unicode that was japanese characters into russian koi-8 is only go to give a page of ?, so watch out for that. Later on i'll add in some ability to check the language. * added wvSimpleCLX program which determines if a file is complex (fast-save) or simple (full-save) * basic character handling, converted windows "compressed unicode" into html as far as possible. * fixed size mistake in PCD PLCF. * tested wvSimpleCLX on all word docs, made a mod or two to the ole code to avoid segfaults identified by the test. * moved decode to decode_simple * added decode_complex * debugged the decode_complex para begin code, and extended to find the para end, though this might be a little wrong, but we'll see. * added the wvText program, primarily for testing the new mechanisms, but it can be a useful program in its own right to get the main document text from a word document in its raw form, obviously its not going to handle tables and any kind of complex word artifact, only the text in the correct order. Which considering the whole complex file format question makes still makes it a very sophisticated little program. * wvSummary bugfix. * debugged wvText so that it doesn't crash on any of the 3735 sample files. * added ability to text code to remove field codes, and just output the previous results of the fields. * added some changes to the error output code, now use wvTrace to output debugging messages, its a macro that will dissappear when compiled normally, unlike the old sillier mechanism. * changed the FKP code to pull in the total data * created wvAssembleSimplePAP * release the FKP on each cycle in the decode_simple * fixed a few sprms from doc investigation that were wrong or dodgy in the spec. * stupid bug in EatSprm. * debugged wvAssembleSimplePAP and FKP code for crashes. * fixed bugs in sprm.c and numrm.c, changed a few constants to the cb equivalents. * applied the PAPX to the PAP correctly (simple mode, i havent even tried complex yet). * confirmed that code does the right thing, and gets the right properties for the simple pap. * reran checks. * create a test with wvHtml to output some of the interesting paragraph properties in the correct place. * added expat the xml parser to the tree, im going to use xml for my config file, which may or maynot be a good idea, but seeing as my lex code created *such* problems on different implentations i'm well and truly sick of it, so im going to try xml instead. * reran autoconf with the latest version * wvConfig changes... 1) created a release for the config list table 2) malloced correctly 3) created an append for <title/> 4) pass the userData into wvConfig.c 5) convert main into orinary call 6) moved wvText to wvConvert, and make wvText a link Changes to to 0.5.16 * added anld.c, changed over from old ANLD to new ANLD. added wvGetANLD and wvGetANLDFromBucket. * cleaned up bad chp entries. allowedfont removed, may cause problems in the future. * added some stylesheet definitions. * trivally added version.c,and modified it to become wv rather than mswordview. * added wvGetSTSHI,wvGetSTD,wvReleaseSTD,wvGetSTSH,wvReleaseSTSH * short tests show that the new stylesheet code appears stable. * added dcs.c, shd.c , numrm.c, asumy.c * defined TAP, TLP, and TC and PAP * added lspd.c,phe.c,tlp.c,tc.c,tap.c * added InitPAP, and all dependancies, for istdNIL stylesheet. * addded ANLV,OLST,SEP * ive completed the new set of PAP sprm handlers and support, this consists of wvGetSprmFromU16,wvEatSprm,wvApplySprmFromBucket,and a myriad of wvApplysprm* functions, with the exception of one or two old sprms that have no documentation, and the hugesprm, which ive left until i get an example of it. * added wvCopyCHP, & wvAddCHPXFromBucket, and most of CHP in sprm handling. * added wvApplysprmCMajority + wvApplysprmCMajority50, but i really don't like the look of them, im very unsure as to whether or not they are right. * finished CHP in sprm code * confirmed correct para style basics, started into char style code. * complex merged CHPX done, only found one trivial example so far, so uncertain as to if it works. * modified wvEatSprm to ret the len. * modified wvEatSprm to handle the three special len cases in it as well. * got wvReleaseSTSH to release its grupe's and sub components as well. * temporarily nailed new stylesheet struct in as part of the old one, so that i can experiment with the new one in conjunction with the old one. Changes to to 0.5.15 * made yet more changes to the configure script, maybe itll all be in the right order now (hah i doubt it!) * added wvWideStrToMB,wvGetFontnameFromCode * added small patch from Barry D Benowitz <b.benowitz@telesciences.com> who noted an uninitialized pointer. * fixed a bug where a $ showing up in a title would shaft the whole thing. * fixed the default value for the html font string, unlikely to have ever been noticed. * a parser.lex and man page fix from garyjohn@spk.hp.com * removed references to the ffn struct, and replace with the appropiate FFN ones. * added fld.c, wvGetFLD, wvGetFLD_PLCF, wvWarning, wvFree. * added wvGetDOP, wvGetDTTM , wvCreateDTTM,wvGetCOPTS,wvGetDOPTYPOGRAPHY, wvGetDOGRID, wvGetASUMYI & dttm.c. * modified dop.c with new interface. * added wvGetSTTBF, wvGetBKF_PLCF,wvGetBKF, bkf.c, sttbf.c * added xst.c,fspa.c. Modified wvWhichTableStream, added wvGetFSPA, wvGetFSPA_PLCF wvGetXst,wvFreeXst. * correct STTBF handling, and sorted out decode_bookmarks ala new form. * added lex problems to the install file/faq. * added lfo.c, lst.c, lvl.c,wvGetLSTF,wvGetLSTF_PLCF,wvGetLVLF,wvGetLVL, wvReleaseLVL wvGetLST,wvReleaseLST,wvGetLFO,wvGetLFO_PLF,wvGetLFOLVL, wvGetLFO_records & wvReleaseLFO_records. Which are all to do with parsing lists, which is possibly the second most complex part of word documents to understand. (the first being fastsaved of course). * added wvSearchLST, began converting list code over to new cleaner "by the spec" code. * wvGetListInfo will probably be the workhorse function which will sort out lists given a correct pap. * added the slightly silly ordinal.c file along with nfc.c. * changed references to mswordview.h to wv.h, to get the changeover moving. * ok, i can currently get a lot of the simple list stuff correct the new way. * most of the list string is now done, as is the nfc and starting position. * added a another entry to the list stuff, to keep track of the current no for the list entry, would work for at least simple lists. * figured out how to correlate the appropiate lfolvl with the correct lfo. * i now use the linked character and paragraph properties linked to the list text. * the new list code is now integrated into the code, but it still is new and probably flaky. I'll do bug testing and so on and work that out in a short while. Changes to to 0.5.14 * i have to make changes to the configure script to link -lXpm in the correct place. * scream, i had to put back in part of the signal configure script, bear with be, why does *everything* work on my machine but nowhere else :-), Changes to to 0.5.13 * a mad person reports that it can be compiled under vms !, im awaiting patches. * changed doc version testing to the knowledge base article on the matter. * removed duplicate fib code from mswordview.c * added wvGetEmpty_PLCF,wvGetFRD,wvGetFRD_PLCF. * added wvGetFFN,wvGetFFN_STTBF,wvReleaseFFN_STTBF,wvGetFONTSIGNATURE & wvGetPANOSE. * removed the reinstall handlers from the configure script, that should sort out the configure problems on some systems, irix in point. Changes to to 0.5.12 * patch from Cliff Miller <cbm@research.bell-labs.com> to fix TTF_CFLAGS in configure and Makefile. * small bug with ending tables. Seeing as you cant place text tags like bold and italic between cell elements in html and expect them to do the right thing, you have to do a little dance where character properties are stopped and restarted for each character cell. I had forgotten to reenable the ordinary nontable mechanism immediately after the end of the table. Changes to to 0.5.11 * we now extract the document title and display it in the title field, using the default config. * add bold and italic element handling, you can change these html tags to you hearts content now. * I confirmed that $title works fine. * I ported over Somar Software's summaryinfo stream stuff, so now wvSummary can print the title and last saved date of an ole document according to the summaryinfo stream. * added bit shifting to awk script. * added warning for duplicate offset in script. * i have a spiffy logo. * added more stuff to the summary into thing, it might very well be complete, the previews of summary info are stored as a wmf file, so in conjunction with libwmf you can get all of this. * added a wv-incconfig and wv-libconfig and installed the appropiate include and lib files, so as to start making the process of using mswordview as a lib more possible. this still needs quite a bit of work. * allowed optional sections in element string, use [] for them. * worked font config into the main code. * bw wanted and got ... 1 $title fix 2 element support (bold&italic&font) 3 --configfile switch * fixed an amazingly stupid bug that crept in with the introduction of wvGetFIB. * noticed that new doc start code wasnt occuring in fastsaved files. * aaaaaagh!!!, i had forgotten to munge the wierd long offsets into their correct halved form, no wonder so much wierdness crept into fast saved files, its amazing how well it worked nonetheless, this should at the least make parsing fastsaved files with tables much shorter!. Changes to to 0.5.10 * added document header and footers to the config file. * addded pixels per twip to the config file. * allowed " as part of a string if escaped. * added code to use the beginning and ending tags. * allowed multiline strings in config file. * use the two twip values. Changes to to 0.5.9 * i never reran autoconf ! * added a patch i got ages ago and forgot to add dos/windows support for .exe extension to the configure thing * added some deep magic to blip handling. * addded check for wmf record sizes < 3 in libwmf. * fixed BSE record to eat empty space, and resync. * fixed Makefile.in in oledecod dir. * much purify related thingies found. * remove last bug to fix last buggy file of current run. Changes to to 0.5.8 * blip code changed, new one looks much better. * would you believe that i was always one out when decoding styles, great bullet proof code though :-), it kept on trucking and resynced itself with the data again for the most part, that bug must be in there for months at this stage ! * new blip code now in operation, appears to do at least the old blip codes functionality for 0x08 blips, how did i get 0x01 blips ? * made configure script get heroic when searching for components, checks for for includes and libs both below a --with-stuff dir, and also inside it as well. * finished 0x01, checked offsets. * had to add guessing code to figure out whether to use a delay_stream or not. * allow resized images (well let netscape do it) for 0x01 graphics. * tested wmf's with text with readonly font dir, no problem there. Changes to to 0.5.7 * fixed bug that causes crashes on tables. Changes to to 0.5.6 * variable handling, add a subst function that substitutes real things for variables in the config file. * updated my homepage, god i love the gimp. All i have to do to change the graphics on my page is to load a different set of text files to the scheme interpreter in the gimp and ta-da out pops my new pages, in the bad old days i'd have been at it for days. * have a mechanism to expand variables in place, only recogonized variable is patterndir, will have more later of course :-). * some magic dohickying to get the libz in /usr/lib to be tested before ending up with the possibly crap one that some systems stick in /usr/X11R6/lib. * do a for loop to install the graphics now, should sort out some people;s broken install scripts, gagh! * cleaned up config file with purify, all systems are go for first public release with basic config file support. * remembered to add ttf support to mswordview as well. * added support for variables in the lex code. * fixed zlib configure script again. Changes to to 0.5.5 * added in support for an external config file. The external file allows a start and end to a style to be user defined, i.e h1 for the start of a heading 1 style. Its possible to disable or enable handling of bold, italic and font size/face changes inside of a style, this is only started now, so its far from finished. Please *dont* use this file for the moment, im working on it. * this is an interim release to fix the configure script problem that i had, and to add to the documentation as to the libwmf stuff. Changes up to 0.5.4 * well now, ive been away for a while working on libwmf, which is now complete enough to use. download it from http://www.csn.ul.ie/~caolan/docs/libwmf.html, and install it and run mswordview's configure and compile and ta-da, mswordview can now handle wmf files. * added a fallback from a failure to find -lz to -lgz, a problem on SuSE linux im told. * found that old redhat's appear to have a libz in the X lib dir, that is old and crappy and doesnt link to my thing, didnt put in a word around, but mentioned it in the documentation. * created file with h1 to h9, verfied that the lex code and so on works together fine with mswordview. Changes up to 0.5.3 * begun adding all fields to structures, and marking them implemented or not. * strikethrough and underline for revision text * found the bounds of the comment in the main document, i put a name tags on them, and place comment begin and end graphics around them, at this stage remember that the -a option to remove comments exists, as even one comment in a doc can make the whole thing pretty unreadable :-), but the support is in there if you need it. * revisions are given underline for added text, and the strikethrough color for deleted text the same as word does it. * begin and end for deleted and added revision text is shown with graphic tags, added a -r --norevisions option to ignore that stuff. * names for revision text * put revisions authors names in yellow text. * i dont even *pretend* that im outputting good html btw, just working html under netscape. once everything is working i might go back through and work out correctly the dependancies between all the html outputting code, that'll be part of the overall cleanup im doing to make this modular enough to be used with abiword as a word97 importer. * time and date of the revisions are included as well. * think that ive completed revision text, but i need more tests before ill be sure. * in comments theres always a pagenum field that word itself doesnt show in comments, so ive stuck code in that disables this field if its at the beginning of a comment, also verified that comments work in fastsaved mode, though what is the story with that page number in annotations, hmm its bothering me somewhat. * titchy bug where i included the wrong end of comment graphic. * put square brackets around comment links, i believe this completes comment support. * titchy bug in the time field for revisions. * properties of text that change during a revision are listed as well. * found the location of what sets the footnote & endnote styles of numbering and other settings for endnotes and footnotes in the DOP, there were missing from the copy that www.wotsit.org has, ive sent them the added section. * extracted the DOP fully. * footnotes and endnotes now get the correct formatting of the numbers, i.e lettered, roman or arabic etc, damn missing page of the spec, i was searching for that for ages. * i have some old code that gives the correct starting point for endnotes and footnotes so im leaving it in for now, but i can now use the DOP instead for this info. * endnotes should now be put either at the end of the doc, or at the end of the section depending on what word does, needs testing. Changes up to 0.5.2 * implemented auto text color colour check for table cells, no more black on black, or black on blue. i must look closely at what other auto changes word makes, and where else i might have to put that code. * some uber-simple greyscaling code when table look says no-color. * verified it works under AIX, made a few changes that showed up due to its stricter malloc, theres probably a few more malloc related issues hiding in there. * column breaks show up as well now. * the various types of section breaks are distinguisable from the others, and from page breaks. * a few changes to make sure formatting and tables get on better together. * sequence field supported, i.e caption numbering, i just use the last fields that msword left in there. * changed hyperlinking so that it works with bookmarks that are in comments (annotations). * i now support multiple bookmarks that end on the same location. * multiple bookmarks that start on the same location should be supported, but no examples yet. * the comment author initials are extracted and used in the main document when referencing comments. * comments now end when they are supposed to, only the correct comments get included, should work for fastsave, not tested. * removed unused variables, sorted out a few other warnings, maybe itll squeak by the irix compiler now ? * names and initial info for comments is extracted as well, and stuck in a table at the end of the document. * fixed the <a name= for comments, should work in fast saved. * custom graphics for annotations. Changes up to 0.5.1 * forgot to change the version no in the source. * damn sunsite broke connection half way through uploading. Changes up to 0.5.0 * Martin Kalms <kalms@lysator.liu.se>, configure fix for sunos 4.1 in relation to strerror. * added option where you can ignore table widths. * custom graphics for comments. * endnote autonumbering now works, now defaults in roman numerals. * fast save footnote problem fixed, though i think things might be even more complex that i thought, so keep an eye on that area. * footnotes are in a colour of their own. * symbols as footnotes, required a change to the 4a30 sprm that might fix a few other char formatting issues. * restarting footnotes on each page, and each section works, this is encoded in the the number itself it appears, a href and a name, and some invalid html code fixed in the footnote area as well, footnotes are now in a colour of their own *but* the location of whatever sets the footnote & endnote styles of numbering is unknown, i havent figured it out. * all endnotes are listed at the end of the section rather than optionally at the end of the document, i dont know how this is done, doesnt appear documented. * textmarks / bookmarks and explicit hyperlinking supported, bugs in old code removed hopefully and internal hyperlinks put in via insert hyperlink are supported. * support for bookmarks, i.e they are converted to <a name>[text]</a> html code. * converted cross-referenced textmarks/bookmarks into hyperlinks. * wmf files can now be decompressed thanks to peter.brandstrom@ericsson.com now i need a wmf --> something useful converter. i see that theres a new one available off the gimp plugin page, with some uberhacking it might do the trick, the notes/wmf dir has a goodly chunk of info on the format if anyone wants to do it for me. * when bookmarks are embedded in bookmarks something odd appears to occur, but nonetheless the ms save as html does the same, so im assuming that its ok * added bookmark support to fastsaved, should work fine, not tested. * pagebreak gifs are correctly centered if the next para is a centered etc one. * author field supported. * proper positioning of page numbers, general layout of headers appears to be fine, except that tab stops are used in headers to center, left and right align headers, which doesnt work so well in html mode. * added defensive code to some sort of list bug. * mimic strike-through and double st by setting the text color to either #ed32ff or #ff7332 * disallow height commands inside tables, as the model of paragraph heights doesnt fit well with the architecture for tables, so im ignoring them in tables, hopefully noone will notice :-) * fixed a small bug in sprm which was causing errors later in lists. * tables and paragraph formatting were misaligned across td boundries. so now i clear specials and fonts on entry to a table, and on exit of each cell, hopefully i broke nothing else on doing so. * at least one really bad conversion with a file called RESUME.doc, but in my defence i looked at the msword conversion of this to html, and its just as buggered up so rasp ;-P * added credits file * found problem in decompress code, i didnt make it good enough for real world usage, i now use mmapping so make my life easier, dont know if this is fully portable, works on linux and solaris. * oledecod had bugs on cleanup, so sent filters group wmf.doc and Contribu.doc to demo the problems. * i now use oledecod 0.0.4 which fixes cleanup problems, but Contribu.doc style problems continue, they return 5 but laola can extract the streams nonetheless while oledecode cannot, i modified the original laolareplace.c to handle this as well. * oledecod 0.0.4 has a bug in relation to 1812bb.doc, laolareplace.old.c hasnt this bug, so im back to using that again. * those ffffffff's in lists that haunted me in earlier releases are *back* grrrrr!!, anyway ive another massive nasty workaround that im using that hasnt crashed any docs, and appears to do the right thing, at least in propos~s.doc * wmf decompression code changed to use mmap, replaces the original code that ate memory, if mmapping doesnt work try looking at the zlib docs and change the code to fixed buffer incremental decompression. * added a bailout to ignore encrypted documents, wonder how id decrypt them if i had the correct password, anyone know ? * added a bug fix for crossreference parsing. * beginnings of tables of contents included, doesnt always work yet. * bug where if the word file ends on a table, the table wasnt closed off is fixed. * bug where non built in graphic types were causing hangs. * im now often happily (if slowly) converting 90 and 100 page documents, the only thing i really am unhappy with is table handling, which is also one of the reasons the conversion is *soooo* slow sometimes, the other reason is those godforsaken fastsaved files. * fixed some other mem related bugs, converted sucessfully the last two problem docs without crashes. * table looks are somewhat supported, though theres no support for last row and last column different from the rest of the cells as of yet, this will have to wait until multi pass on tables is implemented. * the foregrounds and character attributes in general for tables appear to always set correctly in general, but i believe i have to look into how the "auto" text color selects is final colour, as ive been assuming that it gets set to black, which is a fairly valid assumption most of the time, but not always, so a few docs will have black text on black backgrounds in table cells, but the situation is much improved. * ran purify over mswordview, removed a load of dodgy code out of it, theres still a bug or two hiding in the list code, which i belive is the reason that lists are sometimes missing in complex documents, e.g meeting.doc i think i love purify, its the bees knees. * dib's are now extracted as well, though i dont do anything with them yet, this fixes yet more crashes. * fixed laolareplace.old.c, which is the version im going to use for this release, to work on 64bit platforms, a few longs had crept into the code there which shagged the whole thing up. I havent done extensive tests on 64bit yet, but im confident that itll work. * fixed defines to make it work if theres no zlib present. * no crashes after running mswordview on 300 megs of uploaded files. * good enough to upload to sunsite, version number reflects this. changes up to 0.4.9 --This is an interim release while im in scotland until later this november-- added features are that the gateway is included, endnotes are supported, pagebreaks that split tables are supported and some more bugs are fixed, especially in relation to graphics. * added -o - option to gateway, like i should have about 4 releases ago. * fixed graphics again, forgot to reset the extra amount that some have before the graphic data begins, means more jpgs and pngs should work. * endnote text done in simple saved * cleaned up beginning whitespace from footnotes/endnotes/comments. * endnotes in complex mode is in, needs testing. * changed url code to match the other field code, fixes a big bug there. * header and footer colours were wrong again, fixed. * indent drift is fixed again, moved do_indent into decode_?_specials * pagebreaks can occur in the middle of a table, this sort of confusion is fixed for full saved files, and is probably fixed for fastsaved files * pagebreaks now look like they occur after footers,footnotes and endnotes. * custom graphics replace <hr>'s as there were too many of them at the bottom of a page to figure out what was what. * custom graphics for footnotes, and comments changes up to 0.4.8 * this has a slew of bug fixes related to graphics and a new option to put images in a certain directory * fixed f006 code in blip handling, removing a slew of hangs. * ignore every graphic that isnt an understood type, removes hangs. * figured out when theres an extra 16 bytes to delete from the beginning of a blit, and where one of my magical 17s were coming from * got a bug fix off Harry Shamansky (shamansky@adinc.com) as to why the default make wouldnt work under irix. * the current spid handling was mismatching spids and the graphics involved. * i cant handle forms, or ole data, so ive added a check to avoid doing them, removes crashes. * also ive added some other code to watch out for unsupported graphic features. * msword can include wmf and emf files, these are stored in compressed form, using lz encoding in a fashion supposedly compatable with the zlib library, but i havent been able to decompress them yet and even if i could i dont know of any source to convert wmf/emf files to anything usable under linux * ive changed blip handling, so that it works better, well i believe its more crash resitant, but im still not 100% happy with 0x01 handling. * if you insert a bmp via insert->picture->from file, it appears to be converted to png for you, handy. * paragraph indentation is back in, lists and table were confusing the indentation code. * fixed titchy bug so that space at beginning of lists isnt underlined. * support paragraphs whose first lines indentation is greater that the rest of it * support vertical space between paragraphs. * sorted out end_para for the first paragraph found in complex mode, i think i have it right now, in passing i reckon a load of those pap searches in complex mode are unneeded, but i dont want to rock a working boat, if it aint broke dont fix it as an uncle of mine used say, though we did seem to spend an awful amount of time panically fixing things that broke dramaticlly after years of neglect. * finally settled on dirs for left indentation, blockquotes indent from both sides automatically * added an option to put graphics in a specified dir. * added an option to find the graphics at a specified url. * updated man page. * made another change to blip handling, fixes some problems. changes up to 0.4.7 * warning !, in this release mswordview no longer outputs by default to the screen. use -o - for this behaviour. This is an interim release to reassure people that im still working on it, its got quite a few new features and bug fixes since 0.4.4 read down for them all. * implemented tabbing with trans gif, optionally use hardspaces or dont do it at all. * added some support for borders such that the vertical space between paragraphs due to width of borders is retained through the use of vertical trans gif space. changes up to 0.4.6 * indentation of paragraphs dithered to <blockquote>'s is out again as it its doing strange things on long complicated documents. * table cell shading done, fully supported i believe. * drew all the available table patterns in all available colors, made small transparent gifs out of them, if someone wants to do better copies of the ms ones go ahead, use the convert.sh script in the patterns dir to generate pics in all necessary colors. * text color support is in * word underline, which iswhere whitespace isnt underlined is supported. * courier as an alternative to courier new, times alternative to times new roman font face, helvetica as an alternative for everything else. * all caps supported, Small caps supported, though i want full tests of those two babies in all modes. Similiar to the fontfaces these two babies are only supported in ascii languages, as i dont really know how to convert utf-8 unicode into upper case ! * text animations supported by converting them to blink :-) features-examples dir added, supported-font-features.doc has what i believe is all the font features that word supports demonstrated in it. id be happy to have omissions noted, mswordview now supports 1) font size 2) colored text, (in headers and footers as well) 3) font face in ascii based languages 4) underline, including word underline, where whitespace is nt underlined 5) super and sub script 6) All caps and small caps (ascii based languages only) 7) text animations dithered to blink tag mswordview doesnt support due to html limitations (at least i dont think i can do them) strikethrough,double strikethrough,shadowed and outlined text, embossed or engraved text. "hidden text" is shown, coz i dont know the purpose of it yet all caps, small caps and font face for non ascii languages. character spacing * centralized pap initialization code * fixed a crash causing blip bug * fixed a crash due to sep sprms showing up in a papx !!, i ignored them im sure that will bite me hard in the future, but ive documented it here so i wont forget. - Problem: now we have a problem with paragraph properties which is only making a difference now that i want to use the paragraph justification codes. there exist pieces which have fc's greater than the maximum one listed in the plcfbtePapx !, ive been pushing them around for the last 2 days to no avail, im beginning to think that maybe this means that they have no native formatting of their own, the catch is to find the paragaph that they belong to, the spec says to find that by taking the smallest fc in fkp tables that is bigger than the current fc, but there *is none* thats bigger. my thought is to remember if this piece is the beginning of a paragraph mark and if not inherit the previous piece's formatting, and keep going backward until we get one. If it is then either im supposed to default to a new one or go forward to find one. + Solution: Ah-ha i believe i have it, + firstly varient 1 gpprls have to be supported, and i had some offsetting in them wrong + secondly i had a very subtle bug where i changed the value of the avalrgfc, from when i didnt know why sometimes they were +400000000, of course i now use it to determine if the end of the piece if twice the distance of its reported character len of not, and with the val reset i ocassionally had the piece recorded as being too long, so the paragraph properties of the wrong paragraph were being used. * added is paragraph formatting information, supported well is 1) centering, center 2) right justification , div align=right * made a closing paragraph thing like the closing chp for the blurb at the bottom to avoid having the version info centered of justified. * 0x01 fSpec graphics are now supported in addition to 0x08 graphics while both of these are draw objects, only non-vector graphics are supported, and only partial support of those i.e png and jpg. as with the 0x08 graphics theres a lot of magic emperically derived offsets being used to put it together, so dont be too surprised at getting corrupt images. though i *have* fixed a bug in png handling i believe for 0x08 graphic which was the previous subset i supported. changes up to 0.4.5 * i now open graphic and doc files in binary mode to support platforms where this makes a difference. * replaced laola, perl no longer required, thanks to the mighty Andrew Scriven who replaced the OLE functionality i needed with C * got a bug fix off above to handle files with more blocks * optional support for fontface if the text if an ascii based one, i.e if were guaranteed that this is a western european language then we do font faces, fastsaves will probably confuse this test and mean we wont get faces even when we can handle them correctly. * changed indent method for outline lists to multiple hard spaces, rather than <dir>'s, in the future ill make an optional proper html conversion, but it wont look like the original, so its a TO-DO. * indentation of paragraphs dithered to <blockquote>'s is in, alpha support. * absolute width and height of tables is in as well. * i now default to outputting to a file whose name is the same as the input file, with .html appended. graphics are output to the files with the same prefix as the .html file. use -o - to output to stdio * new ole code was broken on a few files ( 1 :-) ), fixed this. changes up to 0.4.4 * a good few bug reports in, crashes and what not, i got the use of purify on a sun box (thanks to martin mellody et al) and sorted out *all* the uninitilized mem reads there, (3000 of them in the course of a typical conversion!!), it still leaks memory like a sieve but thats not important for mswordview, though i will sort that out. purify is a wonderful piece of work i have to say. * changed ffffffff handling for lists, i think it means that the list in question isnt actually there, so to skip it. * changed blockquotes to dir, looks neater and word itself does it, biggest software company in the world cant be wrong, can it ? :-) changes up to 0.4.3 * oops, i shafted the inclusion of getopt for systems that need it. changes up to 0.4.2 * fixed broken simple mode footnotes (doh!) * fixed bug in blip where having drawings where none of them was a picture caused a crash changes up to 0.4.1 * did some tweaking to remove a crash. changes up to 0.4.0 * and big breaking news, preliminary graphic support is now in!! yes, gifs/pngs/jpgs added to a document through the insert->picture->from file mechanism now convert correctly. They are stored in the office draw format which ive just cracked the rough layout of. (through the handy ms spec on the msdn site), graphic support is messy for now, as the files are generated in the cwd of mswordview and named graphic*mswv.*, ill tidy it up later, this news is too good to not get an announcement. changes up to 0.3.0 * added -m --mainonly option if you dont want headers and footers. * added a few more places to look for lls-mswordview search order is now 1 in the path. 2 the same dir as lls was run from if ran absolutely. 3 the current dir. 4 a dir called laola off the absolute path. 5 a dir called laola off the current dir. but stuff line ../../mswordview isnt in there though, coz folk should just put lls-mswordview into their path dammit! * diffent numbering formats for pagenumbering is in, a vs i vs 1 etc. * gpprls for sep's work now, complex sections are in. * found some strange code in clx_headers and clx_footers so i blew it away. * section support in for simple saved files. * sections that restart pagenumbering work now. * sections that have no footers/headers at the beginning work now. * complex support for sections is in as well, should work hopefully needs extensive testing. * TO-DO text color, eventually font faces, but no sleep lost on that i have to say. * TO-DO shaded cells in a table, think up a better table handling method. * i now stick a space into an empty cell so that it shows up. * another U8 wraparound bug removed. * i now use the piecetable for simple docs, so as to skip over sections that arent to be processed, i.e the simple format is just as complex as the complex format :-), i think ive done this right and it wont break anything, ill have to wait and see though. * changed slightly the portions of a field that dont get printed, to make some html ones work, hope i havent shafted anything else. * hmm, really need to cleanup character handling, unicode & special reserved ms symbols and so on, im just plinking at them for the moment. * aghh, found another U8 overflow, what possessed me to put them in in the first place ?, i should have guessed that there would be hundreds of pieces in a file. * received report that it compiles and runs with Sparc solaris 2.5.1 - sparcworks compiler & Intel x86 solaris 2.5.1 - gcc compiler * added patch from diakka <diakka@staff.sinanet.com> to run create_bins on a make rather than make install changes up to 0.2.2 * compiled it on a solaris account i got, and its fine, got confirmation that it works from Will Renkel <renkel@cig.mot.com> * changed fastsaved chpnextfc check to be >= rather that >, hope that i dont break anything cox of it. * foolish error, U8 used for number of pieces, extended to U16 * changed embedded link handling to not end character properties in the middle of a URL ! * changed embedded link handling so as to *not* place "" around urls, as sometimes they are there already, and not having them doesnt hurt, though it offends my sense as to how they should be done. * would you *believe* these ms guys, now they are hitting me with file offsets that are past the end of the file !!, so now i have to watch out for that, the complex format is *such* a collection of hacks, ah-ha ive just checked in word, this file crashes word :-) so this is the first reported case of mswordview being better than msword, though i have to say that in recovery mode word pulled loads of text out of it that i didnt get, :-(, still its a corrupt file so doing anything at all is a success. * i forgot to reset the higher list levels when changing a lower one, fixed now, i think ive it right. * added a define of SA_RESTART to 0 if it isnt there. bash does it so i should get away with it, sunos seems to need it. * added a little patch from Zachariah Baum <zack@studioarchetype.com>, that should help get around folk who run mswordview absolutely and dont stick lls-mswordview in their path, ie make and then dont make install. * fixed yet more bugs, for some reason i thought that the order of evaluation was from right to left !!!! i.e i was doing if ((*p == 'a') && (p!=NULL)) doh! * changed web interface so that utf-8 is always on. * font characteristics turn off when going into tables now. and turn back on when inside, gets rid of some off look and feel. * checked out corel's wordperfect import functionality with office 97 files, conversion isnt as good as mswordview i think. missing header numbers, and one or two didnt convert at all. though of course corel retains layout which mswordview cant do with html, and does shading, ill check pictures at some stage. * have a report that suns pcfileviewer similarly covers about 50% of mswordview's functionality and vice versa. * gzipped uploaded word file collection has just hit 120megs :-) * i now look at this section table so i know whether its a section break or page break. If its a section break, then the header/footers revert to the beginning again. TO-DO, add an space to empty cells to make them look reasonable in netscape. TO-DO check page numbering with sections. TO-DO, do endnotes, should be easy. make new pic to replace hr lines, theres too many hrs now at the bottom of a page to make sense to anyone anymore. if theres no footers, then dont do the lines. TO-DO, continue with the sent files since 0.1.0, and the rest of them. changes up to 0.2.1 * removed bug that caused lists to drift further and further right. 1. checked out the blockquote indention for lists, doesnt appear to be right for srom*.doc, fixed now took closer look at font scanning in decode_letter, in particular special chars, the < 39 wasnt precise enough, being in a wingding/symbol font seems to make you automatically a special char. 2. something not fully right with lists that take their text as special chars (i.e sectionnumber), not done by ms in an obvious fashion. edit doc down to just the 2 headers and then see what happens. 3 AHA!!!, 1 and 2 are wrong, as was previous ideas to ignore lists that appear to have nothing in them, they are there to artifically bump lists up to a different starting number without requiring a seperate list definition for each one, ms shoves in dummy elements to get the list up to the right number, the section id just before one of them threw me entirely, i thought the section number should have been the text of the list. ive got it now! * 3 above is *rubbish*, thats not it at all, i was right originally, ignore those 0 len lists, and the problem was with my list restarting mechanism which didnt work if there was more that 1 list between list section that had to continue numbering. * numerical outline list sublevels will retain the prefix of the above levels, this required a change of the number figuring out code, its now rather heavy of silliness, but it works, i dont love it and im sure lists will be back to get me again at some stage, but outline lists now work, in particular the 1 1.1 1.1.1 style. * TO-DO sections, srom*.doc has them, check them out. * TO-DO change web interface so that the utf-8 can kick in if needs be. * fixed bug where the new piecetable check in simple saved files fell apart after hitting a footer. (tempcp = tempcp, rather than realcp=tempcp, doh!) changes up to 0.2.0 * well arse again, ive revised my ideas as to what consititutes the end of a piece, rather than the beginning the the next piece as i was doing, i now believe thats its the beginning of the piece + the twiddled cp len. makes more sense, and removes crashes from the latest doc i was given. * distinguishs between odd & even page footers. * TO-DO odd & even headers * added the tm symbol as a special case, theres quite a large range of unicode that ms is using that is part of the customizable section, i.e theres loads of glyphs that ms can use that are not part of the standard unicode set, the tm appears to be one of hundreds. eventuallly ill have to get a table of them. * woweee, is ms an evil designer of data formats, they have two types of simple saved docs i thought, those in 8 bit (basically ascii) and those in 16bit (unicode), hah bloody hah, ive been given one which is a mixture of both, and i have to use the damn piecetable to shove it together. and its not as if the document shifted into a different language of anything. if this was fastsaved id not blink an eye, but simple saved, come *on*, why bother calling it simple saved. so i have to keep an eye on the piecetable to determine what exact offset to use after all. * added a huge bit filthy hack in for more list twiddlings, the previously mentioned unknown 4 byte sequence now rears its head as an optional 8 byte sequence !!, but always ffffffff, it might be some kind of flag or summat. anyhow i now chew up any 4 bytes consisting of this if they show up in the place that they might appear, this removes a large crash that occurs otherewise, as all the counters get thrown off course by them. changes up to 0.1.1 * added Makefile patch from Pavel.Roskin@ecsoft.co.uk (says it works on hpux) * well the good news is that the unicode utf-8 is working for taiwanese and im sure other languages, the bad news is that everyones telling me that noone in their language group is actually using unicode :-) so i suppose i require a huge unicode --> JIS/EUC/KSC/Big5/GB converter. :-) * rudimentary support for annotations, i havent too many examples of these but i think they'll work fairly well. * rudimentary support for all special ascii codes for time,page no etc. p.s by rudimentary support i mean that if asked for e.g the current date in a particular format i output the date, maybe in the correct format maybe not. i.e the meaning is the same, though the look might be different. * added a supported sprm, that changes chp information totally to the chp of a different style. * added support for custom footnotes, had to do a bit of a hack to get the <a name> stuff right, hopefully it'll always work, even if it doesn't itll still be readable. * twiddled the char formatting dependancies about again, really ill have to redesign that a bit. * broke the mswordview.c file down a bit into other files. changes up to 0.1.0 * hell ive enough done to warrent a new numbering system. so from now on x.y.z x is a stable bug free (hah) release. folk packaging for commercial unices probably should wait for these releases (none yet, i know) y is a new feature or enough bugs fixed that you better use this version if you want to keep up with the jones. z is some small bug or change that is small enough that i wont upload it to sunsite et al automatically, itll be mostly for me. * added a defaultfont size option, so that if you think the output is too big or small, you can skrink or enlarge it. * added a horizontal padding option, you have the option of 3 different ways to handle a run of multiple line breaks, though the default is probably the best. * tweaked char formatting system, TO-DO overhaul all of that, theres quite a few dependancies between the tags thats becoming a little to difficult to do by hand, a little stack is called for methinks. * added some support for a type of holdover list format found in docs converted to word8 from older versions. works on the one i have so far though theres more testing to be done with it. missing bullets and incorrect numbering may be related to this. pass them on to me. * battered LFO's into submission, this time they'll stay down (i hope). found a 4 byte field that i cant figure out where it came from. *shrug* wouldnt be the first time that happened though. * changed footer and header handling, i now take notice if the first pages headers and footers are different that all the others. i still dont get section breaks, which i think impact on this, i dont have any examples of this to work against. Theres a discrepency between header/footer documentation and what i see before me in the hex, maybe im missing something. * ok theres some difficulty with tables, ive implemented this baby as a one pass parser, later ill have to add multipass (or backpatch) to figure out the number of pages so as to get that field right, but with ms tables you can start off with 2 cols then go to e.g 4 in the same table, you dont know in advance how many rows and cols there are in maxiumum, or which ones span which, which is a pain in the butt, really as far as word is concerned each row is a table into itself, so ive done it this way - each table has the cols of the first row counted and the widths figured out in % of the page width, if a subsequent row has a different number of rows or different widths than the previous row a new table will be begun. the % width will cause netscape to line them up correctly. itll do for now. not perfect i know but hey what is. Itll do the job for the primary task which is making word readable as close to the original layout as possible within html. - to get the tap that tells me all the above we have to scan forward until we find a rowend char, and get the pap of that to get the tap. and with fastsaved theres the usual complexity - The problem will be that netscape and other browsers dont take the width% as their primary factor in determing the actual width of a cell, if the text in it cannot be broken on a space then the cell is expanded to fit, breaking the lineing up. Im considering a somewhat more sophisticated (and questionable) technique where i stick the tables together using dithering of the cells to a (max 64 cell (msdefined)) cell grid. using colspan and so on to do it. * TO-DO theres something called a header text box that i have to figure out and some companion of it for the main doc. i have to implement something to handle these beasts. * TO-DO more testing for bugs and stuff. * TO-DO code overhaul to simplify it. * TO-DO support all fields, ive some supporte page no, date and time. but not perfectly in the same format that word has them in. * TO-DO,figure out how to extract ole embedded msoffice draw and equation editors data, and see if i can get them converted as well. * TO-DO provide alternative outputs, tex/rtf and friends. ive a load of formatting information that i think i can get into those formats. * TO-DO provide basic formatting for html, i.e centering. * TO-DO think about writing word docs :-), now that would be a hunk of work. so to all you asking me about it i recommend you dont even bother with it, just write rtf files and get on with it, thats even what ms did for word 8, saving as word 6/95 just creates a rtf file, if its good enough for them, its good enough for us. * TO-THINK-ABOUT i dont keep very much information in memory really, i just work out what i need for any given instant and drag it out of the file, and then dump it often to only get it again in a few seconds. this leads to an impressive amounting of seeking back and forth across the streams. theres a groove burnt in my hd where im working, its not really optimum behaviour, (works though :-) ) * NEED_HELP-ON, can this compile and work under sgi ?, have success reports from linux, solaris,hpux,aix,freebsd and one failure to compile under sgi, ive one message that it compiles under os/2, though it needs some work to do that. changes up to 0.0.27 * know how to do the right thing with embedded sprm list gets rid of a few wild bugs. * found the list documentation after all, maybe i forgot to download it the last time (doh!), or it wasnt there when i downloaded it. so i removed all of my rather good but unnecessary hex determined code. * added a special case for "*" in lists, make it a bullet point instead, seems to be the right thing to do (?) * changed laola commands name to append -mswordview to avoid overwriting newer lls commands etc. * changed the INC in perl files to reflect final install dir. * TO-WORRY-ABOUT, quite a few ??'s displayed in netscape when dealing with those utf-8 docs, dont know if thats my lack of correct fonts, or a great big dirty bug. also ive a few special cases in the decode_letter to translate letters into what *i* think they should be, its rather questionable and very emperically based. * added some hook code to protect lists from pagebreaks. in doing so i notice that my complex code is a wee bit confused, but it works, so im leaving it alone for now, the added code doesnt make for reability but hey, neither does any of the rest of the code :-) * fiddled list interpertation so that ilfo isnt looked at until the last pap and chp sprms have changed it. fixes difficulties in fast saved files. * TO-DO (list stuff) LFO override not implemented correctly may cause crashes. this is surely the last major list related thing to do. restarts are probably incorrect as are a few other minor list related bits and pieces changes up to 0.0.26 * changed laola lib to a subdir of mswordview and changed laola program names to custom mswordview ones, to avoid clashing with newer versions or original version of laola, as ive doctored things slightly for my own needs. * applied Martin Schultze patch to add lib path to perl include path, though i twiddled it to make a nice tree in my lib. * lists start on the correct number (well ones that are simple numerals do anyway). * understand list continuing and restarting now. * added a defensive patch from Peter Silva <Peter.Silva@ec.gc.ca> * lists now get the char formatting that they should get. * yes!, sorted lists out, have bulleted lists, arabic & roman numerals, lowercase and uppercase lettering systems done. multilevel also works i believe, works on all examples i have anyway * fixed bug that made mswordview fail on files without an extension * TO-DO look at list indentation, if they are true multilevel then i blockquote them (for now), but if they have a set indentation value then like all the other layout constructs i dont preserve this into html. * TO-DO fields, table of contents should be easier with lists done. * TO-DO find out if my unicode (utf-8) support actually works for anyone except me. What fonts do various people need, this is a general netscape question. * middleterm TO-DO, reorganize tags to external data files, to make extensible to other formats, i.e raw ascii, an attempt at latex, rtf. changes up to 0.0.25 * changed list handling slightly, removes a bug where you get too many list levels inserted * i believe that most lists will now be handled correctly as to whether they are numbers or not. I have isolated the undocumented section and have a handle on the situation so its just a matter to comparing theory with practice again. * removed bug where header pap gets used in the main document following a header * finished checking all uploaded files beginning with a, yipee. now theres quite a few elements not addressed yet in those files, but i understand whats involved, in short, section support, proper list support, justification support (centering anyway) decoding of the DATE and TIME fields, would you believe that the TIME field can encode the DATE, despite the fact that theres a DATE field whos job this is !, gagh what can you do with people who do this to you. but anyhow the uploaded all convert without crash, all text is in the right place, and in the right language ( i think :-) ). all bold,italic,font sizes, underline, manual page breaks, the content of footnotes,footers and headers is all shown, albeit not always the way they appear in word, yeah we're getting there. * changed utf conversion code as the original code i was using wasnt quite gpl compatable, anyhow new code is better designed for my needs. * TO-DO, grr!! is someone reading this log, as after my weeks holidays i note thats theres a huge amout of files beginning with a to go through again, i never did make it to b. changes up to 0.0.24 * fixed NULL complex pap bug. * supports underline tag now as well :-) * footnotes supported, all the ones referenced before a pagebreak get listed at the manual pagebreaks and document end . (thats a <hr> in my current output, splitting word docs into different files is a challenge id rather not accept for now as itd just be guesswork and mess), not checked in fastsave yet though. * TO-DO support sections, so as to know what pages get headers and which dont, etc. * TO-DO proper table of contents, the text is now listed but theres no link between the table of contents and the text it purports to describe, for the moment. * TO-DO differenciate between different types of underline i.e word for word etc * EVENTUALLY-TO-DO, i have come across one case where a symbol used in a footnote isnt working !, if i create one of my own it works fine, but when i alter the given one it still occurs, strange. changes up to 0.0.23 * verified it works on linux, aix and solaris. * fixed a very silly overflow byte vs int bug. * overhauled unicode conversion, fixed my sprm size detection. * changed table handling so that tables dont end prematurely. * fixed img insertion dummying of wingding font support. * massively changed my paragraph end detection for complex files, i had the idea all wrong, but close enough that it worked on fairly uniformly formatted files. * works with all uploaded files beginning with A and a theres soooo many to go through :-), im looking forward to getting to b soon. * TO-DO, continue checking against uploaded files, verify header and footer support, start on list information (dum de dum dum dummmm) changes up to 0.0.22 * check for errno * fix list related crash bug, found by Wayne Roberts <milcom@netcom.com> * TO-DO, go through the 50 megs of uploaded word files and see do the convert fairly correctly :-) lists need to be done better. i need to confirm language conversion. and check out table of contents field. changes up to 0.0.21 * for simple format i now decode to utf-8, when appropiate. on viewing many docs with windows netscape 4 it works fine, i dont have the X fonts to do half of the languages under my own X, but hopefully those in the various language blocks can figure out fonts for themselves ? * complex format non-west-european docs might still be shagged, id love to hear from an asian language group as to whether or not the utf8 works for them * some bug fixes by Pavel Machek <pavel@Elf.ucw.cz> changes up to 0.0.20 * headers are fairly correct now, the spec and me are confused as to headers and footers though, so while i *can* do headers and footers, it might require a bit of fine tuning, so i need docs with all sorts of header and footer types in them until im sure im right , but its close enough. * docs with subdocs in them should return the output of the main doc now. *to do, from the veritable deluge of documents in languages i cant read :-), id better handle the non-standard, well non standard to me anyway ! russian and one or two others that i hope fall out in the process, asian would be wonderful. changes up to 0.0.19 * header support added to complex format * wingding font hack added like symbol font * headers are still not right, footers and headers are all appearing at the top of the document, ive more work to do on that next. * ive shagged up the parsing of lls output, so docs with ole inside ole will not work even though theres no good reason they dont, bear with me on this * mswordview.wrapper added to allow inline viewing of word docs. changes up to 0.0.18 * new option to not change msword headings to html headings to support those dodgy people who dont use them correctly. * fixed what looks like a specialized case for recognizing tables * fixed the lack of - sign. * have a new group of files that convert correctly. * these are minor changes, ill add header handling to complex format tomorrow changes up to 0.0.17 * lack of getopt.h on some systems taken into account now. * sub and super scripting now in for simple format. * laola.pl changed to continue even if it thinks the file is the wrong length. * added option to not attempt to dummy up formatting done with whitespace. * using gifs for symbols, this will do for html output, for other output in the future we'll have to organize something a little more sohpisticated * i have some alpha support for headers in at the moment, if you have headers you "might" see them in russet text.

Changes to to 0.5.44 * Fix summary information bug ALL TODO * extend escher to read the wierd.doc extract background image for html find the text id and return it so as to find the break table for it fiddle wvDecodeSimple to be able to handle arbitrary subdocuments stick each textid's range of text into a subdocument handler which can be put in a html layer, extend to handle headers and footers, and feed that to abiword. * more graphic checks, more escher records etc etc. * promote old graphics into escher format. * decompress wmf etc. * get libwmf back into action and get picts together as well. * wvSetCharHandler(proc,NATIVE|UNICODE) * attempt to remove completely empty para's with the IsEmpty code * Got to think of some way to keep the list stuff in sync with the margins of the enclosed paragraphs in html output. * need to add a color Auto thing that figures out what color goes against what fore or back ground. * put the default text and char stuff into one single "style", and then try and figure out a way to save arbitrary no of styles into some nice structure for later seaching and stylesheet implementation. * style overrides for each word style, example idea in wvHtml.xml * comment/footnote and endnote support. * fully implement tablelooks, the flags in particular, and maybe the colors for the bg should be farmed out to sperate config options rather than piggy backing on the other colors, also why did one of the text foregrounds not work on its own, while the others did, and is there one of two cases where our grid will not handle every case ? * add another option for table width, i.e. tableabswidth. * we need character formatting for the

itself, we can do this I suppose, but i'll hold off on that for a while. * fontface and size for char runs.. * what about a collision between underline and revision added ? and also the strike and revision deleted ?, what if a mad user created his own collisions. In the future there will be problems with links being broken by references, this is a similar problem. * install cygwin thingy at home and test configure mechanism for searching and installing strcasecmp/rint. * i figured out what the story is with ole embedded files, ill have to modify my ole code so as to be able in the future to parse embedded docs and splice them together, which could be a wee bit of a challenge. * modify configure script so its possible to link against a different expat lib, or to disable it or something. * test that continous sections and endnotes at the end of section, and other things like that do what i think they should do. * placement of footnotes, what does "treat like endnotes" really mean ? * make sure captions are alright, especially formatting. * bits for anld are wrong * bookmarks embedded in html tags break them, constructs such as e.g stuf f are being output even though thats well wrong in html. * convert the cross-referenced "above/below", into hyperlinked above and below. * optional support for specifying special fonts, not recommended for use on publishing for internet sites, but useful for internal use for those of you who have done the funky chicken dance with unix netscape to work with ms winding etc fonts or are using ie/netscape on windows. * all the fields, document background colour. * gnome canvas wysiwyg viewer, output to ps from this * use incremental zlib functions to do decompressing rather than use mmap, someone who doesnt have mmap on their system can send me a patch for this one ;-) * doesnt compile under neXt & needs to use gcc for hpux 10.20 ? * put code thats in both simple and complex together. * do an autoconf check for mman.h and dont do compression if not there. * maybe someday we should use #pragme pack(1) if we are being compiled with gcc under a little endian platform. That might gain a speed up ? Changes to to 0.5.43 * an improved wvLaTeX.xml by mv@liisa.pp.fi * added some of the older 0x08 word 6 stuff to it. * marvellous set of patches from brian.ewins@bt.com to do a load of speedups. Including some chpx and papx page caching, some replacement of unneeded byte by byte reads, and some element by element copies. Plus a very spiffy token table lookup scheme which speeds things up a lot. * some fixes to parse the old word graphic file format, I cannot use very much of it, but at least I don't crash on it anymore. * added --dir option to wvHtml so that pictures can be placed in a seperate directory * removed some more unnecessary element by element copies * found the lengths for word7 sprms of 111 112 and 113, but i dont know what they do, nonetheless they are now defused and made safe. * make configure.in test for memcpy as well and use bcopy if not * define ssize_t in config.h if unistd.h is not available * mem leaks removed * made expat use the byteordering results for faster working, will default correctly to nothing if this cannot be determined due to cross compiling. * implemented TIME field * remove unnecessary expat subdirs * title done in the correct charset * implemented HYPERLINK field * implemented PAGEREF field * added bookmarks to wvParseStruct * wmf files are decompressed and extracted correctly * unicode in fields is support ok now again. * field thing split into two parts, the command part and the "argument", or last outputted text from the field part. this allow hyperlinked fields to do the right thing, and for unrecognized fields to output their original default contents. * changed pap defaults to include correct widow/orphan defaults * fixed sprm handling of tab stops * added and to xml config, handy for html and necessary for latex im told. * added a lastcell entry which can be used to handle the last cell in each row seperately, put lastcell.begin *before* cell.begin and put lastcell.end *before* cell.end to use it, it is defined the same way as cell * moded percent sign back into config file, so that wvLaTeX.xml can use /textpercent itself * made my own changes to table handling in wvLaTeX.xml from discussion with David C Sterratt * added cellrelpagewidth to tokens which appears to work a charm with the latex conversion, cheers to David C Sterratt for pointing pointing me on the way. * added dop to wvParseStruct so abi can get default tab distance. * remodeled the html entity lookup, and added the basics of a latex entity lookup. * removed the now unnecessary codepage 1252 html entity lookup. * I had left wvGetTC out of sync with the new WORD8 version numbering * horrific table supported munged in latex. latex people should have a better suggestion to do what I want with them, I *dont* want to hear the endless whining about form over content, I still need to support tables that both vertically merged cells, and horizontally merged cells together in one table. * dop reader takes account of the version for word 6/7 compatability needs to be tested. * allow latex or html or raw text conversion options from the config file, this is not the same as charset, the latex option is pretty empty at the moment, I will take submissions for it. * wmf's are decompressed again, and dumped out to disk, so right now you can manually use libwmf to convert them to gif. * hdd should be hdr i reckon really. * began to make some changes which will allow subdocuments to be handled individually. Changes to to 0.5.42 * temporary bmp for older word6/7 document and legacy structures appears to be working. * sprmCPicLocation was one too short for word6/7, strange. * picf modified to use older word6/7 version as well. * some modifications so that it can handle documents with incomplete bte tables. This is only in fullsave, because I doubt the logic behind what ever program is creating them! Its bloody insane, but im going to support it coz word can do it. * only put in the paraborders if we need them, makes the html output smaller, and more importantly works around a netscape bug where is para indent to the right is x, and the first line indent to the left is x, and there is a border (even if of type none!) then the para is indented too far to the left. This is bug 1524.43 Rules.doc * supported 0x01 graphics ala broken/001-TETMEI.doc 0x01 graphics are making their way back in, and are looking better than the old code already. * fields can be embedded in each other, so the field ignorer is now capable of realizing this. * all 0x01 bitmap formats are looking good. * some 0x08 bitmap formats are coming through correctly as well * bug in Huge handling Changes to to 0.5.41 * attempting to support 8 bit russian cp1251 docs as well. * there is an extra argument to the character handler, this is the lid of the character. the Language identifier. * made some changes to the build so that it will work build correctly outside the source tree. * added a small iconv implementation which follow the same syntax as the ordinary iconv. We *must* be able to convert from windows codepages into unicode, it doesnt matter about the reverse direction at all. If the native iconv can do this then we use that, if the native iconv cannot, or does not exist we use our own iconv which can only handle a conversion from windows codepages into unicode, * So currently we can always output in utf-8 from just about whatever input charset word hits us with. * removed the unnecessary symbolfont dir * made some more mods so that we convert into 16bit unicode from all the codepages, we also must convert from 16bit unicode into all the current outputs such as tis and koi and iso-5589 and also utf-8. * I have had the wrong name for my own charset all along :-), a bit dyslexic of me, iso-8859-15, NOT iso-5589-15 ! * change the charset all the way through the system to a string so that we can use everything that works with a systems iconv. * removed unnecessary paramater to wvOutputText * hooked up all the output system through output, i.e. the title gets printed the same way as the body text. * changes to Makefiles to make it build outside of its own dir. Changes to to 0.5.40 * took a patch from Mitch Davis to change PAGESIZE to WV_PAGESIZE, this define already exists under HPUX (oops), and modify -I./ to -I. which supposedly makes a difference. * output title in the same output charset as the rest of the document. * inserted a hack to force lists to end before , rather than after the * made a fix to setting the chp istd correctly after an initialization * the style 10 (Normal) is Generated first if possible, as other styles (illegally i think) depend on it in the style generation code. * tables and list were interacting badly with eachother to create invalid html and incorrect numbering, fixed this. * doubled up the alignment tag with div align as well as the style assignment as netscape is having problems with short paragraph alignment. * made some changes so that the first list start no is always 1 rather than programmer 0 :-). * add a
as a section break to wvHtml.xml, sometimes a heading starts after a section break, but because of no

it ends up in a bad position. * hacked in some sanity checks to swap between unicode and 8bit in the stylesheet names, some mac docs are using 8bit names in word8 files. * hacked in a mechanism to fake a section the size of the document if there are no sections in the section listing, like there always is except for some strange mac word8 docs that I received. * an attempt to make nfc's more like liststartnos so that sublists that start > 1 levels below the last list entry have the correct nfc code. * forced a paraend in html mode to close off any open lists * I wasted a *lot* of time getting multilevel lists to do exactly the right thing, and to get them html complient. I now submit that the problem is really actually quite a toughy without scanning the entire list before printing it (which i do do with tables ). The interpretation of html lists doesnt help the matter, its *close* to what I want but just far enough away to be useless, i.e. This

test
1. test
test

gives 1 test 1 test 2 test and this gives

1. test
test

1 1 test 2 test I reckon it should be 1 test 2 test What we are currently using is the incorrect

test

test

Which gives 1 test 1 test Which is not optimum but the best we can do without scanning through the entire list before printing a single entry. Attempting to see if a list entry will ever be used, and if not then bumping up the start value by 1. Noone will notice the incorrect values for the most part. I may at a later date sidestep the issue by allowing the list entries to be output as ordinary text and be damned with html list limitations. * It became necessary to duplicate the paraending code for the end of a piece in the simple mode as well as complex. THe simple code is now almost exactly the same as the complex, ah well. * I believe I have correctly worked out how to determine when word 6 and 7 files use unicode characters. Changes to to 0.5.39 * made a new wvHtml conversion page, looks nice to me, online bug listing, its hardly a bugzilla bug it serves better for my needs. * added placeholder.png and wvOnline.xml to cvs, neither of which are of any real importance except for the interim. * added variable, handy for the online converter. * added three sprms of (now) known length and unknown purpose to word7 sprm list. * NONE of the word documents that I have (4747 of them, 556Megs) now crash with the current version, this is not to say there there are not serious crashable bugs, or that the output is sane, just that it is now quite reliable. * versioning enum extended and renumbered to handle all word formats in the future, hardcoded 0 and 1 changed to WORD8 and WORD6. * finally hacked in preliminary stylesheet code to get the dependancies in the correct order, its a bit crufty (!), but it does the trick for now. Changes to to 0.5.38 * added the symbol mapping to unicode as best as I could, I made one or two mods from the proper unicode so as to get a few more to work with the current generation of web browsers. very bad behaviour I know and the sort of stuff that got the world into this mess, but at least you can recompile wv at a later to date to fix it, replace the commented out bits of symbol.c to do it. * added messages for conversion table request for special fonts (the spawn of the devil as far as I am concerned). * added a character property end and start at the beginning of a new paragraph, this is necessary in many cases, funny i never noticed it before * figured out some rules to handle placement of graphics, abandoned stylesheet placement as netscape is too much of a mess to be of any use there, and thats the target audience. * the CHP code didnt work for word 7 and 8 sprms, this ironically means that rather than falling through the default case and being ignored, each chp sprm is now parsed leading to certainly more crashes and bugs as we find differences one by one between word 8 and the previous versions character property sprms. * fixed sprmCSymbol for pre word 8, there might be problems with fonts not named "Symbol", like wingdings. * due to serious oddities I have added a TABLEOVERRIDES option in wvHtml.xml which allows the margins before and after and paragraph, and the first line indent to be turned off inside tables, as having them on creates a real mess in netscape, in the future when this ability is supported by browsers you can just remove tableoverrides and ta-da all will work fine. * fixed table row scanner bug * fix for last para scan in complex mode * make mods to table.c to allow cells within 3 units of eachother be considered the same. * hmm, added a workaround for missing the beginning of a para in complex mode under certain conditions. * some incredible hackery to differciate between 16 and 8 bit character modes in word 95 and 6, real dodgy stuff, but its working so far. Though its certainly a point of failure in the future. * fix for table colspan mistakes * modified sprmTTableBorder to work with the smaller word 6 BRC's, also fixed bug where I thought the sprm was variable * had to fix sprmTTableBorder again, because it *is* variable under word 8 despite the docs to the contrary !!, gagh. * aaaargh!, wvGetLFOLVL and that wvInvalidLFOLVL has struck again, this time I think I have it sorted out once and for all (but i bet not), this new layout fixes quite a number of crashes. * incredibly hard to find overflow in U8 in wvGetPAPX, silly me, must really pay more attention to these things, you tend to forget that U8 are a really small type, left to my own devices i'd use int, but for this program I slavishly follow the types in the spec, and overlook the workarounds that are obvious in the struct definitions for PAPX. * fixed the rather ugly empty paragraph skipping code to only go to the next cell when a para level check is done. Im having terrible problems with sprmDefTableShd, it always follows sprmDefTable, and there is something wrong with Shd, maybe its me, maybe its word. Either way im working around the problem. * I had broken the word97 decryption, fixed again. * cleaned things up to create a version enum and associated obvious names with the versions so that its more obvious to read and more extendable, encryption is marked in the version by the base version ored with 0x8000. * some mods to the old list conversion to new list format, removes at least one crash and might solve others, possibly not a full solution. * added html names for umlautte characters. * found a sprm 0x6646 which appears to be 0x6645 HugePAPX where the papx is stored in the data stream, it only occurs for PAP's and only for FKP papx's. Nonetheless it has requireed the addition of a data file stream argument to many sprm related functions, nearly always NULL except for fkp PAP papx's. * sprmPHugePapx implemented, another nasty bug fixed because of this impementation. Changes to to 0.5.37 * para indentation, first line indentation, top bottom left and right margins * border code started, mountain of tags included. * border color added for paragraphs * we can handle individual sides to the border rather than just taking the top for all sides. * supporting brcBetween required that we repeat the table style lookahead for brc's as well, this is very annoying, and seeing as netscape doesnt even allow margin support correctly I hate putting it in as noone can use it, makes me feel more complete to support it myself though, maybe mozilla will sort this out. Changes to to 0.5.36 * mem leaks plugged, word 6 and 7 section sprms added correctly. * crushed a few out by ones, twos and threes :-), flattened a few more pesky buglets and leaks. * purify now reports no problems of any kind with any of the examples and feature-examples. * modified simple and complex saves to only do the *main* body text, no comment and footnotes etc being put in when they shouldnt. * made some headway into understanding undocumented version information. Theres 22 ununderstood bytes per version. * added wvSetSpecialCharHandler, special chars now have their own seperate callback which feeds you the char and the associated CHP, this might require some more work. * doh !, made silly mistake with ending the doc at the ccpText limit. * added wvGetGrpXst converts strings groups to nice STTBF's * added wvGetBKL_PLCF * implemented the COMMENT BEGIN and END, and applied it to the "simple code", the actual comment itself and so on is part of the subdocuments which are not implemented yet, but will be soon. * overlooked complex sep properties, added them in. * added dirty tag to the elehandler, 1 means that the property might (more than likely it is) be modified from the original style as indicated by the istd, this is implemented in simple and complex for PAP,CHP and SEP. Changes to to 0.5.35 * tables. * wvToggle was still in todo, it was fixed a while ago * made decode_complex stsh into a ps->stsh, I obviously missed it before, this is the problem with having both decode_complex and decode_simple with unshared components, gagh! * complex mode tables might even work now. * table relative widths now in as well, percentage of screen, uses the sep, so sep has been put into the expand_data struct, which needs to be cleaned up, i propose putting chp pap and sep into wvParseStruct rather than props, and making expand_data have a pointer to wvparsestruct * fixed section code begin and end crash. * complex mode colspan and rowspan, and various support functions. * wvHtml can find the config file on its own, and has a command line option (-x) to find it. * some fixes to wvConvert which I havnt looked at in a while, so as to get it to work, and to include the password and other changes that made their way into wvHtml. Changes to to 0.5.34 * added cellwidth percentage as well to wvHtml.xml to make the cells the same ratio width in html as in the original word. * changed Dk Colors to Dark Colors in wvHtml.xml to get the right colors * tested rowspan lots and lots on examples. * tested colspan. * wow my god !, there can be either no tc's in sprmTDefTable, word 6 ones, or word 8 ones, you have to work it out depending on the length of the parameter. * small doc addition by Karl F. Larsen * tweaked wvCharBegin to ignore empty rowend paragraphs, squeaks us past the html validation service :-) * basic tablelooks implemented, and basic background color for cells * changed a few more colours in wvHtml.xml to get ones that work in netscape, must change them all to #?????? values. * removed signal and wait stuff from configure * added searching for wvHtml.xml, i also install it, so you can wrap this in a rpm and it will work fine for the average user. Changes to to 0.5.33 * tested row and col span with fullsave and fixed many many bugs, sprmTDefTable is not as simple as it looks. Changes to to 0.5.32 * multilevel word6/95 lists appear to work fine, needs verification * use new cellwidth thing in wvHtml, wvConvert and wvConfig * colspan probably works in general at least * AHA!!!!, sprmTDefTable contains some TC structs, *but* only 10 bytes are allocated for each one, a word 8 TC is 20 bytes long, a word 6 TC is 10 bytes long, so we can point out another location where word 8 is disconnected from its own spec completely, gagh!!!! * completed rowspan support, now wvParseStruct and expand_data are exceedingly messy, and theres a stack of static in wvConfig, it might be a good idea to move them into one location and stuff this pointers from the parsestruct to the expand_data struct nonsense Changes to to 0.5.31 * wvSummary bug fix. * word 95 decrypting from the password added as well !, theres no stopping me somedays :-). Though I have to verify that, as its a bit messy and some bits of it might be unnecessary, and i have no idea how nonenglish languages might affect it. And maybe its based on the percularities of one particular word95 program that I have. Also it would be reasonably easy to make a password cracker for word95 instead of requiring a password to be added in. Changes to to 0.5.30 * removed crypt dir and references to it. * removed crypt from configure script * made check so as not to close NULL FILE * in decrypt.c * modified decrypt.c to be big endian safe, in this vein and in an attempt to make it more readable I have used the standard md5 code snarfed from the rfc instead of the original md5 code, its all the same in function, just endian safe. * modified the SetPassword and password string promotion to be a utf-8 to unicode conversion, this of course will only work if the input, like an xterm, supports utf-8, in any other case its exactly the same as an ascii to unicode, so its the same as ever, except I feel a lot better at least in theory supporting the full unicode suite. Changes to to 0.5.29 * NUMBER 1 CHANGE: we now have the ability to decrypt word 97 documents, yippee!! * more koi8.c changes from Sergey V. Udaltsov * removed all the lex rubbish, and took mswordview itself out of the default Makefile dammit. * some changes to semisupport word 95/6 lists, it does appear that word 95 lists are the exact same as word 6 lists. * word 6 and 95 lists were different, and there is supposedly cases there the word 97 can use word 6 style lists, though its supposedly unlikely. * We have a problem with word6/95 lists, while we have the information about each list entry, i cannot figure out how to tell if one particular entry belongs to a particular list, i.e. I can quite happily pump out lists where every entry is a seperate list consisting of a single entry, this is very annoying. As a temporary measure I have done a checksum on the list information and if the checksum is the same as another entry, then I assume that it is a member of the same list, it works so far on very very simple lists, and I imagine that it will explode when i investigate more complex word6/95 lists. * now lists... lists come with number information and also with character formatting which applies to the number text itself, and paragraph information that applies to the paragraph that is the list entry itself. Every list entry is a paragraph. So if we are not interested in the character properties of the number text itself we can quite happily convert the list into html with numbering correct and so on. If we want the char formatting of the number text we have to loose the html correctness of list handling. The other final case is those weird windows symbols that might be used, we cannot do them in correct html, they must either use the three symbols available to use, or just become bullets. We can apply the para stuff to the actual paragraph and some checking shows that a div is a valid element to put in a list so thats what I have done * with the word6 list problem, I have been unable in word95 to create a list underneath another list with the exact same formatting without putting a space between, I have also been unable to create a list to continues from another list. In short I cannot create a list that can break the admittedly insanely hacked mechanism I have devised to leverage word6 lists into the word97 model used internally by the wv library. * some mods to make multilevel word6/95 lists work correctly, completely mad stuff entirely, dragons be here and so on. * minor change to summary.c to allow slightly dodgy but ok docs through the system, happens with msword version 6.0.1 ( a mac version ?) * explicit ul end ala ol end, if the para is the last para of the doc. Changes to to 0.5.28 * added sprmPNLvlAnm into sprm.c for compatibility with word6/95 * sorted out where there are two lists under each other at the same level but of different types. * Now the list code has become very tied down to being html output, i have been keeping things reasonably flexible with the config file, ah well, its not a serious problem at all. * well now interesting, supported-list-features.doc is now a very bloody awkware set of lists, and its encouraging to note that word97 makes a real mess out of it. While an argument can be made that there should not be a seperate para for each

element, compare the word97 output against the wvHtml output. word97 restarts each of the lists from scratch, hur hur. * removed lex dependancies from the Makefile, and split some of the olderstuff into temporary old* files, which will all be removed one by one. * make does not make mswordview by default, time to wean everyone off that one. * mswordview itself probably doesnt work anymore, use the stable version if you want this program Changes to to 0.5.27 * expanded the list info wvParseStruct to include all of the structures. * made the stylesheet code safe, but its a fix until i do the out of sequence istd initalizing correctly. * removed blank line from expat Makefile, Keith Wear * get list info extracted, make ul vs ol descision, get entry begin * continued with lists, maybe change struct to include chp and pap simultaneously as i might need it for the lists, extract start value for html, and number nfc to use as well, for the case of symbols (nfc tells me i think ?) swap to ul rather than ol, thus we need a reciprocol mechanism in the config file. * lists look good, releaseing to the world Changes to to 0.5.26 * some checking showed that I had the wrong name for the koi encoding, koi8-r is the correct name, and ive changed it to that. * wvHtml dumps graphics, and wvGraphicConvert is a standlone little app for hacking purposes to open up graphics to external hackers. Changes to to 0.5.25 * added date and author id to revisions, found bug in DTTM. added wvDTTMtoUnix to dttm.c * added animations to config file as blink (hur hur) * i added (even though i have no idea what it is) DispFldRMark to everywhere relevent. * that basically completes everything in the chp that makes any kind of sense in html except for font face and size. * well seeing as the output passes the w3c validator test, the html output be default announces this fact. * added charset option to wvHtml, documented in new wvHtml.1 manpage * added koi-8 from Sergey V. Udaltsov and added a howto in notes/internationalization/Charsets-HOWTO * changed lists to be html 4 complient. Changes to to 0.5.24 * righteo, I made some (hopefully) final changes to fast saved handling, and it looks a lot better now. Char attributes are correct, and the issue of para begins and ends being missing from paras that begin in fastsave section appears to be cleared up. There is still spurious character runs being created in this location, but they appear within paragraph blocks, not outside them, and they have no contents so they only create reduntant tags in the html output, or in the case of the lib makes it more inefficient. So its not 100% but its close enough that it'll make absolutely not difference in the case of an abiword-like app, and only someone looking at the source of the html output will make rude i noises about how crap and ineffecient wv is because it outputs empty tags. So the bottom line it that it is a known misfeature that in the case of fastsaved files that there is the duplication of empty char attributes in a small limited number of cases. If you really dislike this, then set options in msword to only create fullsaved files, which you should do anyway, because thats the major reason your word documents are so huge if you ever wondered about that, and its also a huge security hole, e.g. if you edited a confidential document to remove the confidential bits, then you can edit the doc with a hex editor and read all the deleted confidential material !. At some stage i believe i might add a feature to show the original document that a fastsaved document was based on, it can sometimes scare you to death. * my resetting char properties at a new para was slightly out, i wasn't fully regenerating the exception run limits. Changes to to 0.5.23 * added RMarkDel & strike & outline to wvHtml support, handle empty tags correctly now as well. * added lowercase, shadow,vanish, rmark,caps, outline and smallcaps to wvHtml, though many are empty and caps,smallcaps and lowercase need further code to actually do the deed * added includedir to mkinstalldir list, coz of (Marko Rauhamaa) * the toggle (cases 128 and 129 for fBold and loads of others), works by taking a look at the original style that the current one is based on. It was until now not actually looking at the original one at all, but the current one, thats fixed now. * another one was that if we were based upon a char style we weren't getting initalized correctly at all, this too is fixed. * changes have also been made to sprmCMajority and sprmCMajority50 along a similiar line. These three or 4 changes together make a huge difference to the output. So this should clear up a *mountain* of mismatched output, i'm so proud, the best way to track down these differences is to take a fastsaved file and save it as fullsave and compare wv output for the two. * colour in html output. * hmm, real real stupid thing in fastsaved mode where i was completely fecking up the fcLim by changing it in a subfunc and then thinking that it was the original and using it as that again! Changes to to 0.5.22 * new development release Changes to to 0.5.21 * fix for bad sprm handlers so font changes now occur. * fix for having no summary stream in wvSummary. * added protection support for istd out of sequence, we should in the future handle them correctly * added simple word95 file support, gets all text correctly and at least pretends to get the paragraph properties, needs much much checking, i treated them exactly the same as word6 and that appears to work reasonably ok. * I have added a sample import filter for abiword in the abi dir, basically it's up to the abi folk to integrate that in at their leisure. * added contents to sep.c anlv.c & olst.c * fixed the length of sprmTDefTable, solves some word6 crashes. * finally noticed that the BRC is of a different len and layout with word 6 * note to self, the EatSprm only works for true word97 features, ones that d in word6 and 95 have to implemented or things will crash, this is not a real problem as all these sprms should be implemented one by one. * found two TAP sprm's that differ from 6 to 8 and have updated. * implemented sprmCLid which doesnt exist in word97 but does on older vers * added ole code to viewer. * the program named mswordview is depreciated, it still does far more than wvHtml but this is a warning that wvHtml is the new html converter for msword docs. wvConvert is a generic converter that currently defaults to abiword xml so that i can examine a richer set of properties, I wonder how generic i have actually made it, a tex converter would be nice wouldn't it. * wvHtml now uses html output so < & > will work now, i had overlooked that aspect (whops), my focus was on other types of properties, wvHtmlOutputChar might need more work, keep an eye on it. * stuck a stack of structs that i havn't used yet into the header files, and some implementations of readers that i might need someday :-) * added char properties (Justin did all of this one, and good stuff it is too) * merged together two vers * finish SEP, and friends, added a mountain of structs, the remainer of what was not already in the header file, and added some stub files for them all. * added simple file support for Section begins and ends, moved the char handling code around a slight bit so as to be in a nice looking order to me. * continued sections in complex mode, brought my standalone abiword converter up to speed with sections. * implemented all of the SEP sprms, word 6 conflicts not fully checked yet. * Jeff@abisource made it more portable by modifying the wvError/wvTrace macros and putting in defs for rint and strcasecmp. * purified sep code. * fixed fastsaved chp init from pap istd (i think) * fixed finding first para bounds with complex mode if the first para is a new fast saved chunk (i think) * ffn sttbf was wrong for word95 & word6, is fixed now. * Squashed one the bugs that was causing one of my annoying problems with complex files and incorrect para fcLims. This one was driving me completely mad, i don't know if i have fixed it fully correct though, but i think so.. * changed laolareplace.old.c to put isprint test at the end. * added bold and italic char prop handling to simple mode wvHtml * added bold and italic char prop handling to complex mode wvHtml Changes to to 0.5.20 * the checking for end of a piece was all wrong, i was looking at the beginning of the next piece for that information which while always correct failed horribly in the case of the last piece. * fixed some more bugs * fixed wvConvertCPToFC ala end of piece. * fixed text *after* the final para in simple mode related to above. * fixed oversight in len of UPX stuff in stylesheet * fixed some style eating problems. * cleanup up some bits and pieces with pointers and styles. * added strcasecmp check and inclusion route. * more bigfixes throughout chp and friends. * added a simple fib6 reader that reads into a fib8 struct. * word 6 doesnt appear to have a sep table stream so we'll have to look closely at that sort of thing. * modified STSHI handler to allow word6, modified STD to allow word6 * put in a word6 to word8 sprm converter, might even work. we won't know for quite a while, implemented for pap and chp. * reran purify, reworked the binary tree code section for that real complex chp sprm. * made the complex pap search start with the current piece, rather than the next one. Seems to be the right approach. * fixed a small offset problem in word 6 sprm translations. * clx now can load in a word 6 complex piecetable (in theory anyway) * identify word 7 files. * word 6 thing appears shafted. * prm complex option was the wrong way around ! * fixed all bugs that cause crashes on doc collection. * word 6 had to have a seperate BX and fkp and so on for itself, but now i believe fullsaved word6 files are as supported as word97 files. * can extract raw text of fastsaved word 6 files.. * and now we can get the para properties of word 6 fast saved files (i think) * basically brought fastsaved up to fullsave quality, though im not 100 happy with them. * some more purify found problems. * implemented chpx in stylesheet for word 6. * did some nasty hackery to munge word 6 chp sprms in word 8 ones, appears to work. Changes to to 0.5.19 * renamed libwv, and stuck in aviword cvs * this version probably doesnt work, and almost certainly doesnt do what it says on the tin, dont use this until i get to at least the next version, this is basically a cvs test. * use ./configure --without-zlib --without-ttf --without-xpm --without-wmf --without-x change gcc to g++ in Makefile and make a libwv.a suitable for abiword, (yeah i know i know, but im working on it) to get a simple -lwv * whoppee, nearly working fine as an abiword filter.. * moved fib into the parsestruct, changed over existing programs to use wvInitParse rather then handcode for each one. * mad mods to make it compile cleanly under c++ * changed over the simple decodation to use the parsestruct and propugated the changes throughout the system * right, use wvSetCharHandler to set what function will be called with each character of document text. * found my word 5 spec, which is a bit of a relief, coz i don't think i could replace it if i lost it. Made a few copies of it, i need some good ocr software though as i got it sent to me in scanned in tiff files !, and the original docs were obviously a bit crumpled. * we can now read the text of simple word files in abiword * finalized paragraph element handling * made wvConvert and wvHtml use new paragraph element handling * got the plugin to do the same * compiles fine with g++ as well, which is a bonus. * created hook into the the charcode in wvOutputText for abiword, and other lib users. * created an abiword filter with what we have already, need the ability to register handlers for events and so on. * got rid of most of the compile warnings * we can do now do para props of complex files, though we have to confirm this as its always a bit flaky (also in old mswordview btw) Changes to to 0.5.18 * made a release to show off the devel version to the abiword folk. * modified xml code to unexpand < etc etc, so that i can defer processessing of some of the tags until later, im probably making a complete arse of the whole thing, but at least it gives me something to do, and keeps me out of trouble neh ? * created a variable expansion mechanism using xml parser, seems ok. * make wvHtml load up wvHtml.xml and confirm that document begin works completely fine, and that the title is being expanded. * do end as well * attempt the paragraph stuff, and call wvHtml a basic wrap * so now we can output simple files in very basic html with para noted correcly, and the title supported, we can do the same for abiword with document begin/end and para begin/end * charset supported as well. * variables (?!) are now & * right aligned some #defines * finish adding version var, use purify to find problems with adding entries to TT table (debug only i believe) * modify justification so as to call wvExpand again to get the full string * create an abiword config, got document start and finish and paragraph start and finish working as well. * we can now output good html and abiword format docs with basic paragraph alignment, yippee. * converted most of the U8 name:s to U32 name:s (non critical), i never knew that using anything less that an int was not technically correct, well what d'ya know, some other minor stylistic changes. * wrote tiny stub of an abiword importer. * modify OLEdecode to take a FILE * rather than a filename, * standardized ret codes from OLEdecode. * added an error explanation table. Changes to to 0.5.17 * added clx.c, pcd.c, prm.c * clx.c is the successor to piecetable.c, * debuged clx * added GetPHE,fkp.c,bte.c,bx.c * debugged decode.c, all ok now. * paragraph begin and end marks now found for full saved files. * added codepage-1252.c, iso-5589-15.c & text.c if you want to add your own fontencoding conversion do... 1) add the language name to the charsets enum in wv.h 2) create a function like wvConvert1252Toiso8859_15 which converts cp1252 into your language 3) add to text.c in wvOutputFromCP1252 an extra case statement to call wvConvert1252To[YourEncoding] if outputtype == YourEncoding 4 create a function like U16 wvConvertUnicodeToiso8859_15 which converts unicode into your language. 5) add to text.c in wvOutputFromUnicode an extra case statement to call wvConvertUnicodeTo[YourEncoding] if outputtype == YourEncoding Be warned that converting from unicode to your language, which is the most likely scenario will only work out correctly if the unicode actually maps to your charset, so obviously converting unicode that was japanese characters into russian koi-8 is only go to give a page of ?, so watch out for that. Later on i'll add in some ability to check the language. * added wvSimpleCLX program which determines if a file is complex (fast-save) or simple (full-save) * basic character handling, converted windows "compressed unicode" into html as far as possible. * fixed size mistake in PCD PLCF. * tested wvSimpleCLX on all word docs, made a mod or two to the ole code to avoid segfaults identified by the test. * moved decode to decode_simple * added decode_complex * debugged the decode_complex para begin code, and extended to find the para end, though this might be a little wrong, but we'll see. * added the wvText program, primarily for testing the new mechanisms, but it can be a useful program in its own right to get the main document text from a word document in its raw form, obviously its not going to handle tables and any kind of complex word artifact, only the text in the correct order. Which considering the whole complex file format question makes still makes it a very sophisticated little program. * wvSummary bugfix. * debugged wvText so that it doesn't crash on any of the 3735 sample files. * added ability to text code to remove field codes, and just output the previous results of the fields. * added some changes to the error output code, now use wvTrace to output debugging messages, its a macro that will dissappear when compiled normally, unlike the old sillier mechanism. * changed the FKP code to pull in the total data * created wvAssembleSimplePAP * release the FKP on each cycle in the decode_simple * fixed a few sprms from doc investigation that were wrong or dodgy in the spec. * stupid bug in EatSprm. * debugged wvAssembleSimplePAP and FKP code for crashes. * fixed bugs in sprm.c and numrm.c, changed a few constants to the cb equivalents. * applied the PAPX to the PAP correctly (simple mode, i havent even tried complex yet). * confirmed that code does the right thing, and gets the right properties for the simple pap. * reran checks. * create a test with wvHtml to output some of the interesting paragraph properties in the correct place. * added expat the xml parser to the tree, im going to use xml for my config file, which may or maynot be a good idea, but seeing as my lex code created *such* problems on different implentations i'm well and truly sick of it, so im going to try xml instead. * reran autoconf with the latest version * wvConfig changes... 1) created a release for the config list table 2) malloced correctly 3) created an append for <title/> 4) pass the userData into wvConfig.c 5) convert main into orinary call 6) moved wvText to wvConvert, and make wvText a link Changes to to 0.5.16 * added anld.c, changed over from old ANLD to new ANLD. added wvGetANLD and wvGetANLDFromBucket. * cleaned up bad chp entries. allowedfont removed, may cause problems in the future. * added some stylesheet definitions. * trivally added version.c,and modified it to become wv rather than mswordview. * added wvGetSTSHI,wvGetSTD,wvReleaseSTD,wvGetSTSH,wvReleaseSTSH * short tests show that the new stylesheet code appears stable. * added dcs.c, shd.c , numrm.c, asumy.c * defined TAP, TLP, and TC and PAP * added lspd.c,phe.c,tlp.c,tc.c,tap.c * added InitPAP, and all dependancies, for istdNIL stylesheet. * addded ANLV,OLST,SEP * ive completed the new set of PAP sprm handlers and support, this consists of wvGetSprmFromU16,wvEatSprm,wvApplySprmFromBucket,and a myriad of wvApplysprm* functions, with the exception of one or two old sprms that have no documentation, and the hugesprm, which ive left until i get an example of it. * added wvCopyCHP, & wvAddCHPXFromBucket, and most of CHP in sprm handling. * added wvApplysprmCMajority + wvApplysprmCMajority50, but i really don't like the look of them, im very unsure as to whether or not they are right. * finished CHP in sprm code * confirmed correct para style basics, started into char style code. * complex merged CHPX done, only found one trivial example so far, so uncertain as to if it works. * modified wvEatSprm to ret the len. * modified wvEatSprm to handle the three special len cases in it as well. * got wvReleaseSTSH to release its grupe's and sub components as well. * temporarily nailed new stylesheet struct in as part of the old one, so that i can experiment with the new one in conjunction with the old one. Changes to to 0.5.15 * made yet more changes to the configure script, maybe itll all be in the right order now (hah i doubt it!) * added wvWideStrToMB,wvGetFontnameFromCode * added small patch from Barry D Benowitz <b.benowitz@telesciences.com> who noted an uninitialized pointer. * fixed a bug where a $ showing up in a title would shaft the whole thing. * fixed the default value for the html font string, unlikely to have ever been noticed. * a parser.lex and man page fix from garyjohn@spk.hp.com * removed references to the ffn struct, and replace with the appropiate FFN ones. * added fld.c, wvGetFLD, wvGetFLD_PLCF, wvWarning, wvFree. * added wvGetDOP, wvGetDTTM , wvCreateDTTM,wvGetCOPTS,wvGetDOPTYPOGRAPHY, wvGetDOGRID, wvGetASUMYI & dttm.c. * modified dop.c with new interface. * added wvGetSTTBF, wvGetBKF_PLCF,wvGetBKF, bkf.c, sttbf.c * added xst.c,fspa.c. Modified wvWhichTableStream, added wvGetFSPA, wvGetFSPA_PLCF wvGetXst,wvFreeXst. * correct STTBF handling, and sorted out decode_bookmarks ala new form. * added lex problems to the install file/faq. * added lfo.c, lst.c, lvl.c,wvGetLSTF,wvGetLSTF_PLCF,wvGetLVLF,wvGetLVL, wvReleaseLVL wvGetLST,wvReleaseLST,wvGetLFO,wvGetLFO_PLF,wvGetLFOLVL, wvGetLFO_records & wvReleaseLFO_records. Which are all to do with parsing lists, which is possibly the second most complex part of word documents to understand. (the first being fastsaved of course). * added wvSearchLST, began converting list code over to new cleaner "by the spec" code. * wvGetListInfo will probably be the workhorse function which will sort out lists given a correct pap. * added the slightly silly ordinal.c file along with nfc.c. * changed references to mswordview.h to wv.h, to get the changeover moving. * ok, i can currently get a lot of the simple list stuff correct the new way. * most of the list string is now done, as is the nfc and starting position. * added a another entry to the list stuff, to keep track of the current no for the list entry, would work for at least simple lists. * figured out how to correlate the appropiate lfolvl with the correct lfo. * i now use the linked character and paragraph properties linked to the list text. * the new list code is now integrated into the code, but it still is new and probably flaky. I'll do bug testing and so on and work that out in a short while. Changes to to 0.5.14 * i have to make changes to the configure script to link -lXpm in the correct place. * scream, i had to put back in part of the signal configure script, bear with be, why does *everything* work on my machine but nowhere else :-), Changes to to 0.5.13 * a mad person reports that it can be compiled under vms !, im awaiting patches. * changed doc version testing to the knowledge base article on the matter. * removed duplicate fib code from mswordview.c * added wvGetEmpty_PLCF,wvGetFRD,wvGetFRD_PLCF. * added wvGetFFN,wvGetFFN_STTBF,wvReleaseFFN_STTBF,wvGetFONTSIGNATURE & wvGetPANOSE. * removed the reinstall handlers from the configure script, that should sort out the configure problems on some systems, irix in point. Changes to to 0.5.12 * patch from Cliff Miller <cbm@research.bell-labs.com> to fix TTF_CFLAGS in configure and Makefile. * small bug with ending tables. Seeing as you cant place text tags like bold and italic between cell elements in html and expect them to do the right thing, you have to do a little dance where character properties are stopped and restarted for each character cell. I had forgotten to reenable the ordinary nontable mechanism immediately after the end of the table. Changes to to 0.5.11 * we now extract the document title and display it in the title field, using the default config. * add bold and italic element handling, you can change these html tags to you hearts content now. * I confirmed that $title works fine. * I ported over Somar Software's summaryinfo stream stuff, so now wvSummary can print the title and last saved date of an ole document according to the summaryinfo stream. * added bit shifting to awk script. * added warning for duplicate offset in script. * i have a spiffy logo. * added more stuff to the summary into thing, it might very well be complete, the previews of summary info are stored as a wmf file, so in conjunction with libwmf you can get all of this. * added a wv-incconfig and wv-libconfig and installed the appropiate include and lib files, so as to start making the process of using mswordview as a lib more possible. this still needs quite a bit of work. * allowed optional sections in element string, use [] for them. * worked font config into the main code. * bw wanted and got ... 1 $title fix 2 element support (bold&italic&font) 3 --configfile switch * fixed an amazingly stupid bug that crept in with the introduction of wvGetFIB. * noticed that new doc start code wasnt occuring in fastsaved files. * aaaaaagh!!!, i had forgotten to munge the wierd long offsets into their correct halved form, no wonder so much wierdness crept into fast saved files, its amazing how well it worked nonetheless, this should at the least make parsing fastsaved files with tables much shorter!. Changes to to 0.5.10 * added document header and footers to the config file. * addded pixels per twip to the config file. * allowed " as part of a string if escaped. * added code to use the beginning and ending tags. * allowed multiline strings in config file. * use the two twip values. Changes to to 0.5.9 * i never reran autoconf ! * added a patch i got ages ago and forgot to add dos/windows support for .exe extension to the configure thing * added some deep magic to blip handling. * addded check for wmf record sizes < 3 in libwmf. * fixed BSE record to eat empty space, and resync. * fixed Makefile.in in oledecod dir. * much purify related thingies found. * remove last bug to fix last buggy file of current run. Changes to to 0.5.8 * blip code changed, new one looks much better. * would you believe that i was always one out when decoding styles, great bullet proof code though :-), it kept on trucking and resynced itself with the data again for the most part, that bug must be in there for months at this stage ! * new blip code now in operation, appears to do at least the old blip codes functionality for 0x08 blips, how did i get 0x01 blips ? * made configure script get heroic when searching for components, checks for for includes and libs both below a --with-stuff dir, and also inside it as well. * finished 0x01, checked offsets. * had to add guessing code to figure out whether to use a delay_stream or not. * allow resized images (well let netscape do it) for 0x01 graphics. * tested wmf's with text with readonly font dir, no problem there. Changes to to 0.5.7 * fixed bug that causes crashes on tables. Changes to to 0.5.6 * variable handling, add a subst function that substitutes real things for variables in the config file. * updated my homepage, god i love the gimp. All i have to do to change the graphics on my page is to load a different set of text files to the scheme interpreter in the gimp and ta-da out pops my new pages, in the bad old days i'd have been at it for days. * have a mechanism to expand variables in place, only recogonized variable is patterndir, will have more later of course :-). * some magic dohickying to get the libz in /usr/lib to be tested before ending up with the possibly crap one that some systems stick in /usr/X11R6/lib. * do a for loop to install the graphics now, should sort out some people;s broken install scripts, gagh! * cleaned up config file with purify, all systems are go for first public release with basic config file support. * remembered to add ttf support to mswordview as well. * added support for variables in the lex code. * fixed zlib configure script again. Changes to to 0.5.5 * added in support for an external config file. The external file allows a start and end to a style to be user defined, i.e h1 for the start of a heading 1 style. Its possible to disable or enable handling of bold, italic and font size/face changes inside of a style, this is only started now, so its far from finished. Please *dont* use this file for the moment, im working on it. * this is an interim release to fix the configure script problem that i had, and to add to the documentation as to the libwmf stuff. Changes up to 0.5.4 * well now, ive been away for a while working on libwmf, which is now complete enough to use. download it from http://www.csn.ul.ie/~caolan/docs/libwmf.html, and install it and run mswordview's configure and compile and ta-da, mswordview can now handle wmf files. * added a fallback from a failure to find -lz to -lgz, a problem on SuSE linux im told. * found that old redhat's appear to have a libz in the X lib dir, that is old and crappy and doesnt link to my thing, didnt put in a word around, but mentioned it in the documentation. * created file with h1 to h9, verfied that the lex code and so on works together fine with mswordview. Changes up to 0.5.3 * begun adding all fields to structures, and marking them implemented or not. * strikethrough and underline for revision text * found the bounds of the comment in the main document, i put a name tags on them, and place comment begin and end graphics around them, at this stage remember that the -a option to remove comments exists, as even one comment in a doc can make the whole thing pretty unreadable :-), but the support is in there if you need it. * revisions are given underline for added text, and the strikethrough color for deleted text the same as word does it. * begin and end for deleted and added revision text is shown with graphic tags, added a -r --norevisions option to ignore that stuff. * names for revision text * put revisions authors names in yellow text. * i dont even *pretend* that im outputting good html btw, just working html under netscape. once everything is working i might go back through and work out correctly the dependancies between all the html outputting code, that'll be part of the overall cleanup im doing to make this modular enough to be used with abiword as a word97 importer. * time and date of the revisions are included as well. * think that ive completed revision text, but i need more tests before ill be sure. * in comments theres always a pagenum field that word itself doesnt show in comments, so ive stuck code in that disables this field if its at the beginning of a comment, also verified that comments work in fastsaved mode, though what is the story with that page number in annotations, hmm its bothering me somewhat. * titchy bug where i included the wrong end of comment graphic. * put square brackets around comment links, i believe this completes comment support. * titchy bug in the time field for revisions. * properties of text that change during a revision are listed as well. * found the location of what sets the footnote & endnote styles of numbering and other settings for endnotes and footnotes in the DOP, there were missing from the copy that www.wotsit.org has, ive sent them the added section. * extracted the DOP fully. * footnotes and endnotes now get the correct formatting of the numbers, i.e lettered, roman or arabic etc, damn missing page of the spec, i was searching for that for ages. * i have some old code that gives the correct starting point for endnotes and footnotes so im leaving it in for now, but i can now use the DOP instead for this info. * endnotes should now be put either at the end of the doc, or at the end of the section depending on what word does, needs testing. Changes up to 0.5.2 * implemented auto text color colour check for table cells, no more black on black, or black on blue. i must look closely at what other auto changes word makes, and where else i might have to put that code. * some uber-simple greyscaling code when table look says no-color. * verified it works under AIX, made a few changes that showed up due to its stricter malloc, theres probably a few more malloc related issues hiding in there. * column breaks show up as well now. * the various types of section breaks are distinguisable from the others, and from page breaks. * a few changes to make sure formatting and tables get on better together. * sequence field supported, i.e caption numbering, i just use the last fields that msword left in there. * changed hyperlinking so that it works with bookmarks that are in comments (annotations). * i now support multiple bookmarks that end on the same location. * multiple bookmarks that start on the same location should be supported, but no examples yet. * the comment author initials are extracted and used in the main document when referencing comments. * comments now end when they are supposed to, only the correct comments get included, should work for fastsave, not tested. * removed unused variables, sorted out a few other warnings, maybe itll squeak by the irix compiler now ? * names and initial info for comments is extracted as well, and stuck in a table at the end of the document. * fixed the <a name= for comments, should work in fast saved. * custom graphics for annotations. Changes up to 0.5.1 * forgot to change the version no in the source. * damn sunsite broke connection half way through uploading. Changes up to 0.5.0 * Martin Kalms <kalms@lysator.liu.se>, configure fix for sunos 4.1 in relation to strerror. * added option where you can ignore table widths. * custom graphics for comments. * endnote autonumbering now works, now defaults in roman numerals. * fast save footnote problem fixed, though i think things might be even more complex that i thought, so keep an eye on that area. * footnotes are in a colour of their own. * symbols as footnotes, required a change to the 4a30 sprm that might fix a few other char formatting issues. * restarting footnotes on each page, and each section works, this is encoded in the the number itself it appears, a href and a name, and some invalid html code fixed in the footnote area as well, footnotes are now in a colour of their own *but* the location of whatever sets the footnote & endnote styles of numbering is unknown, i havent figured it out. * all endnotes are listed at the end of the section rather than optionally at the end of the document, i dont know how this is done, doesnt appear documented. * textmarks / bookmarks and explicit hyperlinking supported, bugs in old code removed hopefully and internal hyperlinks put in via insert hyperlink are supported. * support for bookmarks, i.e they are converted to <a name>[text]</a> html code. * converted cross-referenced textmarks/bookmarks into hyperlinks. * wmf files can now be decompressed thanks to peter.brandstrom@ericsson.com now i need a wmf --> something useful converter. i see that theres a new one available off the gimp plugin page, with some uberhacking it might do the trick, the notes/wmf dir has a goodly chunk of info on the format if anyone wants to do it for me. * when bookmarks are embedded in bookmarks something odd appears to occur, but nonetheless the ms save as html does the same, so im assuming that its ok * added bookmark support to fastsaved, should work fine, not tested. * pagebreak gifs are correctly centered if the next para is a centered etc one. * author field supported. * proper positioning of page numbers, general layout of headers appears to be fine, except that tab stops are used in headers to center, left and right align headers, which doesnt work so well in html mode. * added defensive code to some sort of list bug. * mimic strike-through and double st by setting the text color to either #ed32ff or #ff7332 * disallow height commands inside tables, as the model of paragraph heights doesnt fit well with the architecture for tables, so im ignoring them in tables, hopefully noone will notice :-) * fixed a small bug in sprm which was causing errors later in lists. * tables and paragraph formatting were misaligned across td boundries. so now i clear specials and fonts on entry to a table, and on exit of each cell, hopefully i broke nothing else on doing so. * at least one really bad conversion with a file called RESUME.doc, but in my defence i looked at the msword conversion of this to html, and its just as buggered up so rasp ;-P * added credits file * found problem in decompress code, i didnt make it good enough for real world usage, i now use mmapping so make my life easier, dont know if this is fully portable, works on linux and solaris. * oledecod had bugs on cleanup, so sent filters group wmf.doc and Contribu.doc to demo the problems. * i now use oledecod 0.0.4 which fixes cleanup problems, but Contribu.doc style problems continue, they return 5 but laola can extract the streams nonetheless while oledecode cannot, i modified the original laolareplace.c to handle this as well. * oledecod 0.0.4 has a bug in relation to 1812bb.doc, laolareplace.old.c hasnt this bug, so im back to using that again. * those ffffffff's in lists that haunted me in earlier releases are *back* grrrrr!!, anyway ive another massive nasty workaround that im using that hasnt crashed any docs, and appears to do the right thing, at least in propos~s.doc * wmf decompression code changed to use mmap, replaces the original code that ate memory, if mmapping doesnt work try looking at the zlib docs and change the code to fixed buffer incremental decompression. * added a bailout to ignore encrypted documents, wonder how id decrypt them if i had the correct password, anyone know ? * added a bug fix for crossreference parsing. * beginnings of tables of contents included, doesnt always work yet. * bug where if the word file ends on a table, the table wasnt closed off is fixed. * bug where non built in graphic types were causing hangs. * im now often happily (if slowly) converting 90 and 100 page documents, the only thing i really am unhappy with is table handling, which is also one of the reasons the conversion is *soooo* slow sometimes, the other reason is those godforsaken fastsaved files. * fixed some other mem related bugs, converted sucessfully the last two problem docs without crashes. * table looks are somewhat supported, though theres no support for last row and last column different from the rest of the cells as of yet, this will have to wait until multi pass on tables is implemented. * the foregrounds and character attributes in general for tables appear to always set correctly in general, but i believe i have to look into how the "auto" text color selects is final colour, as ive been assuming that it gets set to black, which is a fairly valid assumption most of the time, but not always, so a few docs will have black text on black backgrounds in table cells, but the situation is much improved. * ran purify over mswordview, removed a load of dodgy code out of it, theres still a bug or two hiding in the list code, which i belive is the reason that lists are sometimes missing in complex documents, e.g meeting.doc i think i love purify, its the bees knees. * dib's are now extracted as well, though i dont do anything with them yet, this fixes yet more crashes. * fixed laolareplace.old.c, which is the version im going to use for this release, to work on 64bit platforms, a few longs had crept into the code there which shagged the whole thing up. I havent done extensive tests on 64bit yet, but im confident that itll work. * fixed defines to make it work if theres no zlib present. * no crashes after running mswordview on 300 megs of uploaded files. * good enough to upload to sunsite, version number reflects this. changes up to 0.4.9 --This is an interim release while im in scotland until later this november-- added features are that the gateway is included, endnotes are supported, pagebreaks that split tables are supported and some more bugs are fixed, especially in relation to graphics. * added -o - option to gateway, like i should have about 4 releases ago. * fixed graphics again, forgot to reset the extra amount that some have before the graphic data begins, means more jpgs and pngs should work. * endnote text done in simple saved * cleaned up beginning whitespace from footnotes/endnotes/comments. * endnotes in complex mode is in, needs testing. * changed url code to match the other field code, fixes a big bug there. * header and footer colours were wrong again, fixed. * indent drift is fixed again, moved do_indent into decode_?_specials * pagebreaks can occur in the middle of a table, this sort of confusion is fixed for full saved files, and is probably fixed for fastsaved files * pagebreaks now look like they occur after footers,footnotes and endnotes. * custom graphics replace <hr>'s as there were too many of them at the bottom of a page to figure out what was what. * custom graphics for footnotes, and comments changes up to 0.4.8 * this has a slew of bug fixes related to graphics and a new option to put images in a certain directory * fixed f006 code in blip handling, removing a slew of hangs. * ignore every graphic that isnt an understood type, removes hangs. * figured out when theres an extra 16 bytes to delete from the beginning of a blit, and where one of my magical 17s were coming from * got a bug fix off Harry Shamansky (shamansky@adinc.com) as to why the default make wouldnt work under irix. * the current spid handling was mismatching spids and the graphics involved. * i cant handle forms, or ole data, so ive added a check to avoid doing them, removes crashes. * also ive added some other code to watch out for unsupported graphic features. * msword can include wmf and emf files, these are stored in compressed form, using lz encoding in a fashion supposedly compatable with the zlib library, but i havent been able to decompress them yet and even if i could i dont know of any source to convert wmf/emf files to anything usable under linux * ive changed blip handling, so that it works better, well i believe its more crash resitant, but im still not 100% happy with 0x01 handling. * if you insert a bmp via insert->picture->from file, it appears to be converted to png for you, handy. * paragraph indentation is back in, lists and table were confusing the indentation code. * fixed titchy bug so that space at beginning of lists isnt underlined. * support paragraphs whose first lines indentation is greater that the rest of it * support vertical space between paragraphs. * sorted out end_para for the first paragraph found in complex mode, i think i have it right now, in passing i reckon a load of those pap searches in complex mode are unneeded, but i dont want to rock a working boat, if it aint broke dont fix it as an uncle of mine used say, though we did seem to spend an awful amount of time panically fixing things that broke dramaticlly after years of neglect. * finally settled on dirs for left indentation, blockquotes indent from both sides automatically * added an option to put graphics in a specified dir. * added an option to find the graphics at a specified url. * updated man page. * made another change to blip handling, fixes some problems. changes up to 0.4.7 * warning !, in this release mswordview no longer outputs by default to the screen. use -o - for this behaviour. This is an interim release to reassure people that im still working on it, its got quite a few new features and bug fixes since 0.4.4 read down for them all. * implemented tabbing with trans gif, optionally use hardspaces or dont do it at all. * added some support for borders such that the vertical space between paragraphs due to width of borders is retained through the use of vertical trans gif space. changes up to 0.4.6 * indentation of paragraphs dithered to <blockquote>'s is out again as it its doing strange things on long complicated documents. * table cell shading done, fully supported i believe. * drew all the available table patterns in all available colors, made small transparent gifs out of them, if someone wants to do better copies of the ms ones go ahead, use the convert.sh script in the patterns dir to generate pics in all necessary colors. * text color support is in * word underline, which iswhere whitespace isnt underlined is supported. * courier as an alternative to courier new, times alternative to times new roman font face, helvetica as an alternative for everything else. * all caps supported, Small caps supported, though i want full tests of those two babies in all modes. Similiar to the fontfaces these two babies are only supported in ascii languages, as i dont really know how to convert utf-8 unicode into upper case ! * text animations supported by converting them to blink :-) features-examples dir added, supported-font-features.doc has what i believe is all the font features that word supports demonstrated in it. id be happy to have omissions noted, mswordview now supports 1) font size 2) colored text, (in headers and footers as well) 3) font face in ascii based languages 4) underline, including word underline, where whitespace is nt underlined 5) super and sub script 6) All caps and small caps (ascii based languages only) 7) text animations dithered to blink tag mswordview doesnt support due to html limitations (at least i dont think i can do them) strikethrough,double strikethrough,shadowed and outlined text, embossed or engraved text. "hidden text" is shown, coz i dont know the purpose of it yet all caps, small caps and font face for non ascii languages. character spacing * centralized pap initialization code * fixed a crash causing blip bug * fixed a crash due to sep sprms showing up in a papx !!, i ignored them im sure that will bite me hard in the future, but ive documented it here so i wont forget. - Problem: now we have a problem with paragraph properties which is only making a difference now that i want to use the paragraph justification codes. there exist pieces which have fc's greater than the maximum one listed in the plcfbtePapx !, ive been pushing them around for the last 2 days to no avail, im beginning to think that maybe this means that they have no native formatting of their own, the catch is to find the paragaph that they belong to, the spec says to find that by taking the smallest fc in fkp tables that is bigger than the current fc, but there *is none* thats bigger. my thought is to remember if this piece is the beginning of a paragraph mark and if not inherit the previous piece's formatting, and keep going backward until we get one. If it is then either im supposed to default to a new one or go forward to find one. + Solution: Ah-ha i believe i have it, + firstly varient 1 gpprls have to be supported, and i had some offsetting in them wrong + secondly i had a very subtle bug where i changed the value of the avalrgfc, from when i didnt know why sometimes they were +400000000, of course i now use it to determine if the end of the piece if twice the distance of its reported character len of not, and with the val reset i ocassionally had the piece recorded as being too long, so the paragraph properties of the wrong paragraph were being used. * added is paragraph formatting information, supported well is 1) centering, center 2) right justification , div align=right * made a closing paragraph thing like the closing chp for the blurb at the bottom to avoid having the version info centered of justified. * 0x01 fSpec graphics are now supported in addition to 0x08 graphics while both of these are draw objects, only non-vector graphics are supported, and only partial support of those i.e png and jpg. as with the 0x08 graphics theres a lot of magic emperically derived offsets being used to put it together, so dont be too surprised at getting corrupt images. though i *have* fixed a bug in png handling i believe for 0x08 graphic which was the previous subset i supported. changes up to 0.4.5 * i now open graphic and doc files in binary mode to support platforms where this makes a difference. * replaced laola, perl no longer required, thanks to the mighty Andrew Scriven who replaced the OLE functionality i needed with C * got a bug fix off above to handle files with more blocks * optional support for fontface if the text if an ascii based one, i.e if were guaranteed that this is a western european language then we do font faces, fastsaves will probably confuse this test and mean we wont get faces even when we can handle them correctly. * changed indent method for outline lists to multiple hard spaces, rather than <dir>'s, in the future ill make an optional proper html conversion, but it wont look like the original, so its a TO-DO. * indentation of paragraphs dithered to <blockquote>'s is in, alpha support. * absolute width and height of tables is in as well. * i now default to outputting to a file whose name is the same as the input file, with .html appended. graphics are output to the files with the same prefix as the .html file. use -o - to output to stdio * new ole code was broken on a few files ( 1 :-) ), fixed this. changes up to 0.4.4 * a good few bug reports in, crashes and what not, i got the use of purify on a sun box (thanks to martin mellody et al) and sorted out *all* the uninitilized mem reads there, (3000 of them in the course of a typical conversion!!), it still leaks memory like a sieve but thats not important for mswordview, though i will sort that out. purify is a wonderful piece of work i have to say. * changed ffffffff handling for lists, i think it means that the list in question isnt actually there, so to skip it. * changed blockquotes to dir, looks neater and word itself does it, biggest software company in the world cant be wrong, can it ? :-) changes up to 0.4.3 * oops, i shafted the inclusion of getopt for systems that need it. changes up to 0.4.2 * fixed broken simple mode footnotes (doh!) * fixed bug in blip where having drawings where none of them was a picture caused a crash changes up to 0.4.1 * did some tweaking to remove a crash. changes up to 0.4.0 * and big breaking news, preliminary graphic support is now in!! yes, gifs/pngs/jpgs added to a document through the insert->picture->from file mechanism now convert correctly. They are stored in the office draw format which ive just cracked the rough layout of. (through the handy ms spec on the msdn site), graphic support is messy for now, as the files are generated in the cwd of mswordview and named graphic*mswv.*, ill tidy it up later, this news is too good to not get an announcement. changes up to 0.3.0 * added -m --mainonly option if you dont want headers and footers. * added a few more places to look for lls-mswordview search order is now 1 in the path. 2 the same dir as lls was run from if ran absolutely. 3 the current dir. 4 a dir called laola off the absolute path. 5 a dir called laola off the current dir. but stuff line ../../mswordview isnt in there though, coz folk should just put lls-mswordview into their path dammit! * diffent numbering formats for pagenumbering is in, a vs i vs 1 etc. * gpprls for sep's work now, complex sections are in. * found some strange code in clx_headers and clx_footers so i blew it away. * section support in for simple saved files. * sections that restart pagenumbering work now. * sections that have no footers/headers at the beginning work now. * complex support for sections is in as well, should work hopefully needs extensive testing. * TO-DO text color, eventually font faces, but no sleep lost on that i have to say. * TO-DO shaded cells in a table, think up a better table handling method. * i now stick a space into an empty cell so that it shows up. * another U8 wraparound bug removed. * i now use the piecetable for simple docs, so as to skip over sections that arent to be processed, i.e the simple format is just as complex as the complex format :-), i think ive done this right and it wont break anything, ill have to wait and see though. * changed slightly the portions of a field that dont get printed, to make some html ones work, hope i havent shafted anything else. * hmm, really need to cleanup character handling, unicode & special reserved ms symbols and so on, im just plinking at them for the moment. * aghh, found another U8 overflow, what possessed me to put them in in the first place ?, i should have guessed that there would be hundreds of pieces in a file. * received report that it compiles and runs with Sparc solaris 2.5.1 - sparcworks compiler & Intel x86 solaris 2.5.1 - gcc compiler * added patch from diakka <diakka@staff.sinanet.com> to run create_bins on a make rather than make install changes up to 0.2.2 * compiled it on a solaris account i got, and its fine, got confirmation that it works from Will Renkel <renkel@cig.mot.com> * changed fastsaved chpnextfc check to be >= rather that >, hope that i dont break anything cox of it. * foolish error, U8 used for number of pieces, extended to U16 * changed embedded link handling to not end character properties in the middle of a URL ! * changed embedded link handling so as to *not* place "" around urls, as sometimes they are there already, and not having them doesnt hurt, though it offends my sense as to how they should be done. * would you *believe* these ms guys, now they are hitting me with file offsets that are past the end of the file !!, so now i have to watch out for that, the complex format is *such* a collection of hacks, ah-ha ive just checked in word, this file crashes word :-) so this is the first reported case of mswordview being better than msword, though i have to say that in recovery mode word pulled loads of text out of it that i didnt get, :-(, still its a corrupt file so doing anything at all is a success. * i forgot to reset the higher list levels when changing a lower one, fixed now, i think ive it right. * added a define of SA_RESTART to 0 if it isnt there. bash does it so i should get away with it, sunos seems to need it. * added a little patch from Zachariah Baum <zack@studioarchetype.com>, that should help get around folk who run mswordview absolutely and dont stick lls-mswordview in their path, ie make and then dont make install. * fixed yet more bugs, for some reason i thought that the order of evaluation was from right to left !!!! i.e i was doing if ((*p == 'a') && (p!=NULL)) doh! * changed web interface so that utf-8 is always on. * font characteristics turn off when going into tables now. and turn back on when inside, gets rid of some off look and feel. * checked out corel's wordperfect import functionality with office 97 files, conversion isnt as good as mswordview i think. missing header numbers, and one or two didnt convert at all. though of course corel retains layout which mswordview cant do with html, and does shading, ill check pictures at some stage. * have a report that suns pcfileviewer similarly covers about 50% of mswordview's functionality and vice versa. * gzipped uploaded word file collection has just hit 120megs :-) * i now look at this section table so i know whether its a section break or page break. If its a section break, then the header/footers revert to the beginning again. TO-DO, add an space to empty cells to make them look reasonable in netscape. TO-DO check page numbering with sections. TO-DO, do endnotes, should be easy. make new pic to replace hr lines, theres too many hrs now at the bottom of a page to make sense to anyone anymore. if theres no footers, then dont do the lines. TO-DO, continue with the sent files since 0.1.0, and the rest of them. changes up to 0.2.1 * removed bug that caused lists to drift further and further right. 1. checked out the blockquote indention for lists, doesnt appear to be right for srom*.doc, fixed now took closer look at font scanning in decode_letter, in particular special chars, the < 39 wasnt precise enough, being in a wingding/symbol font seems to make you automatically a special char. 2. something not fully right with lists that take their text as special chars (i.e sectionnumber), not done by ms in an obvious fashion. edit doc down to just the 2 headers and then see what happens. 3 AHA!!!, 1 and 2 are wrong, as was previous ideas to ignore lists that appear to have nothing in them, they are there to artifically bump lists up to a different starting number without requiring a seperate list definition for each one, ms shoves in dummy elements to get the list up to the right number, the section id just before one of them threw me entirely, i thought the section number should have been the text of the list. ive got it now! * 3 above is *rubbish*, thats not it at all, i was right originally, ignore those 0 len lists, and the problem was with my list restarting mechanism which didnt work if there was more that 1 list between list section that had to continue numbering. * numerical outline list sublevels will retain the prefix of the above levels, this required a change of the number figuring out code, its now rather heavy of silliness, but it works, i dont love it and im sure lists will be back to get me again at some stage, but outline lists now work, in particular the 1 1.1 1.1.1 style. * TO-DO sections, srom*.doc has them, check them out. * TO-DO change web interface so that the utf-8 can kick in if needs be. * fixed bug where the new piecetable check in simple saved files fell apart after hitting a footer. (tempcp = tempcp, rather than realcp=tempcp, doh!) changes up to 0.2.0 * well arse again, ive revised my ideas as to what consititutes the end of a piece, rather than the beginning the the next piece as i was doing, i now believe thats its the beginning of the piece + the twiddled cp len. makes more sense, and removes crashes from the latest doc i was given. * distinguishs between odd & even page footers. * TO-DO odd & even headers * added the tm symbol as a special case, theres quite a large range of unicode that ms is using that is part of the customizable section, i.e theres loads of glyphs that ms can use that are not part of the standard unicode set, the tm appears to be one of hundreds. eventuallly ill have to get a table of them. * woweee, is ms an evil designer of data formats, they have two types of simple saved docs i thought, those in 8 bit (basically ascii) and those in 16bit (unicode), hah bloody hah, ive been given one which is a mixture of both, and i have to use the damn piecetable to shove it together. and its not as if the document shifted into a different language of anything. if this was fastsaved id not blink an eye, but simple saved, come *on*, why bother calling it simple saved. so i have to keep an eye on the piecetable to determine what exact offset to use after all. * added a huge bit filthy hack in for more list twiddlings, the previously mentioned unknown 4 byte sequence now rears its head as an optional 8 byte sequence !!, but always ffffffff, it might be some kind of flag or summat. anyhow i now chew up any 4 bytes consisting of this if they show up in the place that they might appear, this removes a large crash that occurs otherewise, as all the counters get thrown off course by them. changes up to 0.1.1 * added Makefile patch from Pavel.Roskin@ecsoft.co.uk (says it works on hpux) * well the good news is that the unicode utf-8 is working for taiwanese and im sure other languages, the bad news is that everyones telling me that noone in their language group is actually using unicode :-) so i suppose i require a huge unicode --> JIS/EUC/KSC/Big5/GB converter. :-) * rudimentary support for annotations, i havent too many examples of these but i think they'll work fairly well. * rudimentary support for all special ascii codes for time,page no etc. p.s by rudimentary support i mean that if asked for e.g the current date in a particular format i output the date, maybe in the correct format maybe not. i.e the meaning is the same, though the look might be different. * added a supported sprm, that changes chp information totally to the chp of a different style. * added support for custom footnotes, had to do a bit of a hack to get the <a name> stuff right, hopefully it'll always work, even if it doesn't itll still be readable. * twiddled the char formatting dependancies about again, really ill have to redesign that a bit. * broke the mswordview.c file down a bit into other files. changes up to 0.1.0 * hell ive enough done to warrent a new numbering system. so from now on x.y.z x is a stable bug free (hah) release. folk packaging for commercial unices probably should wait for these releases (none yet, i know) y is a new feature or enough bugs fixed that you better use this version if you want to keep up with the jones. z is some small bug or change that is small enough that i wont upload it to sunsite et al automatically, itll be mostly for me. * added a defaultfont size option, so that if you think the output is too big or small, you can skrink or enlarge it. * added a horizontal padding option, you have the option of 3 different ways to handle a run of multiple line breaks, though the default is probably the best. * tweaked char formatting system, TO-DO overhaul all of that, theres quite a few dependancies between the tags thats becoming a little to difficult to do by hand, a little stack is called for methinks. * added some support for a type of holdover list format found in docs converted to word8 from older versions. works on the one i have so far though theres more testing to be done with it. missing bullets and incorrect numbering may be related to this. pass them on to me. * battered LFO's into submission, this time they'll stay down (i hope). found a 4 byte field that i cant figure out where it came from. *shrug* wouldnt be the first time that happened though. * changed footer and header handling, i now take notice if the first pages headers and footers are different that all the others. i still dont get section breaks, which i think impact on this, i dont have any examples of this to work against. Theres a discrepency between header/footer documentation and what i see before me in the hex, maybe im missing something. * ok theres some difficulty with tables, ive implemented this baby as a one pass parser, later ill have to add multipass (or backpatch) to figure out the number of pages so as to get that field right, but with ms tables you can start off with 2 cols then go to e.g 4 in the same table, you dont know in advance how many rows and cols there are in maxiumum, or which ones span which, which is a pain in the butt, really as far as word is concerned each row is a table into itself, so ive done it this way - each table has the cols of the first row counted and the widths figured out in % of the page width, if a subsequent row has a different number of rows or different widths than the previous row a new table will be begun. the % width will cause netscape to line them up correctly. itll do for now. not perfect i know but hey what is. Itll do the job for the primary task which is making word readable as close to the original layout as possible within html. - to get the tap that tells me all the above we have to scan forward until we find a rowend char, and get the pap of that to get the tap. and with fastsaved theres the usual complexity - The problem will be that netscape and other browsers dont take the width% as their primary factor in determing the actual width of a cell, if the text in it cannot be broken on a space then the cell is expanded to fit, breaking the lineing up. Im considering a somewhat more sophisticated (and questionable) technique where i stick the tables together using dithering of the cells to a (max 64 cell (msdefined)) cell grid. using colspan and so on to do it. * TO-DO theres something called a header text box that i have to figure out and some companion of it for the main doc. i have to implement something to handle these beasts. * TO-DO more testing for bugs and stuff. * TO-DO code overhaul to simplify it. * TO-DO support all fields, ive some supporte page no, date and time. but not perfectly in the same format that word has them in. * TO-DO,figure out how to extract ole embedded msoffice draw and equation editors data, and see if i can get them converted as well. * TO-DO provide alternative outputs, tex/rtf and friends. ive a load of formatting information that i think i can get into those formats. * TO-DO provide basic formatting for html, i.e centering. * TO-DO think about writing word docs :-), now that would be a hunk of work. so to all you asking me about it i recommend you dont even bother with it, just write rtf files and get on with it, thats even what ms did for word 8, saving as word 6/95 just creates a rtf file, if its good enough for them, its good enough for us. * TO-THINK-ABOUT i dont keep very much information in memory really, i just work out what i need for any given instant and drag it out of the file, and then dump it often to only get it again in a few seconds. this leads to an impressive amounting of seeking back and forth across the streams. theres a groove burnt in my hd where im working, its not really optimum behaviour, (works though :-) ) * NEED_HELP-ON, can this compile and work under sgi ?, have success reports from linux, solaris,hpux,aix,freebsd and one failure to compile under sgi, ive one message that it compiles under os/2, though it needs some work to do that. changes up to 0.0.27 * know how to do the right thing with embedded sprm list gets rid of a few wild bugs. * found the list documentation after all, maybe i forgot to download it the last time (doh!), or it wasnt there when i downloaded it. so i removed all of my rather good but unnecessary hex determined code. * added a special case for "*" in lists, make it a bullet point instead, seems to be the right thing to do (?) * changed laola commands name to append -mswordview to avoid overwriting newer lls commands etc. * changed the INC in perl files to reflect final install dir. * TO-WORRY-ABOUT, quite a few ??'s displayed in netscape when dealing with those utf-8 docs, dont know if thats my lack of correct fonts, or a great big dirty bug. also ive a few special cases in the decode_letter to translate letters into what *i* think they should be, its rather questionable and very emperically based. * added some hook code to protect lists from pagebreaks. in doing so i notice that my complex code is a wee bit confused, but it works, so im leaving it alone for now, the added code doesnt make for reability but hey, neither does any of the rest of the code :-) * fiddled list interpertation so that ilfo isnt looked at until the last pap and chp sprms have changed it. fixes difficulties in fast saved files. * TO-DO (list stuff) LFO override not implemented correctly may cause crashes. this is surely the last major list related thing to do. restarts are probably incorrect as are a few other minor list related bits and pieces changes up to 0.0.26 * changed laola lib to a subdir of mswordview and changed laola program names to custom mswordview ones, to avoid clashing with newer versions or original version of laola, as ive doctored things slightly for my own needs. * applied Martin Schultze patch to add lib path to perl include path, though i twiddled it to make a nice tree in my lib. * lists start on the correct number (well ones that are simple numerals do anyway). * understand list continuing and restarting now. * added a defensive patch from Peter Silva <Peter.Silva@ec.gc.ca> * lists now get the char formatting that they should get. * yes!, sorted lists out, have bulleted lists, arabic & roman numerals, lowercase and uppercase lettering systems done. multilevel also works i believe, works on all examples i have anyway * fixed bug that made mswordview fail on files without an extension * TO-DO look at list indentation, if they are true multilevel then i blockquote them (for now), but if they have a set indentation value then like all the other layout constructs i dont preserve this into html. * TO-DO fields, table of contents should be easier with lists done. * TO-DO find out if my unicode (utf-8) support actually works for anyone except me. What fonts do various people need, this is a general netscape question. * middleterm TO-DO, reorganize tags to external data files, to make extensible to other formats, i.e raw ascii, an attempt at latex, rtf. changes up to 0.0.25 * changed list handling slightly, removes a bug where you get too many list levels inserted * i believe that most lists will now be handled correctly as to whether they are numbers or not. I have isolated the undocumented section and have a handle on the situation so its just a matter to comparing theory with practice again. * removed bug where header pap gets used in the main document following a header * finished checking all uploaded files beginning with a, yipee. now theres quite a few elements not addressed yet in those files, but i understand whats involved, in short, section support, proper list support, justification support (centering anyway) decoding of the DATE and TIME fields, would you believe that the TIME field can encode the DATE, despite the fact that theres a DATE field whos job this is !, gagh what can you do with people who do this to you. but anyhow the uploaded all convert without crash, all text is in the right place, and in the right language ( i think :-) ). all bold,italic,font sizes, underline, manual page breaks, the content of footnotes,footers and headers is all shown, albeit not always the way they appear in word, yeah we're getting there. * changed utf conversion code as the original code i was using wasnt quite gpl compatable, anyhow new code is better designed for my needs. * TO-DO, grr!! is someone reading this log, as after my weeks holidays i note thats theres a huge amout of files beginning with a to go through again, i never did make it to b. changes up to 0.0.24 * fixed NULL complex pap bug. * supports underline tag now as well :-) * footnotes supported, all the ones referenced before a pagebreak get listed at the manual pagebreaks and document end . (thats a <hr> in my current output, splitting word docs into different files is a challenge id rather not accept for now as itd just be guesswork and mess), not checked in fastsave yet though. * TO-DO support sections, so as to know what pages get headers and which dont, etc. * TO-DO proper table of contents, the text is now listed but theres no link between the table of contents and the text it purports to describe, for the moment. * TO-DO differenciate between different types of underline i.e word for word etc * EVENTUALLY-TO-DO, i have come across one case where a symbol used in a footnote isnt working !, if i create one of my own it works fine, but when i alter the given one it still occurs, strange. changes up to 0.0.23 * verified it works on linux, aix and solaris. * fixed a very silly overflow byte vs int bug. * overhauled unicode conversion, fixed my sprm size detection. * changed table handling so that tables dont end prematurely. * fixed img insertion dummying of wingding font support. * massively changed my paragraph end detection for complex files, i had the idea all wrong, but close enough that it worked on fairly uniformly formatted files. * works with all uploaded files beginning with A and a theres soooo many to go through :-), im looking forward to getting to b soon. * TO-DO, continue checking against uploaded files, verify header and footer support, start on list information (dum de dum dum dummmm) changes up to 0.0.22 * check for errno * fix list related crash bug, found by Wayne Roberts <milcom@netcom.com> * TO-DO, go through the 50 megs of uploaded word files and see do the convert fairly correctly :-) lists need to be done better. i need to confirm language conversion. and check out table of contents field. changes up to 0.0.21 * for simple format i now decode to utf-8, when appropiate. on viewing many docs with windows netscape 4 it works fine, i dont have the X fonts to do half of the languages under my own X, but hopefully those in the various language blocks can figure out fonts for themselves ? * complex format non-west-european docs might still be shagged, id love to hear from an asian language group as to whether or not the utf8 works for them * some bug fixes by Pavel Machek <pavel@Elf.ucw.cz> changes up to 0.0.20 * headers are fairly correct now, the spec and me are confused as to headers and footers though, so while i *can* do headers and footers, it might require a bit of fine tuning, so i need docs with all sorts of header and footer types in them until im sure im right , but its close enough. * docs with subdocs in them should return the output of the main doc now. *to do, from the veritable deluge of documents in languages i cant read :-), id better handle the non-standard, well non standard to me anyway ! russian and one or two others that i hope fall out in the process, asian would be wonderful. changes up to 0.0.19 * header support added to complex format * wingding font hack added like symbol font * headers are still not right, footers and headers are all appearing at the top of the document, ive more work to do on that next. * ive shagged up the parsing of lls output, so docs with ole inside ole will not work even though theres no good reason they dont, bear with me on this * mswordview.wrapper added to allow inline viewing of word docs. changes up to 0.0.18 * new option to not change msword headings to html headings to support those dodgy people who dont use them correctly. * fixed what looks like a specialized case for recognizing tables * fixed the lack of - sign. * have a new group of files that convert correctly. * these are minor changes, ill add header handling to complex format tomorrow changes up to 0.0.17 * lack of getopt.h on some systems taken into account now. * sub and super scripting now in for simple format. * laola.pl changed to continue even if it thinks the file is the wrong length. * added option to not attempt to dummy up formatting done with whitespace. * using gifs for symbols, this will do for html output, for other output in the future we'll have to organize something a little more sohpisticated * i have some alpha support for headers in at the moment, if you have headers you "might" see them in russet text.