[prev in list] [next in list] [prev in thread] [next in thread] 

List:       mhonarc
Subject:    MS Word Filter
From:       Frank Ronny Larsen <frankrl () nhn ! no>
Date:       2000-07-26 11:29:17
[Download RAW message or body]

Hi all.

At work we receive lots of mail from windows using employees that send
documents of various kinds as MS Word documents. This is annoying for
those of us using Linux/BSD and also somewhat annoying in the mail
archives. Therefore I created this little filter utilizing wvHtml from the
wv package.

The filter converts Word documents using wvHtml and includes it directly
into the e-mail copy in somewhat the same way as the Html filter does. It
also creates a link to the original Word doc so that readers of the
archive can download the .doc as well.

Requirements for the filter are:
 - MHonarc (duh)
 - Perl (duh2, since MHonarc does so already..)
 - wvHtml (URL:http://www.wvWare.com/)

Images in the wordfile does not work. This is because the version of wv
that I have, doesn't support wmf->image conversion. (I couldn't get the
required libs to compile properly). If someone wants to add that, feel
free to do so.

Using it in an archive require you to add the following to the archive's
rcfile:
<MIMEFilters>
application/msword; m2h_application_msword::filter; /path_to_filter/mha_msword.p
l
</MIMEFilters>

Hope this may be useful to others. It has been very useful to me. :)

-- 
Frank Ronny Larsen
Nordnorsk Helsenett


["mha_msword.pl" (TEXT/PLAIN)]

#!/usr/bin/perl
#
# Converting MSWord to HTML for use with MHonArc.
# Written by Frank Ronny Larsen June 2000
#
# Uses wvHtml from the wv package.
# Supports: Converting MSWord to HTML
#           Downloading of original MSWORD
#
# TODO?: Images in Word-doc. Currently the version of wvHtml that I use 
#        do not convert the images, due to lack of libraries. Therefore 
#        this is not implemented here either.
#

package m2h_application_msword;
require 'mhmimetypes.pl';

$wvHtml = "/usr/local/bin/wvHtml -c iso-8859-15";

sub filter {
  local($header, *fields, *data, $decoded, $args) = @_;
  my $txt = "";

  # Require MHonArc to decode the data. 
  if(!$decoded) { 
    return ("<b>MHonArc did not manage to decode the MSWord data.</b>
Probably weird encoding of the e-mail transmission."); 
  };

  ## Get content-type
  my ($ctype) = split ';', $fields{'content-type'};
  $ctype =~ tr/A-Z/a-z/;

  # Write file so users can download the Worddoc itself.
  my $filename = mhonarc::write_attachment( $ctype, \$data );

  # Run wvHtml on the .doc file.
  open C, "$wvHtml $mhonarc::OUTDIR/$filename |";
  @Html = <C>;
  close C;

  # Strip HTML header. (Maybe use MHonArcs mh_text_html code for this?)
  $txt = join ' ', @Html;
  $txt =~ s|\n| |g;
  $txt =~ s|^.*<body.*?>(.*)</body>.*$|$1|i;

  # Add a link to the Word-file, so people may download it.
  $txt = "<p>
<a href=\"$filename\">$filename</a>
<hr>
<table width='100%'>
<tr><td bgcolor='white'>$txt</td></tr>
</table>";

  # Return array with 1. HTML and 2. files
  ($txt, $filename);
}

## True. stupid construct.
1;


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic