[prev in list] [next in list] [prev in thread] [next in thread] 

List:       cyrus-info
Subject:    Re: How to remove attachments from many mails on Cyrus-IMAP Store?
From:       Bron Gondwana <brong () fastmailteam ! com>
Date:       2017-10-19 21:04:29
Message-ID: 1508447069.989306.1144680672.78C9F2F8 () webmail ! messagingengine ! com
[Download RAW message or body]

[Attachment #2 (--_----------=_15084470699893060)]


Hi Walter,

The only way to remove attachments is to replace the message with an
edited copy of the message.  You can't alter emails in an IMAP mailstore
once they are delivered.
At FastMail we're using the attached Perl module to strip attachments
from emails for our "remove attachments" feature.  It's not pretty, but
it gets the job done.  It needs to be hooked up to some code that
actually connects via IMAP and does the work.
Cheers,

Bron.


On Thu, 19 Oct 2017, at 22:10, Walter H. via Info-cyrus wrote:
> Hello,
> 
> my energy supplier sends a daily mail about the electricity power
> consumption of the previous day;
> these mails have two attachment - one .csv and one .xml
> 
> I'd like to remove the .xml attachments from the mails already
> stored in> the cyrus database, as these are bigger and really not needed;
> how would I achieve this?
> (how to delete these from each file in the database is not problem)
> 
> why I would like to do this:  saving storage ...
> 
> the directory has these files:
> 
> 1.
> 2.
> 3.
> ...
> 10007.
> 10008.
> cyrus.cache
> cyrus.header
> cyrus.index
> cyrus.squat
> 
> my system: CentOS 6, Cyrus v2.3.16-Fedora-RPM-2.3.16-15.el6
> 
> Thanks,
> Walter
> 
> ----
> Cyrus Home Page: http://www.cyrusimap.org/
> List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/> To Unsubscribe:
> https://lists.andrew.cmu.edu/mailman/listinfo/info-cyrus

--
  Bron Gondwana, CEO, FastMail Pty Ltd
  brong@fastmailteam.com



[Attachment #5 (unknown)]

<!DOCTYPE html>
<html>
<head>
<title></title>
</head>
<body><div style="font-family:Arial;">Hi Walter,<br></div>
<div style="font-family:Arial;"><br>The only way to remove attachments is to replace \
the message with an edited copy of the message.&nbsp; You can't alter emails in an \
IMAP mailstore once they are delivered.</div> <div \
style="font-family:Arial;"><br></div> <div style="font-family:Arial;">At FastMail \
we're using the attached Perl module to strip attachments from emails for our "remove \
attachments" feature.&nbsp; It's not pretty, but it gets the job done.&nbsp; It needs \
to be hooked up to some code that actually connects via IMAP and does the \
work.<br></div> <div style="font-family:Arial;"><br></div>
<div style="font-family:Arial;">Cheers,<br></div>
<div style="font-family:Arial;"><br>Bron.<br></div>
<div><br></div>
<div><br></div>
<div>On Thu, 19 Oct 2017, at 22:10, Walter H. via Info-cyrus wrote:<br></div>
<blockquote type="cite"><div>Hello,<br></div>
<div><br></div>
<div>my energy supplier sends a daily mail about the electricity power<br></div>
<div>consumption of the previous day;<br></div>
<div>these mails have two attachment - one .csv and one .xml<br></div>
<div><br></div>
<div>I'd like to remove the .xml attachments from the mails already stored \
in<br></div> <div>the cyrus database, as these are bigger and really not \
needed;<br></div> <div>how would I achieve this?<br></div>
<div>(how to delete these from each file in the database is not problem)<br></div>
<div><br></div>
<div>why I would like to do this:&nbsp; saving storage ...<br></div>
<div><br></div>
<div>the directory has these files:<br></div>
<div><br></div>
<div>1.<br></div>
<div>2.<br></div>
<div>3.<br></div>
<div>...<br></div>
<div>10007.<br></div>
<div>10008.<br></div>
<div>cyrus.cache<br></div>
<div>cyrus.header<br></div>
<div>cyrus.index<br></div>
<div>cyrus.squat<br></div>
<div><br></div>
<div>my system: CentOS 6, Cyrus v2.3.16-Fedora-RPM-2.3.16-15.el6<br></div>
<div><br></div>
<div>Thanks,<br></div>
<div>Walter<br></div>
<div><br></div>
<div>----<br></div>
<div>Cyrus Home Page: <a \
href="http://www.cyrusimap.org/">http://www.cyrusimap.org/</a><br></div> <div>List \
Archives/Info: <a href="http://lists.andrew.cmu.edu/pipermail/info-cyrus/">http://lists.andrew.cmu.edu/pipermail/info-cyrus/</a><br></div>
 <div>To Unsubscribe:<br></div>
<div><a href="https://lists.andrew.cmu.edu/mailman/listinfo/info-cyrus">https://lists.andrew.cmu.edu/mailman/listinfo/info-cyrus</a><br></div>
 </blockquote><div style="font-family:Arial;"><br></div>
<div id="sig56629417"><div class="signature">--<br></div>
<div class="signature">&nbsp; Bron Gondwana, CEO, FastMail Pty Ltd<br></div>
<div class="signature">&nbsp; brong@fastmailteam.com<br></div>
<div class="signature"><br></div>
</div>
<div style="font-family:Arial;"><br></div>
</body>
</html>


["StripAttachments.pm" (StripAttachments.pm)]

#!/usr/bin/perl -cw
package MIME::StripAttachments;

use strict;
use warnings;

# Avoid UTF-8 regexp issues. Treat everything as pure
#  binary data
no utf8;
use bytes;

sub StripAttachments {
  my $infh = shift;
  my $outfh = shift;
  my @parts = @_;

  my %todel = map { $_ => 1 } @parts;

  Process($infh, $outfh, sub {
    my $part = shift;
    my $headers = shift;
    return $todel{$part} ? 'delete' : 'keep';
  });
}

sub Process {
  my $infh = shift;
  my $outfh = shift;
  my $decider = shift;

  # Main processing loop
  my $InHeader = 1;
  my $HeadBuffer = '';
  my @Boundaries;
  my @nums;
  my $delbytes = 0;
  my $current_part = '';
  my %HeaderMap;
  my $deleting = 0;
  my $stripped = 0;
  my $deldepth;
  my $deltype;
  my $deldispos;

  while (my $line = $infh->getline()) {
    # Processing a header
    if ($InHeader) {
      $HeadBuffer .= $line;

      # End of headers
      if ($line =~ m/^\r*\n$/) {
        $HeaderMap{$current_part} ||= {};
        $InHeader = ProcessHeaders(\$HeadBuffer, \@Boundaries, $HeaderMap{$current_part});
        unless ($deleting) {
          my ($res, $type) = $decider->($current_part, $HeaderMap{$current_part});
          if ($res eq 'delete') {
            $deleting = 1;
            $delbytes = 0;
            $deldepth = $#Boundaries;
            $deltype = $type || 'x-me-removed';
            $deldispos = $HeaderMap{$current_part}{'content-disposition'}[2];
          }
          elsif ($res eq 'keep') {
            $outfh->print($HeadBuffer);
          } else {
            die "Odd response $res";
          }
        }
        $HeadBuffer = '';
      }
    }

    # In 'body' type section
    else {
      # Found boundary string?
      if (@Boundaries && $line =~ $Boundaries[-1]->[1]) {
        my $Tail = $1;
        if ($deleting) {
          if ($#Boundaries == $deldepth) {
            # we're replacing this one
            $outfh->print("Content-Type: text/$deltype; charset=\"us-ascii\"\r\n");
            $outfh->print("Content-Disposition: $deldispos\r\n") if $deldispos;
            $outfh->print("\r\n");
            $outfh->print("Removed an attachment of $delbytes bytes with the following headers:\r\n");
            foreach my $hn (sort keys %{$HeaderMap{$current_part}}) {
              $outfh->print(join('', @{$HeaderMap{$current_part}{$hn}}) . "\r\n");
            }
            $stripped++;
            $deleting = 0;
          }
          else {
            $delbytes += length($line);
          }
        }
        # Use previous boundary match
        if ($Tail) {
          pop @Boundaries;
        }
        else {
          $nums[$#Boundaries]++;
          $InHeader = 1;
        }
        $#nums = $#Boundaries;
        my @list = @nums;
        $current_part = join('.', @list);
        $outfh->print($line) unless $deleting;
      }
      else {
        if ($deleting) {
          $delbytes += length($line);
        } else {
          $outfh->print($line);
        }
      }
    }
  }

  return $stripped;
}
  
sub ProcessHeaders {
  my ($HeadBuffer, $Boundaries, $HeaderMap) = @_;

  my $rc = 0;

  # Loop through and list all headers (minus \r\n)
  my @Headers;
  pos($$HeadBuffer) = 0;
  while ($$HeadBuffer =~ m/\G([^\s:]+)(:[ \t]*)([^\r\n]*(?:\r?\n[ \t]+[^\r\n]*)*)\r?\n/gc) {
    push @Headers, [ $1, $2, $3 ]
  }
  my ($Remainder) = $$HeadBuffer =~ m/\G(.*)$/s;

  # Build map (prefer earlier headers). Save refs
  my %Headers = map { lc($_->[0]) => $_ } reverse @Headers;

  # Extract new MIME boundary details in content-type headers
  if (my $ContentType = $Headers{'content-type'}) {
    HandleContentTypeHeader($Boundaries, $ContentType->[2]);

    # Return true if message/rfc822 attachment
    if ($ContentType->[2] =~ m{^message/rfc822}i) {
      # We're inside a message now
      $Boundaries->[-1]->[2]++ if @$Boundaries;
      $rc = 1;
    }
  }

  %$HeaderMap = %Headers;

  return $rc;
}

sub HandleContentTypeHeader {
  my ($Boundaries, $HeaderValue) = @_;

  # Put current mime type string into boundary details
  my ($MimeType) = $HeaderValue =~ /^([^;\s]+)/;
  $Boundaries->[-1]->[4] = $MimeType if @$Boundaries;

  # Get boundary string - first try quoted
  my ($Boundary) = ($HeaderValue =~ /boundary="([^"]+)"/i);
  unless ($Boundary) {
    # fall back to pretty restricted
    ($Boundary) = ($HeaderValue =~ /boundary=([^\s;]+)/i);
  }
  return unless $Boundary;

  # Skip boundary if not a multipart/* content type
  return if $MimeType !~ m{^multipart/}i;

  $Boundary = "--" . quotemeta($Boundary);

  # Track how deep we are in attached messages
  my $MessageDepth = @$Boundaries ? $Boundaries->[-1]->[2] : 0;

  # Create match regexp
  push @$Boundaries, [ $Boundary, qr/^$Boundary(--)?\s*$/, $MessageDepth, $MimeType, '' ];
}

1;


----
Cyrus Home Page: http://www.cyrusimap.org/
List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/
To Unsubscribe:
https://lists.andrew.cmu.edu/mailman/listinfo/info-cyrus

[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic