[prev in list] [next in list] [prev in thread] [next in thread] 

List:       postfix-users
Subject:    Re: Network difficulties with some senders
From:       "James B. Byrne" <byrnejb () harte-lyne ! ca>
Date:       2018-07-20 19:08:20
Message-ID: 96e6ac56ef21f867b0145fa16bc5b042.squirrel () webmail ! harte-lyne ! ca
[Download RAW message or body]


On Thu, July 19, 2018 22:23, Viktor Dukhovni wrote:
> On Thu, Jul 19, 2018 at 02:30:52PM -0400, James B. Byrne wrote:
>
>> > You really need to show more of the (non-verbose) logging for this
>> > session and the below.  You're cutting out critical context.
>>
>> Jul 19 13:40:39 mx31 postfix-p25/smtpd[96635]: NOQUEUE:
>>   client=mail.rosedale.ca[66.135.118.147]
>> Jul 19 13:40:39 mx31 postfix-p25/smtpd[96635]: lost connection after
>>   DATA (0 bytes) from mail.rosedale.ca[66.135.118.147]
>> Jul 19 13:40:39 mx31 postfix-p25/smtpd[96635]: disconnect from
>>   mail.rosedale.ca[66.135.118.147] ehlo=1 mail=1 rcpt=1 data=0/1
>>   commands=3/4
>
> The sending client is encountering networking issues.  Unrelated
> to Postfix or TLS.  Your new systems may have enabled TCP window
> scaling or ECN, or other TCP features that are giving the sending
> system indigestion.  Or just a vanilla MTU issue.  You could try
> to disable window scaling, which makes data transfer slower for
> senders with a high bandwidth delay product, but email is generally
> tolerant of a few seconds to minutes of extra delay.
>
> Stating carefully at PCAP files might help.
>
>> Jul 19 13:40:39 mx31 postfix-p25/smtpd[40715]: disconnect from
>> mail.rosedale.ca[66.135.118.147] ehlo=1 mail=1 rcpt=1 data=0/1
>> commands=3/4
>
> Much the same, no TLS in sight.
>

We have resolved this issue.  But the fix is not what one would
describe as intuitive.  The local cacheing DNS service on the MX host
is local_unbound with a reference to 127.0.0.1 in resolv.conf. I set
resolv.conf to check only our forwarding DNS hosts and removed the
reference to 127.0.0.1.  And the 'lost connection after DATA (0
bytes)' problem immediately disappeared and did not return.

The clue which unlocked this puzzle I discovered when I attempted to
ssh into the MX host directly rather than to the underlying host and
use a local console connection.  There was a noticeable delay logging
on via ssh. Turning off UseDNS in sshd_config allowed immediate ssh
logons. That told me that the issue was with the resolver.

The MX host in question is running in a FreeBSD jail.  FreeBSD ships
with a default DNS service named local_unbound.  This was set up in
accordance with previous experience and appeared to be working, from
the command line.

A feature of FreeBSD jails is that each requires a dedicated cloned
loopback interface each with its own unique IP address, usually
127.0.0.[cloned index number].  Jails are supposed to automatically
map references to 127.0.0.1 to whatever IP-ADDR is assigned to that
Jail's lo interface.  I had previously discovered that this was not
the case with Postfix's inet_interfaces setting.  It turns out that
the same thing happens with the resolver (or rather what is expected
does not happen) when 127.0.0.1 is placed in /etc/resolv.conf.

I replaced 127.0.0.1  with the address actually assigned to the cloned
lo interface used by that jail and problem did not resurface.  If I
return it to 127.0.0.1 then we get the problem behaviour.

Evidently the issue is caused by DNS timeouts.  Why it affected only a
few senders I cannot explain but we have had no further disconnects
attributable to this cause since the changes were made to resolv.conf.
 We changed nothing else so this is either the remedy or we have a
massively improbable coincidence.  Not impossible but very unlikely.

Having written that no doubt I have called upon the fates to disabuse
me of my hubris.

Thanks for all the help.  It was greatly appreciated.

-- 
***          e-Mail is NOT a SECURE channel          ***
        Do NOT transmit sensitive data via e-Mail
 Do NOT open attachments nor follow links sent by e-Mail

James B. Byrne                mailto:ByrneJB@Harte-Lyne.ca
Harte & Lyne Limited          http://www.harte-lyne.ca
9 Brockley Drive              vox: +1 905 561 1241
Hamilton, Ontario             fax: +1 905 561 0757
Canada  L8E 3C3

[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic