'Re: [asterisk-dev] res_fax_spandsp segfaults during fax detection'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       asterisk-dev
Subject:    Re: [asterisk-dev] res_fax_spandsp segfaults during fax detection
From:       Michal_Rybárik <michal () rybarik ! sk>
Date:       2014-01-28 22:03:46
Message-ID: 52E82942.40708 () rybarik ! sk
[Download RAW message or body]

Hi Pavel,

> Hi Matthew,
>    thanks for another part of the mosaic, it seems to be more and more
> complete and starts forming a picture :-).
>    Would it be possible, that the peer causing the segfault uses 10ms
> packetization time instead of 20ms, which is required for correct operation
> by libspandsp ? What does Asterisk in cases, when the peer allows 10ms
> packetization only, while we are using 20ms ? Will it do a conversion,
> catching two packets and presenting them as one bigger, or will it just
> forward the short packets to the application ? If the second answer is
> correct (which I belive is true), I think we have a candidate case for the
> crash - it's caused by someone who sends 10ms packets instead of 20ms ones.

My setup (which has segfaults) is VERY simple, 90% of calls flows
between two machines, this way:
(E1 / DAHDI) ------- Asterisk11 ------- (SIP+RTP) -------- Asterisk11
------- (E1 / DAHDI)
Configuration is very simple too - only alaw is allowed in SIP (DAHDI is
also alaw in our country, so there shouldn't be any transcoding, except
initial t38gateway period when transcoding to sln is needed, AFAIK).
Every frame in my setup should be 20ms long, I shouldn't have any 10ms
frames here. If I look at the frame which caused last segfault, I see
that len , and this means frame length in milliseconds. This
corresponds to samples0 which was set in this frame, because with
8000Hz sampling frequency, we should have exactly 160 samples during
20ms - so I think the bad frame was also 20ms long.

[Jan 27 14:00:32] NOTICE[30694][C-000006cb] res_fax_spandsp.c: frame={
frametype=2, datalen0, samples0, mallocd=1, mallocd_hdr_lenV2,
offsetd, src=RTP, flags=1, ts‘40, len , seqno89,
data.ptr=0xb4ef4f30  }

>    So, yes, the pcap of such a failure would be really great! As I wrote,
> I can't generate one, because in my environment, the crash is really VERY
> rare and the pcap files would probably fill the whole disk and didn't catch
> anything :-). Maybe Michal will have more luck with this ?

Ouch :) Capturing all RTP traffic won't be easy, if I don't want to
affect machine operation... Best way would be to mirror traffic at the
switch to another port, and then do pcap on another dedicated machine...
It is possible, but my devices are at the opposite side of country, and
this is not trivial to manage ;) I'll keep this for later days, if we
won't succeed using less brute-force ways :)

BTW, I did an quick & ugly patch yesterday and I'm testing it now.. I
know that it was (probably) the first RTP frame in the call, which
caused last segfault. And only this one frame has datalen0 and
mallocd=1. So I am skipping frames which has such values set - I'm not
passing them into libspandsp. I see from logs, that I was right, and
only first frame in call is filtered from V21 detection, so this doesn't
break normal operation. In a few days I'll see, if this helped, or not.
For sure, this is not proper fix, but I hope it'll help while debugging....

--
Michal Rybarik

--
_____________________________________________________________________
-- Bandwidth and Colocation Provided by http://www.api-digital.com --

asterisk-dev mailing list
To UNSUBSCRIBE or update options visit:
   http://lists.digium.com/mailman/listinfo/asterisk-dev
[prev in list] [next in list] [prev in thread] [next in thread]