'Re: [ossec-dev] ossec 2.7 agent disconnected'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       ossec-dev
Subject:    Re: [ossec-dev] ossec 2.7 agent disconnected
From:       J-B Cheng <jjoobbcc () gmail ! com>
Date:       2012-09-27 1:33:51
Message-ID: CAHAEmvBOwiSQrb=DjuxMOocz_q=dwSzAxL-LoZLG8sV_iSe0Xw () mail ! gmail ! com
[Download RAW message or body]

Thank you!
By the way, interested users can find 2.7-beta1 at
http://www.ossec.net/?page_id=19 .

On Wed, Sep 26, 2012 at 9:41 AM, <regis.houssin@gmail.com> wrote:

> hi,
>
> all is ok after 2.7-beta1 update
>
> thank you
> great job !
>
>
> Le mardi 25 septembre 2012 22:33:41 UTC+2, Daniel Cid a écrit :
>>
>> It seems a case of premature optimization without checking what else
>> would break.
>>
>> In the original code, we were initializing "command" as null:
>> <             logff[i].command = NULL; (line 166)
>>
>> And on the new code we are not. That explains the issue...
>>
>> Also, I would really recommend that everyone sending patches to try to
>> separate one feature/fix
>> per patch. When you mix multiple unrelated changes into one, it
>> becomes very hard for the reviewer to make sure
>> to code is safe and nothing else breaks.
>>
>> Thanks,
>>
>> --
>> Daniel B. Cid
>> http://dcid.me
>>
>>
>>
>>
>>
>> On Tue, Sep 25, 2012 at 1:58 PM, dan (ddp) <ddp...@gmail.com> wrote:
>> > Has anyone had any luck tracking down the core issue?
>> >
>> > On Thu, Sep 20, 2012 at 3:53 PM, JB Cheng <jjoo...@gmail.com> wrote:
>> >> Using 2.7-beta0 build, I could reproduce it every time after
>> restarting
>> >> logcollector, usually under 10 minutes.
>> >> Prior to segfault, in ossec.log I see
>> >>    2012/09/20 12:00:21 ossec-logcollector(1904): INFO: File not
>> available,
>> >> ignoring it: '/var/log/httpd/error_log'.
>> >>
>> >> It does not necessarily be httpd/error_log, it seems any "File not
>> >> available" situation will trigger the segfault, shortly after the log
>> entry
>> >> was printed.
>> >>
>> >> The stack trace is similar to what PAL reported:
>> >> #0  0x0000003c43c63c4c in fgetpos64@@GLIBC_2.2.5 () from
>> /lib64/libc.so.6
>> >> #1  0x0000000000408fed in read_syslog (pos=3, drop_it=0) at
>> read_syslog.c:37
>> >> #2  0x00000000004033c4 in LogCollectorStart () at logcollector.c:349
>> >> #3  0x0000000000404966 in main (argc=4, argv=0x7fffffffe898) at
>> main.c:184
>> >>
>> >> (gdb) p logff[3]
>> >> $2 = {flags = 109537, size = 1348167623, ign = 999, fd = 35324880, fp
>> = 0x0,
>> >> ffile = 0x0,
>> >>   file = 0x6402d0 "/var/log/httpd/error_log", logformat = 0x6401f0
>> "apache",
>> >> read = 0x408f90 <read_syslog>, {
>> >>     djb_program_name = 0xd1 <Address 0xd1 out of bounds>, lines = 209,
>> >> {command = 0xd1 <Address 0xd1 out of bounds>,
>> >>       alias = 0x0}, {timeout = 209, window = 0}, {start_regex = 0xd1
>> >> <Address 0xd1 out of bounds>, end_regex = 0x0}},
>> >>   private_data = 0x3c43f52a38}
>> >>
>> >> As you said, PAL's patch should prevent this segfault situation from
>> >> happening.  I have the core file if you still need it.
>> >> I do, however, curious about how it got here in the first place since
>> 2.6
>> >> did not have this issue.
>> >>
>> >> On Thursday, September 20, 2012 7:14:04 AM UTC-7, JB Cheng wrote:
>> >>>
>> >>> On ossec-list, PAL posted OSSEC 2.7-beta0. Logcollector segfaults
>> dirty
>> >>> fix.
>> >>> https://groups.google.com/**forum/?fromgroups=#!topic/**
>> ossec-list/NAJ_Nzd6T7w<https://groups.google.com/forum/?fromgroups=#!topic/ossec-list/NAJ_Nzd6T7w>
>> >>> Take a look at the post to see partial answers.
>> >>> I will gather more information later today.
>> >>>
>> >>> On Wednesday, September 19, 2012 10:53:55 PM UTC-7, sgros wrote:
>> >>>>
>> >>>> That seems like it could be my mistake....
>> >>>>
>> >>>> Can you reproduce this segmentation fault? If so, could you start it
>> with
>> >>>> core files enabled (ulimit -c unlimited) and fetch stack trace?
>> >>>> Alternatively, can you describe steps that lead to segmentation
>> fault (and
>> >>>> that can be reproduced on a single machine)?
>> >>>>
>> >>>> On Thursday, September 20, 2012 2:09:33 AM UTC+2, JB Cheng wrote:
>> >>>>>
>> >>>>> I investigated the original issue "ossec-logcollector not
>> running... "
>> >>>>> and found out it had a segmentation fault.
>> >>>>> To narrow down the search, I backed out logcollector code change in
>> >>>>> 2.7-beta0 (adding "linux_auditd", "multi-line" format) and the seg
>> fault no
>> >>>>> longer happened. If someone is willing to test the snapshot from
>> >>>>> https://bitbucket.org/jbcheng/**ossec-hids/<https://bitbucket.org/jbcheng/ossec-hids/>,
>> it will be really appreciated.
>> >>>>>
>> >>>>> Who knows, maybe the issue with  'rsyslog'  "File not available "
>> is
>> >>>>> also related to the new change.
>> >>>>>
>> >>>>> On Wednesday, September 19, 2012 8:09:49 AM UTC-7, Kat wrote:
>> >>>>>>
>> >>>>>> You missed my other posts that indeed the perms did NOT fix the
>> >>>>>> problem.. It is rsyslog. 2.6 works fine, so some code change in
>> logcollector
>> >>>>>> is causing this and the distro you are using uses rsyslog as well.
>> You can
>> >>>>>> flip to syslog-ng and that seems to fix it, but obviously a bug in
>> >>>>>> logcollector needs to be resolved.
>> >>>>>>
>> >>>>>> On Wednesday, September 19, 2012 2:32:01 AM UTC-7,
>> regis....@gmail.com
>> >>>>>> wrote:
>> >>>>>>>
>> >>>>>>> i change /var/log/maillog with 644 root:root but same problem:
>> >>>>>>>
>> >>>>>>> 2012/09/19 11:28:59 ossec-logcollector(1904): INFO: File not
>> >>>>>>> available, ignoring it: '/var/log/maillog'.
>> >>>>>>>
>> >>>>>>>
>> >>>>>>>
>> >>>>>>> Le mardi 18 septembre 2012 20:20:42 UTC+2, Kat a écrit :
>> >>>>>>>>
>> >>>>>>>> I think I found the problem in 2.7 --
>> >>>>>>>>
>> >>>>>>>> and to others seeing this problem - please check the perms on
>> the
>> >>>>>>>> files in question. My logs are locked down with
>> >>>>>>>> rw-------  root root
>> >>>>>>>>
>> >>>>>>>> And with 2.6 I don't see this problem, but with 2.7 I am
>> getting:
>> >>>>>>>>
>> >>>>>>>> 2012/09/18 10:18:33 ossec-logcollector(1904): INFO: File not
>> >>>>>>>> available, ignoring it: '/var/log/secure'.
>> >>>>>>>> 2012/09/18 10:18:33 ossec-logcollector(1904): INFO: File not
>> >>>>>>>> available, ignoring it: '/var/log/maillog'.
>> >>>>>>>>
>> >>>>>>>> So it seems to be related to perms.  The initial startup of
>> >>>>>>>> logcollector says it sees the files, but during rootcheck scan
>> and just
>> >>>>>>>> before it finishes, it gens the errors seen above.
>> >>>>>>>> If I change the files to rw-r--r-- root root, it seems to
>> resolve.
>> >>>>>>>> But still need to do some more testing with other perms.
>> Obviously having
>> >>>>>>>> read across the board is not something I want, even on a locked
>> down system.
>> >>>>>>>>
>> >>>>>>>> -Kat
>> >>>>>>>>
>> >>>>>>>> PS - sorry for the confusion in my first post - I should have
>> done
>> >>>>>>>> some more debugging.
>>
>


-- 
JB Cheng

[Attachment #3 (text/html)]

Thank you!<div>By the way, interested users can find 2.7-beta1 at 
<a href="http://www.ossec.net/?page_id=19">http://www.ossec.net/?page_id=19</a> \
.<br><br><div class="gmail_quote">On Wed, Sep 26, 2012 at 9:41 AM,  <span \
dir="ltr">&lt;<a href="mailto:regis.houssin@gmail.com" \
target="_blank">regis.houssin@gmail.com</a>&gt;</span> wrote:<br> <blockquote \
class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc \
solid;padding-left:1ex">hi,<br><br>all is ok after 2.7-beta1 update<br><br>thank \
you<br>great job !<br><br><br>Le mardi 25 septembre 2012 22:33:41 UTC+2, Daniel Cid a \
écrit :<blockquote class="gmail_quote" \
style="margin:0;margin-left:0.8ex;border-left:1px #ccc solid;padding-left:1ex"> <div \
class="im">It seems a case of premature optimization without checking what else \
<br>would break. <br>
<br>In the original code, we were initializing &quot;command&quot; as null:
<br>&lt;             logff[i].command = NULL; (line 166)
<br>
<br>And on the new code we are not. That explains the issue...
<br>
<br>Also, I would really recommend that everyone sending patches to try to
<br>separate one feature/fix
<br>per patch. When you mix multiple unrelated changes into one, it
<br>becomes very hard for the reviewer to make sure
<br>to code is safe and nothing else breaks.
<br>
<br>Thanks,
<br>
<br>--
<br>Daniel B. Cid
<br><a href="http://dcid.me" target="_blank">http://dcid.me</a>
<br>
<br>
<br>
<br>
<br>
<br></div><div class="im">On Tue, Sep 25, 2012 at 1:58 PM, dan (ddp) \
&lt;<a>ddp...@gmail.com</a>&gt; wrote: <br>&gt; Has anyone had any luck tracking down \
the core issue? <br>&gt;
<br></div><div><div class="h5">&gt; On Thu, Sep 20, 2012 at 3:53 PM, JB Cheng \
&lt;<a>jjoo...@gmail.com</a>&gt; wrote: <br>&gt;&gt; Using 2.7-beta0 build, I could \
reproduce it every time after restarting <br>&gt;&gt; logcollector, usually under 10 \
minutes. <br>&gt;&gt; Prior to segfault, in ossec.log I see
<br>&gt;&gt;    2012/09/20 12:00:21 ossec-logcollector(1904): INFO: File not \
available, <br>&gt;&gt; ignoring it: &#39;/var/log/httpd/error_log&#39;.
<br>&gt;&gt;
<br>&gt;&gt; It does not necessarily be httpd/error_log, it seems any &quot;File not
<br>&gt;&gt; available&quot; situation will trigger the segfault, shortly after the \
log entry <br>&gt;&gt; was printed.
<br>&gt;&gt;
<br>&gt;&gt; The stack trace is similar to what PAL reported:
<br>&gt;&gt; #0  0x0000003c43c63c4c in fgetpos64@@GLIBC_2.2.5 () from \
/lib64/libc.so.6 <br>&gt;&gt; #1  0x0000000000408fed in read_syslog (pos=3, \
drop_it=0) at read_syslog.c:37 <br>&gt;&gt; #2  0x00000000004033c4 in \
LogCollectorStart () at logcollector.c:349 <br>&gt;&gt; #3  0x0000000000404966 in \
main (argc=4, argv=0x7fffffffe898) at main.c:184 <br>&gt;&gt;
<br>&gt;&gt; (gdb) p logff[3]
<br>&gt;&gt; $2 = {flags = 109537, size = 1348167623, ign = 999, fd = 35324880, fp = \
0x0, <br>&gt;&gt; ffile = 0x0,
<br>&gt;&gt;   file = 0x6402d0 &quot;/var/log/httpd/error_log&quot;, logformat = \
0x6401f0 &quot;apache&quot;, <br>&gt;&gt; read = 0x408f90 &lt;read_syslog&gt;, {
<br>&gt;&gt;     djb_program_name = 0xd1 &lt;Address 0xd1 out of bounds&gt;, lines = \
209, <br>&gt;&gt; {command = 0xd1 &lt;Address 0xd1 out of bounds&gt;,
<br>&gt;&gt;       alias = 0x0}, {timeout = 209, window = 0}, {start_regex = 0xd1
<br>&gt;&gt; &lt;Address 0xd1 out of bounds&gt;, end_regex = 0x0}},
<br>&gt;&gt;   private_data = 0x3c43f52a38}
<br>&gt;&gt;
<br>&gt;&gt; As you said, PAL&#39;s patch should prevent this segfault situation from
<br>&gt;&gt; happening.  I have the core file if you still need it.
<br>&gt;&gt; I do, however, curious about how it got here in the first place since \
2.6 <br>&gt;&gt; did not have this issue.
<br>&gt;&gt;
<br>&gt;&gt; On Thursday, September 20, 2012 7:14:04 AM UTC-7, JB Cheng wrote:
<br>&gt;&gt;&gt;
<br>&gt;&gt;&gt; On ossec-list, PAL posted OSSEC 2.7-beta0. Logcollector segfaults \
dirty <br>&gt;&gt;&gt; fix.
<br>&gt;&gt;&gt; <a href="https://groups.google.com/forum/?fromgroups=#!topic/ossec-list/NAJ_Nzd6T7w" \
target="_blank">https://groups.google.com/<u></u>forum/?fromgroups=#!topic/<u></u>ossec-list/NAJ_Nzd6T7w</a>
 <br>&gt;&gt;&gt; Take a look at the post to see partial answers.
<br>&gt;&gt;&gt; I will gather more information later today.
<br>&gt;&gt;&gt;
<br>&gt;&gt;&gt; On Wednesday, September 19, 2012 10:53:55 PM UTC-7, sgros wrote:
<br>&gt;&gt;&gt;&gt;
<br>&gt;&gt;&gt;&gt; That seems like it could be my mistake....
<br>&gt;&gt;&gt;&gt;
<br>&gt;&gt;&gt;&gt; Can you reproduce this segmentation fault? If so, could you \
start it with <br>&gt;&gt;&gt;&gt; core files enabled (ulimit -c unlimited) and fetch \
stack trace? <br>&gt;&gt;&gt;&gt; Alternatively, can you describe steps that lead to \
segmentation fault (and <br>&gt;&gt;&gt;&gt; that can be reproduced on a single \
machine)? <br>&gt;&gt;&gt;&gt;
<br>&gt;&gt;&gt;&gt; On Thursday, September 20, 2012 2:09:33 AM UTC+2, JB Cheng \
wrote: <br>&gt;&gt;&gt;&gt;&gt;
<br>&gt;&gt;&gt;&gt;&gt; I investigated the original issue &quot;ossec-logcollector \
not running... &quot; <br>&gt;&gt;&gt;&gt;&gt; and found out it had a segmentation \
fault. <br>&gt;&gt;&gt;&gt;&gt; To narrow down the search, I backed out logcollector \
code change in <br>&gt;&gt;&gt;&gt;&gt; 2.7-beta0 (adding &quot;linux_auditd&quot;, \
&quot;multi-line&quot; format) and the seg fault no <br>&gt;&gt;&gt;&gt;&gt; longer \
happened. If someone is willing to test the snapshot from <br>&gt;&gt;&gt;&gt;&gt; <a \
href="https://bitbucket.org/jbcheng/ossec-hids/" \
target="_blank">https://bitbucket.org/jbcheng/<u></u>ossec-hids/</a>, it will be \
really appreciated. <br>&gt;&gt;&gt;&gt;&gt;
<br>&gt;&gt;&gt;&gt;&gt; Who knows, maybe the issue with  &#39;rsyslog&#39;  \
&quot;File not available &quot; is <br>&gt;&gt;&gt;&gt;&gt; also related to the new \
change. <br>&gt;&gt;&gt;&gt;&gt;
<br>&gt;&gt;&gt;&gt;&gt; On Wednesday, September 19, 2012 8:09:49 AM UTC-7, Kat \
wrote: <br>&gt;&gt;&gt;&gt;&gt;&gt;
<br>&gt;&gt;&gt;&gt;&gt;&gt; You missed my other posts that indeed the perms did NOT \
fix the <br>&gt;&gt;&gt;&gt;&gt;&gt; problem.. It is rsyslog. 2.6 works fine, so some \
code change in logcollector <br>&gt;&gt;&gt;&gt;&gt;&gt; is causing this and the \
distro you are using uses rsyslog as well. You can <br>&gt;&gt;&gt;&gt;&gt;&gt; flip \
to syslog-ng and that seems to fix it, but obviously a bug in \
<br>&gt;&gt;&gt;&gt;&gt;&gt; logcollector needs to be resolved. \
<br>&gt;&gt;&gt;&gt;&gt;&gt; <br>&gt;&gt;&gt;&gt;&gt;&gt; On Wednesday, September 19, \
2012 2:32:01 AM UTC-7, <a>regis....@gmail.com</a> <br>&gt;&gt;&gt;&gt;&gt;&gt; wrote:
<br>&gt;&gt;&gt;&gt;&gt;&gt;&gt;
<br>&gt;&gt;&gt;&gt;&gt;&gt;&gt; i change /var/log/maillog with 644 root:root but \
same problem: <br>&gt;&gt;&gt;&gt;&gt;&gt;&gt;
<br>&gt;&gt;&gt;&gt;&gt;&gt;&gt; 2012/09/19 11:28:59 ossec-logcollector(1904): INFO: \
File not <br>&gt;&gt;&gt;&gt;&gt;&gt;&gt; available, ignoring it: \
&#39;/var/log/maillog&#39;. <br>&gt;&gt;&gt;&gt;&gt;&gt;&gt;
<br>&gt;&gt;&gt;&gt;&gt;&gt;&gt;
<br>&gt;&gt;&gt;&gt;&gt;&gt;&gt;
<br>&gt;&gt;&gt;&gt;&gt;&gt;&gt; Le mardi 18 septembre 2012 20:20:42 UTC+2, Kat a \
écrit : <br>&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;
<br>&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; I think I found the problem in 2.7 --
<br>&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;
<br>&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; and to others seeing this problem - please check \
the perms on the <br>&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; files in question. My logs are \
locked down with <br>&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; rw-------  root root
<br>&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;
<br>&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; And with 2.6 I don&#39;t see this problem, but \
with 2.7 I am getting: <br>&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;
<br>&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; 2012/09/18 10:18:33 ossec-logcollector(1904): \
INFO: File not <br>&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; available, ignoring it: \
&#39;/var/log/secure&#39;. <br>&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; 2012/09/18 10:18:33 \
ossec-logcollector(1904): INFO: File not <br>&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; \
available, ignoring it: &#39;/var/log/maillog&#39;. \
<br>&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; <br>&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; So it seems \
to be related to perms.  The initial startup of <br>&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; \
logcollector says it sees the files, but during rootcheck scan and just \
<br>&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; before it finishes, it gens the errors seen \
above. <br>&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; If I change the files to rw-r--r-- root \
root, it seems to resolve. <br>&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; But still need to do \
some more testing with other perms. Obviously having \
<br>&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; read across the board is not something I want, \
even on a locked down system. <br>&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;
<br>&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; -Kat
<br>&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;
<br>&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; PS - sorry for the confusion in my first post - \
I should have done <br>&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; some more debugging.
<br></div></div></blockquote></blockquote></div><br><br clear="all"><div><br></div>-- \
<br>JB Cheng <div><br></div><br> </div>



[prev in list] [next in list] [prev in thread] [next in thread]
Configure | About | News | Add a list | Sponsored by KoreLogic