[prev in list] [next in list] [prev in thread] [next in thread] 

List:       kde-devel
Subject:    Re: KLogTool
From:       Jan Kneschke <jan () kneschke ! de>
Date:       2002-10-24 9:39:01
[Download RAW message or body]

On Thu, Oct 24, 2002 at 09:49:53AM +0200, Michael Goffioul wrote:
> > > - each log entry is a single line
> > 
> > For very long lines this is NOT the case for apache 1.3.x.
> 
> I would appreciate some example files (in private of course).

Don't you have webserver which is hit by nimda and friends ?

p3EE1DBEA.dip.t-dialin.net - - [25/Oct/2001:14:24:51 +0000] "GET \
/html/admin.php?name%5B0%5D=Pasta+-+programs+-+window.php&filename%5B0%5D=% \
2Fpages%2Fprojects%2Fpasta%2Fwindow.php&edit%5B0%5D=edit&name%5B1%5D=Pasta+-+programs+-+index.php&filename%5B1%5D=%2Fpages%2Fprojects%2Fpast
 a%2Findex.php&name%5B2%5D=Pasta+-+programs+-+wm.php&filename%5B2%5D=%2Fpages%2Fprojects%2Fpasta%2Fwm.php&name%5B3%5D=Pasta+-+programs+-+kons
 ole.php&filename%5B3%5D=%2Fpages%2Fprojects%2Fpasta%2Fkonsole.php&name%5B4%5D=Phpezant+-+listcreate.php&filename%5B4%5D=%2Fpages%2Fprojects%
 2Fphpezant%2Flistcreate.php&name%5B5%5D=Modlogan+-+mlaconfiggen.php&filename%5B5%5D=%2Fpages%2Fprojects%2Fmodlogan%2Fmlaconfiggen.php&name%5
 B6%5D=Pasta+-+lib+-+wm.inc&filename%5B6%5D=%2Flib%2Fphp%2Fpasta%2Fwm.inc&name%5B7%5D=Pasta+-+lib+-+themedwm.inc&filename%5B7%5D=%2Flib%2Fphp
 %2Fpasta%2Fthemedwm.inc&name%5B8%5D=Pasta+-+lib+-+window.inc&filename%5B8%5D=%2Flib%2Fphp%2Fpasta%2Fwindow.inc&name%5B9%5D=Pasta+-+lib+-+obj
 ect.inc&filename%5B9%5D=%2Flib%2Fphp%2Fpasta%2Fobject.i
nc&name%5B10%5D=Pasta+-+lib+-+view.inc&filename%5B10%5D=%2Flib%2Fphp%2Fpasta%2Fview.inc&name%5B11%5D=Pasta+-+lib+-+viewcollection.inc&filena
 me%5B11%5D=%2Flib%2Fphp%2Fpasta%2Fviewcollection.inc&name%5B12%5D=Pasta+-+lib+-+box.inc&filename%5B12%5D=%2Flib%2Fphp%2Fpasta%2Fbox.inc \
HTTP /1.1" 200 15675 "http://jan.kneschke.de/html/admin.php?op=showsource" \
"Mozilla/5.0 (compatible; Konqueror/2.2.1; Linux)"

The Linebreak is after the "object.i". All the other linebreak are just virtual.


p3EE1DBEA.dip.t-dialin.net - - [25/Oct/2001:14:25:20 +0000] "GET \
/html/admin.php?op=showsource HTTP/1.1" 200 18793 "http://jan.kneschke.de/h \
tml/admin.php?name%5B0%5D=Pasta+-+programs+-+window.php&filename%5B0%5D=%2Fpages%2Fprojects%2Fpasta%2Fwindow.php&edit%5B0%5D=edit&name%5B1%5
 D=Pasta+-+programs+-+index.php&filename%5B1%5D=%2Fpages%2Fprojects%2Fpasta%2Findex.php&name%5B2%5D=Pasta+-+programs+-+wm.php&filename%5B2%5D
 =%2Fpages%2Fprojects%2Fpasta%2Fwm.php&name%5B3%5D=Pasta+-+programs+-+konsole.php&filename%5B3%5D=%2Fpages%2Fprojects%2Fpasta%2Fkonsole.php&n
 ame%5B4%5D=Phpezant+-+listcreate.php&filename%5B4%5D=%2Fpages%2Fprojects%2Fphpezant%2Flistcreate.php&name%5B5%5D=Modlogan+-+mlaconfiggen.php
 &filename%5B5%5D=%2Fpages%2Fprojects%2Fmodlogan%2Fmlaconfiggen.php&name%5B6%5D=Pasta+-+lib+-+wm.inc&filename%5B6%5D=%2Flib%2Fphp%2Fpasta%2Fw
 m.inc&name%5B7%5D=Pasta+-+lib+-+themedwm.inc&filename%5B7%5D=%2Flib%2Fphp%2Fpasta%2Fthemedwm.inc&name%5B8%5D=Pasta+-+lib+-+window.inc&filena
 me%5B8%5D=%2Flib%2Fphp%2Fpasta%2Fwindow.inc&name%5B9%5D
=Pasta+-+lib+-+object.inc&filename%5B9%5D=%2Flib%2Fphp%2Fpasta%2Fobject.inc&name%5B10%5D=Pasta+-+lib+-+view.inc&filename%5B10%5D=%2Flib%2Fph
 p%2Fpasta%2Fview.inc&name%5B11%5D=Pasta+-+lib+-+viewcollection.inc&filename%5B11%5D=%2Flib%2Fphp%2Fpasta%2Fviewcollection.inc&name%5B12%5D=P
 asta+-+lib+-+box.inc&filename%5B12%5D=%2Flib%2Fphp%2Fpasta%2Fbox.inc" "Mozilla/5.0 \
(compatible; Konqueror/2.2.1; Linux)"

Here it is after the "%5B9%5D".
 
> > > - fields with spaces are quoted (otherwise it seems impossible
> > > to parse)
> > 
> > You should use regexes to parse them. If you want you should take a look at
> > modlogan (http://jan.kneschke.de/projects/modlogan/) which is a very
> > flexible log-file parser for multiple logfile-types (webserver, ftp-server,
> > mail-server, streaming-server, ...)
> 
> klogtool is already based on regexp's. However, even with regexp's, I
> don't see how you can parse reliably a line that contains successively
> 2 unquoted fields including spaces. Imagine a log format like:
> %r %r. Or each of these fields must have a known format.

This is a user problem. That's why %r is always surrounded by '"'. 

In the lastest release we have regex-generators for CustomLog (apache), MSIIS, 
Netscape, Squid and Realserver as their Logfile format is configurable.

You should take a look at the source to get an impression what has to be
done:

http://cvs.sourceforge.net/cgi-bin/viewcvs.cgi/modlogan/modlogan/src/input/clf/plugin_config.c?rev=1.34&content-type=text/vnd.viewcvs-markup


parse_clf_field_info() is the parser for the apache CustomLog directive.

> Michael.

  Jan

-- 
http://jan.kneschke.de - localizer, modlogan, pxtools
mailto:jan@kneschke.de - Jan Kneschke
 
> > Visit http://mail.kde.org/mailman/listinfo/kde-devel#unsub to unsubscribe <<


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic