[prev in list] [next in list] [prev in thread] [next in thread] 

List:       clamav-users
Subject:    [clamav-users] Filetype determination
From:       Maarten Broekman via clamav-users <clamav-users () lists ! clamav ! net>
Date:       2019-04-26 21:12:32
Message-ID: CAMnu9k6DF4-Vm4sW0YsWDLyERdrPRP6H6b_vPGmO0en4aVP8ng () mail ! gmail ! com
[Download RAW message or body]

[Attachment #2 (multipart/alternative)]


One problem that we're running into is that we encounter web pages and cgi
scripts that are "inconsistently" normalized. I put "inconsistently" in
quotes because without fully knowing the way ClamAV normalizes files, it is
sometimes difficult to understand why two similar files might be normalized
differently. For example, a PHP script that doesn't contain HTML tags will
be normalized using 'ascii-normalise', while the exact same PHP code will
be normalized with 'html-normalise' if it happens to be tacked on to some
HTML.

This seems to be particularly prevalent with phishing kits, where we want
to write a signature based on the PHP code, not necessarily on the HTML. As
a result, we end up having to write two signatures because HTML
normalization seems to remove the spaces around equal signs, while ASCII
normalization leaves them in. Additionally, HTML normalization uses
double-quotes (") to replace single-quotes (') while ASCII normalization
leaves them as their original.

Example:
$ip = getenv("REMOTE_ADDR");
$password = $_POST['password'];

ASCII normalized:
$ip = getenv("remote_addr");
$password = $_post['password'];

HTML normalized:
$ip=getenv("remote_addr");
$password=$_post["password"];

So, my question is this:
How can we get PHP tags ( <? and <?php ) marked as 'HTML' file type so they
are normalized the same as other 'web' files?


 Also, there are more than a few HTML files that browsers render 'properly'
that don't contain the following tags:

'<html>'
'<head>'
'<a*href'
'<img'
'<script'
'<object'
'<iframe'
'<table'


A few other tags, such as <style, <!doctype, <meta, <title, <form, might
help as well as fixing the html and head tags to only require the leading <
(<html instead of <html>)


--Maarten

[Attachment #5 (text/html)]

<div dir="ltr"><div dir="ltr"><div dir="ltr"><div dir="ltr"><div dir="ltr"><div \
dir="ltr"><div dir="ltr"><div dir="ltr">One problem that we&#39;re running into is \
that we encounter web pages and cgi scripts that are &quot;inconsistently&quot; \
normalized. I put &quot;inconsistently&quot; in quotes because without fully knowing \
the way ClamAV normalizes files, it is sometimes difficult to understand why two \
similar files might be normalized differently. For example, a PHP script that \
doesn&#39;t contain HTML tags will be normalized using &#39;ascii-normalise&#39;, \
while the exact same PHP code will be normalized with &#39;html-normalise&#39; if it \
happens to be tacked on to some HTML.</div><div dir="ltr"><br></div><div \
dir="ltr">This seems to be particularly prevalent with phishing kits, where we want \
to write a signature based on the PHP code, not necessarily on the HTML. As a result, \
we end up having to write two signatures because HTML normalization seems to remove \
the spaces around equal signs, while ASCII normalization leaves them in. \
Additionally, HTML normalization uses double-quotes (&quot;) to replace single-quotes \
(&#39;) while ASCII normalization leaves them as their \
original.<br></div></div></div></div></div></div><div dir="ltr"><div dir="ltr"><div \
dir="ltr"><div dir="ltr"><div dir="ltr"><div \
dir="ltr"><div><br></div><div>Example:</div><div>$ip = \
getenv(&quot;REMOTE_ADDR&quot;);<br></div><div>$password = \
$_POST[&#39;password&#39;];<br></div><div><br></div><div>ASCII \
normalized:</div><div>$ip = getenv(&quot;remote_addr&quot;);<br></div><div>$password \
= $_post[&#39;password&#39;];<br></div><div><br></div><div>HTML \
normalized:</div><div>$ip=getenv(&quot;remote_addr&quot;);<br></div><div>$password=$_post[&quot;password&quot;];<br></div><div><br></div><div>So, \
my question is this:</div><div>How can we get PHP tags ( &lt;? and &lt;?php ) marked \
as &#39;HTML&#39; file type so they are normalized the same as other &#39;web&#39; \
files?</div><div><br></div><div><br></div><div><div>  Also, there are more than a few \
HTML files that browsers render &#39;properly&#39; that don&#39;t contain the \
following tags:</div></div></div></div></div></div></div></div></div><blockquote \
style="margin:0 0 0 40px;border:none;padding:0px"><div><div><div><div><div><div><div>< \
div><div>&#39;&lt;html&gt;&#39;</div></div></div></div></div></div></div></div></div>< \
div><div><div><div><div><div><div><div><div>&#39;&lt;head&gt;&#39;</div></div></div></ \
div></div></div></div></div></div><div><div><div><div><div><div><div><div><div>&#39;&l \
t;a*href&#39;</div></div></div></div></div></div></div></div></div><div><div><div><div \
><div><div><div><div><div>&#39;&lt;img&#39;</div></div></div></div></div></div></div>< \
> /div></div><div><div><div><div><div><div><div><div><div>&#39;&lt;script&#39;</div></ \
> div></div></div></div></div></div></div></div><div><div><div><div><div><div><div><di \
> v><div>&#39;&lt;object&#39;</div></div></div></div></div></div></div></div></div><di \
> v><div><div><div><div><div><div><div><div>&#39;&lt;iframe&#39;</div></div></div></di \
> v></div></div></div></div></div><div><div><div><div><div><div><div><div><div>&#39;&l \
> t;table&#39;</div></div></div></div></div></div></div></div></div></blockquote><div \
> dir="ltr"><div dir="ltr"><div dir="ltr"><div dir="ltr"><div dir="ltr"><div \
> dir="ltr"><div dir="ltr"><div><br></div><div>A few other tags, such as &lt;style, \
> &lt;!doctype, &lt;meta, &lt;title, &lt;form, might help as well as fixing the html \
> and head tags to only require the leading &lt; (&lt;html instead of \
> &lt;html&gt;)</div><div><br></div><div><br></div><div>--Maarten</div></div></div></div></div></div></div></div></div>
> 



_______________________________________________

clamav-users mailing list
clamav-users@lists.clamav.net
https://lists.clamav.net/mailman/listinfo/clamav-users


Help us build a comprehensive ClamAV guide:
https://github.com/vrtadmin/clamav-faq

http://www.clamav.net/contact.html#ml


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic