'Re: libvmod-dns (super alpha)'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       varnish-dev
Subject:    Re: libvmod-dns (super alpha)
From:       Kenneth Shaw <kenshaw () gmail ! com>
Date:       2013-06-24 8:54:32
Message-ID: CAAyX=LEQLXD82zVuhHcfQTqi7GuUNK04HWHsha2rFm2W0YovVg () mail ! gmail ! com
[Download RAW message or body]

[Attachment #2 (multipart/alternative)]

I updated the tests on libvmod-dns -- 'make check' should now work as
expected.

-Ken

On Mon, Apr 8, 2013 at 9:40 PM, Kenneth Shaw <kenshaw@gmail.com> wrote:

> Hi All,
>
> This has been successfully deployed in production, and the code (as-is) is
> handling "many thousands" of connections per second from fake and
> legitimate bots advertising themselves as Googlebot/Bingbot/etc with no
> apparent issues/problems. The configuration we've deployed is essentially
> the same as provided here (and in the code base).
>
> Anyway, if anyone else ends up finding libvmod-dns helpful, please
> consider it "emailware" -- ie, drop me an email and let me know
> (off-the-record, of course) how you're making use of it. I'm curious more
> than anything!
>
>
>
> -Ken
>
>
> On Mon, Apr 1, 2013 at 6:21 PM, Kenneth Shaw <kenshaw@gmail.com> wrote:
>
>> Hi,
>>
>> I spent a bit of time today developing a DNS module for Varnish.
>>
>> It is available here:
>>
>> https://github.com/kenshaw/libvmod-dns/
>>
>> The reason for this development is to cut off bots that abuse the
>> User-Agent string (ie, claiming to be Googlebot/bingbot/etc.) by doing a
>> reverse and then forward DNS against the client.ip/X-Forwarded-For header
>> and comparing with a regex against the resultant domain.
>>
>> The logic is meant to work something like this:
>>
>> sub vcl_recv {
>>     # do a dns check on "good" crawlers
>>     if (req.http.user-agent ~ "(?i)(googlebot|bingbot|slurp|teoma)") {
>>         # do a reverse lookup on the client.ip (X-Forwarded-For) and
>> check that its in the allowed domains
>>         set req.http.X-Crawler-DNS-Reverse =
>> dns.rresolve(req.http.X-Forwarded-For);
>>
>>         # check that the RDNS points to an allowed domain -- 403 error if
>> it doesn't
>>         if (req.http.X-Crawler-DNS-Reverse !~
>> "(?i)\.(googlebot\.com|search\.msn\.com|crawl\.yahoo\.net|ask\.com)$") {
>>             error 403 "Forbidden";
>>         }
>>
>>         # do a forward lookup on the DNS
>>         set req.http.X-Crawler-DNS-Forward =
>> dns.resolve(req.http.X-Crawler-DNS-Reverse);
>>
>>         # if the client.ip/X-Forwarded-For doesn't match, then the
>> user-agent is fake
>>         if (req.http.X-Crawler-DNS-Forward != req.http.X-Forwarded-For) {
>>             error 403 "Forbidden";
>>         }
>>     }
>> }
>>
>> While this is not being used in production (yet), I plan to do so later
>> this week against a production system receiving ~10,000+ requests/sec. I
>> will report back afterwards.
>>
>> I realize the code currently has issues (memory, documentation, etc.),
>> which will be fixed in the near future.
>>
>> I also realize there are better ways to head malicious bots off at the
>> pass through DNS, etc (which we are doing as well). The largest issue here
>> for my purposes is that it is difficult / impossible to identify all
>> traffic. Additionally, it is nice to be able to monitor the actual traffic
>> coming through and not completely dropping it at the edge.
>>
>> Any input/comments against what I've written so far would be gladly
>> appreciated! Thanks!
>>
>> -Ken
>>
>
>

[Attachment #5 (text/html)]

<div dir="ltr">I updated the tests on libvmod-dns -- &#39;make check&#39; should now \
work as expected.</div><div class="gmail_extra"><br clear="all"><div><br>-Ken</div> \
<br><br><div class="gmail_quote">On Mon, Apr 8, 2013 at 9:40 PM, Kenneth Shaw <span \
dir="ltr">&lt;<a href="mailto:kenshaw@gmail.com" \
target="_blank">kenshaw@gmail.com</a>&gt;</span> wrote:<br><blockquote \
class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc \
solid;padding-left:1ex"> <div dir="ltr">Hi All,<div><br></div><div>This has been \
successfully deployed in production, and the code (as-is) is handling &quot;many \
thousands&quot; of connections per second from fake and legitimate bots advertising \
themselves as Googlebot/Bingbot/etc with no apparent issues/problems. The \
configuration we&#39;ve deployed is essentially the same as provided here (and in the \
code base).</div>

<div><br></div><div>Anyway, if anyone else ends up finding libvmod-dns helpful, \
please consider it &quot;emailware&quot; -- ie, drop me an email and let me know \
(off-the-record, of course) how you&#39;re making use of it. I&#39;m curious more \
than anything!</div>

<div><br></div></div><div class="gmail_extra"><br \
clear="all"><div><br>-Ken</div><div><div class="h5"> <br><br><div \
class="gmail_quote">On Mon, Apr 1, 2013 at 6:21 PM, Kenneth Shaw <span \
dir="ltr">&lt;<a href="mailto:kenshaw@gmail.com" \
target="_blank">kenshaw@gmail.com</a>&gt;</span> wrote:<br><blockquote \
class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc \
solid;padding-left:1ex">

<div dir="ltr">Hi,<div><br></div><div>I spent a bit of time today developing a DNS \
module for Varnish. </div><div><br></div><div>It is available \
here:</div><div><br></div><div><a href="https://github.com/kenshaw/libvmod-dns/" \
target="_blank">https://github.com/kenshaw/libvmod-dns/</a></div>

<div><br></div><div>The reason for this development is to cut off bots that abuse the \
User-Agent string (ie, claiming to be Googlebot/bingbot/etc.) by doing a reverse and \
then forward DNS against the client.ip/X-Forwarded-For header and comparing with a \
regex against the resultant domain. </div>

<div><br></div><div>The logic is meant to work something like \
this:</div><div><br></div><div>sub vcl_recv {</div><div><div>    # do a dns check on \
&quot;good&quot; crawlers</div><div>    if (req.http.user-agent ~ \
&quot;(?i)(googlebot|bingbot|slurp|teoma)&quot;) {</div>

<div>        # do a reverse lookup on the client.ip (X-Forwarded-For) and check that \
its in the allowed domains</div><div>        set req.http.X-Crawler-DNS-Reverse = \
dns.rresolve(req.http.X-Forwarded-For);</div><div><br>

</div><div>        # check that the RDNS points to an allowed domain -- 403 error if \
it doesn&#39;t</div><div>        if (req.http.X-Crawler-DNS-Reverse !~ \
&quot;(?i)\.(googlebot\.com|search\.msn\.com|crawl\.yahoo\.net|ask\.com)$&quot;) \
{</div>

<div>            error 403 &quot;Forbidden&quot;;</div><div>        }   \
</div><div><br></div><div>        # do a forward lookup on the DNS</div><div>        \
set req.http.X-Crawler-DNS-Forward = \
dns.resolve(req.http.X-Crawler-DNS-Reverse);</div>

<div><br></div><div>        # if the client.ip/X-Forwarded-For doesn&#39;t match, \
then the user-agent is fake </div><div>        if (req.http.X-Crawler-DNS-Forward != \
req.http.X-Forwarded-For) {</div><div>            error 403 \
&quot;Forbidden&quot;;</div>

<div>        }   </div><div>    } \
</div></div><div><div>}</div><div><br></div><div>While this is not being used in \
production (yet), I plan to do so later this week against a production system \
receiving ~10,000+ requests/sec. I will report back afterwards.</div>

<div><br></div><div>I realize the code currently has issues (memory, documentation, \
etc.), which will be fixed in the near future.</div><div><br></div><div>I also \
realize there are better ways to head malicious bots off at the pass through DNS, etc \
(which we are doing as well). The largest issue here for my purposes is that it is \
difficult / impossible to identify all traffic. Additionally, it is nice to be able \
to monitor the actual traffic coming through and not completely dropping it at the \
edge.</div>

<div><br></div><div>Any input/comments against what I&#39;ve written so far would be \
gladly appreciated! Thanks!</div><div><br>-Ken</div> </div></div>
</blockquote></div><br></div></div></div>
</blockquote></div><br></div>

_______________________________________________
varnish-dev mailing list
varnish-dev@varnish-cache.org
https://www.varnish-cache.org/lists/mailman/listinfo/varnish-dev

[prev in list] [next in list] [prev in thread] [next in thread]