[prev in list] [next in list] [prev in thread] [next in thread]
List: varnish-dev
Subject: Re: libvmod-dns (super alpha)
From: Kenneth Shaw <kenshaw () gmail ! com>
Date: 2013-06-24 8:54:32
Message-ID: CAAyX=LEQLXD82zVuhHcfQTqi7GuUNK04HWHsha2rFm2W0YovVg () mail ! gmail ! com
[Download RAW message or body]
[Attachment #2 (multipart/alternative)]
I updated the tests on libvmod-dns -- 'make check' should now work as
expected.
-Ken
On Mon, Apr 8, 2013 at 9:40 PM, Kenneth Shaw <kenshaw@gmail.com> wrote:
> Hi All,
>
> This has been successfully deployed in production, and the code (as-is) is
> handling "many thousands" of connections per second from fake and
> legitimate bots advertising themselves as Googlebot/Bingbot/etc with no
> apparent issues/problems. The configuration we've deployed is essentially
> the same as provided here (and in the code base).
>
> Anyway, if anyone else ends up finding libvmod-dns helpful, please
> consider it "emailware" -- ie, drop me an email and let me know
> (off-the-record, of course) how you're making use of it. I'm curious more
> than anything!
>
>
>
> -Ken
>
>
> On Mon, Apr 1, 2013 at 6:21 PM, Kenneth Shaw <kenshaw@gmail.com> wrote:
>
>> Hi,
>>
>> I spent a bit of time today developing a DNS module for Varnish.
>>
>> It is available here:
>>
>> https://github.com/kenshaw/libvmod-dns/
>>
>> The reason for this development is to cut off bots that abuse the
>> User-Agent string (ie, claiming to be Googlebot/bingbot/etc.) by doing a
>> reverse and then forward DNS against the client.ip/X-Forwarded-For header
>> and comparing with a regex against the resultant domain.
>>
>> The logic is meant to work something like this:
>>
>> sub vcl_recv {
>> # do a dns check on "good" crawlers
>> if (req.http.user-agent ~ "(?i)(googlebot|bingbot|slurp|teoma)") {
>> # do a reverse lookup on the client.ip (X-Forwarded-For) and
>> check that its in the allowed domains
>> set req.http.X-Crawler-DNS-Reverse =
>> dns.rresolve(req.http.X-Forwarded-For);
>>
>> # check that the RDNS points to an allowed domain -- 403 error if
>> it doesn't
>> if (req.http.X-Crawler-DNS-Reverse !~
>> "(?i)\.(googlebot\.com|search\.msn\.com|crawl\.yahoo\.net|ask\.com)$") {
>> error 403 "Forbidden";
>> }
>>
>> # do a forward lookup on the DNS
>> set req.http.X-Crawler-DNS-Forward =
>> dns.resolve(req.http.X-Crawler-DNS-Reverse);
>>
>> # if the client.ip/X-Forwarded-For doesn't match, then the
>> user-agent is fake
>> if (req.http.X-Crawler-DNS-Forward != req.http.X-Forwarded-For) {
>> error 403 "Forbidden";
>> }
>> }
>> }
>>
>> While this is not being used in production (yet), I plan to do so later
>> this week against a production system receiving ~10,000+ requests/sec. I
>> will report back afterwards.
>>
>> I realize the code currently has issues (memory, documentation, etc.),
>> which will be fixed in the near future.
>>
>> I also realize there are better ways to head malicious bots off at the
>> pass through DNS, etc (which we are doing as well). The largest issue here
>> for my purposes is that it is difficult / impossible to identify all
>> traffic. Additionally, it is nice to be able to monitor the actual traffic
>> coming through and not completely dropping it at the edge.
>>
>> Any input/comments against what I've written so far would be gladly
>> appreciated! Thanks!
>>
>> -Ken
>>
>
>
[Attachment #5 (text/html)]
<div dir="ltr">I updated the tests on libvmod-dns -- 'make check' should now \
work as expected.</div><div class="gmail_extra"><br clear="all"><div><br>-Ken</div> \
<br><br><div class="gmail_quote">On Mon, Apr 8, 2013 at 9:40 PM, Kenneth Shaw <span \
dir="ltr"><<a href="mailto:kenshaw@gmail.com" \
target="_blank">kenshaw@gmail.com</a>></span> wrote:<br><blockquote \
class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc \
solid;padding-left:1ex"> <div dir="ltr">Hi All,<div><br></div><div>This has been \
successfully deployed in production, and the code (as-is) is handling "many \
thousands" of connections per second from fake and legitimate bots advertising \
themselves as Googlebot/Bingbot/etc with no apparent issues/problems. The \
configuration we've deployed is essentially the same as provided here (and in the \
code base).</div>
<div><br></div><div>Anyway, if anyone else ends up finding libvmod-dns helpful, \
please consider it "emailware" -- ie, drop me an email and let me know \
(off-the-record, of course) how you're making use of it. I'm curious more \
than anything!</div>
<div><br></div></div><div class="gmail_extra"><br \
clear="all"><div><br>-Ken</div><div><div class="h5"> <br><br><div \
class="gmail_quote">On Mon, Apr 1, 2013 at 6:21 PM, Kenneth Shaw <span \
dir="ltr"><<a href="mailto:kenshaw@gmail.com" \
target="_blank">kenshaw@gmail.com</a>></span> wrote:<br><blockquote \
class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc \
solid;padding-left:1ex">
<div dir="ltr">Hi,<div><br></div><div>I spent a bit of time today developing a DNS \
module for Varnish. </div><div><br></div><div>It is available \
here:</div><div><br></div><div><a href="https://github.com/kenshaw/libvmod-dns/" \
target="_blank">https://github.com/kenshaw/libvmod-dns/</a></div>
<div><br></div><div>The reason for this development is to cut off bots that abuse the \
User-Agent string (ie, claiming to be Googlebot/bingbot/etc.) by doing a reverse and \
then forward DNS against the client.ip/X-Forwarded-For header and comparing with a \
regex against the resultant domain. </div>
<div><br></div><div>The logic is meant to work something like \
this:</div><div><br></div><div>sub vcl_recv {</div><div><div> # do a dns check on \
"good" crawlers</div><div> if (req.http.user-agent ~ \
"(?i)(googlebot|bingbot|slurp|teoma)") {</div>
<div> # do a reverse lookup on the client.ip (X-Forwarded-For) and check that \
its in the allowed domains</div><div> set req.http.X-Crawler-DNS-Reverse = \
dns.rresolve(req.http.X-Forwarded-For);</div><div><br>
</div><div> # check that the RDNS points to an allowed domain -- 403 error if \
it doesn't</div><div> if (req.http.X-Crawler-DNS-Reverse !~ \
"(?i)\.(googlebot\.com|search\.msn\.com|crawl\.yahoo\.net|ask\.com)$") \
{</div>
<div> error 403 "Forbidden";</div><div> } \
</div><div><br></div><div> # do a forward lookup on the DNS</div><div> \
set req.http.X-Crawler-DNS-Forward = \
dns.resolve(req.http.X-Crawler-DNS-Reverse);</div>
<div><br></div><div> # if the client.ip/X-Forwarded-For doesn't match, \
then the user-agent is fake </div><div> if (req.http.X-Crawler-DNS-Forward != \
req.http.X-Forwarded-For) {</div><div> error 403 \
"Forbidden";</div>
<div> } </div><div> } \
</div></div><div><div>}</div><div><br></div><div>While this is not being used in \
production (yet), I plan to do so later this week against a production system \
receiving ~10,000+ requests/sec. I will report back afterwards.</div>
<div><br></div><div>I realize the code currently has issues (memory, documentation, \
etc.), which will be fixed in the near future.</div><div><br></div><div>I also \
realize there are better ways to head malicious bots off at the pass through DNS, etc \
(which we are doing as well). The largest issue here for my purposes is that it is \
difficult / impossible to identify all traffic. Additionally, it is nice to be able \
to monitor the actual traffic coming through and not completely dropping it at the \
edge.</div>
<div><br></div><div>Any input/comments against what I've written so far would be \
gladly appreciated! Thanks!</div><div><br>-Ken</div> </div></div>
</blockquote></div><br></div></div></div>
</blockquote></div><br></div>
_______________________________________________
varnish-dev mailing list
varnish-dev@varnish-cache.org
https://www.varnish-cache.org/lists/mailman/listinfo/varnish-dev
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic