'Re: mod_perl output filter and mod_proxy, mod_cache'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       apache-modperl
Subject:    Re: mod_perl output filter and mod_proxy, mod_cache
From:       Tim Watts <tw () dionic ! net>
Date:       2011-07-15 5:53:28
Message-ID: 4E1FD5D8.10206 () dionic ! net
[Download RAW message or body]

Hi Andre,

Thanks for such a detailed reply:

On 14/07/11 21:07, André Warnier wrote:

>
> Back to the main issue.
>
> See this as just a bit more generic information, as to what/how you
> could think of solving your problem, apart from the other suggestions
> already submitted.
>
> 1) I am not sure about mod_perl I/O filters, because I never used them. (*)
> But in order to (conditionally/unconditionally) insert/delete/modify
> request/response headers, you can also write your own perl handler, and
> by choosing the appropriate type of PerlHandler, you can have it run at
> just about any point in the request/response cycle.
>
> The real power of mod_perl (if you haven't yet discovered that aspect),
> is that it allows you to insert your own code at just about any point of
> the Apache request processing cycle, and to do just about anything you
> want with any aspect of the request/response.
> That includes "interfering" with anything that other, non-perl, Apache
> modules do.

I've written auth handlers in mod_perl before - I did get the impression 
then the possibilities were extensive to do other things,

> See the following page for a good overview of the Apache request
> processing cycle, and what you can do with such PerlHandlers :
> http://perl.apache.org/docs/2.0/user/handlers/intro.html#mod_perl_Handlers_Categories
>
> You are probably more interested in the "HTTP Protocol" section. By
> clicking on each item in that list, you get and explanation of /when/
> that type of handle runs.
> (It's also indirectly a very good introduction to how Apache itself works).
>
> Such handlers are usually easy to write and configure, and the code to
> play with HTTP headers is also quite simple, if you know what to put in
> the header(s).

ah - that is very useful - I shall read that.

> 2) about mod_headers and mod_proxy playing together :
> The trouble is that (contrarily to the mod_perl documentation above) it
> is not usually clear at all in the Apache module's documentation, to
> find out during which exact phase of the Apache request processing each
> module runs.
>
> But I seem to remember something in mod_headers about an "early"
> attribute or parameter.
> Maybe that tells you more of when it runs (or can run), compared to
> mod_proxy.

Hmm - I did read the web page several times, must have missed that - I 
was nearly at the point of reading the source.

> 3) In the documentation of mod_proxy, there should be a possibility to
> configure it inside of a <Location(Match)> section, instead of
> "globally" (outside of any section).
> That forces you to decide more finely which URLs should or should not be
> proxied/forwarded to Tomcat, but it also (in my view) makes it more
> evident to combine the proxying instruction with other modules, like
> perl filters or handlers.
>
> In effect, from Apache's point of view, mod_proxy must be the equivalent
> of a "content-generating handler" (like a PerlResponseHandler), because
> for Apache, passing a request to mod_proxy for processing is not much
> different than passing it to any other internal response-generating
> handler.
> Apache in fact knows nothing of Tomcat. It passes a request to
> mod_proxy, and expects the response (or an error status) back from
> mod_proxy. It has no idea that behind mod_proxy is another server.

It is an interesting possibility that is also worth playing with,

Most of our servers are: redirect all to the proxy *except* a couple of 
url's which are either locally handled or sent to a different proxy.

This is quite typical:

RewriteEngine on
RewriteRule "^/media"  - [L] # Local
RewriteRule "^/django" - [L] # Local
# Otherwise proxy
RewriteRule "^/(.*)$" "http://tomcat.server:8180/webapp/$1" [P,L]
ProxyPassReverse   / http://tomcat.server:8180/webapp
ProxyPassReverseCookiePath /webapp /


Previously, this had been done with ProxyPass directives, including 
negative ones. This did not work well with some Rewrite rules that were 
also needed in some cases. So I tend to handle the whole thing with an 
ordered list of rewrite rules like above, using the proxy flag to those 
where required. It makes the ordering more obvious.

I have not yet tried a system of building the website with set sof 
Location directives, which might be interesting - though I do use 
Location sections to enforce redirects to SSL and requiring 
authentication. Apache is like perl, more than one way to do it.

>
> 4) strictly according to the HTTP protocol, a "GET" request should be
> "idempotent", which means (roughly) that running it twice or more should
> always give the same answer.
> Which in theory means that even if the GET request goes to a database,
> the response should be cacheable under most circumstances.
> Unfortunately, the practice is such that the GET request is much
> overused, and it is not always that way.
> But if caching the response creates problems, you can always tell your
> application developers that it is their fault because they are misusing
> the protocol..
>
> (In really strict terms, a GET /could/ provide a different response; but
> it should not modify the state of the server).

I do recall that.

> 5) despite what I am saying in (4), a GET response can very validly be
> different from a previous GET response with the same URL (for example,
> if in-between the data has been modified by a POST). So if you are
> forcing headers on the responses, you should at least be a bit careful
> not to do this indiscriminately.
>
> That is also why I personally have a doubt about the effectiveness of
> another caching proxy front-end like a couple were mentioned earlier. If
> the Tomcat web applications themselves do not provide headers to
> indicate whether their response can be cached or not, how is the
> front-end going to determine that this response /is/ the same as a
> previous one ?
> It seems to me that such a determination would require elements that
> such a proxy does not have, no ?

I agree - the tomcat apps *should* be declaring what is the correct 
caching scenario. But they don't. So this is very much a work around. 
However, for any given case, the dev folk usually remember enough about 
a project to say "the content of the database does not change, and GETs 
will be invariant as a result" (or not). It's on that basis I'm happy to 
proceed with a kludge, just to save my poor servers from melting(!). 
Well the servers are all VMs, so in more to stop old projects stealing 
resources that could be better used on new projects.

I feel I understand Cache-Control (vs Expires) a lot better since I 
optimised my own website with mod_cache on top of HTML::Mason/mod_perl 
(which do play nice) - and my Mason bits do send sensible Cache-Control 
lines. So I plan to give a small lunchtime seminar on that topic with 
some demos of using Google's pagespeed firebug plugin (very useful for 
this stuff).

The stupid thing is, it is probably trivial at design time to wedge 
extra HTTP headers in (maybe JSP has a framework level TTL/expires 
control - I don't know) but one has to know one *should* be doing it...

>
> Now if you are still there, one more question :
> Are we talking here of a configuration where one front-end Apache
> front-ends for several Tomcats possibly on different machines ?
> or does each Tomcat have its own personal Apache front-end on the same
> machine ?
> or something in-between ?

Mix. Older projects sent 3 different VHOSTS to 3 different remote tomcat 
servers, each of which was handling a dozen+ webapps for a dozen+ 
different apache servers.

This was a disaster as one bad webapp could take out the tomcat farm and 
the bloody logs are so useless it was impossible to find out which one.

These days, we have 3 different tomcat instances on the front machine 
(dev, staging, live/production) and one apache with 3 VHOSTs mapping to 
each tomcat. We may also blend in some django on the same machine. 
Apache may mix in static content itself for efficiciency (CSS/JS).

At least then, the development tomcat can be killed and restarted 
without breaking the live one (and no, "touching" the web.xml file to 
trigger a single webapp reload is about reliable as asking a robber to 
drop your cash off at the bank!).

They used to use a lot of perl - but I think perl lost it a bit with 
forms handling and Ajax (until recently perhaps) which is why everyone 
went off playing with jsp and now django.

I must admit django does seem well designed and I object to python a lot 
less than java. Disadvantage - django likes to write your SQL for you 
leading to a lack of thinking there - eg, one I caught the other day:

5 JOINs with a SELECT DISTINCT over all. Bloke wondered why the MySQL 
server took 40 seconds to compute the result!

>
> (*) considering the name of "filter" however, I would think that
> - an "input filter" should always run /before/ any module which
> generates content (of which mod_proxy is one)
> - an "output filter" should always run /after/ any modules which
> generate content.
> So, it is probably difficult to have a filter which runs /in-between/
> other Apache modules.

I'm still going to have a look at mod_perl filters - I have a feeling 
they could be useful here and there.

Thanks :)

Tim

-- 
Tim Watts
Personal Blog: http://www.dionic.net/tim/
[prev in list] [next in list] [prev in thread] [next in thread]
Configure | About | News | Add a list | Sponsored by KoreLogic