'Re: [Wikitech-l] ZERO architecture'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       wikitech-l
Subject:    Re: [Wikitech-l] ZERO architecture
From:       Adam Baso <abaso () wikimedia ! org>
Date:       2013-05-31 18:25:01
Message-ID: CAB74=NpQEqNQ06KdMy3+Tqa7OLz1xLXRycqxbeKJV0oGRt=OXw () mail ! gmail ! com
[Download RAW message or body]

Sorry to reply on a thread that will probably not sort nicely on the
mailman web interface or threading mail clients. Anybody know of an
easy way to reply to digest email in Gmail such that mailman will
retain threading?

I'll be working on the conversion of Wikipedia Zero banners to ESI.
There will be good lessons here I think for looking at dynamic loading
in other parts of the interface.

Arthur, to address your questions in the parent to Mark's reply:

"is there any reason you'd need serve different HTML than what is
already being served by MobileFrontend?"

Currently, some languages are not whitelisted at carriers, meaning
that users may get billed when they hit <language>.m.wikipedia.org or
<language>.zero.wikipedia.org. Thus, a number of <a href>s are
rewritten to include interstitials if the link is going to result in a
charge. By the way, we see some shortcomings in the existing rewrites
that need to be corrected (e.g., some URLs don't have interstitials,
but should), but that's a separate bug.

My thinking is that we start intercepting all clicks by JavaScript if
it's a Javascripty browser, or via a default interceptor at the URL on
that <language>.(m|zero).wikipedia.org's language's corresponding
subdomain otherwise. In either case, if the destination link is on a
non-whitelisted Wikipedia domain for the carrier or if the link is
external of Wikipedia, the user should land at an interstitial page
hosted on the same whitelisted subdomain from whence the user came.

"Out of curiosity, is there WAP support in Zero? I noticed some
comments like '# WAP' in the varnish acls for Zero, so I presume so.
Is the Zero WAP experience different than the MobileFrontend WAP
experience?"

No special WAP considerations in ZeroRatedMobileAccess above and
beyond MobileFrontend, as I recall. The "# WAP" comments is just for
us to remember in case a support case comes up with that particular
carrier.

We'll want to keep in mind impacts for USSD/SMS support. I think
Jeremy had some good conversations at the Wikimedia Hackathon in
Amsterdam that will help him to refine how his middleware receives and
transforms content.

Mark Bergsma mark at wikimedia.org
Fri May 31 09:44:48 UTC 2013

Previous message: [Wikitech-l] ZERO architecture
Next message: [Wikitech-l] ZERO architecture
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

________________________________

>> * feature phones -- HTML only, the banner is inserted by the ESI
>> ** for carriers with free images
>> ** for carriers without free images
>>
>
> What about including ESI tags for banners for smart devices as well as
> feature phones, then either use ESI to insert the banner for both device
> types or, alternatively, for smart devices don't let Varnish populate the
> ESI chunk and instead use JS to replace the ESI tags with the banner? That
> way we can still serve the same HTML for smart phones and feature phones
> with images (one less thing for which to vary the cache).

I think the verdict is still out on whether it's better to use ESI for
Banners in Varnish or use JS for that client-side. I guess we'll have
to test and see.

> Are there carrier-specific things that would result in different HTML for
> devices that do not support JS, or can you get away with providing the same
> non-js experience for Zero as MobileFrontend (aside from the
> banner, presumably handled by ESI)? If not currently, do you think its
> feasible to do that (eg make carrier-variable links get handled via special
> pages so we can always rely on the same URIs)? Again, it would be nice if
> we could just rely on the same HTML to further reduce cache variance. It
> would be cool if MobileFrontend and Zero shared buckets and they were
> limited to:
>
> * HTML + images
> * HTML - images
> * WAP

That would be nice.

> Since we improved MobileFrontend to no longer vary the cache on X-Device,
> I've been surprised to not see a significant increase in our cache hit
> ratio (which warrants further investigation but that's another email). Are
> there ways we can do a deeper analysis of the state of the varnish cache to
> determine just how fragmented it is, why, and how much of a problem it
> actually is? I believe I've asked this before and was met with a response
> of 'not really' - but maybe things have changed now, or others on this list
> have different insight. I think we've mostly approached the issue with a
> lot more assumption than informed analysis, and if possible I think it
> would be good to change that.

Yeah, we should look into that. We've already flagged a few possible
culprits, and we're also working on the migration of the desktop wiki
cluster from Squid to Varnish, which has some of the same issues with
variance (sessions, XVO, cookies, Accept-Language...) as
MobileFrontend does. After we've finished migrating that and confirmed
that it's working well, we want to unify those clusters'
configurations a bit more, and that by itself should give us
additional opportunity to compare some strategies there.

We've since also figured out that the way we've calculate cache
efficiency with Varnish is not exactly ideal; unlike Squid, cache
purges are done as HTTP requests to Varnish. Therefore in Varnish,
those cache lookups are calculated into the cache hit rate, which
isn't very helpful. To make things worse, the few hundreds of purges a
second vs actual client traffic matter a lot more on the mobile
cluster (with much less traffic but a big content set) than it does
for our other clusters. So until we can factor that out in the Varnish
counters (might be possible in Varnish 4.0), we'll have to look at
other metrics.

More useful therefore is to check the actual backend fetches
("backend_req"), and these appear to have gone down some. Annoyingly,
every time we restart a Varnish instance we get a spike in the Ganglia
graphs, making the long-term graphs pretty much unusable. To fix that
we'll either need to patch Ganglia itself or move to some other stats
engine (statsd?). So we have a bit of work to do there on the Ops
front.

Note that we're about to replace all Varnish caches in eqiad by
(fewer) newer, much bigger boxes, and we've decided to also upgrade
the 4 mobile boxes with those same specs. And we're also doing that in
our new west coast caching data center as well as esams. This will
increase the mobile cache size a lot, and will hopefully help by
throwing resources at the problem.

--
Mark Bergsma <mark at wikimedia.org>
Lead Operations Architect
Wikimedia Foundation

_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
[prev in list] [next in list] [prev in thread] [next in thread]