[prev in list] [next in list] [prev in thread] [next in thread] 

List:       cassandra-user
Subject:    Re: Poor performance; PHP & Thrift to blame
From:       Julian Simon <jsimon () jules ! com ! au>
Date:       2010-03-30 21:27:42
Message-ID: dfe1a2941003301427m44e29695t347674bd82cd0900 () mail ! gmail ! com
[Download RAW message or body]

Well, the app is written in PHP, and in order to use Cassandra for the
(small) aspect of the app which could make use of its' benefits, the
client code will need to be in PHP and run fairly speedily.

Hence my testing with PHP.

I suppose another question for me is: Are there any alternative
interfaces to Cassandra that don't involve the Thrift layer?



On Tue, Mar 30, 2010 at 11:15 PM, David Timothy Strauss
<david@fourkitchens.com> wrote:
> This sounds like the sort of analysis that shouldn't be done in PHP. Have you tried \
> Hadoop + Cassandra 0.6? 
> -----Original Message-----
> From: Julian Simon <jsimon@jules.com.au>
> Date: Tue, 30 Mar 2010 22:21:22
> To: <user@cassandra.apache.org>
> Subject: Re: Poor performance; PHP & Thrift to blame
> 
> Yes I tested it with and without APC - it had a negligible impact on
> performance.
> 
> This didn't surprise me - most of the optimization that APC offers is
> in the parsing of PHP code; seeing as the benchmark is a single PHP
> process the code parsing overhead occurs outside the benchmark loop.
> 
> Does anyone have any benchmarks for larger Cassandra queries from PHP
> similar to what I'm trying to do?  The performance bottlenecks don't
> show up on 1,5,10, or even 100 column query sets - only for larger
> sets or query loops.
> 
> Anyone doing time series analysis?  This is the sort of use case where
> I'd expect to see much larger query sets.
> 
> I suppose Facebook and Digg are only pulling out small column sets, so
> they wouldn't necessarily notice this issue.
> 
> 
> 
> On Tue, Mar 30, 2010 at 8:00 PM, David Timothy Strauss
> <david@fourkitchens.com> wrote:
> > Without APC, there should be even more of an improvement with the Thrift PHP \
> > extension. 
> > ----- "Rauan Maemirov" <rauan@maemirov.com> wrote:
> > 
> > > What about APC? Did you turn it on?
> > > 
> > > 2010/3/30 Julian Simon <jsimon@jules.com.au>:
> > > > Hi,
> > > > 
> > > > I've been trying to benchmark Cassandra for our use case and have
> > > been
> > > > seeing poor performance on both writes and (extremely) poor
> > > > performance on reads.
> > > > 
> > > > Using Cassandra 0.51 stable & thrift-0.2.0.
> > > > 
> > > > It turns out all the CPU time is going to the PHP client process -
> > > the
> > > > JVM operating the Cassandra server isn't breaking much of a sweat.
> > > > 
> > > > For reads the latency is often up to 1 second to fetch a row
> > > > containing ~2000 columns, or around 300ms to fetch a 500-column
> > > wide
> > > > row.  This is with get_slice(), and a predicate specifying the start
> > > &
> > > > finish range.
> > > > 
> > > > Using cachegrind and inspecting the code inside the Thrift bindings
> > > > makes it pretty clear why the performance is so bad, particularly
> > > on
> > > > reads. The biggest culprit is the translation code which casts data
> > > > back and forth into binary representations for sending over the
> > > wire
> > > > to the Cassandra server.
> > > > 
> > > > There seems to be some 32-bit specific code which iterates heavily
> > > > apparently due to a limitation in PHPs implementation of LONGs.
> > > > 
> > > > However, testing on a 64-bit host doesn't yield any performance
> > > improvement.
> > > > 
> > > > More surprisingly, if I compile and enable the PHP native thrift
> > > > bindings (following this guide
> > > > https://wiki.fourkitchens.com/display/PF/Using+Cassandra+with+PHP)
> > > > read performance actually degrades by another 50%.  I have verified
> > > > that the Thrift code is recognizing and using the native PHP
> > > functions
> > > > provided by the library.
> > > > 
> > > > I've tested all of this on both 32-bit and 64-bit installations of
> > > > both PHP 5.1 & 5.2.  Results are the same in all cases.
> > > > 
> > > > My environment is on vanilla CentOS 5.4 server installations inside
> > > > VMWare on a 4 core 64bit host with plenty of RAM and fast disks.
> > > > 
> > > > Has anyone been able to produce decent performance with PHP &
> > > > Cassandra?  If so, how have you done it?
> > > > 
> > > > Thanks,
> > > > Jules
> > > > 
> > 
> > --
> > David Strauss
> > > david@fourkitchens.com
> > > +1 512 577 5827 [mobile]
> > Four Kitchens
> > > http://fourkitchens.com
> > > +1 512 454 6659 [office]
> > > +1 512 870 8453 [direct]
> > 
> 


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic