[prev in list] [next in list] [prev in thread] [next in thread] 

List:       pypy-dev
Subject:    Re: [pypy-dev] Great experience with PyPy
From:       Maciej Fijalkowski <fijall () gmail ! com>
Date:       2013-02-07 12:00:27
Message-ID: CAK5idxSgXMffeR2+0PZ=GihEfQKJ2opKnRKXk7--_Oo0bf7A_w () mail ! gmail ! com
[Download RAW message or body]

On Thu, Feb 7, 2013 at 1:55 PM, Marko Tasic <mtasic85@gmail.com> wrote:
> Hi,
>
> I would like to share short story with you and share what we have
> accomplished with PyPy and its friends so far.
>
> Company that I have worked for last 7 months (intentionally unnamed)
> gave me absolute permission to pick up technologies on which we based
> our solution. What we do is: crawl for PDFs and newspapers articles,
> download, translate them if needed, OCR if needed, do extensive
> analysis of downloaded PDFs and articles, store them in more organized
> structures for faster querying, search for them and generate bunch of
> complex reports.
>
> From very beginning I decided to go with PyPy no matter what. What we
> picked is following:
> * Flask for web framework, and few of its extensions such as
> Flask-Login, Flask-Principal, Flask-WTF, Flask-Mail, etc.
> * Cassandra as database because of its features and great experience
> with it. PyCassa is used as client to talk to Cassandra server.
> * ElasticSearch as distributed search engine, and its client library pyes.
> * Whoosh as search engine, but with some modifications to support
> Cassandra as storage and distributed locking.
> * Redis, and its client library redis-py, for caching and to speed up
> common auto-completion patterns.
> * ZooKeeper, and its client library Kazoo, for distributed locking
> which plays essential role in system for transaction-like behavior
> over many services at once.
> * Celery in conjunction with RabbitMQ for task distribution.
> * Sentry for error logging.
>
> What we have developed on our own are wrappers and clients for:
> * Moses which is language translator
> * Tesseract which is OCR engine
> * Cassandra store for Whoosh
> * wkhtmltopdf and wkhtmltoimage which are used for conversion of HTML
> to PDF/Image
> * etc
>
> Now when product is finished and in final testing phase, I can say
> that we did not regret because we used PyPy and stack around it.
> Typical speed improvement is 2x-3x over CPython in our case, but
> anyway we are mostly IO and memory bound, expect for Celery workers
> where we do analysis which are again many small CPU intensive tasks
> that are exchanged via RabbitMQ. Another reason why we don't see
> speedup us is that we are dependent on external software (servers)
> written in Erlang and Java.
>
> I'm already planing to do Cassandra (distributed key/value only
> database without index features), ZooKeeper, Redis and ElasticSearch
> ports in Python for next projects, and hopefully opensource them.
>
> Regards,
> Marko Tasic
> _______________________________________________
> pypy-dev mailing list
> pypy-dev@python.org
> http://mail.python.org/mailman/listinfo/pypy-dev

Awesome!

I'm glad people can make pypy work for non-trivial tasks which require
a lot of dependencies. We're trying to lower the bar, however it takes
time.

Cheers,
fijal
_______________________________________________
pypy-dev mailing list
pypy-dev@python.org
http://mail.python.org/mailman/listinfo/pypy-dev
[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic