[prev in list] [next in list] [prev in thread] [next in thread] 

List:       ruby-talk
Subject:    Re: rb_str_new with custom subclass of rb_cString
From:       Samuel Williams <space.ship.traveller () gmail ! com>
Date:       2017-03-20 4:27:12
Message-ID: CAHkN8V8oH2=My2_P3FLLNijdC+=q8MeGsxv+1zT4iCVyfq41CA () mail ! gmail ! com
[Download RAW message or body]

> Anyways, why do you need to subclass String?  You seem to care
> about performance, and you may lose a lot of it when subclassing.

I've got quite a few benchmarks that show that significant performance
is gained by my code. But, it's still a work in progress. I don't
assert that it's good or the right way, but I just see how I can
improve real world performance for our use case.

I'm tracking whether strings need to be escaped on output (e.g.
something similar to CGI.escape_html). It's almost a zero-cost
operation during parsing of the source markup, so I can avoid it
entirely. It's a big performance win, page render time improve by 50%.
The method needs to be low-cost, instantiating a custom sub-class of
string blows away any performance gains.

You can see a bit more how it works here:
https://github.com/ioquatix/trenni

So, when parsing Markup (approximately HTML), we track if any entities
were seen:
https://github.com/ioquatix/trenni/blob/342e7fe50bf35a937437e1ce706d72af808ac2b6/ext/trenni/markup.rl#L72


If entities are not seen, we don't need to escape the string on
output, we mark it with Trenni_markup_safe:
https://github.com/ioquatix/trenni/blob/342e7fe50bf35a937437e1ce706d72af808ac2b6/ext/trenni/markup.rl#L155
 https://github.com/ioquatix/trenni/blob/342e7fe50bf35a937437e1ce706d72af808ac2b6/ext/trenni/trenni.h#L71-L78


On output, we use this information, e.g. when generating tags:
https://github.com/ioquatix/trenni/blob/342e7fe50bf35a937437e1ce706d72af808ac2b6/ext/trenni/tag.c#L52-L53


Additionally, I found that CGI.escape_html is a bit slow, can be about
20% faster. I haven't finished my implementation since I'm just
exploring the various available optimisations. But, the tangible
result is page render times for a complex page on my laptop went from
about 15ms down to 6-7ms with native C code and avoiding escape_html
where possible.

Thanks for your interest.


On 20 March 2017 at 13:27, Eric Wong <e@80x24.org> wrote:
> Samuel Williams <space.ship.traveller@gmail.com> wrote:
> > Doing
> > 
> > ```
> > class MyStr < String; end
> > MyStr.new('Hello, World!')
> > ```
> > 
> > makes this hot code path too inefficient.
> > 
> > I found I can use rb_obj_reveal to change class, it worked perfectly
> > and performance was maintained. Not sure if this is a good idea but
> > according to the docs its okay, but the source code says not to use
> > it.
> 
> rb_obj_reveal is intended for hidden objects (klass == 0);
> I don't think ruby-core can guarantee it will remain working
> for changing non-hidden objects.
> 
> Anyways, why do you need to subclass String?  You seem to care
> about performance, and you may lose a lot of it when subclassing.
> 
> Unsubscribe: <mailto:ruby-talk-request@ruby-lang.org?subject=unsubscribe>
> <http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-talk>

Unsubscribe: <mailto:ruby-talk-request@ruby-lang.org?subject=unsubscribe>
<http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-talk>


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic