[prev in list] [next in list] [prev in thread] [next in thread] 

List:       avro-dev
Subject:    [jira] [Created] (AVRO-946) GenericData.resolveUnion() performance improvement
From:       "Hernan Otero (Created) (JIRA)" <jira () apache ! org>
Date:       2011-10-27 21:30:32
Message-ID: 195985141.27168.1319751032275.JavaMail.tomcat () hel ! zones ! apache ! org
[Download RAW message or body]

GenericData.resolveUnion() performance improvement
--------------------------------------------------

                 Key: AVRO-946
                 URL: https://issues.apache.org/jira/browse/AVRO-946
             Project: Avro
          Issue Type: Improvement
          Components: java
    Affects Versions: 1.6.0
            Reporter: Hernan Otero


Due to the sequential nature of today's implementation of GenericData.resolveUnion() \
(used when serializing an object):

{code}
  public int resolveUnion(Schema union, Object datum) {
    int i = 0;
    for (Schema type : union.getTypes()) {
      if (instanceOf(type, datum))
        return i;
      i++;
    }
    throw new UnresolvedUnionException(union, datum);
  }
{code}

it showed up when we were doing some serialization performance analysis.  A simple \
optimization can be implemented by keeping a map within the UnionSchema object (in \
fact, this could actually be a perfect hash map given the potential values in the map \
are known in advance).  The optimization is obviously most notable when a Union \
within the schema contains many types (in our particular use case, more than 40 in \
some cases).  In this scenario, we observed a 25% improvement by using an identity \
hash map.

Even though using an identity map provides a significant boost, we have observed an \
even further improvement (and removed some of the restrictions of relying on object \
identity) by using a perfect hash map on the schema names (an extra 15% on top of \
that in some cases).  This implementation, unfortunately, is not something we could \
contribute at this point, but we thought it'd be a good idea to allow users to \
provide alternative implementations of the indexing behavior, such as adding the \
following static method to Schema:

{code}
public static void setUnionTypeIndexCacheFactory(UnionIndexCacheFactory factory)
{
  unionIndexCacheFactory = factory;
}
{code}

This is what the interface and identity hash map-based implementation would look \
like:

{code}
  /**
   * A factory interface for creating UnionTypeIndexCache instances.
   */
  public static interface UnionIndexCacheFactory
  {
      UnionIndexCache createUnionIndexCache(List<Schema> types);

      /**
       * Used for caching schema indices within a union.
       */
      public static interface UnionIndexCache
      {
          void setTypeIndex(Schema schema, int index);

          int getTypeIndex(Schema schema);
      }

  }

  private static class IdentityMapUnionIndexCacheFactory implements \
UnionIndexCacheFactory  {
      @Override
      public UnionIndexCache createUnionIndexCache(List<Schema> types)
      {
          return new UnionIndexCache()
          {
              private final IdentityHashMap<Schema, Integer> schemaToIndex = new \
IdentityHashMap<Schema, Integer>();

              @Override
              public void setTypeIndex(Schema schema, int index)
              {
                  schemaToIndex.put(schema, index);
              }

              @Override
              public int getTypeIndex(Schema schema)
              {
                  Integer index = schemaToIndex.get(schema);
                  return index == null ? -1 : index;
              }
          };
      }
  }
{code}

I will attach a patch later today or early tomorrow.

Thanks in advance,

Hernan Otero

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: \
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more \
information on JIRA, see: http://www.atlassian.com/software/jira

        


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic