[prev in list] [next in list] [prev in thread] [next in thread] 

List:       avro-user
Subject:    Re: Nested schema issue (with "munged" invalid schema)
From:       Nick Palmer <palmer () cs ! vu ! nl>
Date:       2012-05-30 21:14:28
Message-ID: 32F0970E-BFE7-40C4-BDD2-A172CD894B3A () cs ! vu ! nl
[Download RAW message or body]

You cannot define the same type twice within the same schema so you need to change \
your "munge" step to produce the following:

{
    "name": "address2",
    "type": "record",
    "namespace" : "some.domain",
    "fields" : 
    [
        {
            "name": "street", 
            "type": "string"
        },
        {
            "name": "city", 
            "type": "string"
        },
        {
            "name": "position1",
            "type": {"type":"record","name":"location","namespace":"some.domain","fields":[{"name":"latitude","type":"float"},{"name":"longitude","type":"float"}]}
  },
        {
            "name": "position2",
            "type": "some.domain.location"
        }
    ]
}

~ Nick

On May 1, 2012, at 6:55 PM, Peter Cameron wrote:

> I'm having a problem with nesting schemas. A very brief overview of why we're using \
> Avro (successfully so far) is:  
> o code generation not required 
> o small binary format 
> o dynamic use of schemas at runtime 
> 
> We're doing a flavour of RPC, and the reason we're not using Avro's IDL and flavour \
> of RPC is because the endpoint is not necessarily a Java platform (C# and Java for \
> our purposes), and only the Java implementation of Avro has RPC. Hence no Avro RPC \
> for us.  
> I'm aware that Avro doesn't import nested schemas out of the box. We need that \
> functionality as we're exposed to schemas over which we have no control, and in the \
> interests of maintainability, these schemas are nicely partitioned and are \
> referenced as types from within other schemas. So, for example, a address schema \
> refers to a some.domain.location object by having a field of type \
> "some.domain.location". Note that our runtime has no knowledge of any some.domain \
> package (e.g. address or location objects). Only the endpoints know about \
> some.domain. (A layer at our endpoint runtime serialises any unknown i.e. \
> non-primitive objects as bytestreams.)  
> I implemented a schema cache which intelligently imports schemas on the fly, so \
> adding the address schema to the cache, automatically adds the location schema that \
> it refers to. The cache uses Avro's schema to parse an added schema, catches parse \
> exceptions, looks at the exception message to see whether or not the error is due \
> to a missing or undefined type, and thus goes off to import the needed schema. \
> Brittle, I know, but no other way for us. We need this functionality, and nothing \
> else comes close to Avro.  
> So far so good, until today when I hit a corner case. 
> 
> Say I have an address object that has two fields, called position1 and position2. \
> If position1 and position2 are non-primitive types, then the address schema doesn't \
> parse so presumably is an invalid Avro schema. The error concerns redefining the \
> location type. Here's the example:  
> location schema 
> ============== 
> 
> { 
> "name": "location", 
> "type": "record", 
> "namespace" : "some.domain", 
> "fields" : 
> [ 
> { 
> "name": "latitude", 
> "type": "float" 
> }, 
> { 
> "name": "longitude", 
> "type": "float" 
> } 
> ] 
> } 
> 
> address schema 
> ============== 
> 
> { 
> "name": "address", 
> "type": "record", 
> "namespace" : "some.domain", 
> "fields" : 
> [ 
> { 
> "name": "street", 
> "type": "string" 
> }, 
> { 
> "name": "city", 
> "type": "string" 
> }, 
> { 
> "name": "position1", 
> "type": "some.domain.location" 
> }, 
> { 
> "name": "position2", 
> "type": "some.domain.location" 
> } 
> ] 
> } 
> 
> 
> Now, an answer of having a list of positions as a field is not an answer for us, as \
> we need to solve the general issue of a schema with more than one instance of the \
> same nested type i.e. my problem is not with an address or location schema. 
> The problematic schema constructed by my schema cache is:
> 
> {
> "name": "address2",
> "type": "record",
> "namespace" : "some.domain",
> "fields" : 
> [
> {
> "name": "street", 
> "type": "string"
> },
> {
> "name": "city", 
> "type": "string"
> },
> {
> "name": "position1",
> "type": {"type":"record","name":"location","namespace":"some.domain","fields":[{"name":"latitude","type":"float"},{"name":"longitude","type":"float"}]}
>  },
> {
> "name": "position2",
> "type": {"type":"record","name":"location","namespace":"some.domain","fields":[{"name":"latitude","type":"float"},{"name":"longitude","type":"float"}]}
>  }
> ]
> }
> 
> 
> Can this be done? This is potentially a blocker for us. 
> 
> cheers, 
> Peter 
> 


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic