[prev in list] [next in list] [prev in thread] [next in thread] 

List:       xmlrpc-dev
Subject:    [jira] [Updated] (XMLSCHEMA-65) Make XmlSchema return schemas in a collection in a predictable order
From:       Bjørn_Mølgård_Vester_(Jira) <jira () apache ! org>
Date:       2023-09-04 15:58:20
Message-ID: JIRA.13549581.1693840826000.53053.1693843200054 () Atlassian ! JIRA
[Download RAW message or body]


     [ https://issues.apache.org/jira/browse/XMLSCHEMA-65?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel \
]

Bjørn Mølgård Vester updated XMLSCHEMA-65:
------------------------------------------
    Description: 
When XmlSchema returns schemas in a collection, they are in an unpredictable order \
that depends on the hashcode for, among other things, the absolute path to them on \
the file system. This is a problem for reproducible builds, as the result will differ \
depending on where you store the schemas you work on.

For example, I use CXF's "wsdl2java" tool to generate Java classes from my schemas. \
The schemas are in a source control system, which is checked out by a CI server in \
different folders depending on the branch name among other things.

CXF keeps schemas in an XmlSchemaCollection instance and asks for all schemas during \
type generation. It then uses these to generate an "ObjectFactory" class that \
contains elements from these schemas. As the schemas are returned in a different \
order depending on the checkout folder path, the generated source code will contain \
the elements in different order from build to build.

This in turn breaks our build cache that fingerprints the schema files as inputs and \
generated code as output, which breaks the cache further in the build pipeline and \
leads to long build times.

While this could be fixed in CXF by sorting the schemas first, or by not generating a \
systemId that includes the full path, I think it will be beneficial to make that \
class return schemas in a predictable order for all clients, and not just CXF. The \
order could be the same as when schemas were added to begin with (requiring the \
client to iterate over them in a predictable order).

Technically, the XmlSchemaCollection class keeps schemas in a HashMap where the key \
is a SchemaKey, containing a "systemId" field that includes the full file path.

  was:
When XmlSchema returns schemas in a collection, they are in an unpredictable order \
that depends on the hashcode for, among other things, the absolute path to them on \
the file system. This is a problem for reproducible builds, as the result will differ \
depending on where you store the schemas you work on.

For example, I use CXF's "wsimport" tool to generate Java classes from my schemas. \
The schemas are in a source control system, which is checked out by a CI server in \
different folders depending on the branch name among other things.

CXF keeps schemas in an XmlSchemaCollection instance and asks for all schemas during \
type generation. It then uses these to generate an "ObjectFactory" class that \
contains elements from these schemas. As the schemas are returned in a different \
order depending on the checkout folder path, the generated source code will contain \
the elements in different order from build to build.

This in turn breaks our build cache that fingerprints the schema files as inputs and \
generated code as output, which breaks the cache further in the build pipeline and \
leads to long build times.

While this could be fixed in CXF by sorting the schemas first, or by not generating a \
systemId that includes the full path, I think it will be beneficial to make that \
class return schemas in a predictable order for all clients, and not just CXF. The \
order could be the same as when schemas were added to begin with (requiring the \
client to iterate over them in a predictable order).

Technically, the XmlSchemaCollection class keeps schemas in a HashMap where the key \
is a SchemaKey, containing a "systemId" field that includes the full file path.


> Make XmlSchema return schemas in a collection in a predictable order
> --------------------------------------------------------------------
> 
> Key: XMLSCHEMA-65
> URL: https://issues.apache.org/jira/browse/XMLSCHEMA-65
> Project: XmlSchema
> Issue Type: Improvement
> Affects Versions: 2.3.0
> Reporter: Bjørn Mølgård Vester
> Priority: Minor
> 
> When XmlSchema returns schemas in a collection, they are in an unpredictable order \
> that depends on the hashcode for, among other things, the absolute path to them on \
> the file system. This is a problem for reproducible builds, as the result will \
> differ depending on where you store the schemas you work on. For example, I use \
> CXF's "wsdl2java" tool to generate Java classes from my schemas. The schemas are in \
> a source control system, which is checked out by a CI server in different folders \
> depending on the branch name among other things. CXF keeps schemas in an \
> XmlSchemaCollection instance and asks for all schemas during type generation. It \
> then uses these to generate an "ObjectFactory" class that contains elements from \
> these schemas. As the schemas are returned in a different order depending on the \
> checkout folder path, the generated source code will contain the elements in \
> different order from build to build. This in turn breaks our build cache that \
> fingerprints the schema files as inputs and generated code as output, which breaks \
> the cache further in the build pipeline and leads to long build times. While this \
> could be fixed in CXF by sorting the schemas first, or by not generating a systemId \
> that includes the full path, I think it will be beneficial to make that class \
> return schemas in a predictable order for all clients, and not just CXF. The order \
> could be the same as when schemas were added to begin with (requiring the client to \
> iterate over them in a predictable order). Technically, the XmlSchemaCollection \
> class keeps schemas in a HashMap where the key is a SchemaKey, containing a \
> "systemId" field that includes the full file path.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@ws.apache.org
For additional commands, e-mail: dev-help@ws.apache.org


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic