[prev in list] [next in list] [prev in thread] [next in thread] 

List:       hadoop-dev
Subject:    [DISCUSS] new repo/sub-project, fs-api-shim
From:       Steve Loughran <stevel () cloudera ! com ! INVALID>
Date:       2023-05-31 19:47:36
Message-ID: CAL7CpJxk8ph8snRfcXg6PC7eNGyJyE2Q6Mt5q936HMqBy_hBoQ () mail ! gmail ! com
[Download RAW message or body]


I want to create a new repository to put a shim library to allow previous
releases to access the more recent hadoop filesystem APIs -currently the
open source implementations of parquet, ORC can't use vectored io, in
particular, even though we can in Cloudera. Providing a shim opens them up
to all *and* gets the APIs more broadly stressed/tested.
This needs to be in its own repository, not just for rapid initial release,
but because it is designed to be built as old a version of hadoop we can
reasonably support, which IMO means hadoop 3.1.0+. I know parquet still
wants to build against 2.8.x, but to claim support for hadoop 2 means
"build and test on java7", which is unrealistic in 2023.

Initial WiP implementation, which works with 3.1.0 and tests against others
https://github.com/steveloughran/fs-api-shim
the complexity is about testing this -I have contract tests which then need
to be executed on every supported hadoop release, which will need a
separate module for each one.

I can create the repo easily enough, just would like approval. And is the
name OK?

steve


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic