[prev in list] [next in list] [prev in thread] [next in thread]
List: hadoop-dev
Subject: [DISCUSS] new repo/sub-project, fs-api-shim
From: Steve Loughran <stevel () cloudera ! com ! INVALID>
Date: 2023-05-31 19:47:36
Message-ID: CAL7CpJxk8ph8snRfcXg6PC7eNGyJyE2Q6Mt5q936HMqBy_hBoQ () mail ! gmail ! com
[Download RAW message or body]
I want to create a new repository to put a shim library to allow previous
releases to access the more recent hadoop filesystem APIs -currently the
open source implementations of parquet, ORC can't use vectored io, in
particular, even though we can in Cloudera. Providing a shim opens them up
to all *and* gets the APIs more broadly stressed/tested.
This needs to be in its own repository, not just for rapid initial release,
but because it is designed to be built as old a version of hadoop we can
reasonably support, which IMO means hadoop 3.1.0+. I know parquet still
wants to build against 2.8.x, but to claim support for hadoop 2 means
"build and test on java7", which is unrealistic in 2023.
Initial WiP implementation, which works with 3.1.0 and tests against others
https://github.com/steveloughran/fs-api-shim
the complexity is about testing this -I have contract tests which then need
to be executed on every supported hadoop release, which will need a
separate module for each one.
I can create the repo easily enough, just would like approval. And is the
name OK?
steve
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic