'[Bioc-devel] Question relating to extending a class and inclusion of data'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       bioc-devel
Subject:    [Bioc-devel] Question relating to extending a class and inclusion of data
From:       Vilhelm Suksi <vksuks () utu ! fi>
Date:       2024-05-21 8:58:24
Message-ID: 4e830205fa0c43b5a9ea83a9cfb53818 () utu ! fi
[Download RAW message or body]

Hi!

Excuse the long email, but there are a number of things to be clarified in =
preparation for submitting the notame package which I have been developing =
to meet Bioconductor guidelines. As of now it passes almost all of the auto=
matic checks, with the exception of formatting and some functions that are =
over 50 lines long.

Background 1:
The notame package already has a significant following, and was published i=
n 2020 with an associated protocol article published in the "Metabolomics D=
ata Processing and Data Analysis=97Current Best Practices" special issue of=
 the Metabolites journal (https://www.mdpi.com/2218-1989/10/4/135). The ori=
ginal package relies on the MetaboSet container class, which extends Expres=
sionSet with three slots, namely group_col, time_col and subject_col. These=
 slots are used to store the names of the corresponding sample data columns=
, and are used as default arguments to most functions. This makes for a mor=
e streamlined experience. However, the submission guidelines state that exi=
sting classes should be preferred, such as SummarizedExperiment. We will be=
 implementing support for SummarizedExperiment over the summer. We have inc=
luded a MetaboSet - SummarizedExperiment converter for interoperability. =

Q1: Can an initial Bioconductor submission rely on the Metaboset container =
class? Support for MetaboSet would do well to be included anyways for exist=
ing users until it is phased out.

Q2: Is it ok to extend the SummarizedExperiment class to utilize the three =
aforementioned slots? It could be called MetaboExperiment. Or should the fu=
nctions be modified such that said columns are specified explicitly, using =
SummarizedExperiment?

Background 2:
The notame package caters to untargeted LC-MS data analysis metabolic profi=
ling experiments, encompassing data pretreatment (quality control, normaliz=
ation, imputation and other steps leading up to feature selection) and feat=
ure selection (univariate analysis and supervised learning). Raw data prepr=
ocessing is not supported. Instead, the package offers utilities for flexib=
ly reading peak tables from an Excel file, resulting from various point-and=
-click software such as MS-DIAL. As such, data in Excel format needs to be =
included, but is not available in any Bioconductor package, although such E=
xcel data could be procured from existing data in Bioconductor. However, ex=
isting untargeted LC-MS data in Bioconductor can not be used, as is, to dem=
onstrate the full functionality of the notame package. With regard to featu=
re data, there needs to be several analytical modes. Sample data needs to i=
nclude study group, time point, subject ID and several batches. Blank sampl=
es would be good as well. Packages I have checked for data with the above s=
pecifications include FaahKO, MetaMSdata, msdata, msqc1, mtbls2, pmp, PtH2O=
2lipids, and ropls. As of now, the example data is not realistic in that it=
 is scrambled and I have not yet been informed of the origin and modificati=
on of the data. =

Q3: If I get access to information about the origin and modification of the=
 now used data, can I further modify it to satisfy the needs of the package=
 for an initial Bioconductor release? Or does it need to be realistic? Cons=
ider this the explicit pre-approval inquiry for including data in the notam=
e package.

Q4: Do you think a separate ExperimentData package satisfying the specifica=
tions laid out in Background 2 is warranted? This could be included in a fu=
ture version with SummarizedExperiment/MetaboExperiment support.

Q5: The instructions state that the data needs to be documented (https://co=
ntributions.bioconductor.org/docs.html#doc-inst-script). Is the availabilit=
y of the original data strictly necessary?  I notice many packages don't in=
clude documentation on how the data was procured.

Thanks,
Vilhelm Suksi
Turku Data Science Group
vksuks@utu.fi

_______________________________________________
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel
[prev in list] [next in list] [prev in thread] [next in thread]