[prev in list] [next in list] [prev in thread] [next in thread]
List: bioc-devel
Subject: [Bioc-devel] Question relating to extending a class and inclusion of data
From: Vilhelm Suksi <vksuks () utu ! fi>
Date: 2024-05-21 8:58:24
Message-ID: 4e830205fa0c43b5a9ea83a9cfb53818 () utu ! fi
[Download RAW message or body]
Hi!
Excuse the long email, but there are a number of things to be clarified in =
preparation for submitting the notame package which I have been developing =
to meet Bioconductor guidelines. As of now it passes almost all of the auto=
matic checks, with the exception of formatting and some functions that are =
over 50 lines long.
Background 1:
The notame package already has a significant following, and was published i=
n 2020 with an associated protocol article published in the "Metabolomics D=
ata Processing and Data Analysis=97Current Best Practices" special issue of=
the Metabolites journal (https://www.mdpi.com/2218-1989/10/4/135). The ori=
ginal package relies on the MetaboSet container class, which extends Expres=
sionSet with three slots, namely group_col, time_col and subject_col. These=
slots are used to store the names of the corresponding sample data columns=
, and are used as default arguments to most functions. This makes for a mor=
e streamlined experience. However, the submission guidelines state that exi=
sting classes should be preferred, such as SummarizedExperiment. We will be=
implementing support for SummarizedExperiment over the summer. We have inc=
luded a MetaboSet - SummarizedExperiment converter for interoperability. =
Q1: Can an initial Bioconductor submission rely on the Metaboset container =
class? Support for MetaboSet would do well to be included anyways for exist=
ing users until it is phased out.
Q2: Is it ok to extend the SummarizedExperiment class to utilize the three =
aforementioned slots? It could be called MetaboExperiment. Or should the fu=
nctions be modified such that said columns are specified explicitly, using =
SummarizedExperiment?
Background 2:
The notame package caters to untargeted LC-MS data analysis metabolic profi=
ling experiments, encompassing data pretreatment (quality control, normaliz=
ation, imputation and other steps leading up to feature selection) and feat=
ure selection (univariate analysis and supervised learning). Raw data prepr=
ocessing is not supported. Instead, the package offers utilities for flexib=
ly reading peak tables from an Excel file, resulting from various point-and=
-click software such as MS-DIAL. As such, data in Excel format needs to be =
included, but is not available in any Bioconductor package, although such E=
xcel data could be procured from existing data in Bioconductor. However, ex=
isting untargeted LC-MS data in Bioconductor can not be used, as is, to dem=
onstrate the full functionality of the notame package. With regard to featu=
re data, there needs to be several analytical modes. Sample data needs to i=
nclude study group, time point, subject ID and several batches. Blank sampl=
es would be good as well. Packages I have checked for data with the above s=
pecifications include FaahKO, MetaMSdata, msdata, msqc1, mtbls2, pmp, PtH2O=
2lipids, and ropls. As of now, the example data is not realistic in that it=
is scrambled and I have not yet been informed of the origin and modificati=
on of the data. =
Q3: If I get access to information about the origin and modification of the=
now used data, can I further modify it to satisfy the needs of the package=
for an initial Bioconductor release? Or does it need to be realistic? Cons=
ider this the explicit pre-approval inquiry for including data in the notam=
e package.
Q4: Do you think a separate ExperimentData package satisfying the specifica=
tions laid out in Background 2 is warranted? This could be included in a fu=
ture version with SummarizedExperiment/MetaboExperiment support.
Q5: The instructions state that the data needs to be documented (https://co=
ntributions.bioconductor.org/docs.html#doc-inst-script). Is the availabilit=
y of the original data strictly necessary? I notice many packages don't in=
clude documentation on how the data was procured.
Thanks,
Vilhelm Suksi
Turku Data Science Group
vksuks@utu.fi
_______________________________________________
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic