[prev in list] [next in list] [prev in thread] [next in thread] 

List:       r-devel
Subject:    Re: [Rd] What is the best way to loop over an ALTREP vector?
From:       Romain Francois <romain () rstudio ! com>
Date:       2019-09-24 8:39:13
Message-ID: 46FC2D32-15D1-45D8-8666-4C101D0FCDAD () rstudio ! com
[Download RAW message or body]

Thanks for these comments. I should alter the blog post or write some follow up. 

This was a weekend blog post that only benefited from a short time of research \
research. I'm glad people find it useful, but I'm sure a detailed documentation of \
the features from the authors would be more useful. 

Romain

> Le 24 sept. 2019 à 07:48, Gabriel Becker <gabembecker@gmail.com> a écrit :
> 
> Hi Bob,
> 
> Thanks for sending around the link to that. It looks mostly right and looks
> like a useful onramp. There are a few things to watch out for though (I've
> cc'ed Romain so he's aware of these comments). @romain I hope you taake the
> following comments as they are intended, as help rather than attacks.
> 
> The largest issue I see is that the contract for Get_region is that it
> *populates the
> provided buffer with a copy of the data. *That buffer is expected to be
> safe to destructively modify, shuffle, etc though I don't know if we are
> actually doing that anywhere. As such, if I understand his C++ correctly,
> that Get_region method  is not safe and shouldn't be used.
> 
> The other point is that Dataptr_or_null is not actually *guaranteed *not to
> allocate. The default method returns NULL, but we have no way of preventing
> an allocation in a user-defined method, and probably (?) no easy way of
> detecting that it is occurring before it causes a bug. That said, Romain is
> correct that when you are writing Dataptr_or_null methods you should write
> them so that they don't allocate, generally. Basically your methods for
> Dataptr_or_null shouldn't allocate, but you also should not write code that
> relies on hard assumptions that no one's ever will.
> 
> Also, a small nitpick, R's internal mean function doesn't hit Dataptr, it
> hits either INTEGER_ELT (which really should probably be a
> ITERATE_BY_REGION) or ITERATE_BY_REGION.
> 
> Anyway, I hope that helps.
> ~G
> 
> 
> 
> 
> 
> 
> On Mon, Sep 23, 2019 at 6:12 PM Bob Rudis <bob@rud.is <mailto:bob@rud.is>> wrote:
> 
> > Not sure if you're using just C++ or Rcpp for C++ access but
> > https://purrple.cat/blog/2018/10/14/altrep-and-cpp/ has some tips on
> > using C++ w/ALTREP.
> > 
> > > On Sep 23, 2019, at 3:17 PM, Wang Jiefei <szwjf08@gmail.com> wrote:
> > > 
> > > Sorry for post a lot of things, for the first part of code, I copied my
> > C++
> > > iter macro by mistake(and you can see an explicit type casting). Here is
> > > the macro definition from R_exts/Itermacros.h
> > > 
> > > #define ITERATE_BY_REGION_PARTIAL(sx, px, idx, nb, etype, vtype,     \
> > > 
> > > strt, nfull, expr) do {         \
> > > 
> > > *       const** etype *px = DATAPTR_OR_NULL(sx);           *
> > \
> > > 
> > > if (px != NULL) {                                      \
> > > 
> > > R_xlen_t __ibr_n__ = strt + nfull;                        \
> > > 
> > > R_xlen_t nb = __ibr_n__;                                  \
> > > 
> > > for (R_xlen_t idx = strt; idx < __ibr_n__; idx += nb) {   \
> > > 
> > > expr                                            \
> > > 
> > > }                                                 \
> > > 
> > > }                                                      \
> > > 
> > > else ITERATE_BY_REGION_PARTIAL0(sx, px, idx, nb, etype, vtype,
> > > \
> > > 
> > > strt, nfull, expr);        \
> > > 
> > > } while (0)
> > > 
> > > 
> > > Best,
> > > 
> > > Jiefei
> > > 
> > > On Mon, Sep 23, 2019 at 3:12 PM Wang Jiefei <szwjf08@gmail.com> wrote:
> > > 
> > > > Hi Gabriel,
> > > > 
> > > > I have tried the macro and found a small issue, it seems like the macro
> > is
> > > > written in C and does an implicit type conversion(const void * to const
> > int
> > > > *), see below. While it is allowed in C, C++ seems not happy with it.
> > Is it
> > > > possible to add an explicit type casting so that it can be compatible
> > with
> > > > both language?
> > > > 
> > > > 
> > > > #define ITERATE_BY_REGION_PARTIAL(sx, px, idx, nb, etype, vtype,     \
> > > > 
> > > > strt, nfull, expr) do {         \
> > > > 
> > > > *const etype *px = (const** etype *)DATAPTR_OR_NULL(sx);  *
> > > > \
> > > > 
> > > > if (px != NULL) {                                      \
> > > > 
> > > > R_xlen_t __ibr_n__ = strt + nfull;                        \
> > > > 
> > > > R_xlen_t nb = __ibr_n__;                                  \
> > > > 
> > > > for (R_xlen_t idx = strt; idx < __ibr_n__; idx += nb) {   \
> > > > 
> > > > expr                                            \
> > > > 
> > > > }                                                 \
> > > > 
> > > > }                                                      \
> > > > 
> > > > else ITERATE_BY_REGION_PARTIAL0(sx, px, idx, nb, etype,
> > > > vtype,       \
> > > > 
> > > > strt, nfull, expr);        \
> > > > 
> > > > } while (0)
> > > > 
> > > > 
> > > > Also, I notice that the element type(etype) and vector type(vtype) has
> > > > to be specified in the macro. Since the SEXP is the first argument in
> > the
> > > > macro, it seems redundant to define etype and vtype for they have to
> > match
> > > > the type of the SEXP. I'm wondering if this is intentional? Will there
> > be a
> > > > type-free macro in R in the future? Here is a simple type-free macro I'm
> > > > using.
> > > > 
> > > > #define type_free_iter(sx, ptr, ind, nbatch,expr)\
> > > > 
> > > > switch(TYPEOF(sx)){\
> > > > 
> > > > case INTSXP:\
> > > > 
> > > > ITERATE_BY_REGION(sx, ptr, ind, nbatch, int, INTEGER, expr);\
> > > > 
> > > > break; \
> > > > 
> > > > case REALSXP:\
> > > > 
> > > > ITERATE_BY_REGION(sx, ptr, ind, nbatch, double, REAL, expr);\
> > > > 
> > > > break; \
> > > > 
> > > > case LGLSXP:\
> > > > 
> > > > ITERATE_BY_REGION(sx, ptr, ind, nbatch, int, LOGICAL, expr);\
> > > > 
> > > > break; \
> > > > 
> > > > default:\
> > > > 
> > > > Rf_error("Unknow data type\n"); \
> > > > 
> > > > break; \
> > > > 
> > > > }
> > > > 
> > > > 
> > > > 
> > > > // [[Rcpp::export]]
> > > > 
> > > > double sillysum(SEXP x) {
> > > > 
> > > > double s = 0.0;
> > > > 
> > > > type_free_iter(x, ptr, ind, nbatch,
> > > > 
> > > > {
> > > > 
> > > > for (int i = 0; i < nbatch; i++) { s = s + ptr[i]; }
> > > > 
> > > > });
> > > > 
> > > > return s;
> > > > 
> > > > }
> > > > 
> > > > 
> > > > 
> > > > 
> > > > Best,
> > > > 
> > > > Jiefei
> > > > 
> > > > On Wed, Aug 28, 2019 at 2:32 PM Wang Jiefei <szwjf08@gmail.com> wrote:
> > > > 
> > > > > Thank you, Gabriel. The loop macro is very helpful. It is also exciting
> > > > > to see that there are lots of changes in ALTREP in R devel version. I
> > > > > really appreciate your help!
> > > > > 
> > > > > Best,
> > > > > Jiefei
> > > > > 
> > > > > On Wed, Aug 28, 2019 at 7:37 AM Gabriel Becker <gabembecker@gmail.com>
> > > > > wrote:
> > > > > 
> > > > > > Jiefei,
> > > > > > 
> > > > > > I've been meaning to write up something about this so hopefully this
> > > > > > will be an impetus for me to actually do that, but until then,
> > responses
> > > > > > inline.
> > > > > > 
> > > > > > 
> > > > > > On Tue, Aug 27, 2019, 7:22 PM Wang Jiefei <szwjf08@gmail.com> wrote:
> > > > > > 
> > > > > > > Hi devel team,
> > > > > > > 
> > > > > > > I'm working on C/C++ level ALTREP compatibility for a package. The
> > > > > > > package
> > > > > > > previously used pointers to access the data of a SEXP, so it would
> > not
> > > > > > > work
> > > > > > > for some ALTREP objects which do not have a pointer. I plan to
> > rewrite
> > > > > > > the
> > > > > > > code and use functions like get_elt, get_region, and get_subset to
> > > > > > > access
> > > > > > > the values of a vector, so I have a few questions for ALTREP:
> > > > > > > 
> > > > > > > 1. Since an ALTREP do not have to define all of the above
> > > > > > > functions(element, region, subset), is there any way to check which
> > > > > > > function has been defined for an ALTREP class? I did a search on
> > > > > > > RInternal.h and altrep.c but did not find a solution for it. If not,
> > > > > > > will
> > > > > > > it be added in the future?
> > > > > > > 
> > > > > > 
> > > > > > Element and region are guaranteed to always be defined and work (for
> > > > > > altrep and non-altrep INTSXP, REALSXP, LGLSXPs, etc, we currently
> > don't
> > > > > > have region for STRSXP or VECSXP, I believe). If the altrep class
> > does not
> > > > > > provide them then default methods will be used, which may be
> > inefficient in
> > > > > > some cases but will work. Subset is currently a forward looking stub,
> > but
> > > > > > once implimented, that will also be guaranteed to work for all valid
> > ALTREP
> > > > > > classes.
> > > > > > 
> > > > > > 
> > > > > > > 
> > > > > > > 2. Given the diversity of ALTREP classes, what is the best way to
> > loop
> > > > > > > over
> > > > > > > an ALTREP object? I hope there can be an all-in-one function which
> > can
> > > > > > > get
> > > > > > > the values from a vector as long as at least one of the above
> > functions
> > > > > > > has
> > > > > > > been defined, so package developers would not be bothered by tons of
> > > > > > > `if-else` statement if they want their package to work with ALTREP.
> > > > > > > Since
> > > > > > > it seems like there is no such function exist, what could be the best
> > > > > > > way
> > > > > > > to do the loop under the current R version?
> > > > > > > 
> > > > > > 
> > > > > > The best way to loop over all SEXPs, which supports both altrep and
> > > > > > nonaltrep objects is, with the ITERATE_BY_REGION (which has been in R
> > for a
> > > > > > number of released versions, at least since 3.5.0 I think) and the
> > much
> > > > > > newer (devel only) ITERATE_BY_REGION_PARTIAL macros defined in
> > > > > > R_exts/Itermacros.h
> > > > > > 
> > > > > > The meaning of the arguments is as follows for
> > ITERATE_BY_REGION_PARTIAL
> > > > > > are as follows (ITERATE_BY_REGION is the same except no strt, and
> > nfull).
> > > > > > 
> > > > > > 
> > > > > > - sx - C level variable name of the SEXP to iterate over
> > > > > > - px - variable name to use for the pointer populated with data from
> > > > > > a region of sx
> > > > > > - idx - variable name to use for the "outer", batch counter in the
> > > > > > for loop. This will contain the 0-indexed start position of the
> > batch
> > > > > > you're currently processing
> > > > > > - nb - variable name to use for the current batch size. This will
> > > > > > always either be GET_REGION_BUFFSIZE (512), or the number of
> > elements
> > > > > > remaining in the vector, whichever is smaller
> > > > > > - etype - element (C) type, e.g., int, double, of the data
> > > > > > - vtype - vector (access API) type, e.g, INTEGER, REAL
> > > > > > - strt - the 0-indexed position in the vector to start iterating
> > > > > > - nfull - the total number oif elements to iterate over from the
> > > > > > vector
> > > > > > - expr - the code to process a single batch (Which will do things to
> > > > > > px, typically)
> > > > > > 
> > > > > > 
> > > > > > So code to perform badly implemented not good idea summing of REALSXP
> > > > > > data might look like
> > > > > > 
> > > > > > double sillysum(SEXP x) {
> > > > > > 
> > > > > > double s = 0.0;
> > > > > > 
> > > > > > ITERATE_BY_REGION(x, ptr, ind, nbatch, double, REAL,
> > > > > > {
> > > > > > 
> > > > > > for(int i = 0; i < nbatch; i++) { s = s + ptr[i];}
> > > > > > })
> > > > > > 
> > > > > > return s;
> > > > > > }
> > > > > > 
> > > > > > For meatier examples of ITERATE_BY_REGION's use in practice you can
> > grep
> > > > > > the R sources. I know it is used in the implementations of the various
> > > > > > C-level summaries (summary.c), print and formatting functions, and
> > anyNA.
> > > > > > 
> > > > > > Some things to remember
> > > > > > 
> > > > > > - If you have an inner loop like the one above, your total position
> > > > > > in the original vector is ind + i
> > > > > > - ITERATE_BY_REGION always processes the whole vector, if you need
> > > > > > to only do part of it yo'll either need custom breaking for both
> > inner and
> > > > > > outer loopsl, or in R-devel you can use ITERATE_BY_REGION_PARTIAL
> > > > > > - Don't use the variants ending in 0, all they do is skip over
> > > > > > things that are a good idea in the case of non-altreps (and some
> > very
> > > > > > specific altreps).
> > > > > > 
> > > > > > Hope that helps.
> > > > > > 
> > > > > > Best,
> > > > > > ~G
> > > > > > 
> > > > > > 
> > > > > > 
> > > > > > 
> > > > > > 
> > > > > > 
> > > > > > 
> > > > > > 
> > > > > > 
> > > > > > 
> > > > > > 
> > > > > > 
> > > > > > > Best,
> > > > > > > Jiefei
> > > > > > > 
> > > > > > > [[alternative HTML version deleted]]
> > > > > > > 
> > > > > > > ______________________________________________
> > > > > > > R-devel@r-project.org mailing list
> > > > > > > https://stat.ethz.ch/mailman/listinfo/r-devel
> > > > > > > 
> > > > > > 
> > > 
> > > [[alternative HTML version deleted]]
> > > 
> > > ______________________________________________
> > > R-devel@r-project.org mailing list
> > > https://stat.ethz.ch/mailman/listinfo/r-devel
> > 
> > 
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-devel@r-project.org <mailto:R-devel@r-project.org> mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel \
> <https://stat.ethz.ch/mailman/listinfo/r-devel>

	[[alternative HTML version deleted]]

______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic