[prev in list] [next in list] [prev in thread] [next in thread] 

List:       r-devel
Subject:    Re: [Rd] Milestone: 5000 packages on CRAN
From:       Spencer Graves <spencer.graves () prodsyse ! com>
Date:       2013-11-09 2:13:49
Message-ID: 527D9A5D.6040406 () prodsyse ! com
[Download RAW message or body]

       CRAN size has grown almost exponentially at least since 2001.  R 
history was discussed by John Fox (2009) Aspects of the Social 
Organization and Trajectory of the R Project, R Journal 
(http://journal.r-project.org/archive/2009-2/RJournal_2009-2_Fox.pdf). 
Below please find his data plus 5 additional points I added and R script 
I used to fit a models.


       I won't defend the models fit in the script below.  However, 
unless CRAN management changes dramatically in the next 5 years, it 
seems likely that CRAN will have 10,000 packages some time in 2018.


       By the way, if you don't already use the sos package routinely, I 
encourage you to consider it.  For me, it's by far the fastest 
literature search for anything statistical.  In a very few minutes, I 
get an Excel file with a summary by package of the matches to almost any 
combination of search terms.  (Shameless plug by the lead author of the 
package ;-)


       Best Wishes,
       Spencer Graves


date    packages
2001-06-21    110
2001-12-17    129
2002-06-12    162
2003-05-27    219
2003-11-16    273
2004-06-05    357
2004-10-12    406
2005-06-18    548
2005-12-16    647
2006-05-31    739
2006-12-12    911
2007-04-12    1000
2007-11-16    1300
2008-03-18    1427
2008-10-18    1614
2009-09-17    1952
2012-06-12    3786
2012-11-01    4082
2012-12-14    4210
2013-10-28    4960
2013-11-08    5000

library(gdata)

(CRANfile <- dir(pattern='s\\.xls$'))
#readLines(CRANfile)
str(CRANhist. <- read.xls(CRANfile, stringsAsFactors=FALSE,
                            header=TRUE))
tail(CRANhist., 11)
CRANhist <- CRANhist.[1:20, 1:2]

(dt. <- as.Date(CRANhist$date))
CRANhist$date <- dt.

(day1 <- min(CRANhist$date)) # 2001-06-21
str(ddate <- CRANhist$date-day1)
# difftime in days

CRANhist$CRANdays <- as.numeric(ddate)
(growth <- lm(log(packages)~CRANdays, CRANhist))

CRANhist$pred <- exp(predict(growth))
plot(packages~date, CRANhist, log='y')
lines(pred~date, CRANhist, pch='.')

fitLogLogis <- nls(log(packages) ~ a+b*CRANdays + log(1+exp(d+b*CRANdays)),
                    CRANhist, start=c(a=4.9, b=0.0009, d=0))
# Error ... singular gradient

library(drc)
CRANlogLogis <- drm(packages~CRANdays, data=CRANhist, fct=LL.3())
plot(CRANlogLogis, log='y') # very poor through 2005

CRANlogLogis. <- drm(log(packages)~CRANdays, data=CRANhist, fct=LL.3())
plot(CRANlogLogis., log='y') # terrible:  far worse than CRANlogLogis

CRANlogLogis4 <- drm(packages~CRANdays, data=CRANhist, fct=LL.4())
plot(CRANlogLogis4, log='y') # poor for 2001 but great otherwise

CRANlogLogis4. <- drm(log(packages)~CRANdays, data=CRANhist, fct=LL.4())
plot(CRANlogLogis4., log='y') # best I've found so far.
abline(h=c(4200, 8400))

sapply(CRANhist, range)
pred.dTimes <- seq(0, 6000, 100)
CRANpred <- predict(CRANlogLogis4., data.frame(CRANdays=pred.dTimes))
data.frame(Date=as.Date(day1+pred.dTimes), nPkgs=exp(CRANpred))

plot(day1+pred.dTimes, exp(CRANpred), type='l', log='y')
points(packages~date, CRANhist)

pred.dTimes <- seq(0, 10000, 100)
CRANpred <- predict(CRANlogLogis4., data.frame(CRANdays=pred.dTimes))

plot(day1+pred.dTimes, exp(CRANpred), type='l', log='y')
points(packages~date, CRANhist)
abline(h=c(4200, 8400))
abline(v=as.Date('2012-12-14'))
abline(v=as.Date('2017-09-30'))

#########################

abline(h=20000)
abline(h=70000)

pred.dTimes <- seq(0, 1000000, 10000)
CRANpred <- predict(CRANlogLogis4., data.frame(CRANdays=pred.dTimes))
plot(day1+pred.dTimes, exp(CRANpred), type='l', log='y')
points(packages~date, CRANhist)


On 11/8/2013 4:43 PM, William Dunlap wrote:
> > "Currently, the CRAN package repository features 5001 available packages."
> > 
> > Going from 4000 to 5000 packages took 14.5 months - that's one new package
> > every 10.5 hours. Behind every package there are real people. These
> > user-contributed packages are maintained by ~2900 people [2] - that's 350
> > new maintainers and many more contributors. More people to thank than ever
> > before - don't forget about them, e.g. cite properly when publishing.
> Congratulations!
> 
> I have often wondered about the natural history of R packages: how often they
> are created and shared, how long they are used, how many people use them,
> how long they are maintained, etc.  The usage numbers are hard to get, but the
> "Last modified" dates in the CRAN archives do give some information on how
> often new packages are shared and how long they are maintained.
> 
> Here are some summaries of derived from those dates.  The code to get the
> data and calculate (and plot) the summaries follows.
> 
> > newPkgsByYear
> 1997-01-01 1998-01-01 1999-01-01 2000-01-01
> 2         12         56         41
> 2001-01-01 2002-01-01 2003-01-01 2004-01-01
> 65         66        101        144
> 2005-01-01 2006-01-01 2007-01-01 2008-01-01
> 209        280        329        374
> 2009-01-01 2010-01-01 2011-01-01 2012-01-01
> 502        546        702        809
> 2013-01-01
> 439
> > table(nUpdatesSinceSep2011) # number of recent updates (not including original \
> > submission)
> nUpdatesSinceSep2011
> 0    1    2    3    4    5    6    7    8    9
> 2079  963  528  332  238  166   75   79   50   43
> 10   11   12   13   14   15   16   17   18   19
> 23   22   13   14    8    9   12    5    4    1
> 20   21   22   24   26   27   31   32   34
> 1    3    1    2    1    2    1    1    1
> 
> The code I used is:
> 
> library(XML)
> getArchiveList <- function(site = "http://cran.r-project.org/src/contrib/Archive/") \
> { retval <- readHTMLTable(site, stringsAsFactors=FALSE)[[1]]
> retval <- retval[!is.na(retval$Name) & grepl("/$", retval$Name), ]
> retval$Name <- gsub("/$", "", retval$Name)
> retval$"Last modified" <- as.Date(retval$"Last modified", format="%d-%b-%Y")
> retval
> }
> getArchiveEntry <- function(Name, site = \
> "http://cran.r-project.org/src/contrib/Archive/") { retval <- \
> readHTMLTable(paste0(site, Name), stringsAsFactors=FALSE)[[1]] retval <- \
> retval[!is.na(retval$Name) & retval$Name != "Parent Directory", ] retval$"Last \
> modified" <- as.Date(retval$"Last modified", format="%d-%b-%Y") retval
> }
> 
> al <- getArchiveList()
> # The next may bog down the CRAN archive server - do not do it often
> # ae <- lapply(structure(al$Name, names=al$Name),
> #              function(Name)tryCatch(getArchiveEntry(Name),
> #                                     error=function(e)data.frame(Name=character(), \
> "Last Modified" = as.Date(character())))) 
> initialSubmissionDate <- as.Date(vapply(ae, function(e)min(e[["Last modified"]]), \
> 0), origin=as.Date("1970-01-01")) lastSubmissionDate <- as.Date(vapply(ae, \
> function(e)max(e[["Last modified"]]), 0), origin=as.Date("1970-01-01")) 
> mths <- seq(as.Date("1997-10-01"), as.Date("2014-01-01"), by="months")
> yrs <- seq(as.Date("1997-01-01"), as.Date("2014-01-01"), by="years")
> 
> par(ask=TRUE)
> 
> newPkgsByMonth <-  table(cut(initialSubmissionDate, mths))
> newPkgsByYear <-  table(cut(initialSubmissionDate, yrs))
> plot(mths[-1], newPkgsByMonth, log="y", ylab="# New Pkgs", main="New packages by \
> month") # number of additions each month 
> yearsOfMaintainanceActivity <- as.numeric(lastSubmissionDate - \
> initialSubmissionDate, units="days")/365.25 hist(yearsOfMaintainanceActivity, \
> xlab="Years", main="Maintainance Duration") 
> newPkgsByYear
> table(floor(yearsOfMaintainanceActivity))
> 
> nUpdatesSinceSep2011 <- vapply(ae, function(e){
> Lm <- e[["Last modified"]]
> sum(Lm >= as.Date("2011-09-01") & Lm != min(Lm))}, 0L)
> table(nUpdatesSinceSep2011) # number of recent updates (not including original \
> submission) 
> Bill Dunlap
> Spotfire, TIBCO Software
> wdunlap tibco.com
> 
> 
> > -----Original Message-----
> > From: r-devel-bounces@r-project.org [mailto:r-devel-bounces@r-project.org] On \
> > Behalf Of Henrik Bengtsson
> > Sent: Friday, November 08, 2013 1:59 PM
> > To: R Development Mailing List
> > Subject: [Rd] Milestone: 5000 packages on CRAN
> > 
> > Here we go again...
> > 
> > Today (2011-11-08) on The Comprehensive R Archive Network (CRAN) [1]:
> > 
> > "Currently, the CRAN package repository features 5001 available packages."
> > 
> > Going from 4000 to 5000 packages took 14.5 months - that's one new package
> > every 10.5 hours. Behind every package there are real people. These
> > user-contributed packages are maintained by ~2900 people [2] - that's 350
> > new maintainers and many more contributors. More people to thank than ever
> > before - don't forget about them, e.g. cite properly when publishing.
> > 
> > Milestones:
> > 
> > 2013-11-08: 5000 packages [this post]
> > 2012-08-23: 4000 packages [7]
> > 2011-05-12: 3000 packages [6]
> > 2009-10-04: 2000 packages [5]
> > 2007-04-12: 1000 packages [4]
> > 2004-10-01: 500 packages [3,4]
> > 2003-04-01: 250 packages [3,4]
> > 
> > [1] http://cran.r-project.org/web/packages/
> > [2] http://cran.r-project.org/web/checks/check_summary_by_maintainer.html
> > [3] Private data.
> > [4] https://stat.ethz.ch/pipermail/r-devel/2007-April/045359.html
> > [5] https://stat.ethz.ch/pipermail/r-devel/2009-October/055049.html
> > [6] https://stat.ethz.ch/pipermail/r-devel/2011-May/061002.html
> > [7] https://stat.ethz.ch/pipermail/r-devel/2012-August/064675.html
> > 
> > /Henrik
> > 
> > PS. These data are for CRAN only. There are more packages elsewhere, e.g.
> > R-Forge, Bioconductor, Github etc.
> > 
> > 	[[alternative HTML version deleted]]
> > 
> > ______________________________________________
> > R-devel@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-devel
> ______________________________________________
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel


-- 
Spencer Graves, PE, PhD
President and Chief Technology Officer
Structure Inspection and Monitoring, Inc.
751 Emerson Ct.
San José, CA 95126
ph:  408-655-4567
web:  www.structuremonitoring.com

______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic