'Re: [R] Dput Help in R'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       r-help
Subject:    Re: [R] Dput Help in R
From:       David Winsemius <dwinsemius () comcast ! net>
Date:       2015-12-31 17:30:03
Message-ID: D6D40AEC-6158-45BC-A52F-B0D32EDF4379 () comcast ! net
[Download RAW message or body]

> On Dec 30, 2015, at 11:26 PM, SHIVI BHATIA <shivi.bhatia@safexpress.com> wrote:
> 
> Hi Duncan,
> Please find the dput from the data.
> 
> ab<-read.csv("collection_last.csv",header=TRUE)
> y<-ab[1:10,]
> 

This is (possibly) partial output from a dput call. Unable to repair at any rate.
> 
> ab<- "2,458", "2,461", "2,462", "2,463", "2,464", "2,465", "2,468",
> "2,469", "2,470", "2,473", "2,474", "2,475", "2,476", "2,477",
> "2,478", "2,479", "2,480", "2,483", "2,484,267", "2,485",
> 
snipped
> "99,581", "99,834", "990", "992", "992,489", "993", "994",
> "994,195", "995", "996", "998", "999"), class = "factor"),

It is useful in showing that these items (presumably the column named "Final" are \
factors. Notice the commas in the values you might think were numeric. You will need \
to remove the commas (probably with `gsub`) before using `as.numeric`.

I haven't quite figured out how a dataframe could have a factor column that was so \
much longer than the adjacent columns named "Month" and "Year". I would suggest \
redoing the read.csv with stringsAsFactor=FALSE so that you can then work on "pure" \
text before the coercion to numeric.

-- David.

> Month = structure(c(11L, 11L, 7L, 2L, 2L, 12L, 11L, 11L,
> 11L, 11L, 11L, 11L, 11L, 11L, 11L, 11L, 11L, 11L, 11L,
> 11L,
> 11L, 11L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L),
> .Label = c("Apr",
> 
> "Aug", "Dec", "Feb", "Jan", "Jul", "Jun", "Mar", "May", "Nov",
> 
> "Oct", "Sep"), class = "factor"), Year = c(2010L, 2010L,
> 
> 2011L, 2011L, 2011L, 2011L, 2011L, 2011L, 2011L, 2011L, 2011L,
> 
> 2011L, 2011L, 2011L, 2011L, 2011L, 2011L, 2011L, 2011L, 2011L,
> 
> 2011L, 2011L, 2011L, 2011L, 2011L, 2011L, 2011L, 2011L, 2011L,
> 
> 2011L)), .Names = c("DOC_TYPE", "DOC_NO", "DOC_DT", "SFX_CODE",
> 
> "CUSTOMER", "DOC_AMOUNT", "OS_ASON_RPT_DT", "OS_DAYS", "BILLING_BRANCH",
> 
> "COLL_BR", "RECEIPT_NO", "RECEIPT_DT", "Applied.Date", "RECEIPT_AMT",
> 
> "TDS_AMT", "REBATE", "Final", "Month", "Year"), row.names = c(NA,
> 
> 30L), class = "data.frame")
> 
> 
> Not sure if this would help.
> > 
> -----Original Message-----
> From: Duncan Murdoch [mailto:murdoch.duncan@gmail.com]
> Sent: Wednesday, December 30, 2015 10:23 PM
> To: SHIVI BHATIA <shivi.bhatia@safexpress.com>; r-help@r-project.org
> Subject: Re: [R] Dput Help in R
> 
> On 30/12/2015 5:56 AM, SHIVI BHATIA wrote:
> > Dear Team,
> > 
> > 
> > 
> > I am facing an error while performing a manipulation using a dplyr
> package.
> > In the code below, I am using mutate to build a new calculated column:
> > 
> > 
> > 
> > kp<-read.csv("collection_last.csv",header=TRUE)
> > 
> > mutate(kp,dif=DOC_AMOUNT-RECEIPT_AMT+TDS_AMT+REBATE)
> > 
> > 
> > 
> > However it gives an error:-
> > 
> > Warning messages:
> > 
> > 1: In Ops.factor(c(28831L, 28831L, 17504L, 4184L, 36187L, 25819L, 699L,  :
> > 
> > '-' not meaningful for factors
> > 
> > 2: In Ops.factor(c(28831L, 28831L, 17504L, 4184L, 36187L, 25819L, 699L,  :
> > 
> > '+' not meaningful for factors
> > 
> > 3: In Ops.factor(c(28831L, 28831L, 17504L, 4184L, 36187L, 25819L, 699L,  :
> > 
> > '+' not meaningful for factors
> > 
> > 
> > 
> > This is an error when some of my variables are factors hence I have
> > tried to change these to numeric so used the expression as:
> > 
> > kp$DOC_TYPE=as.numeric(kp$DOC_TYPE).
> > 
> > 
> > 
> > this now shows as variable type of as "double". So expedite help on
> > this one i was trying to create a reproducible example and i am highly
> > struggling to
> > 
> > create one. the data i have is approx. around 1 million rows with 21
> > columns hence when i use a dput option it does not capture the entire
> > detailing and row level info required to share and even
> > dput(head(kp$DOC_TYPE) does not help either.
> > 
> > I have seen many stack overflow & r help column before composing this
> email.
> > Hence i need help to create this reproducible example to share with
> > the experts in the community. Apologies if this is a repeat.
> > 
> > 
> > 
> > PLEASE HELP AS I AM HIGHLY STRUGGLING TO BUILD ANY OUTCOME.
> 
> If you are working with a dataframe or matrix named x, just use
> 
> y <- x[1:10,]
> 
> to extract the first 10 rows.  The error will probably occur with this
> subset as well, and dput() will give you a reasonably sized amount of
> output.  If the error doesn't happen, just take a bigger subset, and
> possibly leave off the beginning, e.g.
> 
> y <- x[101:110,]
> 
> for 10 lines starting at line 101.
> 
> Duncan Murdoch
> 
> This e-mail is confidential. It may also be legally privileged. If you are not the \
> addressee you may not copy, forward, disclose or use any part of it. If you have \
> received this message in error, please delete it and all copies from your system \
> and notify the sender immediately by return e-mail. Internet communications cannot \
> be guaranteed to be timely, secure, error or virus-free. The sender does not accept \
> liability for any errors or omissions. 
> ______________________________________________
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius
Alameda, CA, USA

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[prev in list] [next in list] [prev in thread] [next in thread]