[prev in list] [next in list] [prev in thread] [next in thread]
List: sas-l
Subject: Re: How does setting length save storage?
From: Roger DeAngelis <rogerjdeangelis () GMAIL ! COM>
Date: 2021-11-18 23:28:10
Message-ID: CAOUdXL_UCT5ezP1jTOchVpf0BEXN=iNLeD9Ac=WAS0p1uVMsKg () mail ! gmail ! com
[Download RAW message or body]
Timings and Size to create the datasets below
Create Read
Size Time Sec Back
No compression 800mb 0.57 0.76
Length=3 272mb 1.2 use utl_optlen macro 0.95 Best
(you make the decisions)
Compress=Binary 272mb 2.1 (has to make decisions) 1.67
* no compression;
data ones;
array ones[100000] x1-x100000 (100000*1);
do i=1 to 1000;
output;
end;
run;quit;
800mb
data ones_lenght3;
array ones[100000] 3 x1-x100000 (100000*1);
do i=1 to 1000;
output;
end;
drop i;
run;quit;
1.2 seconds
272mb;
* compress=binary - lengths are 8 bytes;
data ones_compress(compress=binary);
array ones[100000] x1-x100000 (100000*1);
do i=1 to 1000;
output;
end;
run;quit;
2,1 seconds
* 272mb;
data _null_;
set ones;
run;quit;
.76
data _null_;
set ones_lenght3;
run;quit;
.95
data _null_;
set ones_compress;
run;quit;
1.67
On Thu, Nov 18, 2021 at 1:51 PM wireplay CO <savian.net@gmail.com> wrote:
> Agreed on length. I also set attribs as the first statement in a data
> step. If you have a variable like remarks set at $1024, compress helps a
> lot. Numerics do not benefit from it from my experience. I am curious if
> someone shows any improvement using numerics only.
>
> Get BlueMail for Android <https://bluemail.me>
> On Nov 18, 2021, at 9:34 AM, Mark Keintz <mkeintz@outlook.com> wrote:
>>
>> "In an uncompressed state. Only 1 map is needed"
>>
>>
>>
>> Which is why the length attribute is required, whether set by a LENGTH
>> statement or implied.
>>
>>
>>
>> regards,
>>
>> Mark
>>
>>
>>
>> *From:* wireplay CO <savian.net@gmail.com>
>> *Sent:* Thursday, November 18, 2021 11:27 AM
>> *To:* Mark Keintz <mkeintz@outlook.com>
>> *Cc:* sas-l@listserv.uga.edu
>> *Subject:* Re: How does setting length save storage?
>>
>>
>>
>> I think it depends on the makeup of the variables (char vs num). Compress
>> option switches the storage from fixed position to variable but has to
>> store the mapping (var to col pos) per record. In an uncompressed state.
>> Only 1 map is needed.
>>
>> In general, the more char vars with wasted space, the better. However,
>> due to the mapping proliferation for compress, it can increase the size.
>>
>> Get BlueMail for Android <https://bluemail.me>
>>
>> On Nov 17, 2021, at 10:52 AM, Mark Keintz <mkeintz@outlook.com> wrote:
>>
>> " If we don't compress them, then length has no effect on the size of the
>> files?"
>>
>>
>>
>> Not true. Length DOES have effect on the size of uncompressed data set
>> files. But it's just not a linear function of the anticipated savings.
>> It's sort of a step function.
>>
>>
>>
>> For instance, if you had applied the LENGTH 3 to multiple variables, then
>> the *rounding-up-of-record-length* SAS behavior described by Joe would
>> likely find a rounded value less than the original record length.
>>
>>
>>
>> Mark
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> -----Original Message-----
>> From: SAS(r) Discussion <SAS-L@LISTSERV.UGA.EDU> On Behalf Of Sphinx
>> Verdant
>> Sent: Wednesday, November 17, 2021 12:02 PM
>> To: SAS-L@LISTSERV.UGA.EDU
>> Subject: Re: How does setting length save storage?
>>
>>
>>
>> Thanks Allan! If we don't compress them, then length has no effect on the
>> size of the files?
>>
>>
[Attachment #3 (text/html)]
<div dir="ltr"><div class="gmail_default" \
style="font-family:monospace,monospace">Timings and Size to create the datasets \
below<br><br> Create \
Read<br> Size Time Sec \
Back<br><br> No compression 800mb 0.57 \
0.76<br> Length=3 272mb 1.2 use utl_optlen macro \
0.95 Best (you make the decisions)<br> Compress=Binary 272mb 2.1 \
(has to make decisions) 1.67<br><br><br>* no compression;<br><br>data ones;<br> \
array ones[100000] x1-x100000 (100000*1);<br> do i=1 to 1000;<br> output;<br> \
end;<br>run;quit;<br><br>800mb<br><br>data ones_lenght3;<br> array ones[100000] 3 \
x1-x100000 (100000*1);<br> do i=1 to 1000;<br> output;<br> end;<br> drop \
i;<br>run;quit;<br><br>1.2 seconds<br><br>272mb;<br><br>* compress=binary - lengths \
are 8 bytes;<br><br>data ones_compress(compress=binary);<br> array ones[100000] \
x1-x100000 (100000*1);<br> do i=1 to 1000;<br> output;<br> \
end;<br>run;quit;<br><br>2,1 seconds<br>* 272mb;<br><br><br>data _null_;<br> set \
ones;<br>run;quit;<br><br> .76<br><br>data _null_;<br> set \
ones_lenght3;<br>run;quit;<br><br> .95<br><br><br>data _null_;<br> set \
ones_compress;<br>run;quit;<br><br> 1.67<br></div></div><br><div \
class="gmail_quote"><div dir="ltr" class="gmail_attr">On Thu, Nov 18, 2021 at 1:51 PM \
wireplay CO <<a href="mailto:savian.net@gmail.com">savian.net@gmail.com</a>> \
wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px \
0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div \
style="zoom:0%"><div dir="auto">Agreed on length. I also set attribs as the first \
statement in a data step. If you have a variable like remarks set at $1024, compress \
helps a lot. Numerics do not benefit from it from my experience. I am curious if \
someone shows any improvement using numerics only.<br><br></div> <div dir="auto">Get \
<a href="https://bluemail.me" target="_blank">BlueMail for Android</a> </div> <div \
class="gmail_quote">On Nov 18, 2021, at 9:34 AM, Mark Keintz <<a \
href="mailto:mkeintz@outlook.com" target="_blank">mkeintz@outlook.com</a>> \
wrote:<blockquote class="gmail_quote" style="margin:0pt 0pt 0pt 0.8ex;border-left:1px \
solid rgb(204,204,204);padding-left:1ex"> <div>
<p class="MsoNormal"><span \
style="font-size:11pt;font-family:Calibri,sans-serif;color:rgb(31,73,125)">"</span>In \
an uncompressed state. Only 1 map is needed" </p><p></p>
<p class="MsoNormal">
</p><p>
</p>
<p class="MsoNormal">Which is why the length attribute is required, whether set by a \
LENGTH statement or implied. </p><p></p>
<p class="MsoNormal">
</p><p>
</p>
<p class="MsoNormal">regards,
</p><p></p>
<p class="MsoNormal">Mark<span \
style="font-size:11pt;font-family:Calibri,sans-serif;color:rgb(31,73,125)"> \
</span></p><p></p><p></p> <p class="MsoNormal"><span \
style="font-size:11pt;font-family:Calibri,sans-serif;color:rgb(31,73,125)"> \
</span></p><p>
</p><p></p>
<div>
<div style="border-right:none;border-bottom:none;border-left:none;border-top:1pt \
solid rgb(225,225,225);padding:3pt 0in 0in"> <p class="MsoNormal"><b><span \
style="font-size:11pt;font-family:Calibri,sans-serif">From:</span></b><span \
style="font-size:11pt;font-family:Calibri,sans-serif"> wireplay CO <<a \
href="mailto:savian.net@gmail.com" target="_blank">savian.net@gmail.com</a>> <br> \
<b>Sent:</b> Thursday, November 18, 2021 11:27 AM<br> <b>To:</b> Mark Keintz <<a \
href="mailto:mkeintz@outlook.com" target="_blank">mkeintz@outlook.com</a>><br> \
<b>Cc:</b> <a href="mailto:sas-l@listserv.uga.edu" \
target="_blank">sas-l@listserv.uga.edu</a><br> <b>Subject:</b> Re: How does setting \
length save storage? </span></p><p></p><p></p>
</div>
</div>
<p class="MsoNormal">
</p><p>
</p>
<div>
<p class="MsoNormal" style="margin-bottom:12pt">I think it depends on the makeup of \
the variables (char vs num). Compress option switches the storage from fixed position \
to variable but has to store the mapping (var to col pos) per record. In an \
uncompressed state. Only 1 map is needed. </p><p></p>
</div>
<div>
<p class="MsoNormal" style="margin-bottom:12pt">In general, the more char vars with \
wasted space, the better. However, due to the mapping proliferation for compress, it \
can increase the size. </p><p></p>
</div>
<div>
<p class="MsoNormal">Get <a href="https://bluemail.me" target="_blank">BlueMail for \
Android</a> </p><p>
</p>
</div>
<div>
<p class="MsoNormal">On Nov 17, 2021, at 10:52 AM, Mark Keintz <<a \
href="mailto:mkeintz@outlook.com" target="_blank">mkeintz@outlook.com</a>> wrote: \
</p><p></p> <blockquote \
style="border-top:none;border-right:none;border-bottom:none;border-left:1pt solid \
rgb(204,204,204);padding:0in 0in 0in 6pt;margin-left:4.8pt;margin-right:0in"> <div> \
<p>" If we don't compress them, then length has no effect on the size of \
the files?" </p><p></p>
<p>
</p><p></p>
<p>Not true. Length DOES have effect on the size of uncompressed data set \
files. But it's just not a linear function of the anticipated savings. It's sort \
of a step function. </p><p></p>
<p>
</p><p></p>
<p>For instance, if you had applied the LENGTH 3 to multiple variables, then the \
<i>rounding-up-of-record-length</i> SAS behavior described by Joe would likely find a \
rounded value less than the original record length. </p><p></p>
<p>
</p><p></p>
<p>Mark
</p><p></p>
<p>
</p><p></p>
<p>
</p><p></p>
<p>
</p><p></p>
<p>
</p><p></p>
<p>
</p><p></p>
<p>
</p><p></p>
<p>-----Original Message-----<br> From: SAS(r) Discussion <<a \
href="mailto:SAS-L@LISTSERV.UGA.EDU" target="_blank">SAS-L@LISTSERV.UGA.EDU</a>> \
On Behalf Of Sphinx Verdant<br> Sent: Wednesday, November 17, 2021 12:02 PM<br> To: \
<a href="mailto:SAS-L@LISTSERV.UGA.EDU" \
target="_blank">SAS-L@LISTSERV.UGA.EDU</a><br> Subject: Re: How does setting length \
save storage? </p><p></p>
<p>
</p><p></p>
<p>Thanks Allan! If we don't compress them, then length has no effect on the \
size of the files? </p><p></p>
</div>
</blockquote>
</div>
</div></blockquote></div></div></blockquote></div>
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic