[prev in list] [next in list] [prev in thread] [next in thread] 

List:       sas-l
Subject:    Re: How does setting length save storage?
From:       Roger DeAngelis <rogerjdeangelis () GMAIL ! COM>
Date:       2021-11-18 23:28:10
Message-ID: CAOUdXL_UCT5ezP1jTOchVpf0BEXN=iNLeD9Ac=WAS0p1uVMsKg () mail ! gmail ! com
[Download RAW message or body]

Timings and Size to create the datasets below

                                Create                           Read
                       Size     Time Sec                         Back

   No compression      800mb    0.57                             0.76
   Length=3            272mb    1.2  use utl_optlen macro        0.95  Best
(you make the decisions)
   Compress=Binary     272mb    2.1  (has to make decisions)     1.67


* no compression;

data ones;
 array ones[100000] x1-x100000  (100000*1);
 do i=1 to 1000;
   output;
 end;
run;quit;

800mb

data ones_lenght3;
 array ones[100000] 3 x1-x100000  (100000*1);
 do i=1 to 1000;
   output;
 end;
 drop i;
run;quit;

1.2 seconds

272mb;

* compress=binary - lengths are 8 bytes;

data ones_compress(compress=binary);
 array ones[100000] x1-x100000  (100000*1);
 do i=1 to 1000;
   output;
 end;
run;quit;

2,1 seconds
* 272mb;


data _null_;
  set ones;
run;quit;

    .76

data _null_;
 set ones_lenght3;
run;quit;

    .95


data _null_;
 set ones_compress;
run;quit;

    1.67

On Thu, Nov 18, 2021 at 1:51 PM wireplay CO <savian.net@gmail.com> wrote:

> Agreed on length. I also set attribs as the first statement in a data
> step. If you have a variable like remarks set at $1024, compress helps a
> lot. Numerics do not benefit from it from my experience. I am curious if
> someone shows any improvement using numerics only.
>
> Get BlueMail for Android <https://bluemail.me>
> On Nov 18, 2021, at 9:34 AM, Mark Keintz <mkeintz@outlook.com> wrote:
>>
>> "In an uncompressed state. Only 1 map is needed"
>>
>>
>>
>> Which is why the length attribute is required, whether set by a LENGTH
>> statement or implied.
>>
>>
>>
>> regards,
>>
>> Mark
>>
>>
>>
>> *From:* wireplay CO <savian.net@gmail.com>
>> *Sent:* Thursday, November 18, 2021 11:27 AM
>> *To:* Mark Keintz <mkeintz@outlook.com>
>> *Cc:* sas-l@listserv.uga.edu
>> *Subject:* Re: How does setting length save storage?
>>
>>
>>
>> I think it depends on the makeup of the variables (char vs num). Compress
>> option switches the storage from fixed position to variable but has to
>> store the mapping (var to col pos) per record. In an uncompressed state.
>> Only 1 map is needed.
>>
>> In general, the more char vars with wasted space, the better. However,
>> due to the mapping proliferation for compress, it can increase the size.
>>
>> Get BlueMail for Android <https://bluemail.me>
>>
>> On Nov 17, 2021, at 10:52 AM, Mark Keintz <mkeintz@outlook.com> wrote:
>>
>> " If we don't compress them, then length has no effect on the size of the
>> files?"
>>
>>
>>
>> Not true.  Length DOES have effect on the size of uncompressed data set
>> files.  But it's just not a linear function of the anticipated savings.
>> It's sort of a step function.
>>
>>
>>
>> For instance, if you had applied the LENGTH 3 to multiple variables, then
>> the *rounding-up-of-record-length* SAS behavior described by Joe would
>> likely find a rounded value less than the original record length.
>>
>>
>>
>> Mark
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> -----Original Message-----
>> From: SAS(r) Discussion <SAS-L@LISTSERV.UGA.EDU> On Behalf Of Sphinx
>> Verdant
>> Sent: Wednesday, November 17, 2021 12:02 PM
>> To: SAS-L@LISTSERV.UGA.EDU
>> Subject: Re: How does setting length save storage?
>>
>>
>>
>> Thanks Allan! If we don't compress them, then length has no effect on the
>> size of the files?
>>
>>

[Attachment #3 (text/html)]

<div dir="ltr"><div class="gmail_default" \
style="font-family:monospace,monospace">Timings and Size to create the datasets \
below<br><br>                                                Create                   \
Read<br>                                   Size       Time Sec                        \
Back<br><br>     No compression         800mb      0.57                               \
0.76<br>     Length=3                  272mb      1.2   use utl_optlen macro          \
0.95   Best (you make the decisions)<br>     Compress=Binary       272mb      2.1   \
(has to make decisions)       1.67<br><br><br>* no compression;<br><br>data ones;<br> \
array ones[100000] x1-x100000   (100000*1);<br>  do i=1 to 1000;<br>     output;<br>  \
end;<br>run;quit;<br><br>800mb<br><br>data ones_lenght3;<br>  array ones[100000] 3 \
x1-x100000   (100000*1);<br>  do i=1 to 1000;<br>     output;<br>  end;<br>  drop \
i;<br>run;quit;<br><br>1.2 seconds<br><br>272mb;<br><br>* compress=binary - lengths \
are 8 bytes;<br><br>data ones_compress(compress=binary);<br>  array ones[100000] \
x1-x100000   (100000*1);<br>  do i=1 to 1000;<br>     output;<br>  \
end;<br>run;quit;<br><br>2,1 seconds<br>* 272mb;<br><br><br>data _null_;<br>   set \
ones;<br>run;quit;<br><br>      .76<br><br>data _null_;<br>  set \
ones_lenght3;<br>run;quit;<br><br>      .95<br><br><br>data _null_;<br>  set \
ones_compress;<br>run;quit;<br><br>      1.67<br></div></div><br><div \
class="gmail_quote"><div dir="ltr" class="gmail_attr">On Thu, Nov 18, 2021 at 1:51 PM \
wireplay CO &lt;<a href="mailto:savian.net@gmail.com">savian.net@gmail.com</a>&gt; \
wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px \
0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div \
style="zoom:0%"><div dir="auto">Agreed on length. I also set attribs as the first \
statement in a data step. If you have a variable like remarks set at $1024, compress \
helps a lot. Numerics do not benefit from it from my experience. I am curious if \
someone shows any improvement using numerics only.<br><br></div> <div dir="auto">Get \
<a href="https://bluemail.me" target="_blank">BlueMail for Android</a> </div> <div \
class="gmail_quote">On Nov 18, 2021, at 9:34 AM, Mark Keintz &lt;<a \
href="mailto:mkeintz@outlook.com" target="_blank">mkeintz@outlook.com</a>&gt; \
wrote:<blockquote class="gmail_quote" style="margin:0pt 0pt 0pt 0.8ex;border-left:1px \
solid rgb(204,204,204);padding-left:1ex"> <div> 
 <p class="MsoNormal"><span \
style="font-size:11pt;font-family:Calibri,sans-serif;color:rgb(31,73,125)">"</span>In \
an uncompressed state. Only 1 map is needed"  </p><p></p> 
 <p class="MsoNormal">
  </p><p>
     
  </p> 
 <p class="MsoNormal">Which is why the length attribute is required, whether set by a \
LENGTH statement or implied.  </p><p></p> 
 <p class="MsoNormal">
  </p><p>
     
  </p> 
 <p class="MsoNormal">regards,
  </p><p></p> 
 <p class="MsoNormal">Mark<span \
style="font-size:11pt;font-family:Calibri,sans-serif;color:rgb(31,73,125)">  \
</span></p><p></p><p></p>   <p class="MsoNormal"><span \
style="font-size:11pt;font-family:Calibri,sans-serif;color:rgb(31,73,125)">  \
</span></p><p>  
   </p><p></p> 
 <div> 
  <div style="border-right:none;border-bottom:none;border-left:none;border-top:1pt \
solid rgb(225,225,225);padding:3pt 0in 0in">   <p class="MsoNormal"><b><span \
style="font-size:11pt;font-family:Calibri,sans-serif">From:</span></b><span \
style="font-size:11pt;font-family:Calibri,sans-serif"> wireplay CO &lt;<a \
href="mailto:savian.net@gmail.com" target="_blank">savian.net@gmail.com</a>&gt; <br> \
<b>Sent:</b> Thursday, November 18, 2021 11:27 AM<br> <b>To:</b> Mark Keintz &lt;<a \
href="mailto:mkeintz@outlook.com" target="_blank">mkeintz@outlook.com</a>&gt;<br> \
<b>Cc:</b> <a href="mailto:sas-l@listserv.uga.edu" \
target="_blank">sas-l@listserv.uga.edu</a><br> <b>Subject:</b> Re: How does setting \
length save storage?  </span></p><p></p><p></p> 
  </div> 
 </div> 
 <p class="MsoNormal">
  </p><p>
     
  </p> 
 <div> 
  <p class="MsoNormal" style="margin-bottom:12pt">I think it depends on the makeup of \
the variables (char vs num). Compress option switches the storage from fixed position \
to variable but has to store the mapping (var to col pos) per record. In an \
uncompressed state. Only 1 map is needed.  </p><p></p> 
 </div> 
 <div> 
  <p class="MsoNormal" style="margin-bottom:12pt">In general, the more char vars with \
wasted space, the better. However, due to the mapping proliferation for compress, it \
can increase the size.  </p><p></p> 
 </div> 
 <div> 
  <p class="MsoNormal">Get <a href="https://bluemail.me" target="_blank">BlueMail for \
Android</a>   </p><p> 
   </p> 
 </div> 
 <div> 
  <p class="MsoNormal">On Nov 17, 2021, at 10:52 AM, Mark Keintz &lt;<a \
href="mailto:mkeintz@outlook.com" target="_blank">mkeintz@outlook.com</a>&gt; wrote:  \
</p><p></p>   <blockquote \
style="border-top:none;border-right:none;border-bottom:none;border-left:1pt solid \
rgb(204,204,204);padding:0in 0in 0in 6pt;margin-left:4.8pt;margin-right:0in">   <div> \
  <p>&quot; If we don&#39;t compress them, then length has no effect on the size of \
the files?&quot;   </p><p></p> 
    <p>   
     </p><p></p> 
    <p>Not true.   Length DOES have effect on the size of uncompressed data set \
files.   But it's just not a linear function of the anticipated savings.   It's sort \
of a step function.   </p><p></p> 
    <p>   
     </p><p></p> 
    <p>For instance, if you had applied the LENGTH 3 to multiple variables, then the \
<i>rounding-up-of-record-length</i> SAS behavior described by Joe would likely find a \
rounded value less than the original record length.   </p><p></p> 
    <p>   
     </p><p></p> 
    <p>Mark 
     </p><p></p> 
    <p>   
     </p><p></p> 
    <p>   
     </p><p></p> 
    <p>   
     </p><p></p> 
    <p>   
     </p><p></p> 
    <p>   
     </p><p></p> 
    <p>   
     </p><p></p> 
    <p>-----Original Message-----<br> From: SAS(r) Discussion &lt;<a \
href="mailto:SAS-L@LISTSERV.UGA.EDU" target="_blank">SAS-L@LISTSERV.UGA.EDU</a>&gt; \
On Behalf Of Sphinx Verdant<br> Sent: Wednesday, November 17, 2021 12:02 PM<br> To: \
<a href="mailto:SAS-L@LISTSERV.UGA.EDU" \
target="_blank">SAS-L@LISTSERV.UGA.EDU</a><br> Subject: Re: How does setting length \
save storage?  </p><p></p> 
    <p>   
     </p><p></p> 
    <p>Thanks Allan! If we don&#39;t compress them, then length has no effect on the \
size of the files?   </p><p></p> 
   </div> 
  </blockquote> 
 </div> 
</div></blockquote></div></div></blockquote></div>



[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic