[prev in list] [next in list] [prev in thread] [next in thread] 

List:       rrd-users
Subject:    Re: [rrd-users] [unsure]  max DS per rrd file
From:       Ryan Kubica <kubicaryan () yahoo ! com>
Date:       2013-04-25 1:18:23
Message-ID: 1366852703.61386.YahooMailNeo () web122601 ! mail ! ne1 ! yahoo ! com
[Download RAW message or body]

[Attachment #2 (multipart/alternative)]




Hi Mikel,

I've personally never found a good reason to store more than one datasource per RRD \
datafile; and run -very large- rrdtool data servers ( multi-millions per server - \
many servers .)

There are far too many edge-cases, latency issues and join overhead in trying to \
consolidate datasources into a single datafile.  Yes, rrdtool itself is more \
efficient with an insert like that but: 1) what if the datapoints are collected at \
different times?  2) what if they are different steps?  3) what if you want to add a \
datasource? 4) what if you simply have too many datasources to try and \
order/consolidate from a queue to the datafile?  There is also non-trivial \
complexity, overhead and index'ing into an rrd datafile for specific datasources. 

Linux is extremely efficient at block updates, caching, open/closes, etc ... rrdtool \
on a low-end ( 4 cpu ) server with limited memory can easily store 160 thousand \
datasources per minute - on a better server, a whole lot more than that.

'Distributed Cluster' isn't a good reason to not send all your time-series data to \
one server or small set of servers.  The latency/request-time incurred in having to \
fetch data from those servers is usually not worth the trade off.

Graphs of many hundreds of datasources computed for multi-day/week time-ranges in the \
result set are generated in 10s of milliseconds; not seconds ... rrdtool is quite \
capable of producing on-demand graphs of hundreds of graphs per second from one \
server.

I suggest you write a little test-script to write out rrd data to individual rrd \
datafiles to see 'how quick' your servers are at it.  There is some OS tuning and \
rrdtool RRA sizing that will help; especially don't keep hour or daily rollups ... \
the server has to hold onto those blocks to make the consolidation quick and not \
incur a read from disk.

rrdtool scales rather simply ( and without rrdcached -- as I don't use that either. )

HTH
-Ryan


________________________________
 From: mikel <infoeuskadi@gmail.com>
To: rrd-users@lists.oetiker.ch 
Sent: Saturday, April 20, 2013 4:48 AM
Subject: Re: [rrd-users] [unsure]  max DS per rrd file
 


Thanks for your fast reply again.

> Maybe I don't understand what you say here. Some metrics, or all metrics
are 
> queried? Both statements cannot be true at the same time?

Yes it is a tricky case. Apologies I was not clear enough.

In most cases all metrics are queried at the same time, because we want to
know what value they had at a given time. And classify them.

Very randomly we would query for just one metric.

> Anyway, if you query only once in a while, maybe you should think about 
> reducing the number of RRAs in each RRD, and just let it consolidate at 
> graph time. Yes, this will mean you will have to wait longer for your graph 
> to be made, but you save processing time at every update.

This is interesting I did not think about that. Thanks for the hint.

Thanks for your help again.
m



--
View this message in context: \
http://rrd-mailinglists.937164.n2.nabble.com/max-DS-per-rrd-file-tp7580966p7580971.html
 Sent from the RRDtool Users Mailinglist mailing list archive at Nabble.com.

_______________________________________________
rrd-users mailing list
rrd-users@lists.oetiker.ch
https://lists.oetiker.ch/cgi-bin/listinfo/rrd-users


[Attachment #5 (text/html)]

<html><body><div style="color:#000; background-color:#fff; font-family:arial, \
helvetica, sans-serif;font-size:10pt"><div><span><br></span></div><div style="color: \
rgb(0, 0, 0); font-size: 13px; font-family: arial, helvetica, sans-serif; \
background-color: transparent; font-style: normal;"><span>Hi Mikel,</span></div><div \
style="color: rgb(0, 0, 0); font-size: 13px; font-family: arial, helvetica, \
sans-serif; background-color: transparent; font-style: \
normal;"><span><br></span></div><div style="color: rgb(0, 0, 0); font-size: 13px; \
font-family: arial, helvetica, sans-serif; background-color: transparent; font-style: \
normal;"><span>I've personally never found a good reason to store more than one \
datasource per RRD datafile; and run -very large- rrdtool data servers ( \
multi-millions per server - many servers .)</span></div><div style="color: rgb(0, 0, \
0); font-size: 13px; font-family: arial, helvetica, sans-serif; background-color: \
transparent; font-style:  normal;"><span><br></span></div><div style="color: rgb(0, \
0, 0); font-size: 13px; font-family: arial, helvetica, sans-serif; background-color: \
transparent; font-style: normal;"><span>There are far too many edge-cases, latency \
issues and join overhead in trying to consolidate datasources into a single datafile. \
&nbsp;Yes, rrdtool itself is more efficient with an insert like that but: 1) what if \
the datapoints are collected at different times? &nbsp;2) what if they are different \
steps? &nbsp;3) what if you want to add a datasource? 4) what if you simply have too \
many datasources to try and order/consolidate from a queue to the datafile? \
&nbsp;There is also non-trivial complexity, overhead and index'ing into an rrd \
datafile for specific datasources.&nbsp;</span></div><div style="color: rgb(0, 0, 0); \
font-size: 13px; font-family: arial, helvetica, sans-serif; background-color: \
transparent; font-style: normal;"><span><br></span></div><div style="color:  rgb(0, \
0, 0); font-size: 13px; font-family: arial, helvetica, sans-serif; background-color: \
transparent;"><span>Linux is extremely efficient at block updates, caching, \
open/closes, etc ... rrdtool on a low-end ( 4 cpu ) server with limited memory can \
easily store 160 thousand datasources per minute - on a better server, a <span \
style="font-style: italic; font-weight: bold;">whole lot</span> more than \
that.</span></div><div style="color: rgb(0, 0, 0); font-size: 13px; font-family: \
arial, helvetica, sans-serif; background-color: transparent; font-style: \
normal;"><span><br></span></div><div style="color: rgb(0, 0, 0); font-size: 13px; \
font-family: arial, helvetica, sans-serif; background-color: transparent; font-style: \
normal;"><span>'Distributed Cluster' isn't a good reason to not send all your \
time-series data to one server or small set of servers. &nbsp;The \
latency/request-time incurred in having to fetch data from those servers is usually \
not worth the  trade off.</span></div><div style="color: rgb(0, 0, 0); font-size: \
13px; font-family: arial, helvetica, sans-serif; background-color: transparent; \
font-style: normal;"><span><br></span></div><div style="color: rgb(0, 0, 0); \
font-size: 13px; font-family: arial, helvetica, sans-serif; background-color: \
transparent; font-style: normal;">Graphs of many hundreds of datasources computed for \
multi-day/week time-ranges in the result set are generated in 10s of milliseconds; \
not seconds ... rrdtool is quite capable of producing on-demand graphs of hundreds of \
graphs per second from one server.</div><div><br></div><div style="color: rgb(0, 0, \
0); font-size: 13px; font-family: arial, helvetica, sans-serif; background-color: \
transparent; font-style: normal;">I suggest you write a little test-script to write \
out rrd data to individual rrd datafiles to see 'how quick' your servers are at it. \
&nbsp;There is some OS tuning and rrdtool RRA sizing that will help;  especially \
don't keep hour or daily rollups ... the server has to hold onto those blocks to make \
the consolidation quick and not incur a read from disk.</div><div style="color: \
rgb(0, 0, 0); font-size: 13px; font-family: arial, helvetica, sans-serif; \
background-color: transparent; font-style: normal;"><br></div><div style="color: \
rgb(0, 0, 0); font-size: 13px; font-family: arial, helvetica, sans-serif; \
background-color: transparent; font-style: normal;">rrdtool scales rather simply ( \
and without rrdcached -- as I don't use that either. )</div><div style="color: rgb(0, \
0, 0); font-size: 13px; font-family: arial, helvetica, sans-serif; background-color: \
transparent; font-style: normal;"><br></div><div style="color: rgb(0, 0, 0); \
font-size: 13px; font-family: arial, helvetica, sans-serif; background-color: \
transparent; font-style: normal;">HTH</div><div style="color: rgb(0, 0, 0); \
font-size: 13px; font-family: arial, helvetica, sans-serif;  background-color: \
transparent; font-style: normal;">-Ryan</div><div style="color: rgb(0, 0, 0); \
font-size: 13px; font-family: arial, helvetica, sans-serif; background-color: \
transparent; font-style: normal;"><br></div>  <div style="font-family: arial, \
helvetica, sans-serif; font-size: 10pt;"> <div style="font-family: 'times new roman', \
'new york', times, serif; font-size: 12pt;"> <div dir="ltr"> <hr size="1">  <font \
size="2" face="Arial"> <b><span style="font-weight:bold;">From:</span></b> mikel \
&lt;infoeuskadi@gmail.com&gt;<br> <b><span style="font-weight: bold;">To:</span></b> \
rrd-users@lists.oetiker.ch <br> <b><span style="font-weight: bold;">Sent:</span></b> \
Saturday, April 20, 2013 4:48 AM<br> <b><span style="font-weight: \
bold;">Subject:</span></b> Re: [rrd-users] [unsure]  max DS per rrd file<br> </font> \
</div> <div class="y_msg_container"><br><br>Thanks for your fast reply \
again.<br><br>&gt;Maybe I don't understand what you say here. Some  metrics, or all \
metrics<br>are <br>&gt;queried? Both statements cannot be true at the same \
time?<br><br>Yes it is a tricky case. Apologies I was not clear enough.<br><br>In \
most cases all metrics are queried at the same time, because we want to<br>know what \
value they had at a given time. And classify them.<br><br>Very randomly we would \
query for just one metric.<br><br>&gt;Anyway, if you query only once in a while, \
maybe you should think about <br>&gt;reducing the number of RRAs in each RRD, and \
just let it consolidate at <br>&gt;graph time. Yes, this will mean you will have to \
wait longer for your graph <br>&gt;to be made, but you save processing time at every \
update.<br><br>This is interesting I did not think about that. Thanks for the \
hint.<br><br>Thanks for your help again.<br>m<br><br><br><br>--<br>View this message \
in context: <a href="http://rrd-mailinglists.937164.n2.nabble.com/max-DS-per-rrd-file-tp7580966p7580971.html"
  target="_blank">http://rrd-mailinglists.937164.n2.nabble.com/max-DS-per-rrd-file-tp7580966p7580971.html</a><br>Sent \
from the RRDtool Users Mailinglist mailing list archive at \
Nabble.com.<br><br>_______________________________________________<br>rrd-users \
mailing list<br><a ymailto="mailto:rrd-users@lists.oetiker.ch" \
href="mailto:rrd-users@lists.oetiker.ch">rrd-users@lists.oetiker.ch</a><br><a \
href="https://lists.oetiker.ch/cgi-bin/listinfo/rrd-users" \
target="_blank">https://lists.oetiker.ch/cgi-bin/listinfo/rrd-users</a><br><br><br></div> \
</div> </div>  </div></body></html>



_______________________________________________
rrd-users mailing list
rrd-users@lists.oetiker.ch
https://lists.oetiker.ch/cgi-bin/listinfo/rrd-users


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic