[prev in list] [next in list] [prev in thread] [next in thread] 

List:       collectd
Subject:    Re: [collectd] collectd with rrdcached,
From:       Ulf Zimmermann <ulf () openlane ! com>
Date:       2010-12-31 0:07:52
Message-ID: 867B874CEC2101449E81092D4B3DB7B4BD9DDB35 () msmpk02 ! corp ! autc ! com
[Download RAW message or body]

[Attachment #2 (text/plain)]

Yes, they all go to a central server. I tried using the collectd network plugin a \
longer time ago and the results was too many lost updates. Because of that I used to \
run collectd locally and copied the rrd files to a central locations every 30 \
minutes, not so very effective. Then we starting going the vmware way and with the \
older version of collectd and rrdtool, many small writes were destroying our SAN. At \
that point I took the way of rrdcached. Works mostly great, allows flushing when I \
graph, etc. Just there have been some memory issues/bugs, which should be fixed now. \
Due to those bugs I had talked about how collectd rrdcached plugin isn't reconnecting \
and was told it should be reconnecting. But it doesn't for me, even with the latest \
versions.


From: XANi [mailto:xani666@gmail.com]
Sent: Thursday, December 30, 2010 10:25 AM
To: Ulf Zimmermann
Cc: 'collectd@verplant.org'
Subject: Re: [collectd] collectd with rrdcached, not reconnecting when rrdcached gets \
stopped/restarted






So I am working on upgrading a number of things right now to collectd 4.10.2 and \
rrdtool 1.4.5. Unfortunately one of my major problems, is still around. When \
rrdcached dies/stops and then gets restarted, collectd will not reconnect. There was \
a discussion of this on IRC at some point, but I never got back to do more testing. \
Even with the latest version of rrdtool on client and server, collectd will go into a \
spin with messages like:



Dec 29 23:45:40 appbuild01 collectd: collectd startup succeeded

Dec 29 23:49:00 appbuild01 collectd[10338]: rrdcached plugin: rrdc_update \
(appbuild01.autc.com/cpu-0/cpu-user.rrd, [1293695340:49451049], 1) failed with status \
-3.

Dec 29 23:49:00 appbuild01 collectd[10338]: Filter subsystem: Built-in target \
`write': Dispatching value to all write plugins failed with status -1.

Dec 29 23:49:00 appbuild01 collectd[10338]: rrdcached plugin: rrdc_update \
(appbuild01.autc.com/cpu-0/cpu-nice.rrd, [1293695340:28472], 1) failed with status \
-3.

Dec 29 23:49:00 appbuild01 collectd[10338]: Filter subsystem: Built-in target \
`write': Dispatching value to all write plugins failed with status -1.

Dec 29 23:49:00 appbuild01 collectd[10338]: rrdcached plugin: rrdc_update \
(appbuild01.autc.com/cpu-0/cpu-system.rrd, [1293695340:10117087], 1) failed with \
status -3.



At this point I have to restart collectd and everything will be fine. The pain is \
having to do this on > 300 machines. Hi,

are u directing all 300 machines to same rrdcached server ? Wouldn't it be better to \
put collectd with rrdcached and network plugin (set up to act as server) enabled on \
machine that is collecting data, and just use network plugin to send data to that \
machine on all other machines ?

Regards



--

Mariusz Gronczewski (XANi) <xani666@gmail.com<mailto:xani666@gmail.com>>

GnuPG: 0xEA8ACE64

http://devrandom.pl


[Attachment #3 (text/html)]

<html xmlns:v="urn:schemas-microsoft-com:vml" \
xmlns:o="urn:schemas-microsoft-com:office:office" \
xmlns:w="urn:schemas-microsoft-com:office:word" \
xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" \
xmlns="http://www.w3.org/TR/REC-html40"><head><meta http-equiv=Content-Type \
content="text/html; charset=utf-8"><meta name=Generator content="Microsoft Word 14 \
(filtered medium)"><style><!-- /* Font Definitions */
@font-face
	{font-family:"Cambria Math";
	panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
	{font-family:Calibri;
	panose-1:2 15 5 2 2 2 4 3 2 4;}
@font-face
	{font-family:Tahoma;
	panose-1:2 11 6 4 3 5 4 4 2 4;}
@font-face
	{font-family:Consolas;
	panose-1:2 11 6 9 2 2 4 3 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
	{margin:0in;
	margin-bottom:.0001pt;
	font-size:12.0pt;
	font-family:"Times New Roman","serif";}
a:link, span.MsoHyperlink
	{mso-style-priority:99;
	color:blue;
	text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
	{mso-style-priority:99;
	color:purple;
	text-decoration:underline;}
pre
	{mso-style-priority:99;
	mso-style-link:"HTML Preformatted Char";
	margin:0in;
	margin-bottom:.0001pt;
	font-size:10.0pt;
	font-family:"Courier New";}
span.HTMLPreformattedChar
	{mso-style-name:"HTML Preformatted Char";
	mso-style-priority:99;
	mso-style-link:"HTML Preformatted";
	font-family:Consolas;}
span.EmailStyle19
	{mso-style-type:personal-reply;
	font-family:"Calibri","sans-serif";
	color:#1F497D;}
.MsoChpDefault
	{mso-style-type:export-only;
	font-size:10.0pt;}
@page WordSection1
	{size:8.5in 11.0in;
	margin:1.0in 1.0in 1.0in 1.0in;}
div.WordSection1
	{page:WordSection1;}
--></style><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1" />
</o:shapelayout></xml><![endif]--></head><body lang=EN-US link=blue vlink=purple><div \
class=WordSection1><p class=MsoNormal><span \
style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'>Yes, they \
all go to a central server. I tried using the collectd network plugin a longer time \
ago and the results was too many lost updates. Because of that I used to run collectd \
locally and copied the rrd files to a central locations every 30 minutes, not so very \
effective. Then we starting going the vmware way and with the older version of \
collectd and rrdtool, many small writes were destroying our SAN. At that point I took \
the way of rrdcached. Works mostly great, allows flushing when I graph, etc. Just \
there have been some memory issues/bugs, which should be fixed now. Due to those bugs \
I had talked about how collectd rrdcached plugin isn't reconnecting and was told it \
should be reconnecting. But it doesn't for me, even with the latest \
versions.<o:p></o:p></span></p><p class=MsoNormal><span \
style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'><o:p>&nbsp;</o:p></span></p><p \
class=MsoNormal><span \
style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'><o:p>&nbsp;</o:p></span></p><div \
style='border:none;border-left:solid blue 1.5pt;padding:0in 0in 0in 4.0pt'><div><div \
style='border:none;border-top:solid #B5C4DF 1.0pt;padding:3.0pt 0in 0in 0in'><p \
class=MsoNormal><b><span \
style='font-size:10.0pt;font-family:"Tahoma","sans-serif"'>From:</span></b><span \
style='font-size:10.0pt;font-family:"Tahoma","sans-serif"'> XANi \
[mailto:xani666@gmail.com] <br><b>Sent:</b> Thursday, December 30, 2010 10:25 \
AM<br><b>To:</b> Ulf Zimmermann<br><b>Cc:</b> \
'collectd@verplant.org'<br><b>Subject:</b> Re: [collectd] collectd with rrdcached, \
not reconnecting when rrdcached gets \
stopped/restarted<o:p></o:p></span></p></div></div><p \
class=MsoNormal><o:p>&nbsp;</o:p></p><p \
class=MsoNormal><br><br><o:p></o:p></p><pre><o:p>&nbsp;</o:p></pre><pre>So I am \
working on upgrading a number of things right now to collectd 4.10.2 and rrdtool \
1.4.5. Unfortunately one of my major problems, is still around. When rrdcached \
dies/stops and then gets restarted, collectd will not reconnect. There was a \
discussion of this on IRC at some point, but I never got back to do more testing. \
Even with the latest version of rrdtool on client and server, collectd will go into a \
spin with messages like:<o:p></o:p></pre><pre><o:p>&nbsp;</o:p></pre><pre>Dec 29 \
23:45:40 appbuild01 collectd: collectd startup succeeded<o:p></o:p></pre><pre>Dec 29 \
23:49:00 appbuild01 collectd[10338]: rrdcached plugin: rrdc_update \
(appbuild01.autc.com/cpu-0/cpu-user.rrd, [1293695340:49451049], 1) failed with status \
-3.<o:p></o:p></pre><pre>Dec 29 23:49:00 appbuild01 collectd[10338]: Filter \
subsystem: Built-in target `write': Dispatching value to all write plugins failed \
with status -1.<o:p></o:p></pre><pre>Dec 29 23:49:00 appbuild01 collectd[10338]: \
rrdcached plugin: rrdc_update (appbuild01.autc.com/cpu-0/cpu-nice.rrd, \
[1293695340:28472], 1) failed with status -3.<o:p></o:p></pre><pre>Dec 29 23:49:00 \
appbuild01 collectd[10338]: Filter subsystem: Built-in target `write': Dispatching \
value to all write plugins failed with status -1.<o:p></o:p></pre><pre>Dec 29 \
23:49:00 appbuild01 collectd[10338]: rrdcached plugin: rrdc_update \
(appbuild01.autc.com/cpu-0/cpu-system.rrd, [1293695340:10117087], 1) failed with \
status -3.<o:p></o:p></pre><pre><o:p>&nbsp;</o:p></pre><pre>At this point I have to \
restart collectd and everything will be fine. The pain is having to do this on &gt; \
300 machines.<o:p></o:p></pre><p class=MsoNormal \
style='margin-bottom:12.0pt'>Hi,<br><br>are u directing all 300 machines to same \
rrdcached server ? Wouldn't it be better to put collectd with rrdcached and network \
plugin (set up to act as server) enabled on machine that is collecting data, and just \
use network plugin to send data to that machine on all other machines \
?<br><br>Regards<o:p></o:p></p><table class=MsoNormalTable border=0 cellspacing=0 \
cellpadding=0 width="100%" style='width:100.0%'><tr><td style='padding:0in 0in 0in \
0in'><pre><o:p>&nbsp;</o:p></pre><pre>-- <o:p></o:p></pre><pre>Mariusz Gronczewski \
(XANi) &lt;<a href="mailto:xani666@gmail.com">xani666@gmail.com</a>&gt;<o:p></o:p></pre><pre>GnuPG: \
0xEA8ACE64<o:p></o:p></pre><pre><a \
href="http://devrandom.pl">http://devrandom.pl</a><o:p></o:p></pre></td></tr></table><p \
class=MsoNormal><o:p>&nbsp;</o:p></p></div></div></body></html>



_______________________________________________
collectd mailing list
collectd@verplant.org
http://mailman.verplant.org/listinfo/collectd

--===============0509053283==--


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic