[prev in list] [next in list] [prev in thread] [next in thread] 

List:       rhq-devel
Subject:    New Storage Cluster - Sizing Guidelines
From:       Stefan Negrea <snegrea () redhat ! com>
Date:       2013-12-19 23:24:01
Message-ID: 684154007.9493997.1387495441165.JavaMail.root () redhat ! com
[Download RAW message or body]

Hello Everybody,

We've been working on sizing guidelines for the new Storage Cluster. We focused on \
two aspects: disk and compute power. We condensed all of our findings into a Google \
Spreadsheet and wiki page (more about this below). While we are still capturing data \
and refining results we wanted to get this in the hands of users to get feedback. 

Data size on disk was our primary focus since it is relative to the number of metrics \
schedules and collection interval. The underlying data storage is not trivial but \
thanks to John Sanda's hard work we now have an algorithm that gives a baseline. In \
terms of computer power, we wanted to see how far we can scale a deployment with the \
new Storage Cluster. While we have collected some early numbers via simulations, this \
time we wanted to use a real life deployment. We are still collecting data for the \
compute part and we are still far from the upper limit.

The Google Spreadsheet:  
URL: http://goo.gl/kWI29e
Sheets:
 1. Disk Size Algorithm - Interactive algorithm to calculate the baseline for data \
size on disk. Either update rows 6 through 9, columns A & B or add new rows in that \
section as per your installation. Once all the data is inserted you will get a total \
the bottom of the spreadsheet with estimate size per node. The number of storage \
nodes is also configurable (row 28, column B). If you have more storage nodes the \
data will get distributed across the cluster so the amount of disk required per node \
decreases.  2. Discrete Data - data collected from running simulations for collecting \
metrics and running aggregation. The amount disk used was captured at the end of each \
run without forcing compaction.  3. Long Run Charts & Long Run Data - During the 60 \
days simulation, data size on disk was collected on an hourly basis. That gives an \
indication on the patterns for how data grows (along with occasional spikes due to \
compaction).  4. Metrics Collection testbed - Data captured during the compute power \
testing. Data here is very preliminary and we will augment it as we collect more.   \
5. Changelog - we will keep a change log as we update the content. How Use it: 
 1. If you just want to explore our findings, just follow the link above.
 2. If you want to get an estimate for your particular installation or future \
installation, make a copy of the spreedsheet on your Google Drive. Then update the \
columns/rows marked with bright bleu on the "Disk Size Algorithm" sheet. Make sure to \
check back periodically the original spreadsheet as we are still working on \
refinements.

Wiki with additional information:
https://docs.jboss.org/author/display/RHQ49/Storage+Sizing+Analysis


Please feel free to reach to us if you have any questions or comments, we will be \
glad to answer and help. Any feedback is more than appreciated. 


Thank you,
Stefan Negrea

Software Engineer

_______________________________________________
rhq-devel mailing list
rhq-devel@lists.fedorahosted.org
https://lists.fedorahosted.org/mailman/listinfo/rhq-devel


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic