[prev in list] [next in list] [prev in thread] [next in thread]
List: ceph-devel
Subject: Performance drop on Ubuntu 14.04 LTS for 4K/8K workload
From: Somnath Roy <Somnath.Roy () sandisk ! com>
Date: 2015-01-29 1:19:28
Message-ID: 755F6B91B3BE364F9BCA11EA3F9E0C6F28280B95 () SACMBXIP02 ! sdcorp ! global ! sandisk ! com
[Download RAW message or body]
Hi,
I have a two node cluster with 32 OSDs on each (one per drive). It was working fine \
till we spot a severe performance degradation for 4K/8K workload. I saw one node is \
consuming ~5 times more cpu than other node for serving the same amount of inbound \
request. This is not related to disks since this is happening for smaller workload \
serving out of memory as well. Running perf top on both the server reveals the \
following.
Server A (consuming more cpu):
--------------------------------------
16.06% [kernel] [k] read_hpet
5.85% [vdso] [.] 0x0000000000000dd7
3.62% [kernel] [k] _raw_spin_lock
2.76% ceph-osd [.] crush_hash32_3
1.97% libtcmalloc.so.4.1.2 [.] operator new(unsigned long)
1.87% libc-2.19.so [.] 0x0000000000161f0b
1.34% [kernel] [k] _raw_spin_lock_irqsave
1.14% libtcmalloc.so.4.1.2 [.] operator delete(void*)
1.06% ceph-osd [.] 0x00000000007f3b26
0.99% perf [.] 0x0000000000056584
0.96% libstdc++.so.6.0.19 [.] std::basic_string<char, std::char_traits<char>, \
std::allocator<char> >::basic_string(std::string const&) 0.77% [kernel] \
[k] futex_wake 0.69% libstdc++.so.6.0.19 [.] 0x000000000005b644
Server B (the good one):
----------------------------
3.47% ceph-osd [.] crush_hash32_3
2.73% [kernel] [k] _raw_spin_lock
2.30% libtcmalloc.so.4.1.2 [.] operator new(unsigned long)
2.24% libc-2.19.so [.] 0x0000000000098e13
1.33% libtcmalloc.so.4.1.2 [.] operator delete(void*)
1.32% [kernel] [k] futex_wake
1.21% [kernel] [k] __schedule
1.20% libstdc++.so.6.0.19 [.] std::basic_string<char, std::char_traits<char>, \
std::allocator<char> >::basic_string(std::string const&) 1.14% ceph-osd \
[.] 0x00000000007f3e5f 1.13% [kernel] [k] _raw_spin_lock_irqsave
0.97% libstdc++.so.6.0.19 [.] 0x000000000005b651
0.87% [kernel] [k] futex_requeue
0.87% [kernel] [k] __copy_user_nocache
0.80% perf [.] 0x000000000005659e
0.72% [kernel] [k] __d_lookup_rcu
0.69% libpthread-2.19.so [.] pthread_mutex_trylock
0.68% [kernel] [k] futex_wake_op
0.67% libstdc++.so.6.0.19 [.] std::string::_Rep::_M_dispose(std::allocator<char> \
const&) 0.66% libc-2.19.so [.] vfprintf
0.62% libtcmalloc.so.4.1.2 [.] \
tcmalloc::ThreadCache::ReleaseToCentralCache(tcmalloc::ThreadCache::FreeList*, \
unsigned long, int) 0.61% ceph-osd [.] Mutex::Lock(bool)
0.57% [kernel] [k] tcp_sendmsg
So, it seems the gettimeofday + VDSO + read_hpet () is the primary reason for more \
cpu usage. Both the servers are identical and I couldn't figure out why read_hpet() \
is consuming way more cpu on Server A. Restarting ceph-services didn't help. Couldn't \
find any abnormal message in syslog either. My last resort was to reboot and as \
always that helped ☺…Now, both the nodes are behaving similarly.
Anybody has similar experience ? Am I hitting any Ubuntu (14.04.LTS , \
3.13.0-32-generic) bug here ?
Any help/suggestion would be very helpful.
Thanks & Regards
Somnath
________________________________
PLEASE NOTE: The information contained in this electronic mail message is intended \
only for the use of the designated recipient(s) named above. If the reader of this \
message is not the intended recipient, you are hereby notified that you have received \
this message in error and that any review, dissemination, distribution, or copying of \
this message is strictly prohibited. If you have received this communication in \
error, please notify the sender by telephone or e-mail (as shown above) immediately \
and destroy any and all copies of this message in your possession (whether hard \
copies or electronically stored copies).
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic