[prev in list] [next in list] [prev in thread] [next in thread] 

List:       ceph-devel
Subject:    Performance drop on Ubuntu 14.04 LTS for 4K/8K workload
From:       Somnath Roy <Somnath.Roy () sandisk ! com>
Date:       2015-01-29 1:19:28
Message-ID: 755F6B91B3BE364F9BCA11EA3F9E0C6F28280B95 () SACMBXIP02 ! sdcorp ! global ! sandisk ! com
[Download RAW message or body]


Hi,
I have a two node cluster with 32 OSDs  on each (one per drive). It was working fine \
till we spot a severe performance degradation for 4K/8K workload. I saw one node is \
consuming ~5 times more cpu than other node for serving the same amount of inbound \
request. This is not related to disks since this is happening for smaller workload \
serving out of memory as well. Running perf top on both the server reveals the \
following.

Server A (consuming more cpu):
--------------------------------------

16.06%  [kernel]              [k] read_hpet
  5.85%  [vdso]                [.] 0x0000000000000dd7
  3.62%  [kernel]              [k] _raw_spin_lock
  2.76%  ceph-osd              [.] crush_hash32_3
  1.97%  libtcmalloc.so.4.1.2  [.] operator new(unsigned long)
  1.87%  libc-2.19.so          [.] 0x0000000000161f0b
  1.34%  [kernel]              [k] _raw_spin_lock_irqsave
  1.14%  libtcmalloc.so.4.1.2  [.] operator delete(void*)
  1.06%  ceph-osd              [.] 0x00000000007f3b26
  0.99%  perf                  [.] 0x0000000000056584
  0.96%  libstdc++.so.6.0.19   [.] std::basic_string<char, std::char_traits<char>, \
std::allocator<char> >::basic_string(std::string const&)  0.77%  [kernel]             \
[k] futex_wake  0.69%  libstdc++.so.6.0.19   [.] 0x000000000005b644


Server B (the good one):
----------------------------

3.47%  ceph-osd              [.] crush_hash32_3
  2.73%  [kernel]              [k] _raw_spin_lock
  2.30%  libtcmalloc.so.4.1.2  [.] operator new(unsigned long)
  2.24%  libc-2.19.so          [.] 0x0000000000098e13
  1.33%  libtcmalloc.so.4.1.2  [.] operator delete(void*)
  1.32%  [kernel]              [k] futex_wake
  1.21%  [kernel]              [k] __schedule
  1.20%  libstdc++.so.6.0.19   [.] std::basic_string<char, std::char_traits<char>, \
std::allocator<char> >::basic_string(std::string const&)  1.14%  ceph-osd             \
[.] 0x00000000007f3e5f  1.13%  [kernel]              [k] _raw_spin_lock_irqsave
  0.97%  libstdc++.so.6.0.19   [.] 0x000000000005b651
  0.87%  [kernel]              [k] futex_requeue
  0.87%  [kernel]              [k] __copy_user_nocache
  0.80%  perf                  [.] 0x000000000005659e
  0.72%  [kernel]              [k] __d_lookup_rcu
  0.69%  libpthread-2.19.so    [.] pthread_mutex_trylock
  0.68%  [kernel]              [k] futex_wake_op
  0.67%  libstdc++.so.6.0.19   [.] std::string::_Rep::_M_dispose(std::allocator<char> \
const&)  0.66%  libc-2.19.so          [.] vfprintf
  0.62%  libtcmalloc.so.4.1.2  [.] \
tcmalloc::ThreadCache::ReleaseToCentralCache(tcmalloc::ThreadCache::FreeList*, \
unsigned long, int)  0.61%  ceph-osd              [.] Mutex::Lock(bool)
  0.57%  [kernel]              [k] tcp_sendmsg

So, it seems the gettimeofday + VDSO + read_hpet () is the primary reason for more \
cpu usage. Both the servers are identical and I couldn't figure out why read_hpet() \
is consuming way more cpu on Server A. Restarting ceph-services didn't help. Couldn't \
find any abnormal message in syslog either. My last resort was to reboot and as \
always that helped ☺…Now, both the nodes are behaving similarly.

Anybody has similar experience ? Am I hitting any Ubuntu (14.04.LTS , \
3.13.0-32-generic) bug here ?

Any help/suggestion would be very helpful.

Thanks & Regards
Somnath




________________________________

PLEASE NOTE: The information contained in this electronic mail message is intended \
only for the use of the designated recipient(s) named above. If the reader of this \
message is not the intended recipient, you are hereby notified that you have received \
this message in error and that any review, dissemination, distribution, or copying of \
this message is strictly prohibited. If you have received this communication in \
error, please notify the sender by telephone or e-mail (as shown above) immediately \
and destroy any and all copies of this message in your possession (whether hard \
copies or electronically stored copies).

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic