{"id":3604,"date":"2017-05-24T11:23:36","date_gmt":"2017-05-24T07:23:36","guid":{"rendered":"https:\/\/nayarweb.com\/blog\/?p=3604"},"modified":"2017-05-24T11:23:36","modified_gmt":"2017-05-24T07:23:36","slug":"high-load-on-one-of-elasticsearch-node-on-ceph","status":"publish","type":"post","link":"https:\/\/nayarweb.com\/blog\/2017\/high-load-on-one-of-elasticsearch-node-on-ceph\/","title":{"rendered":"High Load  on one of ElasticSearch Node on Ceph"},"content":{"rendered":"<p>I had this 1 Elasticsearch node that had higher load than his colleagues.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-3605\" src=\"https:\/\/nayarweb.com\/blog\/wp-content\/uploads\/2017\/05\/Spectacle.TJ4438.png\" alt=\"\" width=\"676\" height=\"739\" srcset=\"https:\/\/nayarweb.com\/blog\/wp-content\/uploads\/2017\/05\/Spectacle.TJ4438.png 676w, https:\/\/nayarweb.com\/blog\/wp-content\/uploads\/2017\/05\/Spectacle.TJ4438-274x300.png 274w\" sizes=\"auto, (max-width: 676px) 100vw, 676px\" \/><\/p>\n<p>Previously, they were all living in perfect harmony<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-3606\" src=\"https:\/\/nayarweb.com\/blog\/wp-content\/uploads\/2017\/05\/Spectacle.kn4438.png\" alt=\"\" width=\"677\" height=\"345\" srcset=\"https:\/\/nayarweb.com\/blog\/wp-content\/uploads\/2017\/05\/Spectacle.kn4438.png 677w, https:\/\/nayarweb.com\/blog\/wp-content\/uploads\/2017\/05\/Spectacle.kn4438-300x153.png 300w\" sizes=\"auto, (max-width: 677px) 100vw, 677px\" \/><\/p>\n<p>It all started when I rebooted the hypervisor the elastic node was on.<\/p>\n<p>I looked in the KVM virsh files to see if the node had differences with the others. I noticed only this node wasn&#8217;t using the `virtio` driver for network and disk. I changed from `ide` and `e1000` driver to `virtio` for disk and network respectively. Rebooted the node but still couldn&#8217;t match the performance of his counterparts.<\/p>\n<p>This problem had to be solved because the ElasticSearch cluster performance is directly affected by the slowest node in the cluster. If a node is slow, it&#8217;s better it&#8217;s not in the cluster. The 75h percentile requests was more than 1.5s. Usually it was around 400ms in peak hours. My 99.9th percentile exceeded 50 seconds. It was really dangerous. The cluster receives 1 million documents per minute.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-3607\" src=\"https:\/\/nayarweb.com\/blog\/wp-content\/uploads\/2017\/05\/Spectacle.TJ4498.png\" alt=\"\" width=\"667\" height=\"552\" srcset=\"https:\/\/nayarweb.com\/blog\/wp-content\/uploads\/2017\/05\/Spectacle.TJ4498.png 667w, https:\/\/nayarweb.com\/blog\/wp-content\/uploads\/2017\/05\/Spectacle.TJ4498-300x248.png 300w\" sizes=\"auto, (max-width: 667px) 100vw, 667px\" \/><\/p>\n<p>`iotop -a` showed the same processes running but had high IO on `[jbd2\/vdb-8]`. It just confirmed our problem. But no solution as of yet.<\/p>\n<p>I noticed on the network graph that the node was not able to send more than 600MB per 5 mins at all times when previously it could.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-large wp-image-3608\" src=\"https:\/\/nayarweb.com\/blog\/wp-content\/uploads\/2017\/05\/Spectacle.dT4498-1024x199.png\" alt=\"\" width=\"640\" height=\"124\" srcset=\"https:\/\/nayarweb.com\/blog\/wp-content\/uploads\/2017\/05\/Spectacle.dT4498-1024x199.png 1024w, https:\/\/nayarweb.com\/blog\/wp-content\/uploads\/2017\/05\/Spectacle.dT4498-300x58.png 300w, https:\/\/nayarweb.com\/blog\/wp-content\/uploads\/2017\/05\/Spectacle.dT4498-768x149.png 768w, https:\/\/nayarweb.com\/blog\/wp-content\/uploads\/2017\/05\/Spectacle.dT4498-1272x247.png 1272w, https:\/\/nayarweb.com\/blog\/wp-content\/uploads\/2017\/05\/Spectacle.dT4498.png 1370w\" sizes=\"auto, (max-width: 640px) 100vw, 640px\" \/><\/p>\n<p>There must be some kind of restriction on network. It must be when the hypervisor rebooted, the network negotiation had some issues. Comparing values from 2 hypervisors confirmed the hypothesis<\/p>\n<pre>root@hypervisor0:~# mii-tool eth0\r\neth0: negotiated 100baseTx-FD, link ok\r\n\r\nroot@hypervisor1:~# mii-tool eth0\r\neth0: negotiated 1000baseT-FD flow-control, link ok\r\n<\/pre>\n<p>We can see the speed difference is major here. The VM reported high disk IO because Ceph relies on network to read\/write data.<\/p>\n<div>\n<h3>Monitor your cluster farm with <a href=\"https:\/\/nayarweb.com\/collectiva\"><span style=\"color: #ff9900;\">Collectiva (Beta)<\/span><\/a>, a product of nayarweb.com<\/h3>\n<\/div>\n","protected":false},"excerpt":{"rendered":"<p>I had this 1 Elasticsearch node that had higher load than his colleagues. Previously, they were all living in perfect harmony It all started when I rebooted the hypervisor the elastic node was on. I looked in the KVM virsh files to see if the node had differences with the others. I noticed only this &hellip; <a href=\"https:\/\/nayarweb.com\/blog\/2017\/high-load-on-one-of-elasticsearch-node-on-ceph\/\" class=\"continue-reading\">Continue reading <span class=\"screen-reader-text\">High Load  on one of ElasticSearch Node on Ceph<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[210],"tags":[230,225,209],"class_list":["post-3604","post","type-post","status-publish","format-standard","hentry","category-technology","tag-ceph","tag-elasticsearch","tag-virtualisation"],"_links":{"self":[{"href":"https:\/\/nayarweb.com\/blog\/wp-json\/wp\/v2\/posts\/3604","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/nayarweb.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/nayarweb.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/nayarweb.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/nayarweb.com\/blog\/wp-json\/wp\/v2\/comments?post=3604"}],"version-history":[{"count":2,"href":"https:\/\/nayarweb.com\/blog\/wp-json\/wp\/v2\/posts\/3604\/revisions"}],"predecessor-version":[{"id":3610,"href":"https:\/\/nayarweb.com\/blog\/wp-json\/wp\/v2\/posts\/3604\/revisions\/3610"}],"wp:attachment":[{"href":"https:\/\/nayarweb.com\/blog\/wp-json\/wp\/v2\/media?parent=3604"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/nayarweb.com\/blog\/wp-json\/wp\/v2\/categories?post=3604"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/nayarweb.com\/blog\/wp-json\/wp\/v2\/tags?post=3604"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}