High Load on one of ElasticSearch Node on Ceph

I had this 1 Elasticsearch node that had higher load than his colleagues.

Previously, they were all living in perfect harmony

It all started when I rebooted the hypervisor the elastic node was on.

I looked in the KVM virsh files to see if the node had differences with the others. I noticed only this node wasn’t using the `virtio` driver for network and disk. I changed from `ide` and `e1000` driver to `virtio` for disk and network respectively. Rebooted the node but still couldn’t match the performance of his counterparts.

This problem had to be solved because the ElasticSearch cluster performance is directly affected by the slowest node in the cluster. If a node is slow, it’s better it’s not in the cluster. The 75h percentile requests was more than 1.5s. Usually it was around 400ms in peak hours. My 99.9th percentile exceeded 50 seconds. It was really dangerous. The cluster receives 1 million documents per minute.

`iotop -a` showed the same processes running but had high IO on `[jbd2/vdb-8]`. It just confirmed our problem. But no solution as of yet.

I noticed on the network graph that the node was not able to send more than 600MB per 5 mins at all times when previously it could.

There must be some kind of restriction on network. It must be when the hypervisor rebooted, the network negotiation had some issues. Comparing values from 2 hypervisors confirmed the hypothesis

root@hypervisor0:~# mii-tool eth0
eth0: negotiated 100baseTx-FD, link ok

root@hypervisor1:~# mii-tool eth0
eth0: negotiated 1000baseT-FD flow-control, link ok

We can see the speed difference is major here. The VM reported high disk IO because Ceph relies on network to read/write data.

Monitor your cluster farm with Collectiva (Beta), a product of nayarweb.com

LXC, why you no have single executable?

Decided to play with LXC again.

Installation on Ubuntu
$ sudo apt-get install lxc

I tried to know what are the features of the software once installed. Usually I just type the command on my console and it is supposed to give the help and description or something.

root@nayar-laptop:~$ lxc
No command 'lxc' found, did you mean:
Command 'llc' from package 'llvm' (universe)
Command 'lc' from package 'mono-devel' (main)
Command 'axc' from package 'afnix' (universe)
Command 'lpc' from package 'lprng' (universe)
Command 'lpc' from package 'lpr' (universe)
Command 'lpc' from package 'cups-bsd' (main)
lxc: command not found

To find the appropriate commands to be used, have to find the executables in /usr/bin

$ ls /usr/bin/ | grep lxc

There I got all the executables I needed to know.

lxc-attach
lxc-autostart
lxc-cgroup
lxc-checkconfig
lxc-clone
lxc-config
lxc-console
lxc-create
lxc-destroy
lxc-device
lxc-execute
lxc-freeze
lxc-info
lxc-ls
lxc-monitor
lxc-snapshot
lxc-start
lxc-start-ephemeral
lxc-stop
lxc-unfreeze
lxc-unshare
lxc-usernsexec
lxc-wait

Or I could have done. Totally forgot that.
man lxc

I now gives all the details about the software. But I am used to have a single executable with the actions passed as parameters. Would it be better if lxc-info was like lxc info?

Docker has a single executable command to run. Anyways. Gotta play with LXC now.