To start with I will give you a little context about the ELK setup I was working on and how it was being used to better understand the issue I will describe I ran into.
I’m currently using the ELK stack for logging purposes, so I dump all log messages from my code onto a RabbitMQ and from there it will get consumed by multiple Logstash server dumping the messages into Elasticsearch. Those Logstash server are setup to push to different indexes in Elasticsearch, mainly to differentiate the log messages.
So imagine there is an error log queue and an audit log queue in RabbitMQ, you now have a Logstash server to consume from both queues and depending on the message/queue it should push to either the error index or the audit index.
So each Logstash server had two config files, one for every queue to consume from. Every config file had the RabbitMQ input plugin, a custom transformation for some fields and the Elasticsearch output plugin.
Imaging you are running RabbitMQ in production, everything is fine…until you introduce a new product or feature which increases the amount of queues and connections you have. Suddenly your system is experiencing a drop in performance at peak time and you have no idea whats going. You log into RabbitMQ management website and under “Overview” the File Descriptors are showing a very low number of available descriptors, it is possible you started with the default number. If you are lucky you started with 4096 if not you might have started with 1024, which is incredibly low unless you are only running on a few queues and connections but in my case I was running with over 10k queues and over 500 connections. So is that number affecting the performance of your RabbitMQ? It certainly does, that number shows you how many file handles and network sockets RabbitMQ has available and you can imagine what happens if its running out of those, like mine was on Saturday afternoon. But the good thing is, you can increase them!
First lets check the limits.
Under “open files” you might see 4096, lets change that.
First we are increasing the maximum number of files in sysctl.