To start with I will give you a little context about the ELK setup I was working on and how it was being used to better understand the issue I will describe I ran into.
I’m currently using the ELK stack for logging purposes, so I dump all log messages from my code onto a RabbitMQ and from there it will get consumed by multiple Logstash server dumping the messages into Elasticsearch. Those Logstash server are setup to push to different indexes in Elasticsearch, mainly to differentiate the log messages.
So imagine there is an error log queue and an audit log queue in RabbitMQ, you now have a Logstash server to consume from both queues and depending on the message/queue it should push to either the error index or the audit index.
So each Logstash server had two config files, one for every queue to consume from. Every config file had the RabbitMQ input plugin, a custom transformation for some fields and the Elasticsearch output plugin.
With this setup Logstash was consuming fine and all messages got pushed to either the error or the audit index. But while figuring out a different issue, something seemed off in the logging. It looked like that every message was being logged twice. So I got some message directly out of RabbitMQ with another queue setup, because I’m using topics, I could just add another queue with the same routing key and got all the messages to check what they were. Turned out there were no duplicates in the queue. So it must be the ELK.
One thing that was weird in the log messages was that the computed field MachineName (basically the name of the Logstash server processing the message) was populated twice, so the value was basically “Logstash1Logstash1” where it should only be “Logstash1” and the only place where I set that field was in the Logstash config, which means its basically running twice somehow.
After checking the configs over and over again, changing a little bit here and there and even turning off the second Logstash server, nothing seem to help and I was googling around. So I googled duplicate messages and all obvious things, but couldn’t really find anything helpful.
But then I found this:
“Unless you use the multi-pipeline feature Logstash configuration files aren’t independent. Logstash merges them together so that all events from all events will reach all filters and outputs unless you use conditionals.”https://discuss.elastic.co/t/multiple-logstash-config-files/130471/2
Well remember I said I have multiple configurations because of the different indexes and inputs? I can tell you now, that’s the issue.
Both configurations had the output plugin for Elasticsearch and the transformation for the MachineName field. So what Logstash does, is merging the different config files together and in the end it was basically pushing the messages twice to Elasticsearch and transforming the MachineName field twice.
After deleting the output plugin and transformation in one of the two configs, it suddenly all worked fine and no more duplicate messages.
So if you are running multiple configurations because you have multiple inputs even if its just pushing to the same index, remember not to add the same output plugins in all your configs or you will end up with nice duplicate messages.
I also read about the concept of multiple-pipelines, which might have helped not running into this issue.