‘drain now’ mechanism

My OPW project has several parts (I talked about them on my previous post here). I started my work with the first part which is called “the drain now” mechanism. This mechanism aims to drain the slaves when SIGUSR1 signal is sent to them.

‘draining’ in this context means killing an entity (process) and all that it’s running underneath that entity (all the processes that were started by that entity). So ‘draining the slave’ means killing the slave processes and all the jobs/tasks the slave is running.

For the first part of the mechanism, I implemented a signal handler which is triggered when SIGUSR1 signal is issued. The handler kills everything that runs underneath the slave and after that shuts down the slave.

For the second part of the mechanism, I adjusted the first part by  sending an unregister request message from the slave to the master, before shutting down the slave. This is because normally, the master waits up to the health checking delay (~75 seconds) until it considers the tasks that were running on the slave as lost.  With this change, the master will mark the tasks as lost (and do all the necessary things when a task becomes lost) as soon as it will receive the unregister request message. In addition, the master will remove the slave from its lists.

At a first glance all this seems pretty easy to do, but things tend to get a bit complicated when you’re working on a big project and lots of companies relay their infrastructure on it (including Twitter!!). Every piece of code that you add has to be perfect so that the application remains efficient and without bugs. So my patch had a couple of review iterations until it was ready to be committed. My mentor (Ben Mahler) helped me a lot with reviewing all my code and giving me tips.

So as a learned lesson : ‘keep it simple’. When you have something to do, even if it looks easy, think about it twice, maybe there’s even an easier way you could do it.

Thanks for reading and have a great day 🙂 !


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s