The project I will be working on is called “Slave unregistration in Mesos”. Before going into more detail about it, I will briefly explain Mesos’ architecture, so that my project will make more sense.
It is said that a picture values a million words, so here’s a picture which describes Mesos’ architecture (source here).
As you can see in the picture, Mesos has a distributed architecture, with several entities:
There is only one leading master per cluster; for high availability there can be more masters but only one is active
These are hosts in the datacenter and one slave daemon is running on each host; these hosts are the ones on which all the tasks are run
It is responsible for the leader election between the masters and also is used by Mesos slaves to discover the master
These are applications that are doing analytic things (Hadoop, Spark, Aurora, MPI) which run their tasks on top of Mesos.
Multiple frameworks can run on the same Mesos cluster because Mesos provides good resource isolation through Linux containers.
A framework consists of two parts:
– is responsible for scheduling jobs/tasks
– it receives resource offers (memory, cpu, disk) from the master, which are available in the cluster, and it will use them to launch tasks
– is responsible to execute the tasks the scheduler wants to launch on the slave
– it is launched by the slave when a task is launched by the scheduler on the slave
Initial motivation of the project
Sometimes a Mesos cluster has hundreds or thousands of slave machines. From time to time, some of them need maintenance (eg. upgrade the system). Up until now, when an operator wanted to do some maintenance work on a slave, he manually connected on that slave and killed the slave process. This approach will however, leave all tasks running on the machine. So a mechanism is needed that will drain the slave (kill all the tasks and the slave daemon).
Here comes in hand the first part of my project which consists of two parts:
- the ability to kill all the tasks that were running within the slave and after that shutdown the slave daemon, on demand.
- the problem with (1) is that the master will wait up to the health checking delay (~75 seconds) to notify the framework that the tasks were lost. This is why, before shutting down, the slave will send an unregistration message to the master.
Basically the two items above represent the ‘drain now’ mechanism, because the slave will be drained immediately, when the signal is triggered.
The problem with the ‘drain now’ mechanism is that it basically kills all the jobs that were running on the slave which sometimes is not desired. This is because some of them are long term jobs which take a lot of time and resources to be rescheduled and get to the point where they had been before the slave was shutted down. Therefore the followings will also be implemented:
- the ability to deactivate slaves. This means that the master will no longer send resource offers belonging to that slave. As a result, frameworks can no longer launch new tasks on the slave. The tasks that had been launched before the slave was deactivated will continue running.
- the ability to ‘inverse offers’ from frameworks, which means that the master is requesting the frameworks to return the resources they were using, in a particular amount of time; if the resources are not returned in time, they will be forcible revoked.
So this is the description of my project. Please stay tuned for updates.
Thanks for reading and have a great day 🙂 !