What's the best way to know when it's required adding a new machine to a Spark - Hadoop cluster ? How to monitor the data load in this case?

July 16, 2018 155 views
Apache Debian

I'm automating the process of deploying a new cluster of Spark-YARN-HDFS, and I got it done in Ansible, but now I'm looking for automating the process of monitoring a cluster, so I need to automate the process of adding a new machine to the cluster when it's needed < When the CPU usage get over the available in the cluster > or more space is needed, or take out a machine of the cluster, so there's no need to use it anymore.

Scenario Example

1 ) A client request a cluster with initial size of 10 machines of 8GB of RAM and 40 GB of DISK.

2 ) Then we detect that the cluster is receiving more requests than expected, so we need to add a new machine automatically. So, How to detect this situation ?

Thanks

Be the first one to answer this question.