Report this

What is the reason for this report?

What's the best way to know when it's required adding a new machine to a Spark - Hadoop cluster ? How to monitor the data load in this case?

Posted on July 16, 2018

I’m automating the process of deploying a new cluster of Spark-YARN-HDFS, and I got it done in Ansible, but now I’m looking for automating the process of monitoring a cluster, so I need to automate the process of adding a new machine to the cluster when it’s needed < When the CPU usage get over the available in the cluster > or more space is needed, or take out a machine of the cluster, so there’s no need to use it anymore.

Scenario Example

1 ) A client request a cluster with initial size of 10 machines of 8GB of RAM and 40 GB of DISK.

2 ) Then we detect that the cluster is receiving more requests than expected, so we need to add a new machine automatically. So, How to detect this situation ?

Thanks



This textbox defaults to using Markdown to format your answer.

You can type !ref in this text area to quickly search our full set of tutorials, documentation & marketplace offerings and insert the link!

These answers are provided by our Community. If you find them useful, show some love by clicking the heart. If you run into issues leave a comment, or add your own answer to help others.

Hi there,

Here are a few strategies you can employ:

  1. Resource Monitoring Tools: Tools like Ganglia, Prometheus, or Datadog can be used to monitor the CPU, memory, disk usage, and network traffic of your cluster. When these metrics approach a certain threshold, it could indicate the need to add more nodes.

  2. Spark and Hadoop Metrics: Both Spark and Hadoop expose a number of metrics that can be useful for monitoring the performance of your cluster. These include metrics like the number of active tasks, the data read/write rate, and the task execution time. A sudden increase in these metrics could indicate the need to add more nodes.

  3. YARN Resource Manager UI: YARN’s Resource Manager UI provides a view of the cluster resources and application details. If you see that resource allocation is consistently high, it might be time to add more nodes.

  4. HDFS Disk Usage: HDFS also provides metrics on disk usage. If the disk usage is consistently high, it might be time to add more nodes.

For more information on how to get started with Terraform and DigitalOcean I would recommend this tutorial here:

https://www.digitalocean.com/community/tutorials/how-to-use-terraform-with-digitalocean

Then you can also use Ansible to do the configuration management:

https://www.digitalocean.com/community/tutorials/how-to-use-ansible-to-automate-initial-server-setup-on-ubuntu-18-04

Best,

Bobby

The developer cloud

Scale up as you grow — whether you're running one virtual machine or ten thousand.

Get started for free

Sign up and get $200 in credit for your first 60 days with DigitalOcean.*

*This promotional offer applies to new accounts only.