How to start spark cluster on Kubernetes on Digital Ocean step-by-step
How to start a spark cluster on Kubernetes on Digital Ocean step-by-step. On the web I found the following video that explains how to set up Kubernetes cluster (including kubectl) on Digital Ocean: https://www.youtube.com/watch?v=_waZw9jiyhQ.
Starting from there i.e. assuming that I have setup my kubectl on my local machine I am interested in the following questions:
1) How to prepare a base docker spark image for the Kubernetes on Digital Ocean.
2) How to tell Kubernetes which image to use for Spark?
3) Has this image be deployed somehow to Digital Ocean? If yes then how?
4) What is the best was to transfer a large csv file (~100GB) to Digital Ocean Cloud and enable my spark jobs to access and read that file.
5) Can I parquet data format on Digital Ocean? If yes how one can store it?
3) How to save the result of the spark jobs on the Digital Ocean Cloud? As the simplies option I would prefer the results to be written into a persistent data storage which is not deleted after the Kubernetes cluster has finished the jobs and ceased to exists.
These answers are provided by our Community. If you find them useful, show some love by clicking the heart. If you run into issues leave a comment, or add your own answer to help others.×