Question

How to access files on DO Spaces via Pyspark

I struggle to connect pyspark to DO Spaces in order to load data. Any help is welcomed. This is my current code inspired by the ASW S3 connection, unfortunately not working.

sc = SparkContext('local')
spark = SparkSession(sc)

hadoop_conf=sc._jsc.hadoopConfiguration()
hadoop_conf.set("fs.s3n.impl", "org.apache.hadoop.fs.s3native.NativeS3FileSystem")
hadoop_conf.set("fs.s3n.awsAccessKeyId", DOAccessKey)
hadoop_conf.set("fs.s3n.awsSecretAccessKey", DOSecretKey)

df=spark.read.json("https://...../sampleData/file.json")
df.show()


Submit an answer


This textbox defaults to using Markdown to format your answer.

You can type !ref in this text area to quickly search our full set of tutorials, documentation & marketplace offerings and insert the link!

Sign In or Sign Up to Answer

Try DigitalOcean for free

Click below to sign up and get $200 of credit to try our products over 60 days!

Sign up