Question

How to access files on DO Spaces via Pyspark

I struggle to connect pyspark to DO Spaces in order to load data. Any help is welcomed. This is my current code inspired by the ASW S3 connection, unfortunately not working.

sc = SparkContext('local')
spark = SparkSession(sc)

hadoop_conf=sc._jsc.hadoopConfiguration()
hadoop_conf.set("fs.s3n.impl", "org.apache.hadoop.fs.s3native.NativeS3FileSystem")
hadoop_conf.set("fs.s3n.awsAccessKeyId", DOAccessKey)
hadoop_conf.set("fs.s3n.awsSecretAccessKey", DOSecretKey)

df=spark.read.json("https://...../sampleData/file.json")
df.show()


Submit an answer

This textbox defaults to using Markdown to format your answer.

You can type !ref in this text area to quickly search our full set of tutorials, documentation & marketplace offerings and insert the link!

Sign In or Sign Up to Answer