Cloud education

How Hachyderm leveraged DigitalOcean Spaces to scale their Mastodon community

authorAdo Kukic

 Director, Developer Advocacy

Posted: December 14, 20225 min read

What is the Mastodon social network?

Mastodon is a free, open-source, self-hosted social network. Mastodon was launched in 2016, but has had a massive surge in popularity in the last few months as users explore alternatives to Twitter and traditional social media sites. What makes Mastodon different from a traditional social network like Twitter or Reddit is that it is decentralized and distributed, but also federated, meaning no single organization or user owns it or controls it. Users of the platform can join one server, but also interact with users of other Mastodon instances or servers. Anybody can set up and self-host a Mastodon server. DigitalOcean even has a 1-click Marketplace Mastodon image to get you up and running with a server in no time.

fediverse Image of the Fediverse from Axbom

Mastodon is part of the Fediverse, a series of interconnected applications built on top of ActivityPub, a decentralized social networking protocol. From a technology perspective, Mastodon is built as a Ruby on Rails backend, JavaScript frontend, and PostgreSQL as the primary database. Additionally, the Sidekiq Job Queue manages many background jobs such as upload processing and federation. The Mastodon documentation goes into much greater detail on how all these processes come together and how to set them up yourself.

Mastodon growth presents scalability challenges

In the past few months, Mastodon server communities have grown at an accelerated rate. Eugen Rochko, founder of Mastodon, shared a post on November 6th, showing the growth of Mastodon communities and users:

“Hey, so, we’ve hit 1,028,362 monthly active users across the network today. 1,124 new Mastodon servers since Oct 27, and 489,003 new users. That’s pretty cool.”

At DigitalOcean, we have also seen a six-fold increase of Mastodon active instances from October 2022 to December 2022. This sudden explosion in growth of users adopting the platform has presented challenges to Mastodon server operators. One such example is the Hachyderm.io Mastodon community. Hachyderm is a Mastodon server that aims to build a curated network of respectful professionals in the tech industry around the globe. It is a community composed of developers, hackers, industry professionals, and enthusiasts.

In a recent blog post, Kris Nóva, founder of Hachyderm, detailed the growth of her Mastodon instance from 720 users on November 3rd, to over 25,000 users as of November 25th, and it continues to grow. The server was getting roughly one new user every 90 seconds for the duration of the month of November. This influx of users, who joined to share their thoughts, insights, and memes, presented a number of scalability challenges and eventually led to intermittent downtime as the current infrastructure could not handle all the new activity on the server.

One possible cause of the slowdown that Nóva and her team identified with was their use of ZFS to house both the media storage as well as the Postgres database that held references to all of this data on a local server. Nóva explained it as “…the more we could correlate slow disks to slow database responses, and slow media storage. Eventually our compute servers and web servers would max out our connection pool against the database and timeout. Eventually our web servers would overload the media server and timeout.”

Getting media off of the local disk was a top priority to ensure stability of the platform. Nóva connected with DigitalOcean’s Chief Product Officer Gabe Monroy, and after explaining the challenges and potential solutions, chose DigitalOcean Spaces Object Storage, a highly scalable cloud storage service, to store Hachyderm’s media. There was a major concern though. Hachyderm was already running in production and had close to 1.4TB of data that needed to be migrated and taking the server down for a prolonged period of time was not an option. The solution?

The brilliant solution came from a Hachyderm infrastructure volunteer, Malte Janduda. The technical solution was NGINX try_files. Kris again wonderfully explained the solution Malte had suggested:

  1. We begin writing data that is cached in our edge nodes directly to the DigitalOcean Spaces object store instead of the local filesystem.

  2. As users access data, we can ensure that it will be taken off of our local server and delivered to the user.

  3. We can then leverage Mastodon’s S3 feature to write the “hot” data directly back to DigitalOcean Spaces using a reverse Nginx proxy.

This meant that when a user requested a particular asset, it would only be served from the local filesystem once, because as soon as it hit the edge node it would be written to DigitalOcean Spaces. Additionally, the more users accessed Hachyderm, the faster the data would be replicated to DigitalOcean Spaces. This had a side effect of the most accessed data being migrated first.

The remaining data would be migrated from the local filesystem to DigitalOcean Spaces using Rclone. This would be a slow running process in the background that would take a couple of days to migrate all of the data over. This was an excellent real-world example of how a distributed system enabled better scalability. The more users accessed Hachyderm, the faster the migration would complete.

Currently, Nóva and her team are still in the process of migrating all of the media over to DigitalOcean Spaces and are actively working on migrating the rest of the infrastructure to the cloud. Follow the journey on their community hub.

DigitalOcean is the home for your Mastodon server

DigitalOcean provides a simple way to host a Mastodon server–the DigitalOcean Marketplace offers a 1-click image to deploy your own Mastodon server, and we recently updated this image to support the latest and greatest version of Mastodon. Additionally, you can set up DigitalOcean Spaces with this image from the get-go and get all the benefits of offloading media assets to a dedicated object storage solution.

We are also working on documentation on how you can leverage DigitalOcean Managed PostgreSQL as the database for your Mastodon server as well as a scalability guide to hosting your Mastodon community as it grows into the thousands, tens of thousands, and hopefully millions of users. Stay tuned for much more to come!

Share

Try DigitalOcean for free

Click below to sign up and get $200 of credit to try our products over 60 days!Sign up

Related Articles

Accelerate your business with DigitalOcean App Platform
cloud-education

Accelerate your business with DigitalOcean App Platform

April 1, 20243 min read

Access the New Cloud Buying Criteria Proposed by IDC for 2024
cloud-education

Access the New Cloud Buying Criteria Proposed by IDC for 2024

March 27, 20243 min read

How ISVs and startups scale on DigitalOcean Kubernetes: Best Practices Part II - Observability
cloud-education

How ISVs and startups scale on DigitalOcean Kubernetes: Best Practices Part II - Observability

March 21, 20243 min read