HarperDB, a globally distributed data and application platform, is now available on the DigitalOcean Marketplace, giving DigitalOcean users a fast way to bootstrap HarperDB.
HarperDB is unique because it combines a high-performance database, user-built custom applications, and real-time data streaming into a single platform. The technology was built with a focus on performance and ease of use. With flexible user-defined APIs, simple HTTP/S interface, and a high-performance single-model data store that accommodates both NoSQL and SQL workloads, HarperDB scales with your application from proof of concept to production.
HarperDB’s replication engine replicates data between instances of HarperDB using a highly performant, bi-directional pub/sub model on a per-table basis. This can be across an unlimited number of nodes, ultimately enabling limitless global scale. HarperDB also blows competition out of the water when it comes to performance, with upwards of 20K writes/second/node and 120K reads/second/node.
In this tutorial, we’ll go through setting up HarperDB on multiple regions via DigitalOcean Droplets and demonstrate creating a simple API layer on top with Custom Functions.
After logging into DigitalOcean, navigate to the marketplace and click “Create HarperDB Droplet”:
We will create two instances: one in New York and one in San Francisco. Choose the Droplet Type (I’ll be using basic tier for demo purposes) and attach a volume. You can choose the automatic format and mount option unless you want to manage LVM Configuration yourself.
Wait for the two Droplets to be created in each region and note the IP addresses. Also, we will need to enable Firewalls for HarperDB Studio to interact with our instances.
Navigate to the Networking > Firewall section and open up ports 9925-9926 and 9932:
Finally, ssh into the Droplets and inspect the contents of ~/.harperdb
for credentials that were automatically created. You’ll need these details for the next section.
HarperDB Studio is an online portal for managing HarperDB instances. HarperDB has a super powerful free tier, with the option to upgrade to a paid model as needed. See pricing info here.
Navigate to studio.harperdb.io and click on “Create New HarperDB Cloud Instance” and choose “Enterprise” option. Then fill out the instance information accordingly:
Username/Password is from the .harperdb
file mentioned above and the host is the IP address of your Droplet. It is important to click the SSL button as the marketplace installation has SSL enabled.
Once you click Instance Details, it may warn you about accepting self-signed certificates. Click on the link in your browser and accept those certificates. After a few seconds, you’ll see your instance show up on the dashboard.
Now that our databases are set up, we can create schemas and tables. Let’s create a schema called dev
and table called dog
with hash attribute id
:
We can then use curl commands to add some data:
curl -k --location 'https://<my-ip>:9925' \
--header 'Content-Type: application/json' \
--header 'Authorization: <YourBase64EncodedInstanceUser:Pass>' \
--data-raw '{
"operation": "insert",
"schema": "dev",
"table": "dog",
"records": [
{
"dog_name": "Charlie",
"age": 2
}
]
}'
One of the nice features of HarperDB is that they have first-class replication support built into the database. Navigate to the replication
tab on HarperDB studio and create a cluster user to enable clustering.
Do this for both Droplets and wait for the database to restart. Then you can add each instance under clustering tab with publish/subscribe capabilities:
HarperDB achieves replication in an asynchronous pub/sub model. In our example, we’ll want to set up a multi-region application that can fetch data, so enable both publish/subscribe capabilities.
You can try adding another data point, and you should see data show up under the browse tab for both instances.
HarperDB allows users to define a light API layer via a feature called Custom Functions. Custom Functions combine serverless functions with the underlying database, collapsing the stack into a single solution with the ability to define custom API endpoints that have direct access to HarperDB core operations. HarperDB’s serverless Custom Functions, powered by Fastify, are just like AWS Lambda functions or Stored Procedures. Functions are low maintenance and easy to develop; define logic and choose when to execute.
We’ll use Custom Functions to deploy a simple function that fetches data along with region information.
To do so, navigate to /opt/hdb/custom_functions
directory on HarperDB droplets. Then create a new directory called “digitalocean”. Here we’ll initialize npm project:
npm init -y
npm install @fastify/env
We are utilizing the @fastify/env library to load environment variables where we will encode region information.
Create a index.js
under a new routes
directory:
mkdir routes
touch routes/index.js
Then paste the following code:
import fastifyEnv from '@fastify/env'
import { fileURLToPath } from 'url'
import path from 'path'
const schema = {
type: 'object',
required: ['DO_REGION'],
properties: {
DO_REGION: {
type: 'string'
},
}
}
const __filename = fileURLToPath(import.meta.url)
const __dirname = path.dirname(__filename)
const options = {
schema,
dotenv: {
path: `${__dirname}/.env`,
debug: true
},
data: process.env
}
export default async (server, { hdbCore, logger }) => {
await server.register(fastifyEnv, options);
server.route({
url: '/',
method: 'GET',
handler: async () => {
const body = {
operation: 'sql',
sql: 'SELECT * FROM dev.dog ORDER BY dog_name',
};
const results = await hdbCore.requestWithoutAuthentication({ body });
const response = {
region: process.env.DO_REGION,
results
}
return response
},
});
};
Then create a .env
file and add in DO_REGION
information like:
DO_REGION=new-york
Finally, we’re ready to test out our application that fetches all records from our dev.dog
table ordered by dog_name attribute.
By default, Custom Function endpoint is <ip-address>:9926/<name-of-function>. So in our case, it would be <ip-address>:9926/digitalocean.
Choose a Droplet IP that’s close to you. Once we curl that endpoint, we get back:
{
"region": "new-york",
"results": [
{
"age": 2,
"dog_name": "Charlie"
},
{
"age": 4,
"dog_name": "Coco"
}
]
}
Note that in my case, I reached out to the New York region (also note that I added in another record previously).
You can receive the same output from the other region except with the correct region data (e.g. San Francisco).
If you would like to route traffic based on geolocation, you can integrate these endpoints to a global load balancer or use a DNS service that can route traffic accordingly.
In this article, we saw how to set up a multi-region configuration of HarperDB with replication on DigitalOcean. With the new marketplace template, it’s trivial to spin up new instances of HarperDB on demand. We also went over how to enable replication and added a sample Custom Functions to show how an API layer can be added with little overhead.
Thanks for learning with the DigitalOcean Community. Check out our offerings for compute, storage, networking, and managed databases.
This textbox defaults to using Markdown to format your answer.
You can type !ref in this text area to quickly search our full set of tutorials, documentation & marketplace offerings and insert the link!
Sign up for Infrastructure as a Newsletter.
Working on improving health and education, reducing inequality, and spurring economic growth? We'd like to help.
Get paid to write technical tutorials and select a tech-focused charity to receive a matching donation.