Question

How to use JuiceFS store data on DigitalOcean

Posted August 30, 2021 116 views
DigitalOcean SpacesDigitalOcean Managed RedisDigitalOcean Droplets

I want to create a filesystem by JuiceFS which combines with DigitalOcean managed Redis and Spaces. Does anyone have an idea?

These answers are provided by our Community. If you find them useful, show some love by clicking the heart. If you run into issues leave a comment, or add your own answer to help others.

×
Submit an Answer
1 answer

When your business system needs to store and share a large amount of unstructured data in a distributed computing environment, you need to consider using the open source JuiceFS storage at this time.

What is JuiceFS

JuiceFS is a cloud-native distributed file system designed specifically for large-scale data storage scenarios. It is released under the AGPLv3. Any enterprise and individual can freely use JuiceFS under the agreement.

JuiceFS Arch

The architecture of JuiceFS is to store all data in the cloud, the data is mainly stored in object storage, and the corresponding metadata is stored independently in the database. In terms of object storage, it supports almost all object storage services. In terms of databases, it supports Redis, TiKV, PostgreSQL, MySQL, MariaDB, etc., and more databases will be supported in the future.

It is not difficult to understand that local storage has many limitations, such as capacity exhaustion, single point of failure, and difficulty in sharing. These problems become particularly obvious when storing large-scale data. JuiceFS does not have these problems at all. First of all, it uses object storage to store all data, breaking the upper limit of capacity, and the storage space is approaching infinite. Secondly, all data and metadata are stored in the cloud, which determines that it is very suitable for being mounted and shared by multiple hosts at the same time. In addition, there is a single point of failure of any host that mounts JuiceFS, which will not affect the stored data and other hosts.

JuiceFS is designed for the cloud. When using the out-of-the-box storage and database services of the cloud platform, the installation can be completed in a few minutes. This article is oriented towards DigitalOcean and introduces how to quickly and easily install and use JuiceFS on the cloud computing platform.

Requirement

JuiceFS is driven by a combination of storage and database, so you need to prepare:

1. Cloud Server

The cloud server on DigitalOcean is called Droplet. You don’t need to purchase a new Droplet separately to use JuiceFS. If you already have a Droplet in use, which cloud server needs JuiceFS storage, just install the JuiceFS client on it.

Hardware

JuiceFS has no special requirements for hardware, and Droplets of any specification can be used stably. However, it is recommended to choose a better-performing SSD and reserve at least 1GB of capacity for JuiceFS as a local cache.

Operating System

JuiceFS supports Linux, BSD, macOS, and Windows. In this article, we use Ubuntu Server 20.04.

2. Object Storage

JuiceFS uses object storage to store all data. Using Spaces on DigitalOcean is the easiest solution. Spaces is an S3-compatible object storage service that works out of the box. It is recommended to select the same area as the Droplet when creating it so that you can get the best access speed and avoid additional traffic charges.

Of course, you can also use object storage services on other platforms, or use Ceph or MinIO to build manually on Droplet. In short, you are free to choose the object storage you want to use, as long as you make sure that the JuiceFS client can access the object storage API.

Here, I created a Space named juicefs, the region is Singapore sgp1, and its access address is:

In addition, you need to create Spaces access keys in the API menu, and JuiceFS needs to use it to access the Spaces API.

3. Database

Unlike the local file system, JuiceFS stores all the metadata corresponding to the data in an independent database, so that the larger the size of the stored data, the better the performance.

Currently, JuiceFS supports common databases such as Redis, TiKV, MySQL/MariaDB, PostgreSQL, and SQLite, and it is also continuing to develop support for other databases. If the database you need is not yet supported, please submit an Issue for feedback.

In terms of performance, scale, and reliability, each database has its own advantages and disadvantages, and you should choose according to actual scenarios.

Please don’t worry about the choice of database. The JuiceFS client supports metadata migration. You can easily export metadata from one database and migrate it to other databases.

In this article, we use DigitalOcean’s Redis 6 database managed service, select the region Singapore, and select the same VPC private network as the existing Droplet. It takes about 5 minutes to create a Redis cluster. We follow the setup wizard to initialize the database cluster.

By default, the Redis cluster allows all inbound connections. For security reasons, you should select the Droplet that has access to the Redis cluster in the security setting section of the setup wizard in the Add trusted sources, that is, only allow the selected host to access the Redis cluster.

In the setting of the eviction policy, it is recommended to select noeviction, that is, when the memory is exhausted, only errors are reported and no data is evictioned.

Note: In order to ensure the safety and integrity of metadata, please do not select allkeys-lru and allkey-random for the eviction policy.

The access address of the Redis cluster can be found in the Connection Details of the console. If all computing resources are in DigitalOcean, it is recommended to use the VPC private network for connection first, which can maximize security.

Installation and Use

1. Install JuiceFS client

I am currently using Ubuntu Server 20.04, execute the following commands in sequence to install the latest version of the client.

Check current system and set temporary environment variables:

$ JFS_LATEST_TAG=$(curl -s https://api.github.com/repos/juicedata/juicefs/releases/latest | grep 'tag_name' | cut -d '"' -f 4 | tr -d 'v')

Download the latest version of the client software package adapted to the current system:

$ wget "https://github.com/juicedata/juicefs/releases/download/v${JFS_LATEST_TAG}/juicefs-${JFS_LATEST_TAG}-linux-amd64.tar.gz"

Unzip the installation package:

$ mkdir juice && tar -zxvf "juicefs-${JFS_LATEST_TAG}-linux-amd64.tar.gz" -C juice

Install the client to /usr/local/bin:

$ sudo install juice/juicefs /usr/local/bin

Execute the command and see the command help information returned to juicefs, which means that the client is installed successfully.

$ juicefs

NAME:
   juicefs - A POSIX file system built on Redis and object storage.

USAGE:
   juicefs [global options] command [command options] [arguments...]

VERSION:
   0.16.2 (2021-08-25T04:01:15Z 29d6fee)

COMMANDS:
   format   format a volume
   mount    mount a volume
   umount   unmount a volume
   gateway  S3-compatible gateway
   sync     sync between two storage
   rmr      remove directories recursively
   info     show internal information for paths or inodes
   bench    run benchmark to read/write/stat big/small files
   gc       collect any leaked objects
   fsck     Check consistency of file system
   profile  analyze access log
   stats    show runtime stats
   status   show status of JuiceFS
   warmup   build cache for target directories/files
   dump     dump metadata into a JSON file
   load     load metadata from a previously dumped JSON file
   help, h  Shows a list of commands or help for one command

GLOBAL OPTIONS:
   --verbose, --debug, -v  enable debug log (default: false)
   --quiet, -q             only warning and errors (default: false)
   --trace                 enable trace log (default: false)
   --no-agent              Disable pprof (:6060) and gops (:6070) agent (default: false)
   --help, -h              show help (default: false)
   --version, -V           print only the version (default: false)

COPYRIGHT:
   AGPLv3

In addition, you can also visit the JuiceFS GitHub Releases page to select other versions for manual installation.

2. Create a file system

To create a file system, use the format subcommand, the format is:

juicefs format [command options] META-URL NAME

The following command creates a file system named mystor:

juicefs format \
--storage space \
--bucket https://juicefs.sgp1.digitaloceanspaces.com \
--access-key <your-access-key-id> \
--secret-key <your-access-key-secret> \
rediss://default:your-password@private-db-redis-sgp1-03138-do-user-2500071-0.b.db.ondigitalocean.com:25061/1 \
mystor

Parameter Description:

  • --storage: Specify the data storage engine, here is space, click here to view all supported storage.
  • --bucket: Specify the bucket access address.
  • --access-key and --secret-key: Specify the secret key for accessing the object storage API.
  • The Redis cluster managed by DigitalOcean needs to be accessed with TLS/SSL encryption, so it needs to use the rediss:// protocol header. The /1 added at the end of the link represents the use of Redis’s No. 1 database.

If you see output similar to the following, it means that the file system is created successfully.

2021/08/23 16:36:28.450686 juicefs[2869028] <INFO>: Meta address: rediss://default@private-db-redis-sgp1-03138-do-user-2500071-0.b.db.ondigitalocean.com:25061/1
2021/08/23 16:36:28.481251 juicefs[2869028] <WARNING>: AOF is not enabled, you may lose data if Redis is not shutdown properly.
2021/08/23 16:36:28.481763 juicefs[2869028] <INFO>: Ping redis: 331.706µs
2021/08/23 16:36:28.482266 juicefs[2869028] <INFO>: Data uses space://juicefs/mystor/
2021/08/23 16:36:28.534677 juicefs[2869028] <INFO>: Volume is formatted as {Name:mystor UUID:6b0452fc-0502-404c-b163-c9ab577ec766 Storage:space Bucket:https://juicefs.sgp1.digitaloceanspaces.com AccessKey:7G7WQBY2QUCBQC5H2DGK SecretKey:removed BlockSize:4096 Compression:none Shards:0 Partitions:0 Capacity:0 Inodes:0 EncryptKey:}

3. Mount a file system

To mount a file system, use the mount subcommand, and use the -d parameter to mount it as a daemon. The following command mounts the newly created file system to the mnt directory under the current directory:

$ sudo juicefs mount -d \
rediss://default:your-password@private-db-redis-sgp1-03138-do-user-2500071-0.b.db.ondigitalocean.com:25061/1 mnt

The purpose of using sudo to perform the mount operation is to allow juicefs to have the authority to create a cache directory under /var/. Please note that when mounting the file system, you only need to specify the database address and the mount point, not the name of the file system.

If you see an output similar to the following, it means that the file system is mounted successfully.

2021/08/23 16:39:14.202151 juicefs[2869081] <INFO>: Meta address: rediss://default@private-db-redis-sgp1-03138-do-user-2500071-0.b.db.ondigitalocean.com:25061/1
2021/08/23 16:39:14.234925 juicefs[2869081] <WARNING>: AOF is not enabled, you may lose data if Redis is not shutdown properly.
2021/08/23 16:39:14.235536 juicefs[2869081] <INFO>: Ping redis: 446.247µs
2021/08/23 16:39:14.236231 juicefs[2869081] <INFO>: Data use space://juicefs/mystor/
2021/08/23 16:39:14.236540 juicefs[2869081] <INFO>: Disk cache (/var/jfsCache/6b0452fc-0502-404c-b163-c9ab577ec766/): capacity (1024 MB), free ratio (10%), max pending pages (15)
2021/08/23 16:39:14.738416 juicefs[2869081] <INFO>: OK, mystor is ready at mnt

Use the df command to see the mounting status of the file system:

$ df -Th
File system    type             capacity used usable used%  mount point
JuiceFS:mystor fuse.juicefs       1.0P   64K  1.0P   1%     /home/herald/mnt

As you can see from the output information of the mount command, JuiceFS defaults to sets 1024 MB as the local cache. Setting a larger cache can make JuiceFS have better performance. You can set the cache (in MiB) through the --cache-size option when mounting a file system. For example, set a 20GB local cache:

$ sudo juicefs mount -d --cache-size 20000 \
rediss://default:your-password@private-db-redis-sgp1-03138-do-user-2500071-0.b.db.ondigitalocean.com:25061/1 mnt

After the file system is mounted, you can store data in the ~/mnt directory just like using a local hard disk.

4. File system status

Use the status subcommand to view the basic information and connection status of a file system. You only need to specify the database URL.

$ juicefs status rediss://default:bn8l7ui2cun4iaji@private-db-redis-sgp1-03138-do-user-2500071-0.b.db.ondigitalocean.com:25061/1
2021/08/23 16:48:48.567046 juicefs[2869156] <INFO>: Meta address: rediss://default@private-db-redis-sgp1-03138-do-user-2500071-0.b.db.ondigitalocean.com:25061/1
2021/08/23 16:48:48.597513 juicefs[2869156] <WARNING>: AOF is not enabled, you may lose data if Redis is not shutdown properly.
2021/08/23 16:48:48.598193 juicefs[2869156] <INFO>: Ping redis: 491.003µs
{
  "Setting": {
    "Name": "mystor",
    "UUID": "6b0452fc-0502-404c-b163-c9ab577ec766",
    "Storage": "space",
    "Bucket": "https://juicefs.sgp1.digitaloceanspaces.com",
    "AccessKey": "7G7WQBY2QUCBQC5H2DGK",
    "SecretKey": "removed",
    "BlockSize": 4096,
    "Compression": "none",
    "Shards": 0,
    "Partitions": 0,
    "Capacity": 0,
    "Inodes": 0
  },
  "Sessions": [
    {
      "Sid": 1,
      "Heartbeat": "2021-08-23T16:46:14+08:00",
      "Version": "0.16.2 (2021-08-25T04:01:15Z 29d6fee)",
      "Hostname": "ubuntu-s-1vcpu-1gb-sgp1-01",
      "MountPoint": "/home/herald/mnt",
      "ProcessID": 2869091
    },
    {
      "Sid": 2,
      "Heartbeat": "2021-08-23T16:47:59+08:00",
      "Version": "0.16.2 (2021-08-25T04:01:15Z 29d6fee)",
      "Hostname": "ubuntu-s-1vcpu-1gb-sgp1-01",
      "MountPoint": "/home/herald/mnt",
      "ProcessID": 2869146
    }
  ]
}

5. Unmount a file system

Use the umount subcommand to unmount a file system, for example:

$ sudo juicefs umount ~/mnt

Note: Force unmount the file system in use may cause data damage or loss, please be careful to operate.

6. Auto-mount at boot

If you don’t want to manually remount JuiceFS every time you restart the system, you can set up automatic mounting.

First, you need to rename the juicefs client to mount.juicefs and copy it to the /sbin/ directory:

$ sudo cp /usr/local/bin/juicefs /sbin/mount.juicefs

Edit the /etc/fstab configuration file and add a new record:

rediss://default:bn8l7ui2cun4iaji@private-db-redis-sgp1-03138-do-user-2500071-0.b.db.ondigitalocean.com:25061/1    /home/herald/mnt       juicefs     _netdev,cache-size=20480     0  0

In the mount option, cache-size=20480 means to allocate 20GiB of local disk space as the local cache of JuiceFS. Please decide the allocated cache size according to the actual hardware. You can adjust the FUSE mount options in the above configuration according to your needs.

7. Multi-host shared

The JuiceFS file system supports being mounted by multiple cloud servers at the same time, and there is no requirement for the geographic location of the cloud server. It can easily realize the real-time data of servers between the same platform, between cross-cloud platforms, and between public and private clouds. shared.

Not only that, but the shared mount of JuiceFS can also provide a strong data consistency guarantee. When multiple servers mount the same file system, the writes confirmed on the file system will be visible in real-time on all hosts.

To use the shared mount, it is important to ensure that the database and object storage services that make up the file system can be accessed by each host to mount it. In the demonstration environment of this article, the Spaces object storage is open to the entire Internet, and it can be read and written through the API as long as the correct access key is used. But for the Redis database cluster managed by DigitalOcean, you need to configure the access strategy reasonably to ensure that the hosts outside the platform have access permissions.

When you mount the same file system on multiple hosts, first create a file system on any host, then install the JuiceFS client on every host, and use the same database address to mount it with the mount command. Pay special attention to the fact that the file system only needs to be created once, and there should be no need to repeat file system creation operations on other hosts.

Summary

This article introduces the basics of installing and using JuiceFS on DigitalOcean, using Spaces object storage, and the platform-managed Redis database cluster to create and mount a file system.

If you are interested, you can also try to create file systems using object storage and cloud databases on different platforms. In addition, if you are worried about the reliability of Redis, you can also try databases such as MySQL, TiKV, and PostgreSQL. Different databases will give you completely different experiences.