By Adrien Payong and Shaoni Mukherjee
Enterprises, cloud providers, and data centers must meet high-performance networking and low-latency data transfer requirements for efficiency.
Traditional networking methods show limitations because they cannot handle the growing data demands of AI/ML workloads.
That’s where Remote Direct Memory Access (RDMA) comes into play. RDMA allows direct data transfers between two computers’ memories without involving the CPU, operating system, or most network stack components.
In this article, we will provide an overview of RDMA technology, including its functionality and operation, main protocols, comparisons with TCP/IP networks, practical applications, and common pitfalls. Systems engineers, cloud architects, and anyone passionate about internet technology will find this guide a valuable resource on how RDMA can revolutionize network infrastructure.
RDMA enables direct data transfers from one computer’s memory to another without involving the remote system’s CPU or operating system.
Traditional networking systems, like TCP/IP, process data packets by moving them through each layer of the OS networking stack. This involves copying data between buffers at every stage and consuming CPU resources for each packet.
Traditional networking:
The following describes the internal workflow of RDMA data transfer between two systems—Requester (initiator) and Responder (receiver):
Understanding RDMA protocols will allow you to choose the optimal solution for your environment. RDMA protocols offer unique benefits and are suited for specific operational requirements.
InfiniBand protocol delivers high performance for RDMA protocols by offering low latency and high throughput. High-performance computing environments rely heavily on InfiniBand, which powers numerous leading supercomputers worldwide.
Key characteristics of InfiniBand:
RoCE delivers RDMA functionality over standard Ethernet networks. This results in lower costs and improved accessibility when compared to InfiniBand. There are two versions of RoCE:
RoCE provides RDMA benefits for existing Ethernet setups, which creates an attractive solution for data centers wanting to enhance their network performance without replacing their existing infrastructure.
The Internet Wide Area RDMA Protocol (iWARP) provides RDMA functionality over standard Ethernet networks using TCP/IP. The protocol wraps RDMA traffic inside TCP/IP structures, which provides direct memory access capabilities without special hardware modifications.
RDMA differentiates itself from traditional TCP/IP networking through its performance features, making it ideal for high-performance applications that require low latency. The following table contrasts their main attributes:
Characteristic | Traditional TCP/IP | RDMA |
---|---|---|
Latency | Multiple context switches, protocol processing, higher latency (typically tens of microseconds or more) | Direct memory access; ultra-low latency (as low as 1–2 microseconds) |
Throughput | Requires multiple threads and high CPU usage to approach maximum bandwidth | Single-threaded throughput up to 40 Gbps with minimal CPU usage |
CPU Utilization | Significant CPU resources required, especially at high data rates | Near-zero CPU usage for data transfer operations |
Protocol Overhead | High overhead for error correction, congestion control, packet sequencing | Hardware-level management reduces protocol overhead |
Scalability | Bottlenecks emerge as node count increases due to CPU and protocol limitations | Scales efficiently with consistent performance even as nodes increase |
Below are essential hardware components:
RDMA has broad OS support, with Linux leading in high-performance environments:
Because Linux provides strong RDMA support, it is the operating system of choice for high-performance network environments. RDMA functionality is enabled through the rdma-core package and specific kernel modules that establish direct communication with RDMA hardware.
The Linux RDMA stack is built from these essential components:
Implementing RDMA on Linux involves several essential stages.
In the following example, we will show how to establish RDMA communication between two Linux nodes. We will assume both servers (let’s call them nodeA and nodeB) have RDMA-capable network adapters and are running Ubuntu 22.04.
Step 1: Install RDMA Core Libraries and Drivers: We can install the RDMA libraries, drivers, and debugging utilities using our distribution’s package manager. These packages include drivers and tools to configure, manage, and test RDMA. For Ubuntu, this can be done by running:
sudo apt update
sudo apt install rdma-core ibverbs-utils infiniband-diags
Step 2: Enable and Start the RDMA Management Service: Next, enable and start the RDMA management service, which manages RDMA devices and is essential for RDMA operations:
sudo systemctl enable rdma
sudo systemctl start rdma
With these commands, RDMA services will also start automatically upon system reboot.
Step 3: Configure Memory Limits: This configuration will enable users in the RDMA group to lock as much memory as they want, which is required for stability and performance. For this purpose, you must increase the memory limit of users or groups involved in RDMA transfers by editing /etc/security/limits.d/rdma.conf:
@rdma soft memlock unlimited
@rdma hard memlock unlimited
Step 4: Configure RDMA Network Interfaces: Assign IP addresses to the RDMA-capable network interfaces on both nodes. For example, if the name of your InfiniBand or RoCE interface is ib0, run the following commands:
On nodeA:
sudo ip addr add 192.168.100.1/24 dev ib0
sudo ip link set ib0 up
On nodeB:
sudo ip addr add 192.168.100.2/24 dev ib0
sudo ip link set ib0 up
This step prepares the interface for RDMA traffic.
Step 5: Test and Validate RDMA Connectivity: Once the setup is complete, you must ensure RDMA is working as expected.
Check for available RDMA devices:
ibv_devinfo
This command displays information about RDMA devices present on the system.
Test basic connectivity with ping:
ping 192.168.100.2 # From nodeA to nodeB
This ensures there is basic network connectivity between the nodes.
Test RDMA communication with rping:
On nodeB (as server):
rping -s -a 192.168.100.2
On nodeA (as client):
rping -c -a 192.168.100.2
Successful completion confirms the RDMA path is operational and performing as expected.
Microsoft Windows Server 2012 and subsequent versions incorporate RDMA support. Windows uses the NetworkDirect interface for native RDMA support. SMB Direct is a feature of SMB 3.0 that uses RDMA for faster file transfers.
Features:
RDMA support was introduced in VMware vSphere 7.0 and above for virtualization with RoCE v2 (RDMA over Converged Ethernet v2) adapters and NVMe over RDMA storage solutions. This can allow you to use RDMA inside virtualized environments to:
RDMA technology has improved performance, efficiency, and scalability in various fields and use cases. Here are the most common use cases where RDMA can provide tangible value:
Use Case | How RDMA Adds Value | Example Technologies / Scenarios |
---|---|---|
High-Performance Computing (HPC) Clusters | Enables ultra-low-latency message passing and shared storage between nodes. Essential for parallel scientific computing, simulations, and applications where every microsecond counts. | InfiniBand networks for supercomputers; MPI (Message Passing Interface); Storage systems like Lustre and GPFS. |
AI/ML Distributed Training | Enables fast parameter synchronization and data transfer between GPUs across nodes to improve scaling and speed up the training of large deep learning and AI models. | TensorFlow, PyTorch, NVIDIA NCCL, GPUDirect RDMA; RoCE/InfiniBand networks in multi-GPU clusters. |
Cloud Storage & Data Services | Provides high-throughput, low-latency access to remote disks for fast storage services, improved data replication, and better performance for cloud databases and distributed caches. | NVMe over Fabrics (NVMe-oF); Microsoft SMB Direct & Storage Spaces Direct; iSER for block storage, MySQL. |
Private Cloud & Virtualized Data Centers | Provides nearly native network speeds for virtual machines and containers, supporting low-latency applications and enabling scalable, high-performance cloud environments within virtualized setups. | VMware ESXi with PVRDMA or SR-IOV; Azure RDMA-enabled VMs; Low-latency trading |
Big Data Analytics & In-Memory Computing | Accelerates data transfers and inter-process communications in analytics and streaming frameworks, minimizing tail latencies and assisting real-time, large-scale data processing. | Apache Spark, Apache Kafka, Apache Ignite; RDMA-accelerated microservices and market data feeds. |
Caching & Database Systems | Reduces latency for distributed cache operations and database replication, ensuring swift access to in-memory data across cluster nodes—ideal for cloud-native and SaaS solutions. | DigitalOcean Managed Valkey with RDMA support; RDMA-enabled Memcached. |
Below is a table of the most common RDMA mistakes and misconceptions that occur during deployment, with a brief explanation for each one.
Mistake or Misconception | Explanation |
---|---|
Assuming RDMA works on any network card | RDMAs require RDMA-capable NICs (RNICs) on both ends. Attempting RDMA with non-capable hardware or without enabling the RDMA feature (e.g., through the drivers/firmware) will fail. |
Misconfiguring the network for RoCE | When using RoCE, the network switches must be configured for PFC (Priority-based Flow Control) and ECN (Explicit Congestion Notification) to avoid packet drops and achieve optimal performance. |
Confusing RDMA protocols and compatibility | InfiniBand and Ethernet are not directly compatible; RoCE and iWARP are different protocols and not interoperable. Mixing hardware or assuming “RDMA = InfiniBand” can cause compatibility issues. |
Neglecting the need for application support | Enabling RDMA in hardware and network does not speed up every workload. Applications must be RDMA-aware and specifically configured to leverage RDMA transports for benefits. |
Expecting RDMA to solve all performance problems | While RDMA reduces network overhead, bear in mind that it won’t solve issues arising from other bottlenecks like disk I/O, CPU limitations, or software inefficiencies. Conducting thorough profiling is essential to ensure RDMA addresses actual pain points. |
Believing RDMA is unreliable or risky | Modern RDMA technologies such as InfiniBand and RoCE are open standards, enjoy large support, and are highly reliable when configured correctly. Security features and interoperability have seen considerable improvements. |
Ignoring maintenance and tuning | Neglecting essential maintenance tasks like firmware updates, buffer tuning, and network monitoring can lead to performance drops or system errors. Regular checks and monitoring are key to ensure RDMA operates smoothly and efficiently. |
Knowledge of these mistakes can help you get the most out of your RDMA architecture. This way, you can avoid significant troubleshooting down the road.
What is RDMA used for?
RDMA technology is used in high-performance computing, cloud storage, AI/ML clusters, virtualization, and financial systems for ultra-low latency and high-performance data transfers.
How is RDMA different from TCP/IP?
Unlike traditional network protocols like TCP/IP, RDMA bypasses the standard network stack. Instead, it allows direct memory access, which speeds up data transfers between applications and network hardware.
Does RDMA require special hardware?
Yes, to use RDMA, you’ll need network interface cards (NICs) that support RDMA. For some protocols, you might also need switches that support lossless data transmission to maximize performance.
What OS supports RDMA?
RDMA support is available across multiple platforms, including Linux via rdma-core, Windows Server 2012, and later( via NetworkDirect, and various UNIX-based systems. Besides, VMware vSphere offers RDMA functionality for virtualized environments.
Is RDMA only for supercomputers?
No—While RDMA was originally developed for high-performance computing, it’s now widely adopted in cloud services, enterprise data centers, AI and machine learning workloads, and virtualization setups.
Enterprises, cloud providers, and data centers can use RDMA to satisfy the demands of AI, ML, and other data-intensive applications for low latency, high throughput, and CPU-efficient networking.
With RDMA, you can bypass the TCP/IP stack and instead use alternative protocols like InfiniBand or RoCE. This can lead to lower latency, higher throughput, and better CPU efficiency. RDMA can be used with Linux, Windows Server, and VMware platforms with solid hardware and network infrastructure compatibility.
.
If your organization is considering a data center upgrade, it is essential to understand RDMA’s architecture, practical considerations, and use cases.
You can further improve your understanding by exploring the following articles:
Integrating RDMA with these solutions will ensure a scalable, future-ready network foundation for your most demanding workloads.
Thanks for learning with the DigitalOcean Community. Check out our offerings for compute, storage, networking, and managed databases.
I am a skilled AI consultant and technical writer with over four years of experience. I have a master’s degree in AI and have written innovative articles that provide developers and researchers with actionable insights. As a thought leader, I specialize in simplifying complex AI concepts through practical content, positioning myself as a trusted voice in the tech community.
With a strong background in data science and over six years of experience, I am passionate about creating in-depth content on technologies. Currently focused on AI, machine learning, and GPU computing, working on topics ranging from deep learning frameworks to optimizing GPU-based workloads.
This textbox defaults to using Markdown to format your answer.
You can type !ref in this text area to quickly search our full set of tutorials, documentation & marketplace offerings and insert the link!
Get paid to write technical tutorials and select a tech-focused charity to receive a matching donation.
Full documentation for every DigitalOcean product.
The Wave has everything you need to know about building a business, from raising funding to marketing your product.
Stay up to date by signing up for DigitalOcean’s Infrastructure as a Newsletter.
New accounts only. By submitting your email you agree to our Privacy Policy
Scale up as you grow — whether you're running one virtual machine or ten thousand.
Sign up and get $200 in credit for your first 60 days with DigitalOcean.*
*This promotional offer applies to new accounts only.