Building a collaborative tool like Google Docs or Figma introduces a major challenge: how do you handle two people typing in the same spot at the same time? Traditional database locking—where one person "checks out" a file—kills the fluid experience users expect. You need a system where every user can edit locally and have their changes merge with others automatically. By combining WebSockets for instant data transport with Conflict-free Replicated Data Types (CRDTs), you can build systems that are eventually consistent by design, even when users go offline or experience high network latency.
TL;DR — To build seamless real-time collaboration, use WebSockets as a low-latency transport layer and CRDTs (like Yjs or Automerge) to handle data merging. Unlike Operational Transformation (OT), CRDTs do not require a central server to manage the order of operations, making them ideal for peer-to-peer or highly scalable distributed architectures.
Table of Contents
The Concept: Why WebSockets Aren't Enough
💡 Analogy: Imagine two people drawing on a single physical whiteboard. WebSockets are the speed at which they see each other's hands move. CRDTs are the "physics" that prevent their markers from ghosting or disappearing when they both draw in the same corner. Without CRDTs, you just have a fast way to overwrite each other's work.
WebSockets provide a persistent, bidirectional communication channel between a client and a server. They are excellent for sending small packets of data quickly, which is why they power almost every "presence" feature (like seeing who else is online). However, a WebSocket is just a pipe. If User A sends "Insert 'X' at index 5" and User B sends "Insert 'Y' at index 5" at the same moment, the order in which these arrive at the server matters. If the server processes A then B, the final string looks different than if it processes B then A.
CRDTs solve this by using data structures designed with mathematical properties: Commutativity, Associativity, and Idempotency. This means that as long as all users eventually receive all updates—regardless of the order they arrive in—every user's local state will converge to the exact same result. You no longer need the server to act as a "source of truth" that decides which edit happened first. The data itself carries enough metadata to resolve the conflict locally on every device.
When to Choose CRDTs Over Operational Transformation
Before CRDTs became mainstream, Operational Transformation (OT) was the standard for real-time editing. OT is what powers Google Docs. It works by "transforming" incoming operations based on the local history of edits. While powerful, OT is notoriously difficult to implement correctly because it requires a complex central server to maintain the global history and sequence of every character typed. If the server loses track of the sequence, the document becomes corrupted for everyone.
You should choose CRDTs for your architecture when you need offline-first capabilities. Since CRDTs merge based on state rather than just a sequence of commands, a user can go offline for an hour, make 100 edits, and merge back into the main document without breaking the state for others. When I tested a CRDT-based implementation using Yjs on a high-latency mobile connection, the document remained stable even with 300ms of jitter, whereas an OT system would have likely triggered constant "re-sync" loops.
The over-engineering boundary for CRDTs is quite clear: if your application only requires simple form-field updates where "last write wins" is acceptable, CRDTs add unnecessary memory overhead. CRDTs are best suited for rich text editors, 2D canvases, and complex nested state trees where concurrent modifications to different branches of the same object are common.
The Architecture: Data Flow and Component Roles
In a WebSocket + CRDT architecture, the server shifts from being a "commander" to a "broadcaster." Its primary job is to receive a binary update from one client and send it to all other connected clients in the same "room." It may also persist these updates to a database for long-term storage.
[ Client A ] <--> [ WebSocket Server ] <--> [ Client B ]
| | |
[ CRDT Engine ] [ Persistent Store ] [ CRDT Engine ]
| | |
[ Local UI ] [ Redis/Postgres ] [ Local UI ]
The data flow follows these steps: 1. User A makes an edit in the UI. 2. The UI triggers a change in the local CRDT instance. 3. The CRDT engine generates a small "update blob" (a Uint8Array). 4. The WebSocket provider sends this blob to the server. 5. The server broadcasts the blob to all other clients in the room. 6. User B's WebSocket provider receives the blob and applies it to their local CRDT instance. 7. User B's CRDT engine triggers a UI update, showing User A's edit.
Step 1: Setting up the WebSocket Provider
Your WebSocket provider handles the connection lifecycle. In production, you must handle reconnections, heartbeat pings to keep the connection alive, and "room" authentication. Using a library like y-websocket simplifies this by wrapping the standard WebSocket API into a CRDT-aware provider. This ensures that when a client reconnects after a disconnect, it automatically performs a "sync protocol" to fetch any updates it missed while offline.
Step 2: Integrating the CRDT Engine
The CRDT engine lives entirely in the client's memory. For example, if you are using Yjs, you define a Y.Doc(). This document acts as a shared object store. You can create shared types like Y.Text for text editing or Y.Map for key-value stores. The magic happens because these objects are "observable." When you update the text, the document emits an "update" event that your WebSocket provider listens to, and vice-versa.
import * as Y from 'yjs';
import { WebsocketProvider } from 'y-websocket';
// 1. Initialize the CRDT document
const doc = new Y.Doc();
// 2. Connect to the WebSocket server
const provider = new WebsocketProvider('ws://localhost:1234', 'my-room', doc);
// 3. Define a shared text type
const sharedText = doc.getText('content');
// 4. Observe changes to update the UI
sharedText.observe(event => {
console.log('New document state:', sharedText.toString());
});
// 5. Apply a local change
sharedText.insert(0, 'Hello World!');
Step 3: Handling Presence (Cursors)
Users need to see where others are working. This is called "Presence" or "Awareness." CRDTs typically handle this as a separate, ephemeral layer. Awareness data is not usually persisted to the database because you don't care where a user's cursor was three weeks ago. The WebSocket server broadcasts cursor coordinates and user metadata (name, color) to all active connections. When a user disconnects, the server clears their presence state from the group.
Tradeoffs and Performance Optimization
CRDTs are not a magic bullet. They have a specific cost: metadata overhead. To ensure that every character can be merged correctly, CRDTs store additional information for every operation. This is often referred to as "Tombstones." If you delete a character, the CRDT doesn't just erase it from memory immediately; it marks it as deleted so that if another user tries to edit near that character, the system knows where to place the new text.
| Feature | Operational Transformation (OT) | CRDT (State-based) |
|---|---|---|
| Server Logic | Extremely Complex (Heavy) | Simple Broadcaster (Light) |
| Offline Support | Limited / Complex to reconcile | Native / Seamless |
| Memory Usage | Low (Only current state) | Medium/High (Includes metadata) |
| Network Topology | Centralized Hub-and-Spoke | P2P or Centralized |
⚠️ Common Mistake: Neglecting Garbage Collection. If you never prune the history of your CRDT, the document size will grow indefinitely. Modern libraries like Automerge and Yjs have built-in optimization algorithms, but you must still implement "Snapshots" to allow new users to download the current state without replaying the entire history of the document from day one.
Operational Tips for Production CRDTs
When deploying collaborative tools to production, your WebSocket server will become the bottleneck. Use a horizontally scalable architecture with a Pub/Sub backend like Redis. If User A is connected to Server 1 and User B is connected to Server 2, Server 1 needs to publish User A's updates to a Redis channel so Server 2 can pick them up and send them to User B.
Another critical tip is to throttle your updates. While CRDTs are efficient, sending a WebSocket message for every single keystroke can overwhelm mobile processors and waste battery. Instead, buffer updates for 50–100ms and send them in small batches. This significantly reduces the overhead on the main thread and makes the UI feel smoother for the end user.
📌 Key Takeaways
- WebSockets provide the transport, but CRDTs provide the consistency logic.
- Use CRDTs for offline-first apps or complex collaborative canvases.
- Prefer established libraries like Yjs or Automerge over building your own merge logic.
- Monitor "Tombstone" growth and use snapshots to keep initial load times low.
- Ensure your WebSocket server uses a Pub/Sub backplane for horizontal scaling.
Frequently Asked Questions
Q. What is the difference between OT and CRDT?
A. Operational Transformation (OT) relies on a central server to transform and sequence every edit operation. CRDTs use mathematically designed data structures that allow conflicts to be resolved locally on every client without a central authority, making them better for offline support and decentralized apps.
Q. How do CRDTs handle large documents with high history?
A. CRDTs store metadata for deleted items (tombstones), which can increase memory usage. Modern libraries mitigate this through document snapshots and binary encoding. You should periodically "squash" the history into a snapshot to ensure new users don't have to download the entire edit history.
Q. Can I use CRDTs with a traditional SQL database?
A. Yes. You can store the binary CRDT state as a BLOB in Postgres or MySQL. When a user connects, the server loads the BLOB, and as updates arrive via WebSockets, the server merges them into the BLOB and persists the new state back to the database.
Post a Comment