Building Truly Offline Apps Isn’t Magic: Local-First with Next.js, Flutter, and On-Device AI

Starting a new app, most of us wire up the backend before we even sketch the core screens—then hope our offline mode “just works later.” It rarely does. If you’ve ever patched cache misses, handled sync bugs, or watched your app freeze when Wi-Fi drops, you know why local-first development matters.
I decided to do it right this time, from the start. Here’s how I set up local-first in Next.js and Flutter, built a sync layer that’s optional (not required), and ran Gemma 3n for offline inference right on the device.
What Does “Local-First” Actually Mean?
Data on the device is always the source of truth. Your UI only ever talks to local storage. Everything else—cloud sync, cross-device sharing, even AI inference—is an add-on. If users lose connection, the app keeps working. If sync fails, data never disappears. If a user wants AI features, they get them instantly and privately, no backend calls.
Core requirements:
- All app logic must work 100% offline (CRUD, search, inference)
- Every mutation logs as an immutable operation (WAL pattern)
- Deterministic reducers merge concurrent edits
- Schema migrations never require cloud connectivity
- Optional transport sync, but functionally no dependency on it
Next.js Implementation
1. Local Storage: IndexedDB
I use IndexedDB via Dexie. There are two stores:
ops: Write-ahead log (append-only)docs: Materialized current state
2. Reducer Pattern
Every change is an operation object, appended to WAL, then folded with a deterministic reducer to update docs.
// IndexedDB setup import Dexie from "dexie"; const db = new Dexie("localfirst"); db.version(1).stores({ ops: "++id,op_id,lamport_ts", docs: "[coll+id],lamport_ts" }); // Example op const op = { op_id: crypto.randomUUID(), lamport_ts: Date.now(), actor: "device_123", kind: "todo.edit", payload: { id: "a", title: "updated" } }; await db.ops.add(op); // Then update docs via reducer
Reducers must be deterministic and order-independent where possible. For basic docs, I use “last writer wins” with per-field timestamps; for lists, I use CRDT sequences if I want rich merges.
3. Real-Time Reactivity
Hooks subscribe to batched WAL writes. Queries read straight from docs, so the UI is always up to date and works instantly, not sometimes.
4. Schema Migrations
On version bump, replay ops from WAL, transform with new reducer, write updated docs, and checkpoint the migration to avoid repeated work.
Flutter Implementation
1. Local Storage: Isar
On mobile, Isar fits perfectly:
OpandDoccollections- ACID transactions, fast lookups
- Can migrate by replaying WAL and updating docs as needed
2. Operations and Reducers
Each user interaction generates a serialized Op. Reducers update the current Doc record for that entity.
@collection class Op { Id id = Isar.autoIncrement; late String opId; late int lamportTs; late String kind; late Map<String, dynamic> payload; }
Reducer functions match op kind and materialize fields, using per-field timestamps for LWW or sequence CRDT logic for ordered updates.
3. UI and Isolate Sync
UI code subscribes to changes in docs via Isar queries. WAL writes and state reduction can run in a background isolate for performance, so rendering is never blocked by DB work.
Optional Sync Layer
If the user has multiple devices or wants backup, sync comes into play. My rule is: if sync fails, users never notice; everything keeps working.
Mechanism at a glance:
- Client advertises vector clock of known ops
- Pushes new ops in batches when online
- Pulls missing ops from server (or other devices)
- Server is just an append-only relay; no state reconciliation required
All actual "merging" is local—code assumes nothing about server order or reliability.
Example Sync Client (Next.js)
async function sync(endpoint: string) { const known = await indexedOpsVectorClock(); const localOps = await getUnsentOps(); // Push local ops await fetch(endpoint + "/push", { method: "POST", body: JSON.stringify(localOps) }); // Pull remote ops const res = await fetch(endpoint + "/pull", { method: "POST", body: JSON.stringify({ known }) }); const remoteOps = await res.json(); // Append to WAL, replay as usual }
It’s as stateless and boring as possible. No conflict dialogs, no server arbitration.
Edge Inference with Gemma 3n
Most “AI features” these days are just SaaS vendors piping your data through a remote API. Here, real AI runs local.
Packing Gemma 3n
- For Next.js: Quantize the model, bundle it or fetch as-needed, load with ONNX Runtime for Web (WebGPU if possible).
- For Flutter: Quantized
.tflitefile, run with TFLite Flutter plugin using NNAPI/CoreML delegates.
Workflow:
- Download or bundle the model
- Tokenize inputs locally
- Run inference call directly on-device
- Use result as part of normal app flow (summarizing, recommending, etc.)
Web Example:
import { InferenceSession } from "onnxruntime-web"; const session = await InferenceSession.create("gemma3n_quant.onnx"); // Prepare input tensors and call session.run()
Flutter Example:
final interpreter = await Interpreter.fromAsset("gemma3n_quant.tflite"); final result = interpreter.run(input, output); // Use result as needed
No network calls, no PII leaves the device, and inference latency is entirely predictable.
Testing and Reliability
With local-first, you get new test strategies:
- Use property-based tests to shuffle ops, replay on multiple platforms, assert convergence
- Simulate random crashes (write half an op, crash, restart and verify state)
- Schema migrations are tested by replaying millions of WAL steps with generated data
Conclusion
Local-first isn’t a library or a toggle. It requires building your logic, storage, and even AI features around the principle that the user’s device always comes first. Sync, sharing, and cloud inference are all nice—but never required.
With Next.js, Flutter, optional WAL replication, and device-resident Gemma 3n, you can build apps that:
- Don’t lose data
- Never block the UI
- Run the latest AI models offline
- Sync only as a convenience
If you’re tired of patching last-minute offline bugs or watching your AI features give up at the wrong time, building this way makes your app (and life) much more dependable.