Building Truly Offline Apps Isn’t Magic: Local-First with Next.js, Flutter, and On-Device AI

Starting a new app, most of us wire up the backend before we even sketch the core screens—then hope our offline mode “just works later.” It rarely does. If you’ve ever patched cache misses, handled sync bugs, or watched your app freeze when Wi-Fi drops, you know why local-first development matters.

I decided to do it right this time, from the start. Here’s how I set up local-first in Next.js and Flutter, built a sync layer that’s optional (not required), and ran Gemma 3n for offline inference right on the device.

What Does “Local-First” Actually Mean?

Data on the device is always the source of truth. Your UI only ever talks to local storage. Everything else—cloud sync, cross-device sharing, even AI inference—is an add-on. If users lose connection, the app keeps working. If sync fails, data never disappears. If a user wants AI features, they get them instantly and privately, no backend calls.

Core requirements:

All app logic must work 100% offline (CRUD, search, inference)
Every mutation logs as an immutable operation (WAL pattern)
Deterministic reducers merge concurrent edits
Schema migrations never require cloud connectivity
Optional transport sync, but functionally no dependency on it

Next.js Implementation

1. Local Storage: IndexedDB

I use IndexedDB via Dexie. There are two stores:

ops: Write-ahead log (append-only)
docs: Materialized current state

2. Reducer Pattern

Every change is an operation object, appended to WAL, then folded with a deterministic reducer to update docs.

// IndexedDB setup
import Dexie from "dexie";
const db = new Dexie("localfirst");
db.version(1).stores({
  ops: "++id,op_id,lamport_ts",
  docs: "[coll+id],lamport_ts"
});

// Example op
const op = {
  op_id: crypto.randomUUID(),
  lamport_ts: Date.now(),
  actor: "device_123",
  kind: "todo.edit",
  payload: { id: "a", title: "updated" }
};
await db.ops.add(op);
// Then update docs via reducer

Reducers must be deterministic and order-independent where possible. For basic docs, I use “last writer wins” with per-field timestamps; for lists, I use CRDT sequences if I want rich merges.

3. Real-Time Reactivity

Hooks subscribe to batched WAL writes. Queries read straight from docs, so the UI is always up to date and works instantly, not sometimes.

4. Schema Migrations

On version bump, replay ops from WAL, transform with new reducer, write updated docs, and checkpoint the migration to avoid repeated work.

Flutter Implementation

1. Local Storage: Isar

On mobile, Isar fits perfectly:

Op and Doc collections
ACID transactions, fast lookups
Can migrate by replaying WAL and updating docs as needed

2. Operations and Reducers

Each user interaction generates a serialized Op. Reducers update the current Doc record for that entity.

@collection
class Op {
  Id id = Isar.autoIncrement;
  late String opId;
  late int lamportTs;
  late String kind;
  late Map<String, dynamic> payload;
}

Reducer functions match op kind and materialize fields, using per-field timestamps for LWW or sequence CRDT logic for ordered updates.

3. UI and Isolate Sync

UI code subscribes to changes in docs via Isar queries. WAL writes and state reduction can run in a background isolate for performance, so rendering is never blocked by DB work.

Optional Sync Layer

If the user has multiple devices or wants backup, sync comes into play. My rule is: if sync fails, users never notice; everything keeps working.

Mechanism at a glance:

Client advertises vector clock of known ops
Pushes new ops in batches when online
Pulls missing ops from server (or other devices)
Server is just an append-only relay; no state reconciliation required

All actual "merging" is local—code assumes nothing about server order or reliability.

Example Sync Client (Next.js)

async function sync(endpoint: string) {
  const known = await indexedOpsVectorClock();
  const localOps = await getUnsentOps();

  // Push local ops
  await fetch(endpoint + "/push", { method: "POST", body: JSON.stringify(localOps) });

  // Pull remote ops
  const res = await fetch(endpoint + "/pull", { method: "POST", body: JSON.stringify({ known }) });
  const remoteOps = await res.json();
  // Append to WAL, replay as usual
}

It’s as stateless and boring as possible. No conflict dialogs, no server arbitration.

Edge Inference with Gemma 3n

Most “AI features” these days are just SaaS vendors piping your data through a remote API. Here, real AI runs local.

Packing Gemma 3n

For Next.js: Quantize the model, bundle it or fetch as-needed, load with ONNX Runtime for Web (WebGPU if possible).
For Flutter: Quantized .tflite file, run with TFLite Flutter plugin using NNAPI/CoreML delegates.

Workflow:

Download or bundle the model
Tokenize inputs locally
Run inference call directly on-device
Use result as part of normal app flow (summarizing, recommending, etc.)

Web Example:

import { InferenceSession } from "onnxruntime-web";
const session = await InferenceSession.create("gemma3n_quant.onnx");
// Prepare input tensors and call session.run()

Flutter Example:

final interpreter = await Interpreter.fromAsset("gemma3n_quant.tflite");
final result = interpreter.run(input, output);
// Use result as needed

No network calls, no PII leaves the device, and inference latency is entirely predictable.

Testing and Reliability

With local-first, you get new test strategies:

Use property-based tests to shuffle ops, replay on multiple platforms, assert convergence
Simulate random crashes (write half an op, crash, restart and verify state)
Schema migrations are tested by replaying millions of WAL steps with generated data

Conclusion

Local-first isn’t a library or a toggle. It requires building your logic, storage, and even AI features around the principle that the user’s device always comes first. Sync, sharing, and cloud inference are all nice—but never required.

With Next.js, Flutter, optional WAL replication, and device-resident Gemma 3n, you can build apps that:

Don’t lose data
Never block the UI
Run the latest AI models offline
Sync only as a convenience

If you’re tired of patching last-minute offline bugs or watching your AI features give up at the wrong time, building this way makes your app (and life) much more dependable.