Why Offline-First Was Our Hardest Decision

The sixth problem: making the app work when there’s no internet — which, on a film set, is most of the time.

By December 2025 we had a working multi-user, real-time collaborative production app. It worked beautifully in our office. It worked at the coffee shop. It worked when the user had four bars of LTE.

It failed the moment any of those conditions weren’t true. And on a film set — the actual environment we’d been building for — they were almost never true.

Stone-walled apartments in old European cities. Soundstages built like Faraday cages. Locations an hour outside a city. Basements, parking garages, deserts, oceans, mountains. The filmmaker uses your app on the morning of a shoot in their kitchen, again at lunch in a van with no signal, and again at the end of the day on the corner of someone else’s driveway. Your app has to feel the same in all three places.

We knew this. We’d put it off because the alternative — giving the client real autonomy from the network — meant rewriting every single repository, every store, every state pipeline. The kind of architectural decision you keep deferring because you can almost get away without it.

In December, we stopped getting away without it.

What “cloud-first” actually costs

The pattern most apps default to looks like this: when the user does something, send it to the server, wait for the response, then update the UI. If you’re offline, you show an error. If the network is slow, you show a spinner.

This is fine for an inbox. It’s wrong for a tool you use on set.

A director planning the next shoot day shouldn’t see a spinner when they drag a shot from position 4 to position 7. They shouldn’t see an error when they’re walking out of a building. They shouldn’t lose what they just typed if the connection drops at exactly the wrong moment. None of this is acceptable in an environment where the network is the exception, not the norm.

The server should still be the source of truth — that part wasn’t wrong. What was wrong was making the user wait for the server before showing them a result. The fix wasn’t to demote the server. It was to stop treating the network as a precondition for the UI.

The local database becomes the working surface

The rebuild centered on a single decision: the client doesn’t ask the server for permission before showing the user a result.

Every shot, every scene, every banner, every script element is now stored in a local database on each user’s machine. The database has two layers: the canonical project tables, which hold the state the server has confirmed, and an outbox table, which holds transactions the device has performed but the server hasn’t yet acknowledged. The UI reads from both layers together — confirmed state plus pending transactions — so the user always sees the result of their own actions immediately, even though the canonical tables haven’t moved yet. The server remains the source of truth: what it ultimately says is what the canonical tables become.

When you reorder a shot list:

The transaction is written to the local outbox.
The UI updates instantly by reading the canonical project state overlaid with the new outbox entry — the change is visible immediately even though the canonical tables haven’t been touched.
When the connection is available, the outbox worker hands the transaction to the server.
The server processes it (with whatever ACL and conflict-resolution rules apply) and broadcasts the resolved result to every connected device, including this one.
The client applies the server-resolved result to its canonical project tables and clears the corresponding outbox entry. If the server resolved things differently than the optimistic display assumed, the UI now reflects the canonical state automatically.

If the connection isn’t available, steps 3–5 wait. The user keeps working. When the network returns, queued transactions drain in order. The server resolves whatever conflicts arise — it has the full view of what every device did — and broadcasts the resolution back.

The architectural shift wasn’t from “server is source of truth” to “device is source of truth.” That part stayed the same. The shift was from “client waits on server” to “client operates autonomously and reconciles after the fact.” In practice, the rebuild meant touching everything — shot list, script editor, files, storyboard, project data, even thumbnails and PreVis images. We rewrote each repository to write locally first. We rebuilt the bootstrap process to hydrate the local database on first launch, then do incremental syncs on subsequent startups. We replaced loading spinners with optimistic UI updates.

Real-time sync evolved with it

The sync engine, fresh off its WebSocket transport upgrade, had to evolve again. The change wasn’t about who was authoritative — the server still is — but about giving the client real time to spend offline before it had to talk to the server.

The sync engine now:

Treats every local change as a transaction with a unique sync ID, written first to the local outbox.
Queues outbox entries when offline; drains them in order when the connection is restored.
Hands transactions to the server, which processes them with ACL and conflict resolution.
Broadcasts resolved results to every connected device, including the originator, so each client can reconcile its local state with what the server has now decided is canonical.

We also added a publish-subscribe layer to fan out transactions across multiple server instances, so the system could scale horizontally without losing real-time guarantees.

The connection-aware UX is a separate small thing that mattered. An online/offline status badge on the phone tells you immediately whether your changes are queued or syncing. The app suppresses intrusive network-error popups during normal app lifecycle events (the kind of “you’re offline!” message every cloud-first app pops up the instant you switch screens, which is wrong about 90% of the time).

What this cost us

Sixty-nine commits. Six hundred and fifty-two files changed. Roughly eighty thousand lines of new code, thirty-four thousand lines deleted. Most of that deletion was the previous sync paths, which we tore out and replaced.

In time, it was about ten weeks of focused work by one developer, with feature work paused or slowed for the duration. The earlier real-time sync work didn’t get thrown away — the transactional foundation was the right idea — but it had to be reshaped to support disconnected operation. Everything that assumed “the client always has a working connection to the server” had to be rewritten to assume “the client might not have heard from the server for hours, and that’s fine.”

This is the kind of rewrite that’s terrifying to start. Once it’s done, it’s hard to imagine the product without it.

The fixes that came out of stress-testing

An offline-capable rebuild surfaces every assumption you didn’t know you were making. The bug list from the December–February period was long and educational:

A race condition where local thumbnail uploads and deletes could fire in the wrong order. Fixed by serializing them through a proper local file-handling service.
A crash in the script editor’s layout computation under specific document states. Fixed by adding defensive checks at the layout boundary.
Duplicate scene headers appearing after concurrent edits. Fixed in the sync engine’s conflict resolution.
Stale production list items lingering after deletion on another device. Fixed by re-validating list contents on sync.
A specific echo effect where a transaction broadcast back to its originating device caused a double-apply. Fixed by tracking the originator and suppressing the echo.

Every one of these is the kind of bug that only appears in a system where the client operates partly out of step with the server. None of them existed when the client waited for the server to confirm every action. This was the cost of the architecture choice — and the reason for the long, careful test suite we built alongside the rebuild.

What it looks like today

You open CineLog with no internet. Everything you did last time is still there. You can edit a scene, reorder shots, add notes, mark a production day complete. You walk back into WiFi and the app catches up silently, syncs your changes to the server, picks up anyone else’s changes. There is no “are you sure?” dialog, no “click here to retry,” no spinner where there shouldn’t be one. The app just works the same way regardless of network state.

We launched the public beta in February 2026 on this foundation. The architecture isn’t a feature — most users will never know we made the decision. It’s the floor on which everything else stands.

Next: When a Shot Planner Becomes a Production Management Platform — the moment our tool stopped serving one user and started serving the whole crew.