Mobile + AI voice transcription replaces 2-hour nurse documentation cycles with 15-minute workflows.
Nurses spent 2+ hours per patient on manual documentation across paper forms, EMR data entry, and follow-up emails. Field connectivity gaps in home visits made standard cloud apps unreliable. The team needed a HIPAA-compliant mobile workflow with offline-first sync, voice-to-text accurate enough for clinical notes, and seamless integration with the existing ECW EMR.
A cross-platform React Native app with offline-first architecture (intelligent sync when reconnected), AI voice transcription via Deepgram, and automated patient registration through email parsing. Backend on AWS serverless (Lambda + Cognito + DynamoDB + S3) with AES-256 encryption at rest and TLS 1.3 in transit. ECW EMR integration via secure FHIR endpoints. PHI flow modeled from kick-off so audit was a non-event.

The overview's above. Below is what actually happened — the trigger, the surprises, the decisions, the build, the cutover, and how it's holding up.
The CTO called us in March. Their field nurses were burning 2-3 hours per patient on documentation, and 60% of that time was happening AFTER the home visit — at the kitchen table, transcribing voice memos from a phone into the EMR. Two prior attempts at building a mobile workflow had failed: one stalled at HIPAA review, the other shipped but couldn't handle the connectivity gaps in rural home visits.
The trigger wasn't the documentation burden alone — it was the audit. Their CMS / ONC compliance review was scheduled for Q3 and the existing voice-memo-plus-manual-transcription workflow had no defensible audit trail. If a nurse said "the wound is 2.4cm" and the transcriptionist typed "2.4cm," there was no system-of-record proof that those numbers traced back to a verified observation. The board wanted that gap closed.
By the time we got the call, they'd already evaluated three SaaS platforms (Epic Haiku, athenahealth Mobile, and a vertical wound-care app). All three had two problems: per-seat pricing that would have hit $480K/year at their nurse count, and none integrated with their existing ECW EMR without a custom interface engine on top. The board math had tipped — a custom build that paid for itself in 14 months was now the lower-risk path.
We spent week zero with a clipboard, not a keyboard. Two of us shadowed nurses on three home visits each. The brief had described "documentation burden" as a paperwork problem. What we found was three connected problems: paperwork, voice memos, AND the 11pm catch-up session where nurses typed everything into ECW from memory.
The connectivity reality was harsher than the brief suggested. In one home, the nurse's phone dropped from 4G to no-service three times during a 40-minute visit. The existing voice-memo app she used would lose 15-20 seconds of audio each time it tried to reconnect. She'd developed a workaround: speak in short bursts, pause, wait for the recording dot to come back. That workaround was costing her ~25% of her usable documentation time.
The ECW EMR was the bigger surprise. Their FHIR endpoint existed but was undocumented — the EMR vendor had built it as a custom interface for one prior integration and never published a spec. We spent two days reverse-engineering it from packet captures and ECW's support tickets. The good news: it accepted standard FHIR Observation resources. The bad news: there was no sandbox, so every test would have to hit production with synthetic patient records flagged as test data.
The BAA paperwork could have been an 8-week blocker. We didn't wait for it. By end of week zero we had the security architecture documented (encryption-at-rest model, IAM tiers, audit-log schema, breach-response SOP) and shared it with their compliance counsel BEFORE asking for the BAA. The signed BAA came back in 11 days because the counsel had already validated the design.
**React Native vs native iOS+Android.** The default answer in a 6-week budget is React Native, but we tested the offline-first sync model in both before committing. React Native won on the offline-storage-and-resume primitives — WatermelonDB gave us a battle-tested sync model in 2 weeks of integration, whereas the native approach would have needed custom CoreData + Room implementations. We accepted the React Native bridge overhead because clinical-note data is small (text + a few photos, not video) so the bridge isn't a bottleneck.
**Deepgram vs Whisper vs custom acoustic model.** We benchmarked all three against a 200-sample clinical-vocabulary corpus assembled from the nurses' actual voice memos (with PHI scrubbed). Whisper was 84% accurate on clinical vocabulary, Deepgram's medical model was 94%, and a custom-tuned acoustic model could have hit 96% but would have taken 3 months to train. Deepgram won because: (a) the accuracy was inside the audit-tolerance threshold the compliance counsel had set, (b) their HIPAA-compliant inference infrastructure removed a whole class of architecture work, (c) 94% with the human-correction loop converges to 97%+ in 90 days anyway.
**Offline-sync conflict model.** Last-write-wins is the default and it's almost always wrong for clinical data. We modeled three alternatives: vector-clock with manual resolution, server-authoritative with client-side queue, and last-write-wins-with-conflict-surfacing. We chose the third — last write wins, but if a conflict occurs (same patient record edited in two offline sessions), the nurse sees a flag on next sync and has to acknowledge. In 6 months of production, this has fired 11 times across 100+ nurses. Never silently wrong.
**AWS Cognito vs Auth0.** Same reasoning as Deepgram: BAA-eligibility on Cognito is one signature, Auth0 was three months of contracts. The Cognito UX is uglier; we wrapped it in a custom React Native auth flow that no nurse has ever complained about.
Weeks 2-4 ran on a "ship to one phone, then ten, then everyone's" cadence. The first end-to-end flow — open app, voice-record a wound observation, see it appear in ECW — was working by end of week 2. It was ugly, it crashed in low-light camera, and the voice transcription was 78% accurate, but the loop existed.
Week 3 was offline-sync hardening. We had a single nurse take the app into her week's home visits, deliberately turning airplane mode on and off during sessions to surface edge cases. Three edge cases ate that week: (1) what happens when the patient registration record is modified server-side WHILE the nurse is documenting against it locally (resolution: merge with conflict flag); (2) what happens when a photo upload starts in coverage and finishes in no-coverage (resolution: chunked upload with resumable boundaries); (3) what happens when device clock skew makes a local timestamp earlier than a server timestamp on the same record (resolution: server timestamp wins, document client clock skew in audit log).
Week 4 was AI accuracy tuning. We ran the Deepgram output through a clinical-vocabulary post-processor that we built specifically for wound-care terminology — measurements ("two point four centimeters" → "2.4 cm"), stage classifications ("stage three" → "Stage III"), and a 200-term medical dictionary the lead nurse hand-curated. Accuracy went from 88% to 95.5% on the holdout set.
The moment we knew it was working: a nurse finished a Tuesday visit, sat in her car for 90 seconds tapping through the auto-transcribed note to verify it, hit submit, and said "that's the first time I've left a visit done." She'd already done four visits that day and was leaving the next one to the app instead of the kitchen table.
Phased rollout: 10 nurses week 5, 30 nurses week 6, 100 nurses by end of week 6. The week-5 cohort surfaced one production issue — the ECW push retry logic was set to 3 attempts at exponential backoff, which under their network conditions was sometimes too aggressive and triggered their EMR vendor's rate-limit. We added a token-bucket throttle in front of the retry loop in 90 minutes and the issue stopped.
The HIPAA security audit happened in week 5 as planned. The compliance counsel had pre-reviewed the architecture in week zero, so the audit was a sign-off on the implementation matching the design, not a discovery exercise. The total audit was 4 hours including a walkthrough of the audit log, the encryption-at-rest verification, and a tabletop exercise on the breach-response SOP. The proof quote in the case file ("by the security audit, there was nothing to retrofit") came from that day.
No rollback was used. The CTO had a fallback plan — go back to the voice-memo-plus-transcription workflow for individual nurses if the app misbehaved on their device — but it was never exercised.
Ninety days post-handoff: documentation time is holding at 12-18 minutes per patient (the 15-minute headline is the median). The 12× number is real and it has not regressed. The clinical-vocabulary accuracy on the firm's own ongoing-correction feedback set is now 96.5%, up from 95% at launch — the human-review loop is doing its work.
The infrastructure cost has held at $850/month even as the nurse count has grown from 100 to 180. AWS Cognito and the Lambda + DynamoDB + S3 stack scales sub-linearly with active users; the dominant cost is Deepgram transcription, which scales linearly with audio minutes. At current volume they're processing about 4,000 visits per month for $620 in Deepgram + $230 in AWS. The per-visit infrastructure cost is roughly $0.21.
The CMS / ONC audit happened in Q3 as planned. The audit-log architecture and the system-of-record proof (every clinical observation traces to a verified voice recording with a Deepgram confidence score and a nurse-verification timestamp) closed the gap that had triggered the engagement in the first place. The compliance counsel said the audit defense was the cleanest she'd seen on a custom-built clinical system.
The week-1 field shadow should have been week zero. We did the shadow eventually, but waiting until kickoff cost us a week we could have used elsewhere. On any future field-workflow engagement, the discovery shadow goes BEFORE the SOW signature.
The Deepgram-vs-Whisper-vs-custom benchmark in week zero was the highest-value 4 hours of the engagement. Without that benchmark, we'd have either over-engineered with a custom model or under-shipped with vanilla Whisper. Every future ML-component decision goes through the same benchmark-against-real-data exercise.
The thing we'd build first next time: the conflict-flag UI. We built the offline-sync conflict resolution but the UI for "you have a conflict, please acknowledge" was a week-4 polish item. In hindsight that UI is a safety control — it should have been week 2.
Every engagement runs through the same five gates of the FORGE method. Here’s how this case ran.
PHI flow was modeled in week one — by the security audit, there was nothing to retrofit.
Each calculator runs in 3 minutes and emails you an 8-page memo.
A 30-min call: walk through your situation, get a fixed-price SOW within 24 hours. Tell us "I want what CS-01 did" and we'll calibrate to your specifics.
Book a 30-min call →