Scope a similar engagement →
Skip to main content

Cloud-native multi-tenant SaaS for ed-tech analytics

Microservices platform unifies fragmented assessment data behind LTI / QTI / CASE-compliant APIs.

CASE FILE · CS-03 SHIPPED
“Term-start traffic was 80× baseline. The platform held; the old SaaS would have invoiced us six figures for the burst.”
Term-start traffic burst handled80×
ClientK-12 ed-tech analytics provider
SectorEducation
Service linesBuild · Upstream
Window10 weeks fixed
READ THE FILE

Challenge

Assessment data arrived from multiple LMS and assessment platforms in incompatible formats. Manual ingestion + normalization burned weeks of analyst time per cohort. The platform needed to support multi-tenancy across districts, scale to district-wide concurrency during exam windows, and emit standards-compliant data via LTI, QTI, and CASE APIs.

Solution

A .NET Core microservices platform on Kubernetes, with RabbitMQ for asynchronous ingestion, Azure Synapse Analytics as the warehouse, and Power BI Embedded for tenant-facing dashboards. Power Automate Desktop bridges legacy file-drop sources. Standards-compliant connectors (LTI 1.3, QTI 3.0, CASE 1.1) emit data to downstream LMS and reporting tools without custom integration per district.

Engagement

  • Sector: Education
  • Service lines: Build · Upstream
  • Client: EdTech analytics provider serving K-12 districts and higher-ed (anonymized)
Online learning analytics screen
CASE FILE · CS-03 · CLOUD-NATIVE MULTI-TENANT SAAS FOR ED-TECH ANALYTICS
Microservices platform unifies fragmented assessment data behind LTI / QTI / CASE-compliant APIs.
CASE FILE · CS-03 · LONG-FORM

The full story.

The overview's above. Below is what actually happened — the trigger, the surprises, the decisions, the build, the cutover, and how it's holding up.

TLDR audio briefing
For busy executives
~1m 9s summary · 0:00 / 1:09
01
Why we got the call.

The trigger

Term-start was a math problem. The existing SaaS analytics platform priced its burst capacity at a 6-figure annual surcharge — payable in advance, used for roughly two windows per year (August district onboarding and January mid-year assessment), idle the rest of the time. The CFO's question was straightforward: are we paying $200K/year to handle 8 weeks of traffic?

The other half of the trigger was integration pain. Each new district required a 4-6 week custom integration to ingest their LMS assessment data. The analyst team had built a manual normalization pipeline in Excel macros — impressive engineering, but unmaintainable as district count grew. The platform was on track to onboard 24 new districts in the coming year; that math was untenable.

The third pressure was standards. The state department of education had quietly published a roadmap requiring LTI 1.3 + QTI 3.0 compliance for any provider claiming "interoperability" in their procurement materials. The existing platform was LTI 1.1 — out of compliance within 18 months. Either rebuild for compliance or get cut from procurement shortlists.

02
What we found that the brief didn't say.

The week-one discovery

Week zero we mapped the actual data flows. The platform handled 12 districts, each with 2-7 LMS partners (Canvas, Schoology, PowerSchool, Brightspace, Moodle, two custom regional LMSes). That's roughly 40 distinct inbound formats. Some were clean (Canvas's LTI 1.3 endpoint was well-formed). Some were file drops via SFTP at 3am every night. One was a daily email with a CSV attachment.

The 7 different inbound file formats clustered into 4 actual data shapes underneath the format variation. The Excel macros had been doing the cluster-and-normalize work for years; we found a 4,200-line workbook with VBA that, once we deciphered it, was effectively a hand-rolled ETL engine. The analyst who built it had moved on 18 months earlier. Nobody on the current team understood it.

The standards-compliance work was a moving target. LTI 1.3 had stabilized but QTI 3.0 was still in IMS Global's "experimental" phase — meaning the spec was published but reference implementations had been changing month-to-month. CASE 1.1 was stable but only two of the 12 districts had vendors emitting CASE-formatted data; the rest would need translation layers.

The hardest discovery: 4 of the 12 target districts had hard requirements that were mutually incompatible. District A required ALL student data to never leave their AWS region. District B required all data to flow through their on-premises ETL appliance before reaching cloud. District C required full PII isolation between tenants AND prohibited any vendor-side aggregation across districts. We documented these in week 0 because they shaped the architecture.

03
The tradeoffs we made + why.

The architecture decisions

**.NET Core vs team-preferred Python.** The client's existing engineering team was .NET-strong. The maintenance hand-off mattered more than our team's productivity. .NET Core (now .NET 8) gave us battle-tested microservices ergonomics and the team can fork into the codebase post-handoff without retraining. The decision is documented in the architecture memo so future architects understand why.

**Microservices vs monolith.** Burst capacity was the deciding factor. A monolith would have required scaling the whole application at term-start; microservices let us scale the ingestion pipeline separately from the dashboard rendering. We accepted the operational complexity (5 services instead of 1) because the alternative — paying $200K/year for SaaS burst capacity — was the problem we were solving.

**RabbitMQ vs Kafka.** The team had RabbitMQ in their stack already (used elsewhere in their Microsoft-shop architecture). Kafka would have been technically defensible for the scale targets but added a new operational discipline the team would have to maintain. RabbitMQ was the right boring choice. We documented the upgrade path to Kafka in the architecture memo for the day they outgrow RabbitMQ (somewhere north of 50× current throughput).

**Synapse vs Snowflake.** The client had an Azure commitment. Synapse Analytics integrates natively with their existing Azure tenant, their existing Active Directory groups for access control, and their existing Power BI Embedded licenses. Snowflake would have been faster on some query patterns but the operational integration cost would have eaten the speed advantage at their scale.

**Power BI Embedded vs custom React dashboards.** Power BI Embedded won on time-to-ship and on the licensing economics (the client already had E5 licenses). Custom React would have given us more design control but the lift to match Power BI's standard chart library (sortable tables, drill-down on cohort comparisons, scheduled email reports) would have eaten 3 weeks. We did the design work on the Power BI side to push the dashboards as far as the platform allows.

04
How the work actually unfolded.

The build

Weeks 2-7 ran on a build-test-deploy loop with mock LMS data for every district shape. We assembled a "district simulator" that could emit any of the 7 inbound formats at configurable rates — invaluable for stress-testing the ingestion pipeline before any real district was connected.

Week 5 surprise: a QTI 3.0 experimental endpoint at one of the target districts changed mid-build. The IMS Global working group had published a minor revision and one LMS vendor had pushed it to production overnight. Our ingestion was hitting field names that no longer existed. We shipped version detection in the QTI adapter — sniff the response, route to the right parser variant — in 6 hours. The fix went in week 5; in retrospect it should have been in week 1.

Week 7 was load-testing. We hit the platform with district-simulator traffic at 80× baseline (matching the worst-case term-start spike). The dashboard rendering held; the autoscale fired as designed; one query in the Synapse warehouse took 9 seconds and broke our latency budget. We rewrote it as a materialized view in 3 hours. The day a real district uploaded a 1.2GB assessment dump as their "we want to test the platform too" trial — and the pipeline handled it in 4 minutes — the team knew the architecture was sound.

Multi-tenant isolation testing was painstaking. We ran adversarial test cases: tenant A's JWT trying to access tenant B's data, tenant A's BI query attempting cross-tenant aggregation, tenant A's user filling Power BI report URL with tenant B's report ID. All blocked at the API layer; all logged in the audit trail.

05
How launch went.

The cutover

Phased onboarding: 1 district in week 9 (the simplest data shape — Canvas LTI 1.3 only), 4 districts by week 10 (mixed shapes including SFTP file drops), all 12 districts by week 14 (post-handoff). The first district's analyst lead said the onboarding took 90 minutes — most of it was scheduling, not technical work. They had been bracing for 4-6 weeks per the old playbook.

Term-start traffic spike hit week 16, two weeks after the last district was live. Autoscale fired across the ingestion microservices as designed. Peak load was 87× baseline (we'd designed for 80×, the autoscale headroom absorbed the rest). Dashboard rendering stayed under 2-second p95 latency throughout the spike. The CFO's $200K SaaS burst-invoice never came.

06
How the work is holding up.

Ninety days later

The 80× burst-handled metric is the headline, but the better metric is the one we didn't expect: district onboarding now takes 2 days, not 6 weeks. The analyst team has run 8 new district onboardings since handoff using the runbook we left them. Each onboarding is now mostly configuration (mapping the district's LMS endpoints to our adapters) instead of custom integration work.

Standards-compliance review with the state DOE happened in Q4. LTI 1.3, QTI 3.0, and CASE 1.1 all certified. The platform stayed on procurement shortlists.

The Excel-macro engine has been decommissioned. The analyst whose VBA we'd been reverse-engineering would probably be glad to know it's been retired with honors — it shipped the company for 5 years before a proper platform was viable.

07
Honest retrospective.

What we'd do differently

The week-5 QTI-version-change surprise was avoidable. We should have negotiated direct access to all 7 LMS partners' standards-test environments in week zero, not built against the cached schema. Spec drift in experimental standards is the norm, not the exception. Next standards-implementation engagement starts with a week-zero "subscribe to the spec working group" item.

The district-simulator was the best testing investment of the engagement. We'd build it earlier — week 1 not week 3 — on any future multi-tenant platform engagement. Adversarial test data beats production data for finding architectural issues.

We'd build the standards-version-detection layer up-front, not as a week-5 hotfix. The pattern (sniff the response, route to the right parser variant) is reusable across any experimental-standards endpoint.

ENGAGEMENT TIMELINE · 10 WEEKS FIXED

Every engagement runs through the same five gates of the FORGE method. Here’s how this case ran.

FORGE GATES · CS-03SHIPPED
W0 · FRAMEMulti-tenancy requirements, standards review (LTI / QTI / CASE), burst-load capacity targets, district-onboarding model.
W1 · OUTLINE.NET Core microservices design, Synapse warehouse schema, RabbitMQ ingestion topology, Power BI tenancy model.
W2–7 · REBUILDMicroservices build, async ingestion pipeline, tenant-scoped dashboards, standards-compliant connectors (LTI / QTI / CASE).
W8 · GOVERNLoad testing for term-start spike, security review, multi-tenant data-isolation audit.
W9–10 · ENGAGEFirst-cohort district onboarding, performance tuning, dashboard refinement based on early-user feedback.
Results · Key metrics · CS-03Verified
Standards-compliant
LTI 1.3, QTI 3.0, CASE 1.1 endpoints
Multi-tenant
District-level isolation + role-based access
Burst-tolerant
Autoscale handles assessment-window load spikes
Real-time
Tenant dashboards refresh as data ingests
STACK · CS-03SHIPPED
SectorEducation
ServicesBuild · Upstream
ClientEdTech analytics provider serving K-12 districts and higher-ed (anonymized)
.NET Core Kubernetes RabbitMQ Azure Synapse Analytics Power BI Embedded Power Automate Desktop LTI 1.3 QTI 3.0 CASE 1.1
Client voice
Term-start traffic was 80× baseline. The platform held; the old SaaS would have invoiced us six figures for the burst.
Head of Platform · K-12 ed-tech analytics provider
Get a quick answer for a similar engagement · See all 10 →

Try the matching free calculator

Each calculator runs in 3 minutes and emails you an 8-page memo.

Scope a similar engagement.

A 30-min call: walk through your situation, get a fixed-price SOW within 24 hours. Tell us "I want what CS-03 did" and we'll calibrate to your specifics.

Book a 30-min call →