K-12 ed-tech: multi-tenant analytics that actually scales
Per-district isolation, COPPA / FERPA / DPDP overlays, and the data model that survives Year 3.

K-12 ed-tech analytics is one of the harder multi-tenant data problems in commercial software. The tenant count is large (often 1,000+ districts), the data sensitivity is high (student PII, often in the COPPA / FERPA / DPDP envelope), the per-tenant data volume is bursty (testing windows, term boundaries, summer dormancy), and the access pattern is unusual (district admins, school admins, teachers, parents — often the same data viewed at different aggregations).
Most ed-tech analytics architectures fail at year three when the tenant count crosses a threshold the original schema didn’t account for. This post is what scales, what doesn’t, and the architectural choices that make the difference.
The compliance envelope
Three overlapping regimes typically apply:
- COPPA (US). Parental consent for under-13s, restrictions on data collection, deletion rights.
- FERPA (US). Educational records under district control; vendor processes data on district’s behalf, not as a controller.
- State-level student-data privacy laws. California, New York, Colorado, Illinois, Texas — each has additional requirements.
- DPDP (India), if Indian operations. Children defined as under-18 (stricter than COPPA), heightened consent requirements.
The common thread: data minimisation, per-tenant isolation, deletion-on-request, and audit logging. The architecture has to support these at the schema level, not as bolt-ons.
Where most architectures fail
Three common failure patterns:
- Single-database multi-tenancy with
tenant_ideverywhere. Works for the first 100 districts. By 1,000 districts, query plans degrade, indexes balloon, and per-tenant operations (export, deletion) become operationally painful. By 5,000, the architecture has to be re-platformed. - Per-tenant database with shared application. Solves isolation but creates operational complexity at scale (1,000+ database backups, schema migrations across the fleet, cross-tenant analytics impossible without a separate warehouse).
- Naïve event ingestion. Per-event pricing assumptions break when school-system testing windows produce 100× normal event volumes for two weeks. Cost spikes that surprise the CFO.
The architecture that scales
The reference pattern we ship for K-12 ed-tech multi-tenant analytics:
- Schema-per-district in a shared PostgreSQL fleet. Use PostgreSQL schemas (not separate databases) to isolate per-district data within a single instance. ~100–500 districts per instance, scaled horizontally.
- Event ingestion in object storage. Raw events land in S3 (partitioned by district + date). Hot queries hit the OLTP layer; analytical queries hit a warehouse (Athena, Snowflake, BigQuery) over the S3 layer. Per-event pricing on the warehouse, not on the OLTP.
- Materialised views for the common dashboards. The 80% of queries (district-level reports, school-level reports, teacher dashboards) hit pre-computed aggregates. Refreshed on a schedule aligned to data freshness requirements (often hourly or per-event for some metrics).
- Per-district isolation enforced at the API layer. Application code never queries cross-tenant. Cross-tenant queries (for benchmarking, anonymised aggregates) go through a separate, audited path.
- Deletion-on-request as a first-class operation. Per-district deletion drops the district’s schema; per-student deletion has a defined workflow. The operational primitive exists, tested, with a documented runbook.
This architecture handles 5,000+ districts with predictable cost and operational burden.
The data model decisions that matter most
Three model decisions that compound across the lifetime of the system:
- Student-level identity model. Use a stable internal student ID, not the district’s SIS ID. SIS IDs change when districts switch SIS vendors; internal IDs persist. Map at ingestion.
- Event vs state separation. Raw events (logged actions) stay immutable in S3. Derived state (current grade, current enrolment) lives in OLTP and is rebuildable from events. State changes don’t lose history.
- Anonymisation at the analytical layer, not the ingestion layer. Raw data carries identifiers; analytical queries use anonymised views. Re-identification is technically possible but operationally controlled.
Performance at scale
For a 1,500-district deployment we’ve shipped recently, the steady-state performance envelope:
- Event ingestion: 50–200 events/sec sustained, 2,000+ events/sec during peak testing windows.
- Dashboard query latency: P95 under 500ms for the common dashboards (pre-computed). P95 under 5s for ad-hoc analytical queries (warehouse).
- Per-district data export: 30 minutes to 4 hours depending on district size, executed asynchronously with notification.
- Per-district data deletion: 5–30 minutes, with verification log.
These targets are achievable with the architecture above. They are not achievable with naïve single-table multi-tenancy at this scale.
What we ship
For ed-tech clients in K-12, the typical engagement is a 14–18 week Build or Upstream re-platforming. Deliverables include the data architecture, the ingestion pipelines, the OLTP and warehouse layers, the API isolation enforcement, the materialised view library, and the operational runbooks for the per-district lifecycle operations (provisioning, deletion, export).
Compliance scaffolding (consent capture, audit logging, DSR workflows) is included in the same engagement. We don’t ship K-12 ed-tech without it.
Read more: /sectors/education · /case-studies/edtech-multi-tenant-analytics · /build/
Run the matching free calculator
Each one runs in 3 minutes and emails you an 8-page memo.