18px
System Design·April 18, 2026·15 min read

Behind the Scenes: Building a Live Cricket Streaming Platform That Doesn't Crash at Toss Time

A deep dive into the control plane behind a live cricket streaming platform: ladder-based pre-scaling, admission control, heartbeats, graceful degradation, and the infrastructure choices that keep the match running at toss-time scale.

How do you serve 15 million concurrent viewers when Virat Kohli walks to the crease and not drop a single frame?

This article is about the control plane behind that experience, not the video encoding pipeline itself. It is the invisible backend that decides who gets to watch, how many can watch, how sessions stay alive, and what happens when demand outruns safe capacity.

The Problem Nobody Talks About

Architecture Overview

Layer 1: The Domain

A simplified Go model looks like this:

gotype Match struct {
    ID        string
    Status    MatchStatus
    StartTime time.Time
    Rungs     []LadderRung
}

type LadderRung struct {
    StartTime            time.Time
    TargetFleetSize      int
    ActiveSessionCeiling int
    DBPoolTarget         int
    RedisPoolTarget      int
    DegradeThreshold     int
}

Each rung says: at this point in the match, expect this much demand, scale to this fleet size, allow this many active sessions, and enter degrade mode once a lower threshold is crossed.

  • T-30 min: pre-match buildup, warm the fleet and raise the session ceiling.

  • Toss: jump the floor aggressively before users flood in.

  • First ball: scale further because this is where the real wave often arrives.

  • Innings break: temporarily reduce the floor.

  • Second innings / final overs: raise limits again for the next surge.

The key idea is simple: for live sports, reacting to CPU after the spike starts is already late. The platform should scale from the schedule, not from panic.

The Playback Session

Layer 2: The Admission Pipeline

Gate 1: Degradation Check

Gate 2: Match Active Validation

Gate 3: Entitlement Check

Gate 4: Capacity Admission

Gate 5: Session Creation

Layer 3: The License Renewal Loop

The device limit is enforced through device leases with a LastRenewed timestamp. Stale leases are cleaned up lazily during reads:

gofor id, lease := range devices {
    if now.Sub(lease.LastRenewed) <= 2*time.Minute {
        count++
    } else {
        delete(devices, id)
    }
}

This avoids a separate cleanup daemon and still converges the system toward truth. If a user moves from phone to TV, the old device naturally expires after it stops renewing.

Layer 4: Graceful Degradation

Core-protect mode can be triggered in three ways:

  1. Automatically when active sessions cross the degrade threshold.
  2. Manually through an admin API when operators see trouble.
  3. From infrastructure through CloudWatch alarms that detect stressed dependencies such as Redis CPU saturation.

When this mode is active, the platform should drop non-essential work such as overlays, recommendation payloads, thumbnails, and extra analytics while keeping playback and license renewals sacred.

Layer 5: Infrastructure Choices

  • Aurora PostgreSQL: read replicas, fast failover, auto-scaling storage, Multi-AZ durability.

  • Redis Cluster: sharded hot keys, read replicas, automatic failover, low-latency session access.

  • ECS Fargate: no node management, fast scale-out, and good economics for event-driven traffic patterns.

The fleet is pre-scaled using scheduled actions tied to the ladder instead of waiting for CPU-based reactive scaling.

hclresource "aws_appautoscaling_scheduled_action" "ladder_rung_1" {
  name     = "ladder-rung-1-toss"
  schedule = "cron(0 14 * * ? *)"

  scalable_target_action {
    min_capacity = 20
    max_capacity = 100
  }
}

The important idea is not the exact numbers. It is that the system raises the floor before the toss, not after containers are already overloaded.

Network Topology

Layer 6: The Spike Simulator

A representative implementation looks like this:

gofunc (s *Service) SimulateSpike(ctx context.Context, params SpikeParams) {
    go func() {
        throttle := make(chan struct{}, params.Concurrency)
        var wg sync.WaitGroup

        for i := 0; i < params.TotalUsers; i++ {
            throttle <- struct{}{}
            wg.Add(1)

            go func(idx int) {
                defer wg.Done()
                defer func() { <-throttle }()

                userID := fmt.Sprintf("user-sim-%d", idx)
                deviceID := uuid.NewString()

                _, _ = s.playbackSvc.Start(context.Background(), userID, params.MatchID, deviceID)
                time.Sleep(50 * time.Millisecond)
            }(i)
        }

        wg.Wait()
    }()
}

This validates the admission pipeline, the degrade threshold, and the accuracy of concurrent metrics under pressure.

Layer 7: Metrics

Why a Monolith

API Surface

Graceful Shutdown and Containerization

What This Architecture Gets Right

What Production Evolution Would Add

Closing Thought

Filed under fieldnotesApril 18, 2026