
EduScale
A premium, all-in-one engineering learning platform featuring personalized roadmaps, real-time coding battles, and comprehensive placement preparation.
Performance & Impact
< 200ms
Sync Latency
Real-time state synchronization via WebSocket
Sub-second
Execution Engine
Remote code execution & test validation
99.9%
Data Integrity
Atomic transactions & state persistence
The Problem
Engineering education is often disjointed, with students moving between static roadmaps, isolated coding editors, and scattered community forums. This lack of integration leads to poor progress tracking and a higher dropout rate during self-paced learning.
The Solution
A unified Engineering Learning Platform (SaaS) that seamlessly integrates structured curriculum with interactive coding tools and real-time social competition. EduScale provides a 'single source of truth' for the student's entire technical journey.
System Architecture
A production-grade EdTech platform built around a distributed real-time engine. The backend uses @socket.io/redis-adapter for horizontal Socket.io scaling across multiple Node.js instances, redlock (Redlock algorithm) for distributed locking to prevent race conditions in battle state writes, opossum for circuit-breaker protection on external services, prom-client exposing a Prometheus /metrics endpoint, and Bull queues for reliable background job processing. Frontend is Next.js 15 App Router with Redux Toolkit.
Core Engineering Achievements:
System Architecture
Frontend
Backend
Infrastructure
"Distributed real-time architecture: Socket.io rooms backed by Redis cluster adapter for multi-instance scaling. Redlock prevents duplicate battle-start race conditions. Circuit breaker wraps the code-execution service. Prometheus metrics on /metrics for observability."
Visual Showcase
A high-fidelity walkthrough of the core interfaces and user experiences, designed with modern aesthetics.
Unified User Dashboard
A centralized hub tracking enrolled roadmaps, ongoing battle states, and overall technical progress. Completely unified between Light and Dark modes.

Interactive Career Roadmaps
Node-based curriculum visualization allowing students to track granular progress and unlock specialized technical tracks.

Technical Assessment Suite
A specialized multi-language execution environment providing integrated testing, static analysis, and time complexity benchmarking.

Real-time Battle Zone
A competitive arena powered by WebSockets, allowing sub-second real-time multiplayer coding showdowns with live leaderboards.

The Engineering Challenge
The hardest problem was preventing race conditions when two users simultaneously start the same battle. The fix: redlock (Redlock algorithm over Redis) acquires a distributed lock before any battle state write, preventing duplicate battle creation. Socket.io horizontal scaling uses @socket.io/redis-adapter so any Node.js instance can broadcast to any room. opossum circuit breaker wraps the remote code-execution service — when it trips, battles degrade gracefully instead of hanging. prom-client exposes active-battle count, queue depth, and p99 latency on /metrics.
Next.js 15
Framework
PostgreSQL
Engine
High
Complexity
User Journey
Discovery
Users browse high-quality career roadmaps tailored for engineering roles.
Structured Learning
Personalized progress tracking through modules and coding tasks.
Competitive Practice
Real-time 1v1 or group coding battles to test skills under pressure.
Interview Ready
Mock assessments and AI-driven feedback for placement preparation.
Incidents & fixes
Real bugs, real fixes
The case-study sections above cover what the system does well. These are the times it didn't — what broke, what I got wrong on the first hypothesis, and how we confirmed the fix held.
Duplicate battles on tournament Saturday
Symptom
First public tournament. Within the first couple of hours we had a handful of battles where two rooms got created for the same pairing — both players saw a 'Battle starting' modal, joined two parallel games, and wondered which one was real.
First hypothesis (and where it went wrong)
My first guess was a WebSocket reconnect race on shaky mobile networks. I spent an hour on that theory before realizing reconnects were behaving correctly. The real culprit showed up in the handler-duration histogram: our start-battle path had a long tail around 2.3–2.6s when the Postgres replica was under load, and the Redlock TTL was 2 seconds. Lock was expiring mid-handler, second request would grab its own lock, two rooms.
Fix
Bumped the TTL to 8 seconds and added Redlock auto-renewal — a 500ms heartbeat that calls `lock.extend()` while the handler runs. Also added a `handler_duration_seconds` Prometheus histogram so we'd notice next time a handler started creeping toward the lock TTL. The code change was small, but the habit of 'always measure before you pick a TTL' was the real takeaway.
Confirmed by
The duplicate-battle counter stayed at 0 through the next tournament. The new histogram also surfaced two slow paths (a missing join index and a chatty N+1 in the ranking query) — neither had been loud enough to matter on their own, but both were already halfway to the old TTL.
Split rooms across Node instances
Symptom
Stress test on a 3-instance deploy. Roughly one in fifteen battles, both players could type but neither saw the other's cursor. Every packet was acked, nothing in the logs, just… silence across the wire.
First hypothesis (and where it went wrong)
I was sure it was a room-name collision or a sticky-session issue at the load balancer. Neither held up. The giveaway was tailing the Redis MONITOR during a failed battle — the pub channel got the broadcast but the sub channel on the other instance wasn't receiving it.
Fix
@socket.io/redis-adapter needs two independent ioredis clients — one for `pub`, one for `sub`. I'd wired both to the same connection, assuming Redis pipelining would handle it. Under load, a long-running write would block the sub from draining. Split into two clients (one-line diff) and it was done. Embarrassing, but exactly what the Socket.io docs say on page one.
Confirmed by
Zero split-room reports across the next ~40 tournaments. Added a pub/sub latency gauge (`redis_pubsub_roundtrip_ms`) so a regression would be visible on the dashboard instead of in a user DM.
Interested in the full engineering breakdown?
I'm always open to discussing technical implementations, from state management strategies to infrastructure scaling.