Part 05 of 10

The Scheduler Was Broken All Along

930 polls per second on the UI thread. Nobody noticed for years.

Engineers·12 min read

For engineers. This is the root cause of the crash from Part 1.

With the god object tamed and the codebase split into layers, I could finally isolate the scheduler — the component that fetches character data from CCP's ESI API. This is the thing that crashed the app at sixty characters. I needed to understand exactly how it worked before I could fix it.

I built a diagnostic tool. 221 lines of code: a TCP stream on port 5555 that broadcasts structured JSON events in real-time. Developer-only, never shipped. I turned it on, added characters, and watched.

Within sixty seconds, it was clear: the scheduler wasn't just buggy. It was architecturally broken.

930 Polls Per Second

Here's what the old scheduler did:

A timer fires every one second on the UI thread
It broadcasts to 35 subscribers
Each of 31 data monitors per character checks "is it time to fetch yet?"
If yes, it fires an HTTP request that blocks the calling thread until the UI processes the response

With thirty characters: 930 monitors, each running every second. Plus another 30 data orchestrators doing the same thing. That's 960 method calls per second on the UI thread — the same thread responsible for drawing the screen.

96% of those calls returned "not yet" and did nothing useful. But they still consumed CPU time on the thread that needed to be free to keep the application responsive.

With sixty characters: 1,860 calls per second. The UI thread couldn't keep up. The screen froze. The app crashed.

The Thundering Herd

On startup, every monitor was set to "force update." On the first timer tick, approximately 240 HTTP requests fired simultaneously. But the connection limit was ten. The remaining 230 queued up. Each response blocked a thread waiting for the UI to process it. The UI was busy running the next round of 960 checks. The system deadlocked.

This is why the crash happened during startup. Not after the characters loaded — during the initial fetch.

The New Scheduler

The replacement is 973 lines across seven files. The core idea: instead of checking every character every second, maintain a priority queue sorted by when each piece of data actually expires. Sleep until the next job is due. Process it. Sleep again.

Zero wasted polling.

How it works: The UI thread sends commands into a thread-safe queue — "register this character," "this tab is now visible," "force refresh." The scheduler runs on its own background thread, processes commands, and maintains a priority queue. When a job is due, it fetches the data, processes the response, and schedules the next fetch based on the cache expiry header CCP returns.

Concurrency: Twenty simultaneous HTTP connections, gated by a semaphore. Enough to be fast, not enough to trigger rate limits.

Per-character rate limiting: Each character has a token bucket — 150 tokens per fifteen-minute window with a 10% safety margin. When a character's budget is spent, its jobs are deferred until the budget refills. One character's rate limit never affects another.

Startup phasing: Instead of the thundering herd, characters are phased in over several seconds:

Phase	What fetches	When
1	Skills and queue for the visible character	Immediately
2	Implants and attributes for all characters	Staggered over 2 seconds
3	Market orders, contracts, industry, mail	Staggered over 3 seconds
4	Everything else	Staggered over 5 seconds

For a hundred characters, the full startup takes about six seconds instead of trying to fire 3,100 requests in the first tick.

Tab awareness: When you switch to a different character's tab, the scheduler promotes that character's jobs to high priority and demotes the previous one. The character you're looking at always gets fresh data first.

Auth isolation: If one character's ESI token expires, only that character is affected. No other characters pause. When the user re-authenticates, that character's jobs resume automatically.

The Diagnostic Stream

FIG 5.3

Real-time TCP stream on port 5555 — 221 lines that caught bugs the codebase carried for years. Debug builds only.

The Diagnostic Stream

221 lines of TCP JSON streaming that paid for itself on day one. Every ESI call, every scheduler decision, every rate limit — all visible in real-time on port 5555. Debug builds only — never included in any release.

The tool that revealed all of this — TcpJsonLoggerProvider — stayed in the codebase as a developer tool. Connect from any terminal and watch the scheduler in real-time:

{"ts":"...","tag":"FETCH","msg":"char47/Skills → 200 OK (180ms), tokens=142/150"}
{"ts":"...","tag":"FETCH","msg":"char12/Assets → 304 Not Modified (23ms)"}
{"ts":"...","tag":"WARN","msg":"char12 budget exhausted — deferring 3 jobs"}

Every ESI request, every cache hit, every rate limit decision — visible in real time. The tool that cost 221 lines to build caught bugs the original codebase carried for years.

The Crash Is Fixed

After the scheduler rewrite, I loaded a hundred characters. No crash. No freeze. No thundering herd. The UI stayed responsive throughout startup. Characters loaded in phases, data streamed in over a few seconds, and the priority queue kept everything orderly.

The problem that started this entire journey — a crash at sixty characters — was solved. But by the time I got here, the architecture was clean enough to do something nobody had done with EVEMon in twenty years: build a new UI.

Previous: Part 4 — Surgery on a Beating Heart

Next: Part 6 — 101,000 Lines and Zero ViewModels

930 Polls Per Second

The Thundering Herd

More Problems

The New Scheduler

The Diagnostic Stream

The Crash Is Fixed