coming in planning

ai-native system design (foundations)

design perplexity on a whiteboard. inference internals, retrieval, evals, reliability, and cost engineering. first principles, paper-backed.

26 lessons|7 modules|~14 hours

what you’ll learn

design end-to-end ai systems like perplexity, copilot autocomplete, and enterprise search
reason about inference internals, kv cache, batching, and where latency actually lives
build retrieval pipelines that survive multi-tenant production traffic and tight acl constraints
instrument llm systems with the eval discipline that separates senior engineers from prompt-tinkerers
cut a $50k/month feature to $5k without losing quality, and defend every lever

curriculum

planning sketch

this is a rough curriculum we’re still planning. modules and lessons are likely to shift before any lesson is recorded. want to shape it? mail@karnstack.com.

module one

the mental shift

~60 min2 lessons

01why classic system design assumptions breakcoming soon30m

02cost, latency, quality as a first-class trianglecoming soon30m

module two

the llm as architectural component

~125 min4 lessons

03inference internals: tokens, kv cache, prefill vs decode, batchingcoming soon35m

04serving stacks: self-host vs hosted vs hybridcoming soon30m

05model routing and the model portfoliocoming soon30m

06cost, latency, quality budgets as design artifactscoming soon30m

module three

memory and retrieval

~160 min5 lessons

07when to rag, when not tocoming soon30m

08the retrieval pipeline: chunking, embeddings, hybrid, rerankcoming soon35m

09vector dbs in productioncoming soon30m

10freshness, write paths, acls, multi-tenancycoming soon35m

11retrieval evals: recall@k, mrr, faithfulnesscoming soon30m

module four

eval foundations

~90 min3 lessons

12eval-driven development and error analysiscoming soon30m

13offline evals: golden sets, regression suites, scoringcoming soon30m

14llm-as-judge: bias, position effects, calibrationcoming soon30m

module five

reliability, safety, observability

~125 min4 lessons

15prompt injection: direct, indirect, exfiltrationcoming soon35m

16output validation and structured generationcoming soon30m

17failure modes and fallbackscoming soon30m

18observability for non-deterministic systemscoming soon30m

module six

performance and cost engineering

~130 min4 lessons

19caching at every layercoming soon35m

20streaming, speculative decoding, partial resultscoming soon35m

21routing and cascadescoming soon30m

22gpu autoscaling, batching, queueing for slow jobscoming soon30m

module seven

foundations case studies

~140 min4 lessons

23perplexity: designing an answer enginecoming soon35m

24github copilot autocomplete: low-latency completioncoming soon35m

25llm gateway: litellm, portkey, cloudflare ai gatewaycoming soon35m

26notion ai and glean: workspace ragcoming soon35m

frequently asked

when does this launch?: in planning. the curriculum on this page is a sketch. modules and lessons are likely to shift before any lesson is recorded. all access includes every course as it ships.
do i need to have done the distributed-systems course?: no, but a senior backend background helps. this course assumes you've shipped real services and understand distributed systems, databases, queues, and observability at a working level.

how this course is made

the curriculum is curated by karnstack and reviewed by senior engineers in the industry before it ships. narration is an ai voice (elevenlabs) reading human-written, human-reviewed scripts. read how courses are made.