ai-native system design (foundations)
design perplexity on a whiteboard. inference internals, retrieval, evals, reliability, and cost engineering. first principles, paper-backed.
26 lessons|7 modules|~14 hours
what you’ll learn
- design end-to-end ai systems like perplexity, copilot autocomplete, and enterprise search
- reason about inference internals, kv cache, batching, and where latency actually lives
- build retrieval pipelines that survive multi-tenant production traffic and tight acl constraints
- instrument llm systems with the eval discipline that separates senior engineers from prompt-tinkerers
- cut a $50k/month feature to $5k without losing quality, and defend every lever
curriculum
planning sketchthis is a rough curriculum we’re still planning. modules and lessons are likely to shift before any lesson is recorded. want to shape it? mail@karnstack.com.
01
module one
the mental shift
~60 min2 lessons01why classic system design assumptions breakcoming soon30m
02cost, latency, quality as a first-class trianglecoming soon30m
02
module two
the llm as architectural component
~125 min4 lessons03inference internals: tokens, kv cache, prefill vs decode, batchingcoming soon35m
04serving stacks: self-host vs hosted vs hybridcoming soon30m
05model routing and the model portfoliocoming soon30m
06cost, latency, quality budgets as design artifactscoming soon30m
03
module three
memory and retrieval
~160 min5 lessons07when to rag, when not tocoming soon30m
08the retrieval pipeline: chunking, embeddings, hybrid, rerankcoming soon35m
09vector dbs in productioncoming soon30m
10freshness, write paths, acls, multi-tenancycoming soon35m
11retrieval evals: recall@k, mrr, faithfulnesscoming soon30m
04
module four
eval foundations
~90 min3 lessons12eval-driven development and error analysiscoming soon30m
13offline evals: golden sets, regression suites, scoringcoming soon30m
14llm-as-judge: bias, position effects, calibrationcoming soon30m
05
module five
reliability, safety, observability
~125 min4 lessons15prompt injection: direct, indirect, exfiltrationcoming soon35m
16output validation and structured generationcoming soon30m
17failure modes and fallbackscoming soon30m
18observability for non-deterministic systemscoming soon30m
06
module six
performance and cost engineering
~130 min4 lessons19caching at every layercoming soon35m
20streaming, speculative decoding, partial resultscoming soon35m
21routing and cascadescoming soon30m
22gpu autoscaling, batching, queueing for slow jobscoming soon30m
07
module seven
foundations case studies
~140 min4 lessons23perplexity: designing an answer enginecoming soon35m
24github copilot autocomplete: low-latency completioncoming soon35m
25llm gateway: litellm, portkey, cloudflare ai gatewaycoming soon35m
26notion ai and glean: workspace ragcoming soon35m
frequently asked
- when does this launch?
- in planning. the curriculum on this page is a sketch. modules and lessons are likely to shift before any lesson is recorded. every plan includes every course as it ships.
- do i need to have done the distributed-systems course?
- no, but a senior backend background helps. this course assumes you've shipped real services and understand distributed systems, databases, queues, and observability at a working level.