coming in planning

ai-native system design (foundations)

design perplexity on a whiteboard. inference internals, retrieval, evals, reliability, and cost engineering. first principles, paper-backed.

26 lessons|7 modules|~14 hours

what you’ll learn

  • design end-to-end ai systems like perplexity, copilot autocomplete, and enterprise search
  • reason about inference internals, kv cache, batching, and where latency actually lives
  • build retrieval pipelines that survive multi-tenant production traffic and tight acl constraints
  • instrument llm systems with the eval discipline that separates senior engineers from prompt-tinkerers
  • cut a $50k/month feature to $5k without losing quality, and defend every lever

curriculum

planning sketch

this is a rough curriculum we’re still planning. modules and lessons are likely to shift before any lesson is recorded. want to shape it? mail@karnstack.com.

01
module one

the mental shift

~60 min2 lessons
01why classic system design assumptions breakcoming soon30m
02cost, latency, quality as a first-class trianglecoming soon30m
02
module two

the llm as architectural component

~125 min4 lessons
03inference internals: tokens, kv cache, prefill vs decode, batchingcoming soon35m
04serving stacks: self-host vs hosted vs hybridcoming soon30m
05model routing and the model portfoliocoming soon30m
06cost, latency, quality budgets as design artifactscoming soon30m
03
module three

memory and retrieval

~160 min5 lessons
07when to rag, when not tocoming soon30m
08the retrieval pipeline: chunking, embeddings, hybrid, rerankcoming soon35m
09vector dbs in productioncoming soon30m
10freshness, write paths, acls, multi-tenancycoming soon35m
11retrieval evals: recall@k, mrr, faithfulnesscoming soon30m
04
module four

eval foundations

~90 min3 lessons
12eval-driven development and error analysiscoming soon30m
13offline evals: golden sets, regression suites, scoringcoming soon30m
14llm-as-judge: bias, position effects, calibrationcoming soon30m
05
module five

reliability, safety, observability

~125 min4 lessons
15prompt injection: direct, indirect, exfiltrationcoming soon35m
16output validation and structured generationcoming soon30m
17failure modes and fallbackscoming soon30m
18observability for non-deterministic systemscoming soon30m
06
module six

performance and cost engineering

~130 min4 lessons
19caching at every layercoming soon35m
20streaming, speculative decoding, partial resultscoming soon35m
21routing and cascadescoming soon30m
22gpu autoscaling, batching, queueing for slow jobscoming soon30m
07
module seven

foundations case studies

~140 min4 lessons
23perplexity: designing an answer enginecoming soon35m
24github copilot autocomplete: low-latency completioncoming soon35m
25llm gateway: litellm, portkey, cloudflare ai gatewaycoming soon35m
26notion ai and glean: workspace ragcoming soon35m

frequently asked

when does this launch?
in planning. the curriculum on this page is a sketch. modules and lessons are likely to shift before any lesson is recorded. every plan includes every course as it ships.
do i need to have done the distributed-systems course?
no, but a senior backend background helps. this course assumes you've shipped real services and understand distributed systems, databases, queues, and observability at a working level.

Command Palette

Search for a command to run...