ai engineer pro
specialist ai systems where the agent layer alone isn't enough. voice agents, multimodal, long-context vs rag, computer use, and on-device inference.
5 lessons|2 modules|~3 hours
what you’ll learn
- ship a production voice agent with a real latency budget
- design multimodal pipelines that handle screenshots, charts, and image-aware support flows
- reason about long-context vs rag and pick the right tool without product-chrome arguments
- understand computer-use agents and on-device inference tradeoffs at the architectural level
curriculum
planning sketchthis is a rough curriculum we’re still planning. modules and lessons are likely to shift before any lesson is recorded. want to shape it? mail@karnstack.com.
01
module one
specialist case studies
~65 min2 lessons01voice agent: livekit and pipecat referencecoming soon35m
02multimodal pipeline: image and textcoming soon30m
02
module two
open problems
~90 min3 lessons03long context vs rag vs hybrid retrievalcoming soon30m
04computer use and browser agentscoming soon30m
05on-device and edge inferencecoming soon30m
frequently asked
- when does this launch?
- in planning, sequenced after production-agents. the curriculum on this page is a sketch. modules and lessons are likely to shift before any lesson is recorded.
- how is this different from production-agents?
- course 2 covers the agent layer end-to-end (loops, tools, memory, runtime, sandboxing). course 3 covers specialist systems where that layer alone isn't enough: voice latency budgets, multimodal pipelines, computer use, and on-device tradeoffs.