coming in planning

ai engineer pro

specialist ai systems where the agent layer alone isn't enough. voice agents, multimodal, long-context vs rag, computer use, and on-device inference.

5 lessons|2 modules|~3 hours

what you’ll learn

ship a production voice agent with a real latency budget
design multimodal pipelines that handle screenshots, charts, and image-aware support flows
reason about long-context vs rag and pick the right tool without product-chrome arguments
understand computer-use agents and on-device inference tradeoffs at the architectural level

curriculum

planning sketch

this is a rough curriculum we’re still planning. modules and lessons are likely to shift before any lesson is recorded. want to shape it? mail@karnstack.com.

module one

specialist case studies

~65 min2 lessons

01voice agent: livekit and pipecat referencecoming soon35m

02multimodal pipeline: image and textcoming soon30m

module two

open problems

~90 min3 lessons

03long context vs rag vs hybrid retrievalcoming soon30m

04computer use and browser agentscoming soon30m

05on-device and edge inferencecoming soon30m

frequently asked

when does this launch?: in planning, sequenced after production-agents. the curriculum on this page is a sketch. modules and lessons are likely to shift before any lesson is recorded.
how is this different from production-agents?: course 2 covers the agent layer end-to-end (loops, tools, memory, runtime, sandboxing). course 3 covers specialist systems where that layer alone isn't enough: voice latency budgets, multimodal pipelines, computer use, and on-device tradeoffs.

how this course is made

the curriculum is curated by karnstack and reviewed by senior engineers in the industry before it ships. narration is an ai voice (elevenlabs) reading human-written, human-reviewed scripts. read how courses are made.