AI Video Generation Architect
Start Date: ASAP
Role Type: Full-Time, Salaried
Background: Software development
Location: Remote, Flexible (USA based)
Salary: $170,000-$200,000 per year, plus benefits
Who We Are:
The Modern Classrooms Project (MCP) is a 501(c)(3) nonprofit organization that empowers educators to build classrooms that respond to every student’s needs. Founded by two award-winning teachers, we lead a movement of educators in implementing a self-paced, mastery-based instructional model that leverages technology to foster human connection, authentic learning, and social-emotional growth.
To date, we have reached over 100,000+ teachers through our free online course in 150+ countries. We are an ambitious, idealistic team led by former classroom teachers, and we are passionate about what we do.
Job Description - Why we need you!
Effective instructional videos make high-quality instruction accessible to all learners, regardless of experience or background. Every day, in classrooms around the world, Modern Classroom educators replace live lectures with instructional videos so that students can learn at their own paces, in school and/or at home. Good videos enhance learning — and they are time consuming to produce. A single high-quality lesson video can take hours to plan, record, edit, and caption.
We need an experienced, hands-on, AI-native engineer to build a brand new, state-of-the-art generative pipeline that turns specifications into high-quality instructional videos — complete with animations, synchronized AI narration, captions, and automated ground-truth quality verification. You will own the video render path end to end, from the canonical specification to the final rendered output, creating intuitive, powerful tools that will directly support educators and students every day.
Key Responsibilities
As our AI Video Generation Architect, you will be a senior individual contributor on our Engineering Team, reporting to the Head of Engineering and collaborating closely with the Chief Innovation Officer to ship features that make a real difference for students and educators.
You’ll be joining a small and growing team of talented software engineers working together to solve the problems teachers and students face every day. We’re building a world where every student can succeed, and we need you to help us make that happen.
You will:
- Architect the video generation pipeline end to end. Design the gen-AI pipeline that transforms lesson specifications into storyboards, scene graphs, scripts, and production plans. Every stage emits deterministic lessons-as-code and structured intermediate artifacts — scene specs, asset manifests, timing maps — that can be inspected, versioned, cached, diffed, and selectively re-rendered.
- Ship multiple substantial features per week. This is a minimum velocity bar, not an exaggeration. You will leverage AI and agentic coding to build incredible software, very, very quickly.
- Build the multi-agent production workflow. Develop agentic orchestration (LangGraph or equivalent) in which an orchestrator delegates to specialist agents: pedagogy analyst, instructional scriptwriter, slide designer, animator, narrator, and a panel of graders, evaluators, and LLM judges, with structured outputs and human-in-the-loop labeling and fine-tuning.
- Engineer the video generation pipeline. Build brand-consistent, design-system-driven video generation from structured content: layout engines and templates, LaTeX/KaTeX mathematical typesetting, programmatic diagrams and charts, and text-faithful image generation for illustrations with automated readability checks. Design programmatic motion to support worked examples with narration: kinetic typography, transitions, animated number lines and area models. Run parallelized rendering with generative video models (e.g. Veo / Kling / Seedance). Narration with TTS (e.g. ElevenLabs v3 / Gemini-TTS) audio tags and SSML, pronunciation lexicons for mathematical vocabulary, consistent voice identities across a course, multi-voice dialogue, multilingual narration, and open license music embeds.
- Build the ground-truth quality system. Construct golden datasets of spec-to-video pairs annotated by educators. Implement rubric-based scoring with calibrated LLM- and VLM-as-judge evaluators: frame-level visual fidelity, verification of on-screen mathematics, A/V sync validation, pedagogical fidelity checks against the source spec's learning objectives, reading-level analysis, and K-12 content safety screens. Symbolically verify every worked example with a computable ground truth verification system — if the video teaches 3/4 + 1/8, a machine learning model should independently confirm the answer before any student sees it.
- Architect resilient, high-scale media infrastructure. Design and scale the distributed backend across Python and TypeScript that carries the pipeline: render queues and job orchestration, transcoding and streaming (HLS), and provenance-aware metadata for AI-generated media. Own the systems design and ensure our foundational architecture is ready to scale.
- Raise the bar for the team. Review the work of teammates and contractors. Collaborate with teammates on architecture and implementation reviews. Write PR comments, design docs, and agent skills that make the next person faster.
You should apply if:
- You are AI-native. You are an expert in continuous multi-session development with Claude Code and/or OpenAI Codex. You are an expert at prompt engineering and context engineering. You write Agent Skills the way other engineers write unit tests. You practice Spec-Driven Development (GitHub Spec Kit or equivalent) as part of your normal workflow.
- You have built real backend AI orchestration layers that run when you're not watching. You think in graphs — shared state flowing through nodes, conditional edges, interrupts, and circuit breakers. You have shipped non-trivial agentic pipelines using LangGraph, Python, and TypeScript, or equivalent. You treat durable execution, structured outputs, human-in-the-loop checkpoints, and provider-agnostic model routing as baseline design constraints. You have built evaluation harnesses, annotated datasets, and versioned prompt chains as first-class artifacts.
- You are a programmatic media craftsperson. You have deep experience with a programmatic animation framework (e.g. Manim, Remotion / Motion Canvas) and strong FFmpeg fundamentals: codecs, containers, color, audio streams, muxing. You understand TTS model trade-offs, expressive direction with audio tags and SSML, pronunciation lexicons, forced alignment and word-level timestamps, and loudness standards. You can hear when the pacing is wrong for a twelve-year-old learner, and you fix it in the pipeline, not the waveform.
- You treat quality as a measurable system. You build golden datasets and calibrated judges before you scale generation. You combine deterministic checks (schemas, layout constraints, symbolic math verification, A/V sync) with LLM- and VLM-as-judge evaluation validated against human labels. You catch the subtly wrong diagram, the mispronounced denominator, the worked example that's off by one — and the same eye applies to agent-generated code, which is plausible but not always right. You do not ship what you cannot measure.
- You are self-directed. You thrive in small, high-autonomy teams and startups where the surface area is broad and the context shifts constantly. You write clearly. You own a problem end-to-end without waiting for a ticket to tell you what to do next.
- You love to learn. You're actively leveraging the latest developments in AI and applying them to enhance both your own and others' work. You're also motivated by MCP's mission and vision, and eager to build teacher- and student-facing products.
- You want to shape the world. You're motivated to be part of something larger than yourself. You believe that the highest value of your talent is using it to empower others. You're ready to make a real difference in educators' and young people's lives.
It would also be helpful if:
- You have experience building edtech products.
- You have experience handling sensitive and/or confidential data, particularly in an education context (COPPA, CIPA, FERPA, PPRA, SOC 2).
Compensation and Benefits
We aim to offer a competitive compensation package, as well as the opportunity to work in a fast-growing nonprofit that is on a mission to improve education worldwide. This includes:
- Salaried position: $170,000-$200,000 gross salary per year
- Employer-sponsored health insurance through CareFirst BlueCross BlueShield
- Employer-sponsored dental and vision insurance through MetLife
- Participation in Vanguard 403(b) deferred-compensation plan with 3% employer match
- Paid Time Off, inclusive of: vacation/PTO (20 days), paid holidays, paid parental leave, sick and safe paid time off, "Me Days", and the ability to earn paid Comp time off
- Annual budget for MCP-funded Continuous Learning for the program(s) you request (available after 6 months of continuous full-time employment)
- FSA and Dependent Care FSA access
- 1x Salary Life Insurance company-paid coverage
- Access to Wishbone Pet Insurance Benefit
- Ability to work remotely and to set your own hours (within reason)
____________________________________________________________________________________________________________________
STATEMENT OF NON-DISCRIMINATION: The Modern Classrooms Project is committed to equal employment opportunity. We do not discriminate on the basis of race, color, gender, disability, age, religion, sexual orientation, nationality, or ethnicity. We are strongly committed to hiring a diverse team and encourage applications from traditionally under-represented backgrounds.