Applications open until June 22

Cohort 3

The third cohort opens a free, merit-based path for people with strong technical ability, initiative, and real availability. Applications are open until June 22.

Apply to Cohort 3

Invitation from Leonardo Gonzalez

A direct invitation for technical talent in Latin America that wants to study modern AI through papers, code, evaluation, and engineering judgment.

Period
July 1 to September 30, 2026
Commitment
8 to 12 hours per week, several months
Focus
July-September 2026, specialized curriculum

Work focus

  • Bridge scientific research and applied AI engineering.
  • Continue using Agentic AI MOOC material where it adds value.
  • Build a specialized original path for frontier technical problems.

Activities

  • Industry updates and scientific papers explained in depth.
  • Applied exercises that ground theory in practice.
  • Cohort projects with high self-management.
  • Focused mentorship for people who demonstrate initiative.

Achievements and outcomes

  • Technical portfolio with implementations and documented decisions.
  • Judgment to read, adapt, and evaluate recent papers.
  • Preparation for high-level projects and opportunities.

Technical path

Cohort 3 Curriculum

Cohort 3 goes deep into the technical foundations behind modern AI systems: Transformers, modern attention mechanisms, Mamba and hybrid architectures, diffusion models, pretraining, post-training, SFT, DPO, RL, inference, providers, tool use, agent harnesses, and work orchestration.

The technical path moves from model fundamentals into system construction: tokenization, embeddings, attention, generative architectures, training, post-training, inference behavior, tool use, evaluation, benchmark reproduction, agent harnesses, and work orchestration with systems such as Symphony and OpenSymphony.

The cohort includes leveling material from critical Cohort 2 topics and a core path of reading, implementation, and evaluation. The focus is to read papers with engineering judgment: understand their mechanisms, reproduce their benchmarks where possible, implement ideas in code, and turn technical evidence into design decisions.

Evaluation will be approached through understanding and reproducing paper benchmarks: what each benchmark measures, what claim it supports, which ablations matter, how reproducible it is, what biases or contamination it may contain, and how to adapt that discipline to our own systems.

Each Fellow or team will develop a final artifact: a benchmark reproduction, an architecture or inference experiment, an agent harness, a work orchestration specification, or an applied system with rigorous evaluation.

Weekly plan

Calendar

Weekly plan for Cohort 3, from technical reading method to the final capstone.

Week 1

Papers method + technical baseline

Core question: How do we read papers as engineers?

Topics: Claims, mechanisms, assumptions, ablations, limitations, real implementation, evaluation, reproducibility, and production implications.

Deliverable: Paper Card v1: one-page structured summary with claim, mechanism, evidence, benchmark, implementation path, risks, and open questions.

Week 2

Transformer and attention fundamentals

Core question: What makes modern LLMs work mechanically?

Topics: Tokenization, embeddings, decoder-only architecture, self-attention, Q/K/V, masking, RoPE, MHA, MQA, GQA, context length, KV cache, prefill vs decode, memory, latency, throughput, and inference cost.

Deliverable: Minimal Transformer attention notebook to make attention, masking, RoPE, MHA/GQA intuition, and KV cache growth concrete.

Week 3

Modern sequence and attention architectures

Core question: Are Transformers the only viable base, or a dominant architecture inside a broader space?

Topics: Attention scaling limits, MLA, sliding-window attention, periodic global attention, sparse attention, linear attention, DeltaNet/Kimi Delta-style attention, state-space models, Mamba, hybrid architectures, FlashAttention, PagedAttention, prefix caching, and training/inference tradeoffs.

Deliverable: Architecture and inference comparison memo: compare standard Transformer, GQA/MLA, sliding/sparse attention, linear attention, and Mamba/hybrids.

Week 4

Diffusion models and generative paradigms beyond LLMs

Core question: What do diffusion models teach that an LLM-only engineer might miss?

Topics: Denoising diffusion, score matching intuition, latent diffusion, diffusion transformers, text-to-image/video/audio systems, conditioning, guidance, and sampling.

Deliverable: Toy diffusion or guided generation walkthrough for conceptual and engineering literacy.

Week 5

Pretraining

Core question: What does it mean to create a foundation model?

Topics: Data mixtures, token budgets, compute scaling, objective functions, data contamination, synthetic data, curriculum, filtering, and base model behavior before instruction tuning.

Deliverable: Pretraining plan critique: given a hypothetical model budget, propose data, objective, evaluations, and priority risks.

Week 6

Post-training: SFT, DPO, RLHF, RL, and verifiable rewards

Core question: How do base models become useful assistants, agents, and reasoning systems?

Topics: Supervised fine-tuning, preference data, reward models, DPO, preference optimization, RLHF/RLAIF, verifiable rewards, reasoning-oriented post-training, and safety/behavior effects.

Deliverable: Post-training decision memo: for a target behavior, choose between SFT, DPO, RL, prompting, or tooling.

Week 7

Inference, providers, and tools

Core question: What changes when a model becomes a service?

Topics: Decoding, temperature, top-p, structured outputs, function/tool calling, streaming, latency, throughput, batching, quantization, speculative decoding, context management, provider APIs, hosted vs open-weight/local models, cost, reliability, data policies, and observability.

Deliverable: Provider and inference matrix: capabilities, restrictions, costs, latency, tool support, structured output reliability, context behavior, and operational risks.

Week 8

Paper evaluation and benchmark reproduction I

Core question: How do papers prove their claims?

Topics: Datasets, splits, metrics, baselines, ablations, human evaluation, LLM-as-judge, contamination, variance, leaderboard gaming, and reproducibility gaps.

Deliverable: Benchmark Card: which benchmark a paper uses, what it measures, what claim it supports, what limits the measurement, and what would be needed to reproduce it.

Week 9

Paper evaluation and benchmark reproduction II

Core question: Can we implement a meaningful part of a paper's evaluation?

Topics: Benchmark subset reproduction, minimal eval harness creation, multi-run measurement, prompt sensitivity, model/provider sensitivity, error taxonomy, and honest uncertainty reporting.

Deliverable: Mini benchmark reproduction: small repo or notebook that reproduces a meaningful part of a paper's evaluation.

Week 10

Agent harnesses

Core question: What is an agent when we analyze it as an engineering system?

Topics: Model loop, tool interface, state, memory, planning, environment, traces, retries, human approval, guardrails, sandboxing, failure modes, and the distinction between harness, model, and application.

Deliverable: Minimal agent harness with tool calls, traces, failure handling, and an evaluation task.

Week 11

Work orchestration: Symphony, OpenSymphony, and agentic engineering

Core question: What comes after prompting and individual agents?

Topics: Work item as control surface, issue tracker as state machine, isolated workspaces, durable agent runs, review and recovery, policy files, repo-specific instructions, CI, proof of work, and human supervision at workflow level.

Deliverable: Mini orchestration specification for the capstone: issues, states, workspaces, policies, review, retries, and recovery.

Week 12

Capstone build sprint

Core question: Can we build something backed by papers and measured rigorously?

Topics: Possible tracks: model fundamentals, post-training, benchmark reproduction, agent harnesses, work orchestration, AI for science/research.

Deliverable: Functional artifact + evaluation plan: prototype, notebook, repo, harness, benchmark reproduction, or applied system with a defined evaluation.

Week 13

Demo, writeup, and cohort memory

Core question: What remains as infrastructure for Cohort 4?

Topics: Closing criterion: technical depth to understand the paper, mechanism, evidence, implementation gap, and engineering implications.

Deliverable: Demo, repo or notebook, Benchmark Card, eval report, failure analysis, decision memo, and note on what future cohorts should know.