Project: AI Math Tutor with Code Interpreter
Dec 30, 2025 • 30 min read
LLMs are language models, not calculators. Ask GPT-4 to multiply 8,734 × 9,127 in its head and there's a non-trivial chance it hallucinates the answer — it's predicting what the answer should look like based on training patterns, not actually computing it. For a math tutor, hallucinated calculations are catastrophic. The solution is Code Interpreter: the LLM writes Python, executes it in a sandboxed environment, and reports the exact output. Every calculation becomes a verified result.
1. The "Orchestrate, Don't Compute" Pattern
❌ Standard LLM
Question: "What is the derivative of sin(x²)?"
LLM tries to do calculus in its "head" by predicting tokens → may get it right, may hallucinate. No way to verify without checking manually.
✅ Code Interpreter
LLM writes: sympy.diff(sin(x**2), x)
Executes in Python sandbox → stdout: 2*x*cos(x**2) → guaranteed correct. LLM then explains the steps.
2. Creating the Persistent Assistant
# STEP 1: Create the assistant ONCE — save the ID in your .env file
# Do NOT recreate the assistant on every request — it's a persistent object
# You pay for storage: ~$0.10/day for active assistants with File Search
from openai import OpenAI
import os
client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))
assistant = client.beta.assistants.create(
name="Professor Pi",
model="gpt-4o", # Best reasoning for complex math
instructions="""You are an expert math tutor named Professor Pi.
ALWAYS follow this workflow:
1. Understand what the student is asking
2. Write Python code to solve it precisely (use: sympy for algebra/calculus, numpy for numerics, matplotlib for graphs)
3. Execute the code and read the exact output
4. Explain the solution step-by-step in simple language
5. Output ALL math expressions in LaTeX format using $...$ for inline and $$...$$ for block equations
NEVER try to compute results mentally — always use code for calculations.
When plotting, save figures to /mnt/data/ so they can be retrieved.
Available Python libraries: sympy, numpy, matplotlib, scipy, pandas""",
tools=[{"type": "code_interpreter"}], # Enable Code Interpreter tool
# Optional: attach a PDF textbook as a knowledge file
# tool_resources={"code_interpreter": {"file_ids": ["file_abc123"]}},
)
print(f"✅ Assistant created: {assistant.id}")
print(f" Save this to your .env: MATH_ASSISTANT_ID={assistant.id}")
# Output: MATH_ASSISTANT_ID=asst_abc123xyz4563. The Thread and Run Lifecycle
# STEP 2: Each conversation session gets a Thread (persistent message history)
# Threads survive across API calls — perfect for tutoring sessions
import time
ASSISTANT_ID = os.getenv("MATH_ASSISTANT_ID")
def start_session() -> str:
"""Create a new tutoring session (Thread). Returns thread_id."""
thread = client.beta.threads.create()
return thread.id
def ask_math_question(thread_id: str, question: str) -> dict:
"""
Send a message and get a response.
Returns: {text: str, images: list[bytes], code_executed: list[str]}
"""
# Add user message to the thread
client.beta.threads.messages.create(
thread_id=thread_id,
role="user",
content=question
)
# Create a Run (triggers the assistant to process the thread)
run = client.beta.threads.runs.create(
thread_id=thread_id,
assistant_id=ASSISTANT_ID,
# Optional: override instructions for this specific run
# instructions="Focus on visual explanations with matplotlib graphs.",
max_prompt_tokens=16384, # Prevent runaway costs on long conversations
max_completion_tokens=4096,
)
# Poll until run completes (typically 3-15s for math problems)
while run.status in ('queued', 'in_progress', 'cancelling'):
run = client.beta.threads.runs.retrieve(thread_id=thread_id, run_id=run.id)
# Show intermediate "Run Steps" for better UX (show code being executed)
if run.status == 'in_progress':
steps = client.beta.threads.runs.steps.list(thread_id=thread_id, run_id=run.id)
for step in steps.data:
if step.type == 'tool_calls' and step.status == 'in_progress':
for tool_call in step.step_details.tool_calls:
print(f"Executing code:\n{tool_call.code_interpreter.input}")
if run.status not in ('completed', 'failed', 'expired'):
time.sleep(0.5)
if run.status != 'completed':
raise RuntimeError(f"Run failed: {run.status}. Error: {run.last_error}")
# Retrieve messages after the run
messages = client.beta.threads.messages.list(thread_id=thread_id, order='desc', limit=1)
last_message = messages.data[0]
result = {"text": "", "images": [], "code_executed": []}
for content_block in last_message.content:
if content_block.type == 'text':
result["text"] = content_block.text.value
elif content_block.type == 'image_file':
# Download the generated graph image
file_response = client.files.content(content_block.image_file.file_id)
result["images"].append(file_response.read())
# Also extract the code that was executed (for showing in UI)
run_steps = client.beta.threads.runs.steps.list(thread_id=thread_id, run_id=run.id)
for step in run_steps.data:
if step.type == 'tool_calls':
for tool_call in step.step_details.tool_calls:
if hasattr(tool_call, 'code_interpreter'):
result["code_executed"].append({
"input": tool_call.code_interpreter.input,
"output": tool_call.code_interpreter.outputs,
})
return result
# Example usage:
thread_id = start_session()
# Algebra
resp = ask_math_question(thread_id, "Find all roots of: 3x³ - 5x² + 2x - 8 = 0")
print(resp["text"]) # LaTeX-formatted steps + exact roots from SymPy
# Calculus
resp = ask_math_question(thread_id, "Find the integral of e^(x²) from -∞ to +∞")
print(resp["text"]) # Should recognize this is √π (Gaussian integral)
# Plotting
resp = ask_math_question(thread_id, "Plot the phase portrait of the Lorenz attractor")
if resp["images"]:
with open("lorenz.png", "wb") as f:
f.write(resp["images"][0]) # Save the generated matplotlib figure4. Frontend: LaTeX Rendering
npm install react-latex-next katex
// components/MathMessage.tsx
import 'katex/dist/katex.min.css';
import Latex from 'react-latex-next';
interface MathMessageProps {
text: string;
images?: string[]; // Base64 PNG data URLs
codeBlocks?: { input: string; output: string }[];
}
export function MathMessage({ text, images, codeBlocks }: MathMessageProps) {
return (
<div style={{ padding: '1rem' }}>
{/* Text with LaTeX rendering — handles $inline$ and $$block$$ */}
<div style={{ lineHeight: '1.8', fontSize: '1rem' }}>
<Latex>{text}</Latex>
</div>
{/* Show executed code blocks (educational — students see what Python ran) */}
{codeBlocks?.map((block, i) => (
<details key={i} style={{ marginTop: '1rem', cursor: 'pointer' }}>
<summary style={{ color: 'var(--text-secondary)', fontSize: '0.85rem' }}>
View Python code executed
</summary>
<pre style={{ background: '#000', padding: '1rem', borderRadius: '8px', fontSize: '0.8rem', color: '#0f0', marginTop: '0.5rem' }}>
{block.input}
</pre>
{block.output?.map((out, j) => out.type === 'logs' && (
<div key={j} style={{ fontFamily: 'monospace', fontSize: '0.8rem', color: '#22c55e', padding: '0.5rem', background: '#0a0a0a', borderRadius: '4px' }}>
Output: {out.logs}
</div>
))}
</details>
))}
{/* Rendered graphs */}
{images?.map((imgData, i) => (
<img
key={i}
src={`data:image/png;base64,${imgData}`}
alt="Generated math graph"
style={{ maxWidth: '100%', borderRadius: '8px', marginTop: '1rem' }}
/>
))}
</div>
);
}Frequently Asked Questions
How much does it cost per tutoring session with the Assistants API?
Costs are split three ways: (1) Thread storage: $0.10/GB/day — negligible for text. (2) Code Interpreter tool usage: $0.03 per Code Interpreter session (not per code execution — one session per Run). (3) Token costs: GPT-4o at $2.50/1M input + $10/1M output tokens. For a typical math tutoring session (10 back-and-forth questions, average 500 tokens each), expect roughly: 5,000 tokens × $0.0025/1K = $0.0125 in tokens + 10 × $0.03 = $0.30 in Code Interpreter fees. Total: ~$0.31/session. For a business model, charging $5-10 for unlimited sessions within a subject area is economically viable with this cost structure.
Can I use Assistants with streaming to show the answer character by character?
Yes — use client.beta.threads.runs.create_and_stream() with an EventHandler. This provides real-time streaming of text tokens and tool call progress. The streaming API emits events: on_text_delta for each text token, on_tool_call_created when code starts executing, on_tool_call_delta as code is written, and on_image_file_done when an image is ready. For the math tutor UI, show the code being written character-by-character as a typing animation, then show the execution output, then stream the explanation — this creates a compelling "watch the AI think" experience that students find educational and engaging.
"Don't compute. Orchestrate."
Learn more about Assistants API v2 →Continue Reading
Vivek
AI EngineerFull-stack AI engineer with 4+ years building LLM-powered products, autonomous agents, and RAG pipelines. I've shipped AI features to production for startups and worked hands-on with GPT-4o, LangChain, LlamaIndex, and the Vercel AI SDK. I started OpnCrafter to share everything I wish I had when learning — no fluff, just working code and real-world context.