opncrafter

Project: AI Math Tutor with Code Interpreter

Dec 30, 2025 • 30 min read

LLMs are language models, not calculators. Ask GPT-4 to multiply 8,734 × 9,127 in its head and there's a non-trivial chance it hallucinates the answer — it's predicting what the answer should look like based on training patterns, not actually computing it. For a math tutor, hallucinated calculations are catastrophic. The solution is Code Interpreter: the LLM writes Python, executes it in a sandboxed environment, and reports the exact output. Every calculation becomes a verified result.

1. The "Orchestrate, Don't Compute" Pattern

❌ Standard LLM

Question: "What is the derivative of sin(x²)?"

LLM tries to do calculus in its "head" by predicting tokens → may get it right, may hallucinate. No way to verify without checking manually.

✅ Code Interpreter

LLM writes: sympy.diff(sin(x**2), x)

Executes in Python sandbox → stdout: 2*x*cos(x**2) → guaranteed correct. LLM then explains the steps.

2. Creating the Persistent Assistant

# STEP 1: Create the assistant ONCE — save the ID in your .env file
# Do NOT recreate the assistant on every request — it's a persistent object
# You pay for storage: ~$0.10/day for active assistants with File Search

from openai import OpenAI
import os

client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))

assistant = client.beta.assistants.create(
    name="Professor Pi",
    model="gpt-4o",  # Best reasoning for complex math
    instructions="""You are an expert math tutor named Professor Pi.
    
ALWAYS follow this workflow:
1. Understand what the student is asking
2. Write Python code to solve it precisely (use: sympy for algebra/calculus, numpy for numerics, matplotlib for graphs)
3. Execute the code and read the exact output
4. Explain the solution step-by-step in simple language
5. Output ALL math expressions in LaTeX format using $...$ for inline and $$...$$ for block equations

NEVER try to compute results mentally — always use code for calculations.
When plotting, save figures to /mnt/data/ so they can be retrieved.

Available Python libraries: sympy, numpy, matplotlib, scipy, pandas""",

    tools=[{"type": "code_interpreter"}],  # Enable Code Interpreter tool
    
    # Optional: attach a PDF textbook as a knowledge file
    # tool_resources={"code_interpreter": {"file_ids": ["file_abc123"]}},
)

print(f"✅ Assistant created: {assistant.id}")
print(f"   Save this to your .env: MATH_ASSISTANT_ID={assistant.id}")
# Output: MATH_ASSISTANT_ID=asst_abc123xyz456

3. The Thread and Run Lifecycle

# STEP 2: Each conversation session gets a Thread (persistent message history)
# Threads survive across API calls — perfect for tutoring sessions

import time

ASSISTANT_ID = os.getenv("MATH_ASSISTANT_ID")

def start_session() -> str:
    """Create a new tutoring session (Thread). Returns thread_id."""
    thread = client.beta.threads.create()
    return thread.id

def ask_math_question(thread_id: str, question: str) -> dict:
    """
    Send a message and get a response.
    Returns: {text: str, images: list[bytes], code_executed: list[str]}
    """
    # Add user message to the thread
    client.beta.threads.messages.create(
        thread_id=thread_id,
        role="user",
        content=question
    )
    
    # Create a Run (triggers the assistant to process the thread)
    run = client.beta.threads.runs.create(
        thread_id=thread_id,
        assistant_id=ASSISTANT_ID,
        # Optional: override instructions for this specific run
        # instructions="Focus on visual explanations with matplotlib graphs.",
        max_prompt_tokens=16384,    # Prevent runaway costs on long conversations
        max_completion_tokens=4096,
    )
    
    # Poll until run completes (typically 3-15s for math problems)
    while run.status in ('queued', 'in_progress', 'cancelling'):
        run = client.beta.threads.runs.retrieve(thread_id=thread_id, run_id=run.id)
        
        # Show intermediate "Run Steps" for better UX (show code being executed)
        if run.status == 'in_progress':
            steps = client.beta.threads.runs.steps.list(thread_id=thread_id, run_id=run.id)
            for step in steps.data:
                if step.type == 'tool_calls' and step.status == 'in_progress':
                    for tool_call in step.step_details.tool_calls:
                        print(f"Executing code:\n{tool_call.code_interpreter.input}")
        
        if run.status not in ('completed', 'failed', 'expired'):
            time.sleep(0.5)
    
    if run.status != 'completed':
        raise RuntimeError(f"Run failed: {run.status}. Error: {run.last_error}")
    
    # Retrieve messages after the run
    messages = client.beta.threads.messages.list(thread_id=thread_id, order='desc', limit=1)
    last_message = messages.data[0]
    
    result = {"text": "", "images": [], "code_executed": []}
    
    for content_block in last_message.content:
        if content_block.type == 'text':
            result["text"] = content_block.text.value
        elif content_block.type == 'image_file':
            # Download the generated graph image
            file_response = client.files.content(content_block.image_file.file_id)
            result["images"].append(file_response.read())
    
    # Also extract the code that was executed (for showing in UI)
    run_steps = client.beta.threads.runs.steps.list(thread_id=thread_id, run_id=run.id)
    for step in run_steps.data:
        if step.type == 'tool_calls':
            for tool_call in step.step_details.tool_calls:
                if hasattr(tool_call, 'code_interpreter'):
                    result["code_executed"].append({
                        "input": tool_call.code_interpreter.input,
                        "output": tool_call.code_interpreter.outputs,
                    })
    
    return result

# Example usage:
thread_id = start_session()

# Algebra
resp = ask_math_question(thread_id, "Find all roots of: 3x³ - 5x² + 2x - 8 = 0")
print(resp["text"])  # LaTeX-formatted steps + exact roots from SymPy

# Calculus
resp = ask_math_question(thread_id, "Find the integral of e^(x²) from -∞ to +∞")
print(resp["text"])  # Should recognize this is √π (Gaussian integral)

# Plotting
resp = ask_math_question(thread_id, "Plot the phase portrait of the Lorenz attractor")
if resp["images"]:
    with open("lorenz.png", "wb") as f:
        f.write(resp["images"][0])  # Save the generated matplotlib figure

4. Frontend: LaTeX Rendering

npm install react-latex-next katex

// components/MathMessage.tsx
import 'katex/dist/katex.min.css';
import Latex from 'react-latex-next';

interface MathMessageProps {
    text: string;
    images?: string[];  // Base64 PNG data URLs
    codeBlocks?: { input: string; output: string }[];
}

export function MathMessage({ text, images, codeBlocks }: MathMessageProps) {
    return (
        <div style={{ padding: '1rem' }}>
            {/* Text with LaTeX rendering — handles $inline$ and $$block$$ */}
            <div style={{ lineHeight: '1.8', fontSize: '1rem' }}>
                <Latex>{text}</Latex>
            </div>

            {/* Show executed code blocks (educational — students see what Python ran) */}
            {codeBlocks?.map((block, i) => (
                <details key={i} style={{ marginTop: '1rem', cursor: 'pointer' }}>
                    <summary style={{ color: 'var(--text-secondary)', fontSize: '0.85rem' }}>
                        View Python code executed
                    </summary>
                    <pre style={{ background: '#000', padding: '1rem', borderRadius: '8px', fontSize: '0.8rem', color: '#0f0', marginTop: '0.5rem' }}>
                        {block.input}
                    </pre>
                    {block.output?.map((out, j) => out.type === 'logs' && (
                        <div key={j} style={{ fontFamily: 'monospace', fontSize: '0.8rem', color: '#22c55e', padding: '0.5rem', background: '#0a0a0a', borderRadius: '4px' }}>
                            Output: {out.logs}
                        </div>
                    ))}
                </details>
            ))}

            {/* Rendered graphs */}
            {images?.map((imgData, i) => (
                <img
                    key={i}
                    src={`data:image/png;base64,${imgData}`}
                    alt="Generated math graph"
                    style={{ maxWidth: '100%', borderRadius: '8px', marginTop: '1rem' }}
                />
            ))}
        </div>
    );
}

Frequently Asked Questions

How much does it cost per tutoring session with the Assistants API?

Costs are split three ways: (1) Thread storage: $0.10/GB/day — negligible for text. (2) Code Interpreter tool usage: $0.03 per Code Interpreter session (not per code execution — one session per Run). (3) Token costs: GPT-4o at $2.50/1M input + $10/1M output tokens. For a typical math tutoring session (10 back-and-forth questions, average 500 tokens each), expect roughly: 5,000 tokens × $0.0025/1K = $0.0125 in tokens + 10 × $0.03 = $0.30 in Code Interpreter fees. Total: ~$0.31/session. For a business model, charging $5-10 for unlimited sessions within a subject area is economically viable with this cost structure.

Can I use Assistants with streaming to show the answer character by character?

Yes — use client.beta.threads.runs.create_and_stream() with an EventHandler. This provides real-time streaming of text tokens and tool call progress. The streaming API emits events: on_text_delta for each text token, on_tool_call_created when code starts executing, on_tool_call_delta as code is written, and on_image_file_done when an image is ready. For the math tutor UI, show the code being written character-by-character as a typing animation, then show the execution output, then stream the explanation — this creates a compelling "watch the AI think" experience that students find educational and engaging.

"Don't compute. Orchestrate."

Learn more about Assistants API v2 →

Continue Reading

👨‍💻
Written by

Vivek

AI Engineer

Full-stack AI engineer with 4+ years building LLM-powered products, autonomous agents, and RAG pipelines. I've shipped AI features to production for startups and worked hands-on with GPT-4o, LangChain, LlamaIndex, and the Vercel AI SDK. I started OpnCrafter to share everything I wish I had when learning — no fluff, just working code and real-world context.

GPT-4oLangChainNext.jsVector DBsRAGVercel AI SDK