nanochat: “ChatGPT tốt nhất mà $100 có thể mua”#

Trong thế giới AI đang phát triển với tốc độ chóng mặt, việc training một Large Language Model (LLM) thường được coi là đặc quyền của các tập đoàn công nghệ lớn với budgets hàng triệu đô. nanochat - dự án mới nhất từ Andrej Karpathy - đã thách thức paradigm này bằng cách cho thấy rằng bạn có thể tạo ra một ChatGPT clone hoàn chỉnh chỉ với $100.

nanochat là gì?#

nanochat là một full-stack implementation của một LLM giống ChatGPT trong một single codebase clean, minimal, hackable và dependency-lite. Thay vì sử dụng frameworks phức tạp, nanochat được thiết kế để chạy trên một single 8XH100 node thông qua script đơn giản, bao gồm toàn bộ pipeline từ đầu đến cuối.

Dự án này bao gồm:

Tokenization - Text preprocessing và encoding
Pretraining - Base language model training
Finetuning - Task-specific adaptation
Evaluation - Performance assessment
Inference - Model serving
Web UI - ChatGPT-like interface

Tính năng nổi bật#

🚀 End-to-End Pipeline#

nanochat cung cấp complete pipeline trong một codebase duy nhất:

Tokenization với rustbpe#

BPE Tokenizer: Efficient tokenization implementation
Rust Performance: Fast processing với Rust backend
Custom Vocabulary: Tailored tokenizer cho training data
Cross-language Support: Python bindings cho Rust tokenizer

Training Pipeline#

Base Training: Pretraining từ scratch
Mid Training: Continued pretraining
Supervised Fine-tuning (SFT): Task adaptation
Reinforcement Learning (RL): Human preference alignment

💰 Cost-Effective Training#

Breakthrough achievement của nanochat:

$100 Tier Model#

Training Time: ~4 hours trên 8XH100 node
Compute Cost: $24/hour × 4 hours =$ 96
Model Size: 4e19 FLOPs capability
Performance: Kindergartener-level intelligence

Scaling Options#

1
# $100 tier (default)
2
bash speedrun.sh
3

4
# $300 tier (~12 hours, GPT-2 grade)
5
# Requires depth=26, more data shards
6
torchrun --standalone --nproc_per_node=8 -m scripts.base_train -- --depth=26
7

8
# $1000 tier (~41.6 hours)
9
# Production-grade model

🔧 Hackable Architecture#

Designed cho accessibility và customization:

Minimal Dependencies#

Pure PyTorch: Vanilla implementation
No Frameworks: Không phụ thuộc heavy frameworks
Readable Code: ~8K lines trong 45 files
Single Codebase: Everything trong one repository

Easy Customization#

1
# Model architecture trong nanochat/model.py
2
class GPT(nn.Module):
3
    def __init__(self, config):
4
        super().__init__()
5
        self.config = config
6
        # Simple, hackable architecture
7

8
# Training loop trong scripts/base_train.py
9
for step in range(max_steps):
10
    # Clean training logic
11
    loss = model(x, y)
12
    loss.backward()

Cách sử dụng nanochat#

Quick Start - Speedrun Script#

1. Setup Environment#

1
# Boot up 8XH100 GPU node (Lambda, AWS, etc.)
2
# Recommended: $24/hour spot instances
3

4
# Clone repository
5
git clone https://github.com/karpathy/nanochat.git
6
cd nanochat

2. Run Complete Training#

1
# Full pipeline trong 4 hours
2
bash speedrun.sh
3

4
# Hoặc trong screen session để monitor
5
screen -L -Logfile speedrun.log -S speedrun bash speedrun.sh

3. Serve Your LLM#

1
# Activate environment
2
source .venv/bin/activate
3

4
# Start web server
5
python -m scripts.chat_web
6

7
# Access ChatGPT-like UI
8
# http://your-server-ip:8000/

Hardware Requirements#

Minimum Configuration#

1
# 8XH100 (Recommended)
2
gpu: 8x H100 80GB
3
memory: 2TB RAM
4
storage: 1TB NVMe SSD
5

6
# Alternative: 8XA100 (slower)
7
gpu: 8x A100 80GB
8
memory: 1TB RAM
9
storage: 500GB SSD

Single GPU Setup#

1
# Omit torchrun for single GPU
2
python -m scripts.base_train
3

4
# Automatically switches to gradient accumulation
5
# Same results, 8x longer training time

Kiến trúc và Implementation#

🏗️ Core Components#

Model Architecture#

1
# GPT-style transformer trong nanochat/model.py
2
class GPT(nn.Module):
3
    def __init__(self, config):
4
        # Embedding layers
5
        self.tok_emb = nn.Embedding(config.vocab_size, config.n_embd)
6
        self.pos_emb = nn.Parameter(torch.zeros(1, config.block_size, config.n_embd))
7

8
        # Transformer blocks
9
        self.blocks = nn.ModuleList([Block(config) for _ in range(config.n_layer)])
10

11
        # Output head
12
        self.ln_f = nn.LayerNorm(config.n_embd)
13
        self.head = nn.Linear(config.n_embd, config.vocab_size, bias=False)

Training Stages#

1
# Base training - Language modeling
2
python -m scripts.base_train
3

4
# Mid training - Continued pretraining
5
python -m scripts.mid_train
6

7
# Supervised fine-tuning
8
python -m scripts.sft_train
9

10
# Reinforcement learning (optional)
11
python -m scripts.rl_train

📊 Data Pipeline#

Dataset Preparation#

1
# Download và prepare data
2
python -m nanochat.dataset -n 180  # 180 shards for $100 tier
3

4
# Automatic data processing:
5
# 1. Download FineWeb dataset
6
# 2. Tokenize với custom BPE
7
# 3. Create training shards
8
# 4. Shuffle và batch data

Tokenizer Implementation#

1
// rustbpe - High-performance tokenizer
2
use rustbpe::Tokenizer;
3

4
impl Tokenizer {
5
    pub fn encode(&self, text: &str) -> Vec<u32> {
6
        // Efficient BPE encoding
7
    }
8

9
    pub fn decode(&self, tokens: &[u32]) -> String {
10
        // Fast decoding với caching
11
    }
12
}

Performance Benchmarks#

📈 Model Performance#

nanochat includes comprehensive evaluation suite:

Core Benchmarks#

1
# Evaluation metrics included
2
CORE: 0.2219          # Common sense reasoning
3
ARC-Challenge: 0.2875  # Abstract reasoning
4
ARC-Easy: 0.3561      # Basic reasoning
5
GSM8K: 0.0250         # Math word problems
6
HumanEval: 0.0671     # Code generation
7
MMLU: 0.3111          # General knowledge
8
ChatCORE: 0.0730      # Conversational ability

Scaling Laws#

Tier	Cost	Time	Depth	Parameters	Performance
$100	$96	4h	12	~100M	Kindergartener
$300	$288	12h	26	~300M	GPT-2 level
$1000	$998	42h	36	~1B	Decent assistant

⚡ Training Efficiency#

Compute Optimization#

1
# Automatic mixed precision
2
@torch.amp.autocast(device_type='cuda', dtype=torch.bfloat16)
3
def forward(self, x, targets=None):
4
    # Efficient forward pass
5

6
# Gradient accumulation for memory efficiency
7
effective_batch_size = device_batch_size * gradient_accumulation_steps * world_size

Memory Management#

1
# Tune batch size cho available VRAM
2
--device_batch_size=32  # H100 80GB (default)
3
--device_batch_size=16  # For deeper models
4
--device_batch_size=8   # V100 32GB
5
--device_batch_size=4   # RTX 4090 24GB

Advanced Features#

🎯 Custom Training Recipes#

Hyperparameter Tuning#

1
# Configurable training parameters
2
config = {
3
    'learning_rate': 6e-4,
4
    'batch_size': 32,
5
    'sequence_length': 1024,
6
    'warmup_steps': 2000,
7
    'weight_decay': 0.1,
8
    'beta1': 0.9,
9
    'beta2': 0.95,
10
}

Curriculum Learning#

1
# Progressive training stages
2
stages = [
3
    {'name': 'base', 'data': 'fineweb', 'steps': 50000},
4
    {'name': 'mid', 'data': 'fineweb', 'steps': 5000},
5
    {'name': 'sft', 'data': 'smoltalk', 'steps': 3000},
6
    {'name': 'rl', 'data': 'preferences', 'steps': 1000},
7
]

🔍 Evaluation Framework#

Comprehensive testing suite:

Automated Benchmarks#

1
# Built-in evaluation tasks
2
tasks = [
3
    'core',           # Common sense
4
    'arc_challenge',  # Abstract reasoning
5
    'arc_easy',       # Basic reasoning
6
    'gsm8k',         # Math problems
7
    'humaneval',     # Code generation
8
    'mmlu',          # Knowledge
9
    'chatcore',      # Conversation
10
]

Custom Evaluation#

1
# Add your own evaluation tasks
2
class CustomTask(Task):
3
    def evaluate(self, model, tokenizer):
4
        # Custom evaluation logic
5
        return accuracy_score

🌐 Web Interface#

ChatGPT-like web UI:

Features#

Real-time Chat: Interactive conversation interface
Streaming Responses: Token-by-token generation
Multiple Conversations: Session management
Export Functionality: Save conversations
Mobile Responsive: Works on all devices

Customization#

1
<div class="chat-container">
2
    <div class="messages" id="messages"></div>
3
    <div class="input-container">
4
        <textarea id="user-input" placeholder="Ask me anything..."></textarea>
5
        <button id="send-btn">Send</button>
6
    </div>
7
</div>

Development và Contribution#

🛠️ Development Setup#

Local Development#

1
# Setup development environment
2
git clone https://github.com/karpathy/nanochat.git
3
cd nanochat
4

5
# Install dependencies với uv
6
uv venv
7
source .venv/bin/activate
8
uv pip install -e .

Testing Framework#

1
# Run tests
2
python -m pytest tests/test_rustbpe.py -v -s
3

4
# Test tokenizer performance
5
python tests/test_tokenizer_speed.py
6

7
# Validate model architecture
8
python tests/test_model.py

📚 Code Analysis#

nanochat được thiết kế để dễ hiểu và modify:

Code Statistics#

1
# Generated report
2
Characters: 333,989
3
Lines: 8,304
4
Files: 44
5
Tokens (approx): 83,497
6
Dependencies (uv.lock lines): 2,004

AI-Assisted Analysis#

1
# Package entire codebase for LLM analysis
2
files-to-prompt . -e py -e md -e rs -e html -e toml -e sh \
3
  --ignore "*target*" --cxml > packaged.txt
4

5
# Use với ChatGPT, Claude, etc. for questions
6
# Or visit: deepwiki.com/karpathy/nanochat

Educational Value#

🎓 Learning Path#

nanochat serves as capstone project cho LLM101n course:

Concepts Covered#

Transformer Architecture: Attention mechanisms, positional encoding
Training Dynamics: Loss landscapes, optimization
Scaling Laws: Parameter count vs performance
Data Engineering: Tokenization, batching, sampling
Evaluation Metrics: Benchmarking, human evaluation

Hands-on Experience#

1
# Students can modify every aspect:
2
# 1. Change model architecture
3
class CustomGPT(GPT):
4
    def __init__(self, config):
5
        # Custom modifications
6

7
# 2. Implement new training techniques
8
def custom_training_loop():
9
    # Novel optimization strategies
10

11
# 3. Add new evaluation tasks
12
def custom_benchmark():
13
    # Domain-specific evaluation

📖 Research Applications#

Academic Use Cases#

Scaling Law Research: Study parameter vs performance relationships
Training Efficiency: Optimize compute utilization
Architecture Exploration: Test new transformer variants
Data Quality Impact: Analyze dataset effects on performance

Industry Applications#

Proof of Concepts: Rapid prototyping cho LLM applications
Custom Domain Models: Train specialized models
Cost Analysis: Budget planning cho LLM projects
Benchmarking: Compare với existing solutions

Community và Impact#

📊 Project Statistics#

⭐ 11.3k GitHub stars - Massive community interest
🔄 1k forks - Active experimentation
👨‍💻 3 contributors - Focused development team
📅 Recent Release - Launched October 2025

🌟 Community Impact#

Democratizing AI#

Accessibility: Makes LLM training accessible to individuals
Education: Teaches LLM concepts through hands-on experience
Research: Enables academic research without massive budgets
Innovation: Allows experimentation với novel approaches

Industry Influence#

1
# Before nanochat: "LLMs require millions of dollars"
2
# After nanochat: "You can build ChatGPT for $100"
3

4
# This paradigm shift enables:
5
- Small companies to experiment with LLMs
6
- Researchers to test ideas quickly
7
- Students to learn hands-on
8
- Entrepreneurs to prototype AI products

Comparison với Commercial Solutions#

nanochat vs OpenAI GPT Training#

Aspect	nanochat	OpenAI GPT-3/4
Cost	$100-1000	$10M+
Time	4-42 hours	Months
Accessibility	Open source	Closed
Customization	Full control	API only
Learning	Educational	Black box

nanochat vs Other Training Frameworks#

Feature	nanochat	Transformers	DeepSpeed
Complexity	Minimal	High	Very High
Dependencies	Few	Many	Many
Hackability	Maximum	Medium	Low
Learning Curve	Gentle	Steep	Very Steep

Future Roadmap#

🔮 Planned Improvements#

Technical Enhancements#

Longer Contexts: Support cho longer sequence lengths
Multimodal: Vision và audio capabilities
Efficiency: Further optimization cho smaller budgets
Architectures: Exploration of new model designs

Tooling Improvements#

AutoML: Automated hyperparameter tuning
Monitoring: Better training visualization
Deployment: Production-ready serving options
Scaling: Multi-node training support

🌍 Community Goals#

Educational Impact#

Course Integration: LLM101n capstone project
Workshop Materials: Hands-on training materials
Documentation: Comprehensive learning guides
Video Tutorials: Step-by-step walkthroughs

Research Enablement#

Baseline Models: Standardized comparison points
Evaluation Suites: Comprehensive benchmarking
Research Templates: Starting points cho new research
Collaboration Tools: Community contribution frameworks

Kết luận#

nanochat đại diện cho democratization của LLM training. Bằng cách chứng minh rằng bạn có thể tạo ra một ChatGPT clone hoàn chỉnh chỉ với $100, Andrej Karpathy đã:

Phá vỡ barriers: LLM training không còn là đặc quyền của big tech
Mở ra giáo dục: Students có thể học hands-on về LLMs
Khuyến khích nghiên cứu: Researchers có thể experiment với budgets nhỏ
Tạo cơ hội: Entrepreneurs có thể prototype AI products

Với codebase minimal nhưng powerful, nanochat không chỉ là một tool mà là manifesto cho accessible AI development. Nó chứng minh rằng innovation không cần massive resources - chỉ cần creativity, expertise, và right approach.

Tài nguyên tham khảo#

💻 GitHub Repository
🧑‍🏫 Andrej Karpathy - Author
📚 Discussion Walkthrough
🔍 DeepWiki Analysis
🏫 LLM101n Course - Eureka Labs

Quick Commands#

1
# Complete training pipeline
2
git clone https://github.com/karpathy/nanochat.git
3
cd nanochat
4
bash speedrun.sh
5

6
# Wait 4 hours...
7

8
# Chat với your LLM
9
source .venv/bin/activate
10
python -m scripts.chat_web
11

12
# Visit http://your-ip:8000/

Bài viết này giới thiệu nanochat - breakthrough project chứng minh rằng AI development có thể accessible cho mọi người. Hãy bắt đầu training ChatGPT clone của riêng bạn ngay hôm nay với chỉ $100.