1496 từ
7 phút đọc
nanochat: Xây dựng ChatGPT với $100 từ Andrej Karpathy

nanochat: “ChatGPT tốt nhất mà $100 có thể mua”#

Trong thế giới AI đang phát triển với tốc độ chóng mặt, việc training một Large Language Model (LLM) thường được coi là đặc quyền của các tập đoàn công nghệ lớn với budgets hàng triệu đô. nanochat - dự án mới nhất từ Andrej Karpathy - đã thách thức paradigm này bằng cách cho thấy rằng bạn có thể tạo ra một ChatGPT clone hoàn chỉnh chỉ với $100.

nanochat là gì?#

nanochat là một full-stack implementation của một LLM giống ChatGPT trong một single codebase clean, minimal, hackable và dependency-lite. Thay vì sử dụng frameworks phức tạp, nanochat được thiết kế để chạy trên một single 8XH100 node thông qua script đơn giản, bao gồm toàn bộ pipeline từ đầu đến cuối.

Dự án này bao gồm:

  • Tokenization - Text preprocessing và encoding
  • Pretraining - Base language model training
  • Finetuning - Task-specific adaptation
  • Evaluation - Performance assessment
  • Inference - Model serving
  • Web UI - ChatGPT-like interface

Tính năng nổi bật#

🚀 End-to-End Pipeline#

nanochat cung cấp complete pipeline trong một codebase duy nhất:

Tokenization với rustbpe#

  • BPE Tokenizer: Efficient tokenization implementation
  • Rust Performance: Fast processing với Rust backend
  • Custom Vocabulary: Tailored tokenizer cho training data
  • Cross-language Support: Python bindings cho Rust tokenizer

Training Pipeline#

  • Base Training: Pretraining từ scratch
  • Mid Training: Continued pretraining
  • Supervised Fine-tuning (SFT): Task adaptation
  • Reinforcement Learning (RL): Human preference alignment

💰 Cost-Effective Training#

Breakthrough achievement của nanochat:

$100 Tier Model#

  • Training Time: ~4 hours trên 8XH100 node
  • Compute Cost: 24/hour×4hours=24/hour × 4 hours = 96
  • Model Size: 4e19 FLOPs capability
  • Performance: Kindergartener-level intelligence

Scaling Options#

Terminal window
# $100 tier (default)
bash speedrun.sh
# $300 tier (~12 hours, GPT-2 grade)
# Requires depth=26, more data shards
torchrun --standalone --nproc_per_node=8 -m scripts.base_train -- --depth=26
# $1000 tier (~41.6 hours)
# Production-grade model

🔧 Hackable Architecture#

Designed cho accessibility và customization:

Minimal Dependencies#

  • Pure PyTorch: Vanilla implementation
  • No Frameworks: Không phụ thuộc heavy frameworks
  • Readable Code: ~8K lines trong 45 files
  • Single Codebase: Everything trong one repository

Easy Customization#

# Model architecture trong nanochat/model.py
class GPT(nn.Module):
def __init__(self, config):
super().__init__()
self.config = config
# Simple, hackable architecture
# Training loop trong scripts/base_train.py
for step in range(max_steps):
# Clean training logic
loss = model(x, y)
loss.backward()

Cách sử dụng nanochat#

Quick Start - Speedrun Script#

1. Setup Environment#

Terminal window
# Boot up 8XH100 GPU node (Lambda, AWS, etc.)
# Recommended: $24/hour spot instances
# Clone repository
git clone https://github.com/karpathy/nanochat.git
cd nanochat

2. Run Complete Training#

Terminal window
# Full pipeline trong 4 hours
bash speedrun.sh
# Hoặc trong screen session để monitor
screen -L -Logfile speedrun.log -S speedrun bash speedrun.sh

3. Serve Your LLM#

Terminal window
# Activate environment
source .venv/bin/activate
# Start web server
python -m scripts.chat_web
# Access ChatGPT-like UI
# http://your-server-ip:8000/

Hardware Requirements#

Minimum Configuration#

# 8XH100 (Recommended)
gpu: 8x H100 80GB
memory: 2TB RAM
storage: 1TB NVMe SSD
# Alternative: 8XA100 (slower)
gpu: 8x A100 80GB
memory: 1TB RAM
storage: 500GB SSD

Single GPU Setup#

Terminal window
# Omit torchrun for single GPU
python -m scripts.base_train
# Automatically switches to gradient accumulation
# Same results, 8x longer training time

Kiến trúc và Implementation#

🏗️ Core Components#

Model Architecture#

# GPT-style transformer trong nanochat/model.py
class GPT(nn.Module):
def __init__(self, config):
# Embedding layers
self.tok_emb = nn.Embedding(config.vocab_size, config.n_embd)
self.pos_emb = nn.Parameter(torch.zeros(1, config.block_size, config.n_embd))
# Transformer blocks
self.blocks = nn.ModuleList([Block(config) for _ in range(config.n_layer)])
# Output head
self.ln_f = nn.LayerNorm(config.n_embd)
self.head = nn.Linear(config.n_embd, config.vocab_size, bias=False)

Training Stages#

# Base training - Language modeling
python -m scripts.base_train
# Mid training - Continued pretraining
python -m scripts.mid_train
# Supervised fine-tuning
python -m scripts.sft_train
# Reinforcement learning (optional)
python -m scripts.rl_train

📊 Data Pipeline#

Dataset Preparation#

# Download và prepare data
python -m nanochat.dataset -n 180 # 180 shards for $100 tier
# Automatic data processing:
# 1. Download FineWeb dataset
# 2. Tokenize với custom BPE
# 3. Create training shards
# 4. Shuffle và batch data

Tokenizer Implementation#

// rustbpe - High-performance tokenizer
use rustbpe::Tokenizer;
impl Tokenizer {
pub fn encode(&self, text: &str) -> Vec<u32> {
// Efficient BPE encoding
}
pub fn decode(&self, tokens: &[u32]) -> String {
// Fast decoding với caching
}
}

Performance Benchmarks#

📈 Model Performance#

nanochat includes comprehensive evaluation suite:

Core Benchmarks#

Terminal window
# Evaluation metrics included
CORE: 0.2219 # Common sense reasoning
ARC-Challenge: 0.2875 # Abstract reasoning
ARC-Easy: 0.3561 # Basic reasoning
GSM8K: 0.0250 # Math word problems
HumanEval: 0.0671 # Code generation
MMLU: 0.3111 # General knowledge
ChatCORE: 0.0730 # Conversational ability

Scaling Laws#

TierCostTimeDepthParametersPerformance
$100$964h12~100MKindergartener
$300$28812h26~300MGPT-2 level
$1000$99842h36~1BDecent assistant

⚡ Training Efficiency#

Compute Optimization#

# Automatic mixed precision
@torch.amp.autocast(device_type='cuda', dtype=torch.bfloat16)
def forward(self, x, targets=None):
# Efficient forward pass
# Gradient accumulation for memory efficiency
effective_batch_size = device_batch_size * gradient_accumulation_steps * world_size

Memory Management#

Terminal window
# Tune batch size cho available VRAM
--device_batch_size=32 # H100 80GB (default)
--device_batch_size=16 # For deeper models
--device_batch_size=8 # V100 32GB
--device_batch_size=4 # RTX 4090 24GB

Advanced Features#

🎯 Custom Training Recipes#

Hyperparameter Tuning#

# Configurable training parameters
config = {
'learning_rate': 6e-4,
'batch_size': 32,
'sequence_length': 1024,
'warmup_steps': 2000,
'weight_decay': 0.1,
'beta1': 0.9,
'beta2': 0.95,
}

Curriculum Learning#

# Progressive training stages
stages = [
{'name': 'base', 'data': 'fineweb', 'steps': 50000},
{'name': 'mid', 'data': 'fineweb', 'steps': 5000},
{'name': 'sft', 'data': 'smoltalk', 'steps': 3000},
{'name': 'rl', 'data': 'preferences', 'steps': 1000},
]

🔍 Evaluation Framework#

Comprehensive testing suite:

Automated Benchmarks#

# Built-in evaluation tasks
tasks = [
'core', # Common sense
'arc_challenge', # Abstract reasoning
'arc_easy', # Basic reasoning
'gsm8k', # Math problems
'humaneval', # Code generation
'mmlu', # Knowledge
'chatcore', # Conversation
]

Custom Evaluation#

# Add your own evaluation tasks
class CustomTask(Task):
def evaluate(self, model, tokenizer):
# Custom evaluation logic
return accuracy_score

🌐 Web Interface#

ChatGPT-like web UI:

Features#

  • Real-time Chat: Interactive conversation interface
  • Streaming Responses: Token-by-token generation
  • Multiple Conversations: Session management
  • Export Functionality: Save conversations
  • Mobile Responsive: Works on all devices

Customization#

templates/chat.html
<div class="chat-container">
<div class="messages" id="messages"></div>
<div class="input-container">
<textarea id="user-input" placeholder="Ask me anything..."></textarea>
<button id="send-btn">Send</button>
</div>
</div>

Development và Contribution#

🛠️ Development Setup#

Local Development#

Terminal window
# Setup development environment
git clone https://github.com/karpathy/nanochat.git
cd nanochat
# Install dependencies với uv
uv venv
source .venv/bin/activate
uv pip install -e .

Testing Framework#

Terminal window
# Run tests
python -m pytest tests/test_rustbpe.py -v -s
# Test tokenizer performance
python tests/test_tokenizer_speed.py
# Validate model architecture
python tests/test_model.py

📚 Code Analysis#

nanochat được thiết kế để dễ hiểu và modify:

Code Statistics#

Terminal window
# Generated report
Characters: 333,989
Lines: 8,304
Files: 44
Tokens (approx): 83,497
Dependencies (uv.lock lines): 2,004

AI-Assisted Analysis#

Terminal window
# Package entire codebase for LLM analysis
files-to-prompt . -e py -e md -e rs -e html -e toml -e sh \
--ignore "*target*" --cxml > packaged.txt
# Use với ChatGPT, Claude, etc. for questions
# Or visit: deepwiki.com/karpathy/nanochat

Educational Value#

🎓 Learning Path#

nanochat serves as capstone project cho LLM101n course:

Concepts Covered#

  • Transformer Architecture: Attention mechanisms, positional encoding
  • Training Dynamics: Loss landscapes, optimization
  • Scaling Laws: Parameter count vs performance
  • Data Engineering: Tokenization, batching, sampling
  • Evaluation Metrics: Benchmarking, human evaluation

Hands-on Experience#

# Students can modify every aspect:
# 1. Change model architecture
class CustomGPT(GPT):
def __init__(self, config):
# Custom modifications
# 2. Implement new training techniques
def custom_training_loop():
# Novel optimization strategies
# 3. Add new evaluation tasks
def custom_benchmark():
# Domain-specific evaluation

📖 Research Applications#

Academic Use Cases#

  • Scaling Law Research: Study parameter vs performance relationships
  • Training Efficiency: Optimize compute utilization
  • Architecture Exploration: Test new transformer variants
  • Data Quality Impact: Analyze dataset effects on performance

Industry Applications#

  • Proof of Concepts: Rapid prototyping cho LLM applications
  • Custom Domain Models: Train specialized models
  • Cost Analysis: Budget planning cho LLM projects
  • Benchmarking: Compare với existing solutions

Community và Impact#

📊 Project Statistics#

  • 11.3k GitHub stars - Massive community interest
  • 🔄 1k forks - Active experimentation
  • 👨‍💻 3 contributors - Focused development team
  • 📅 Recent Release - Launched October 2025

🌟 Community Impact#

Democratizing AI#

  • Accessibility: Makes LLM training accessible to individuals
  • Education: Teaches LLM concepts through hands-on experience
  • Research: Enables academic research without massive budgets
  • Innovation: Allows experimentation với novel approaches

Industry Influence#

Terminal window
# Before nanochat: "LLMs require millions of dollars"
# After nanochat: "You can build ChatGPT for $100"
# This paradigm shift enables:
- Small companies to experiment with LLMs
- Researchers to test ideas quickly
- Students to learn hands-on
- Entrepreneurs to prototype AI products

Comparison với Commercial Solutions#

nanochat vs OpenAI GPT Training#

AspectnanochatOpenAI GPT-3/4
Cost$100-1000$10M+
Time4-42 hoursMonths
AccessibilityOpen sourceClosed
CustomizationFull controlAPI only
LearningEducationalBlack box

nanochat vs Other Training Frameworks#

FeaturenanochatTransformersDeepSpeed
ComplexityMinimalHighVery High
DependenciesFewManyMany
HackabilityMaximumMediumLow
Learning CurveGentleSteepVery Steep

Future Roadmap#

🔮 Planned Improvements#

Technical Enhancements#

  • Longer Contexts: Support cho longer sequence lengths
  • Multimodal: Vision và audio capabilities
  • Efficiency: Further optimization cho smaller budgets
  • Architectures: Exploration of new model designs

Tooling Improvements#

  • AutoML: Automated hyperparameter tuning
  • Monitoring: Better training visualization
  • Deployment: Production-ready serving options
  • Scaling: Multi-node training support

🌍 Community Goals#

Educational Impact#

  • Course Integration: LLM101n capstone project
  • Workshop Materials: Hands-on training materials
  • Documentation: Comprehensive learning guides
  • Video Tutorials: Step-by-step walkthroughs

Research Enablement#

  • Baseline Models: Standardized comparison points
  • Evaluation Suites: Comprehensive benchmarking
  • Research Templates: Starting points cho new research
  • Collaboration Tools: Community contribution frameworks

Kết luận#

nanochat đại diện cho democratization của LLM training. Bằng cách chứng minh rằng bạn có thể tạo ra một ChatGPT clone hoàn chỉnh chỉ với $100, Andrej Karpathy đã:

  • Phá vỡ barriers: LLM training không còn là đặc quyền của big tech
  • Mở ra giáo dục: Students có thể học hands-on về LLMs
  • Khuyến khích nghiên cứu: Researchers có thể experiment với budgets nhỏ
  • Tạo cơ hội: Entrepreneurs có thể prototype AI products

Với codebase minimal nhưng powerful, nanochat không chỉ là một tool mà là manifesto cho accessible AI development. Nó chứng minh rằng innovation không cần massive resources - chỉ cần creativity, expertise, và right approach.

Tài nguyên tham khảo#

Quick Commands#

Terminal window
# Complete training pipeline
git clone https://github.com/karpathy/nanochat.git
cd nanochat
bash speedrun.sh
# Wait 4 hours...
# Chat với your LLM
source .venv/bin/activate
python -m scripts.chat_web
# Visit http://your-ip:8000/

Bài viết này giới thiệu nanochat - breakthrough project chứng minh rằng AI development có thể accessible cho mọi người. Hãy bắt đầu training ChatGPT clone của riêng bạn ngay hôm nay với chỉ $100.

nanochat: Xây dựng ChatGPT với $100 từ Andrej Karpathy
https://githay.com/posts/nanochat-opensource/
Tác giả
Githay
Đăng vào lúc
2025-10-14
Giấy phép bản quyền
CC BY-NC-SA 4.0