patak
0.5B Β· 32k context Β· YaRN
| Parameters | 517M |
| Context length | 32,768 tokens (YaRN RoPE) |
| Architecture | LLaMA-style decoder-only transformer |
| Layers | 24 |
| Attention heads | 20 (MHA) |
| Hidden dim | 1,280 |
| FFN dim | 3,456 (SwiGLU) |
| Vocabulary | 32,000 (SentencePiece Unigram) |
| Normalization | RMSNorm (pre-norm) |
| Positional encoding | RoPE (ΞΈ=10000) + YaRN scale=4 |
| Activation | SwiGLU |
| Training data | ~4.5B tokens (Wikipedia + HPLT + HunSum2) |
| License | MIT |
Architecture
βββββββββββββββββββββββββββββββββββββββββββββββ
β Patak 517M (LLM) β
β Hungarian Β· Decoder-only β
βββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β βββββββββββββββββββββββββββββββββββββββββ β
β β Embedding (32K β 1280) β β
β β (weight-tied β) β β
β ββββββββββββββββββββ¬βββββββββββββββββββββ β
β β β
β ββββββββββββββββββββΌβββββββββββββββββββββ β
β β βΌ Γ24 layers β β
β β βββββββββββββββββββββββββββββββββββ β β
β β β RMSNorm β β β
β β ββββββββββββββββββ¬βββββββββββββββββ β β
β β βΌ β β
β β βββββββββββββββββββββββββββββββββββ β β
β β β Multi-Head Attention β β β
β β β 20 heads Β· 64d/head β β β
β β β YaRN RoPE (ΞΈ=10K, 4Γ scale) β β β
β β β Causal mask β β β
β β ββββββββββββββββββ¬βββββββββββββββββ β β
β β β β β
β β ββββββββββ€ β β
β β β β ββββ (residual) β β
β β β β β
β β βββββββββ΄ββββββββββββββββββββββββββ β β
β β β RMSNorm β β β
β β ββββββββββββββββββ¬βββββββββββββββββ β β
β β βΌ β β
β β βββββββββββββββββββββββββββββββββββ β β
β β β SwiGLU FFN β β β
β β β gate: 1280 β 3456 (SiLU) β β β
β β β up: 1280 β 3456 β β β
β β β down: 3456 β 1280 β β β
β β ββββββββββββββββββ¬βββββββββββββββββ β β
β β β β β
β β ββββββββββ€ β β
β β β β ββββ (residual) β β
β β β β β
β ββββββββββββΌβββββββββββββββββββββββββββββ β
β βΌ β
β βββββββββββββββββββββββββββββββββββββββββ β
β β RMSNorm (final) β β
β ββββββββββββββββββββ¬βββββββββββββββββββββ β
β βΌ β
β βββββββββββββββββββββββββββββββββββββββββ β
β β LM Head (1280 β 32K logits) β β
β β (tied with Embedding) β β
β βββββββββββββββββββββββββββββββββββββββββ β
β β
βββββββββββββββββββββββββββββββββββββββββββββββ€
β Params: ~517M β Dtype: bfloat16 β
β Context: 2048 (β4096 via YaRN) β
β Tokenizer: SentencePiece Unigram 32K β
βββββββββββββββββββββββββββββββββββββββββββββββ
Downloads
| patak |
float16 Β· Transformers Β· 1.0 GB |
| patak-gguf |
Q4_K_M / Q8_0 Β· llama.cpp Β· CPU/GPU |
| patak-mlx |
bfloat16 Β· MLX Β· Apple Silicon |