csermely
| Variants | |
| csermely | float16 · Transformers · 0.2 GB |
| csermely-gguf | Q4_K_M / Q8_0 · llama.cpp · CPU/GPU |
| csermely-mlx | float16 · MLX · Apple Silicon |
| Parameters | 137.8M |
| Context length | 8,192 tokens (YaRN RoPE) |
| Architecture | LLaMA-style decoder-only transformer |
| Layers | 16 |
| Attention heads | 12 |
| Hidden dim | 768 |
| FFN dim | 2,048 (SwiGLU) |
| Vocabulary | 32,000 (SentencePiece Unigram) |
| Normalization | RMSNorm (pre-norm) |
| Positional encoding | RoPE + YaRN extension |
| Activation | SwiGLU |
| Training data | ~1B tokens of Hungarian text |
| Precision | bfloat16 (train) / float16 (published) |
| License | MIT |
| HuggingFace | emese-tech/csermely |