emese / csermely
hu / en
csermely
csermely
138M · 8k context
Variants
csermely float16 · Transformers · 0.2 GB
csermely-gguf Q4_K_M / Q8_0 · llama.cpp · CPU/GPU
csermely-mlx float16 · MLX · Apple Silicon
Parameters 137.8M
Context length 8,192 tokens (YaRN RoPE)
Architecture LLaMA-style decoder-only transformer
Layers 16
Attention heads 12
Hidden dim 768
FFN dim 2,048 (SwiGLU)
Vocabulary 32,000 (SentencePiece Unigram)
Normalization RMSNorm (pre-norm)
Positional encoding RoPE + YaRN extension
Activation SwiGLU
Training data ~1B tokens of Hungarian text
Precision bfloat16 (train) / float16 (published)
License MIT
HuggingFace emese-tech/csermely