The first production language model built in Africa. Scores every word it generates in real time. Thinks before it answers. Calls tools natively. Runs on your hardware, offline, with no per-inference cost after deployment.
Same architecture across both. Split is by scale and licensing.
Architecture, training pipeline, web interface, and the 7B through 50B checkpoints. Free to deploy and modify.
Frontier-scale checkpoints, production accuracy tooling, enterprise integrations. Distributed through Alchymia Groom.
Every frontier model today is a transformer. Regent is a Mamba-2 state-space model with grouped-query attention at selected layers. Different engine, different properties. Recurrent layers handle processing with a fixed memory footprint. Attention layers are placed every eight blocks for precise recall. Two output heads share the same internal state.
Compresses each input into a fixed-size state buffer. The buffer stays the same size whether the session runs one minute or eight hours.
Placed every eight recurrent layers for precise recall. Finds specific facts the recurrent layers compressed away. Sliding window keeps memory bounded.
Converts structured knowledge nodes into native model inputs. Each node carries confidence, recency, emotional weight, and category. No text conversion needed.
7 numbers injected at multiple layers during generation: mood, influence, truth bias, civility, good/evil, curiosity, self-preservation. Shapes the entire response. Not a prompt prefix.
No production language model ships with all eight. Most ship with none.
Two output channels, one pass. One produces the next word. The other scores it 0 to 1: confident or guessing. Other systems check after the fact by re-running the model or sending output to a second one.
Above 0.6: writes normally. Between 0.3 and 0.6: slows, picks conservative words. Below 0.3: stops, retrieves facts, tries again from the uncertain point. Other models write at the same pace whether right or wrong.
Typed nodes as input: facts, relationships, memories, constraints, each with confidence and category. No text conversion. Other systems flatten knowledge into prompt text and re-inject it on every call.
7 numbers alongside the conversation: mood, influence, truth bias, civility, good/evil, curiosity, self-preservation. Injected at multiple layers. Shapes the entire response. Other models use a prompt persona that dilutes as the response grows.
Other models accumulate tens of gigabytes over a multi-hour session. Regent compresses past context into a fixed 1 GB state. It does not grow. A session running for weeks uses the same memory it used in the first second.
Reads state that already exists. No extra passes. Most safety systems run the model 5 to 20 times for a reliability signal. This one runs once.
When the question requires reasoning, the model works through it internally first, then responds. The reasoning is visible to the caller. Built into the generation loop with dedicated tokens, not a prompt engineering trick.
When the model needs external information, it emits a structured tool request, pauses, receives the result, and continues. APIs, databases, search. Built into the architecture with dedicated tokens, not a plugin layer.
The verification head reads the same internal state as the generation head and outputs an accuracy score per word. One pass. No separate model, no multiple samples, no post-hoc filtering.
Model is confident and grounded. Writes normally at the configured temperature and sampling settings. Standard generation for topics the model knows well.
Model is uncertain. Temperature drops, output shifts toward conservative and hedged language. The response signals its own uncertainty rather than asserting it.
Model is fabricating. Generation stops. The system retrieves relevant knowledge, re-injects it as context, and re-generates from the point where confidence broke down.
Most checks score an entire response. This scores every word, catching fabricated claims mid-sentence and identifying the exact span.
Sampling verification runs the model 5 to 20 times. This reads state that already exists. 0.1% of parameters, zero extra passes.
Retrieval augmentation gives the model documents but does not stop it ignoring them. The verification head is trained to flag claims not grounded in the knowledge base.
Compared on what Regent is built for, not on general benchmarks. The models listed are the latest publicly available frontier models from each lab as of 2026.
| Property | Regent | GPT-5 | Claude Opus 4.6 | Gemini 2.5 | Llama 4 | DeepSeek V3 |
|---|---|---|---|---|---|---|
| Non-transformer architecture | Mamba-2 SSM | Transformer | Transformer | Transformer | Transformer | MoE Transformer |
| Thinks before answering | Yes | Yes | Yes | Yes | No | Yes |
| Native tool calling | Yes | Yes | Yes | Yes | Yes | Yes |
| Per-word accuracy score | Yes | No | No | No | No | No |
| Stops and self-corrects mid-generation | Yes | Post-hoc only | Post-hoc only | Post-hoc only | Post-hoc only | Post-hoc only |
| Fixed memory at any session length | Yes (~1 GB fixed) | Grows with session | Grows with session | Grows with session | Grows with session | Grows with session |
| Native structured knowledge input | Yes | Text only | Text only | Text only | Text only | Text only |
| Self-hosted, no API required | Yes | API only | API only | API only | Open weights | Open weights |
| Air-gap deployable | Yes | No | No | No | Yes | Yes |
| Self-hosted, single server | 7B on 16 GB GPU | API only | API only | API only | 8B at ~5GB | Too large |
| Multi-hour session runtime | Hours to days | Context window only | Context window only | Context window only | Context window only | Context window only |
| Auditable per-claim accuracy trace | Yes | Reasoning trace only | Reasoning trace only | Reasoning trace only | No | No |
| HuggingFace compatible | Yes | No | No | No | Yes | Yes |
Standard models accumulate memory with every word processed. Regent's memory is fixed. For workloads measured in hours, not turns, this is the difference between running and not running.
| Model | Architecture | Memory at 10K words | Memory at 100K words | Memory at 1M words |
|---|---|---|---|---|
| Regent 7B | Hybrid recurrent + attention | ~1 GB | ~1 GB | ~1 GB |
| GPT-5 | Transformer | API managed | API managed | Exceeds context |
| Claude Opus 4.6 | Transformer | API managed | API managed | Exceeds context |
| Llama 4 70B | Transformer | ~2.5 GB | ~25 GB | Will not fit |
| DeepSeek V3 | Mixture-of-experts | ~1.5 GB | ~15 GB | Will not fit |
Same backbone across every checkpoint. Open source up to 50B. Commercial above.
Export from the Model Studio interface or the command line. One click to HuggingFace. One click to Docker. Runs on NVIDIA CUDA and Apple Silicon.
Export to a HuggingFace package. Load with two lines. Push to Hub from the interface. Works with existing fine-tuning, quantization, and deployment tooling.
Self-contained Docker package with FastAPI server. Runs on any machine with Docker. No Python setup needed.
Native on NVIDIA GPUs and Apple Silicon. Optional Triton kernel on CUDA for long-sequence throughput. CPU fallback on any machine.
Browser interface for the full lifecycle: scrape data, train, monitor, chat with live accuracy scores, manage the knowledge graph, export. No command line needed.
Drop-in replacement at /v1/chat/completions. Any SDK, framework, or application built for OpenAI works by changing the base URL. No code changes needed.
SafeTensors format, readable by PyTorch and MLX. Supports float32, float16, and bfloat16. Same file loads on both platforms.
Clone, install, and start. The Model Studio interface opens in your browser and handles everything from there.
--synthetic to validate on random data. All four phases run end-to-end. Swap in a real corpus when ready.
Workloads where constant memory, per-word accuracy scoring, structured knowledge input, or self-hosting matter more than raw benchmark rankings.
Every claim needs to be traceable. The accuracy score flags uncertain spans before the document is finished. Case law and statute are graph-shaped; the knowledge encoder ingests them directly.
Long sessions, fixed memory, clinical consequences for errors. UMLS, SNOMED, and ICD are structured ontologies the knowledge encoder reads natively. The accuracy score meets the auditability bar clinical workflows require.
8-hour shifts and 6-hour missions need fixed memory. The state never grows regardless of duration. Accuracy score gates actions before execution.
Air-gap is a hard requirement. Ships as weights, runs on local hardware, no external connectivity. Accuracy score produces auditable records per claim.
Long sessions over structured data. Every claim traceable before it influences a decision. The accuracy score is the audit trail. Self-hosting removes data sovereignty concerns.
Constant memory means no session limit. Knowledge graph input persists context without growing prompts. Behavioral state stays consistent across the full session.
Molecular databases, drug interaction graphs, and trial data are graph-shaped. Regulatory submissions require every claim to be auditable. Trial data often cannot leave a jurisdiction. Wrong output has patient consequences.
Air-gap is mandatory. 8 to 12 hour shifts run without memory degradation. The accuracy score flags uncertain operational recommendations before an operator acts. No cloud vendor has clearance to run inside a nuclear facility.
Ships at sea have no reliable internet for weeks. Engine monitoring, navigation, and cargo management need a model that runs offline for the full voyage. Long sessions, fixed memory, no connectivity required.
Remote sites, zero connectivity, safety-critical decisions. Long operational cycles. The accuracy score gates decisions before they become incidents. No ongoing infrastructure required after deployment.
Every decision needs a traceable justification for regulatory review. Structured policy, precedent, and actuarial knowledge fits the graph encoder. The accuracy score is the audit trail regulators require.
Pharmaceutical, financial, and environmental compliance all require AI output auditable at the claim level. Regulatory frameworks are graph-shaped. General models produce plausible compliance text with no verifiability.
Works offline when infrastructure is down. Medical triage, search and rescue, resource allocation. A model requiring connectivity is unavailable exactly when needed most. Behavioral conditioning keeps output calibrated for high-stakes decisions.
Every figure needs a source. Long document review over structured financial data. The accuracy score identifies which claims to scrutinize before the report is signed. Self-hosting removes the conflict of sending client data to a third-party API.
Unreliable connectivity, thin margins, structured knowledge of crop disease and soil. Zero marginal cost after deployment is the only viable model at this scale. Self-hosted deployment is the only distribution path that reaches this market.
One license, local hardware, zero marginal cost after deployment. Works without internet. Healthcare, legal, agricultural, and financial organizations across Africa, South Asia, Southeast Asia, and Latin America.
Entire repository in context with no limit. Calls compilers, test runners, and linters mid-generation through native tool calling. Verification head scores confidence per line before you run anything. Multi-hour sessions without memory degradation.
Fixed memory means conversations never truncate. Adaptive gate reduces cost on routine exchanges. Thinking and tool calling work natively. At 7B it will not match 70B+ frontier models on open-ended tasks, but Grande Regent at 70B+ is competitive.
Run all four phases from the Model Studio interface with one click, or from the command line with one command. Validated end-to-end on Apple Silicon and NVIDIA CUDA.
Language modeling on a general corpus. Full model trains. Learns grammar, world knowledge, and reasoning.
Fine-tuning on domain conversations. Knowledge encoder trains jointly. Learns output format and tone for the deployment context.
Accuracy head trains, backbone frozen. Trained on grounded, fabricated, contradicted, and entity-swapped pairs. 0.1% of parameters update.
Preference learning against a frozen reference. Higher accuracy and better domain alignment are preferred. The model learns what good output means for its use case.
Researching, building, and accelerating AI for developing economies.
Regent is the first real language model to come out of Africa. Not a fine-tune. Not a wrapper. A ground-up architecture designed to be on par with the best models in the world at the workloads it is built for.
Alchymia Labs is founded by Ayomide I. Daniels. The team is in the diaspora. The work is global.
Developing economies do not need cheaper versions of Western AI. They need AI with different properties: offline, owned, auditable, affordable.
Getting there from where we stand requires a 10x magnitude of ingenuity. That is the operating requirement, and the core ethos of the people at Alchymia.
7B to 50B is open source. Deployable, modifiable, free to build on. The organizations with the most to gain from AI are often the least able to pay for it.
Grande Regent at 70B to 1T is commercial. That revenue funds the open tier.
Training frontier-scale models currently requires concentrating tens of thousands of GPUs in one facility and a $50M to $200M check to a single cloud provider.
Alchymia is developing DSTP, a protocol to train 1 to 2 trillion parameter models by pooling compute across universities, national labs, government centers, private organizations, and individuals. Frontier-scale AI should not require a single nine-figure infrastructure investment.
Three billion people stand to gain the most from AI in healthcare, legal access, agriculture, finance, and education. None of the major labs build for them primarily.
Regent ships as infrastructure, not a subscription. It is the first model released under that mandate.
Architecture, code, weights, and training pipeline are on GitHub. Issues, questions, and contributions welcome.
Frontier-scale checkpoints, enterprise tooling, and SLA support. Distributed through Alchymia Groom.