The model built for
decisions that matter.

The first production language model built in Africa. Scores every word it generates in real time. Thinks before it answers. Calls tools natively. Runs on your hardware, offline, with no per-inference cost after deployment.

Not a Transformer Real-Time Accuracy Scoring Native Thinking Native Tool Calling Constant Memory OpenAI-Compatible API HuggingFace Compatible CUDA + Apple Silicon No API Dependency
1GB
Fixed memory — same at 100 tokens or 1 million
Context window — runs for hours, days, or weeks
0.1%
Extra cost to add real-time accuracy scoring
$0
Per-inference cost after deployment

Two distributions

Same architecture across both. Split is by scale and licensing.

Open Source

Regent

7B to 50B parameters

Architecture, training pipeline, web interface, and the 7B through 50B checkpoints. Free to deploy and modify.

  • Hybrid recurrent backbone with sparse attention
  • Knowledge graph encoder and behavioral conditioning
  • Real-time accuracy scoring with three-zone decoding
  • Four-phase training pipeline
  • Browser-based Model Studio interface
  • HuggingFace and Docker export
  • 7B, 13B, 30B, and 50B checkpoints
GitHub
Commercial

Grande Regent

70B to 1T parameters

Frontier-scale checkpoints, production accuracy tooling, enterprise integrations. Distributed through Alchymia Groom.

  • 70B, 200B, 500B, and 1T checkpoints
  • Production-trained verification head
  • Production knowledge graph tooling
  • Enterprise SLA
  • On-premise and air-gap deployment
  • Distributed via Alchymia Groom
Request Access

Not a transformer

Every frontier model today is a transformer. Regent is a Mamba-2 state-space model with grouped-query attention at selected layers. Different engine, different properties. Recurrent layers handle processing with a fixed memory footprint. Attention layers are placed every eight blocks for precise recall. Two output heads share the same internal state.

Regent Architecture

Recurrent Layers (Mamba-2)

Compresses each input into a fixed-size state buffer. The buffer stays the same size whether the session runs one minute or eight hours.

🎯

Attention Layers (Sparse)

Placed every eight recurrent layers for precise recall. Finds specific facts the recurrent layers compressed away. Sliding window keeps memory bounded.

🧠

Knowledge Graph Encoder

Converts structured knowledge nodes into native model inputs. Each node carries confidence, recency, emotional weight, and category. No text conversion needed.

💜

Behavioral Conditioning

7 numbers injected at multiple layers during generation: mood, influence, truth bias, civility, good/evil, curiosity, self-preservation. Shapes the entire response. Not a prompt prefix.

Eight things this model does that others do not

No production language model ships with all eight. Most ship with none.

01

It scores its own accuracy while it writes

Two output channels, one pass. One produces the next word. The other scores it 0 to 1: confident or guessing. Other systems check after the fact by re-running the model or sending output to a second one.

02

It changes behavior when confidence drops

Above 0.6: writes normally. Between 0.3 and 0.6: slows, picks conservative words. Below 0.3: stops, retrieves facts, tries again from the uncertain point. Other models write at the same pace whether right or wrong.

03

It reads structured knowledge directly

Typed nodes as input: facts, relationships, memories, constraints, each with confidence and category. No text conversion. Other systems flatten knowledge into prompt text and re-inject it on every call.

04

Behavioral state is a dial, not a paragraph

7 numbers alongside the conversation: mood, influence, truth bias, civility, good/evil, curiosity, self-preservation. Injected at multiple layers. Shapes the entire response. Other models use a prompt persona that dilutes as the response grows.

05

Constant memory regardless of session length

Other models accumulate tens of gigabytes over a multi-hour session. Regent compresses past context into a fixed 1 GB state. It does not grow. A session running for weeks uses the same memory it used in the first second.

06

Accuracy scoring adds 0.1% to the cost

Reads state that already exists. No extra passes. Most safety systems run the model 5 to 20 times for a reliability signal. This one runs once.

07

It thinks before it answers

When the question requires reasoning, the model works through it internally first, then responds. The reasoning is visible to the caller. Built into the generation loop with dedicated tokens, not a prompt engineering trick.

08

It calls tools natively

When the model needs external information, it emits a structured tool request, pauses, receives the result, and continues. APIs, databases, search. Built into the architecture with dedicated tokens, not a plugin layer.

Built-in grounding verification

The verification head reads the same internal state as the generation head and outputs an accuracy score per word. One pass. No separate model, no multiple samples, no post-hoc filtering.

FLOW (score > 0.6)

Model is confident and grounded. Writes normally at the configured temperature and sampling settings. Standard generation for topics the model knows well.

CAUTION (0.3 to 0.6)

Model is uncertain. Temperature drops, output shifts toward conservative and hedged language. The response signals its own uncertainty rather than asserting it.

HALT (score < 0.3)

Model is fabricating. Generation stops. The system retrieves relevant knowledge, re-injects it as context, and re-generates from the point where confidence broke down.

Per-word, not per-response

Most checks score an entire response. This scores every word, catching fabricated claims mid-sentence and identifying the exact span.

No extra inference cost

Sampling verification runs the model 5 to 20 times. This reads state that already exists. 0.1% of parameters, zero extra passes.

Grounded in the knowledge base

Retrieval augmentation gives the model documents but does not stop it ignoring them. The verification head is trained to flag claims not grounded in the knowledge base.

Against the flagships

Compared on what Regent is built for, not on general benchmarks. The models listed are the latest publicly available frontier models from each lab as of 2026.

Property Regent GPT-5 Claude Opus 4.6 Gemini 2.5 Llama 4 DeepSeek V3
Non-transformer architecture Mamba-2 SSM Transformer Transformer Transformer Transformer MoE Transformer
Thinks before answering Yes Yes Yes Yes No Yes
Native tool calling Yes Yes Yes Yes Yes Yes
Per-word accuracy score Yes No No No No No
Stops and self-corrects mid-generation Yes Post-hoc only Post-hoc only Post-hoc only Post-hoc only Post-hoc only
Fixed memory at any session length Yes (~1 GB fixed) Grows with session Grows with session Grows with session Grows with session Grows with session
Native structured knowledge input Yes Text only Text only Text only Text only Text only
Self-hosted, no API required Yes API only API only API only Open weights Open weights
Air-gap deployable Yes No No No Yes Yes
Self-hosted, single server 7B on 16 GB GPU API only API only API only 8B at ~5GB Too large
Multi-hour session runtime Hours to days Context window only Context window only Context window only Context window only Context window only
Auditable per-claim accuracy trace Yes Reasoning trace only Reasoning trace only Reasoning trace only No No
HuggingFace compatible Yes No No No Yes Yes
Not a general capability ranking. On language quality, instruction following, math, and coding, the frontier transformers lead. Regent is a different architecture built for different properties: accuracy-scored, long-running, self-hosted workloads where wrong answers have a measurable cost.

Memory at scale: the operational difference

Standard models accumulate memory with every word processed. Regent's memory is fixed. For workloads measured in hours, not turns, this is the difference between running and not running.

Model Architecture Memory at 10K words Memory at 100K words Memory at 1M words
Regent 7B Hybrid recurrent + attention ~1 GB ~1 GB ~1 GB
GPT-5 Transformer API managed API managed Exceeds context
Claude Opus 4.6 Transformer API managed API managed Exceeds context
Llama 4 70B Transformer ~2.5 GB ~25 GB Will not fit
DeepSeek V3 Mixture-of-experts ~1.5 GB ~15 GB Will not fit

7B to 1T parameters

Same backbone across every checkpoint. Open source up to 50B. Commercial above.

7B
Regent | Open Source
LicenseOpen
Weights (INT4)~4 GB + 1 GB state
HardwareSingle GPU (16 GB+)
TargetSelf-hosted / On-premise
13B to 50B
Regent | Open Source
LicenseOpen
Memory (INT4)8 to 30 GB
HardwareServer, workstation, cloud
TargetEnterprise on-premise

Three ways to deploy

Export from the Model Studio interface or the command line. One click to HuggingFace. One click to Docker. Runs on NVIDIA CUDA and Apple Silicon.

🤗

HuggingFace

Export to a HuggingFace package. Load with two lines. Push to Hub from the interface. Works with existing fine-tuning, quantization, and deployment tooling.

from transformers import AutoModelForCausalLM, AutoTokenizer model = AutoModelForCausalLM.from_pretrained( "path/to/export", trust_remote_code=True )
🐳

Docker / Self-Hosted

Self-contained Docker package with FastAPI server. Runs on any machine with Docker. No Python setup needed.

CUDA + Apple Silicon

Native on NVIDIA GPUs and Apple Silicon. Optional Triton kernel on CUDA for long-sequence throughput. CPU fallback on any machine.

🖥

Model Studio Interface

Browser interface for the full lifecycle: scrape data, train, monitor, chat with live accuracy scores, manage the knowledge graph, export. No command line needed.

🔌

OpenAI-Compatible API

Drop-in replacement at /v1/chat/completions. Any SDK, framework, or application built for OpenAI works by changing the base URL. No code changes needed.

📦

Weight Formats

SafeTensors format, readable by PyTorch and MLX. Supports float32, float16, and bfloat16. Same file loads on both platforms.

Up and running in three steps

Clone, install, and start. The Model Studio interface opens in your browser and handles everything from there.

Step 1 — Clone and install

# Apple Silicon git clone https://github.com/Alchymia-AI/Regent cd regent-model python3.12 -m venv .venv && source .venv/bin/activate pip install mlx pyyaml sentencepiece numpy fastapi uvicorn pydantic # NVIDIA GPU pip install torch --index-url https://download.pytorch.org/whl/cu124 pip install pyyaml sentencepiece numpy fastapi uvicorn pydantic safetensors transformers

Step 2 — Start the interface

# Starts Model Studio UI + API server ./start.sh # Opens at http://localhost:3000 # API at http://localhost:8400 # With a specific model config and weights ./start.sh --config configs/regent_7b.yaml \ --model checkpoints/alignment/regent.safetensors

Step 3 — Or load from HuggingFace

from transformers import AutoModelForCausalLM, AutoTokenizer import torch model = AutoModelForCausalLM.from_pretrained( "alchymia-ai/regent-7b", trust_remote_code=True, torch_dtype=torch.float16, device_map="auto", ) tok = AutoTokenizer.from_pretrained( "alchymia-ai/regent-7b", trust_remote_code=True ) out = model.generate( **tok("Hello, Regent!", return_tensors="pt").to(model.device), max_new_tokens=200 )
No weights yet? Run with --synthetic to validate on random data. All four phases run end-to-end. Swap in a real corpus when ready.

Where it fits

Workloads where constant memory, per-word accuracy scoring, structured knowledge input, or self-hosting matter more than raw benchmark rankings.

Legal Research and Drafting

Strong fit

Every claim needs to be traceable. The accuracy score flags uncertain spans before the document is finished. Case law and statute are graph-shaped; the knowledge encoder ingests them directly.

Healthcare and Clinical Support

Strong fit

Long sessions, fixed memory, clinical consequences for errors. UMLS, SNOMED, and ICD are structured ontologies the knowledge encoder reads natively. The accuracy score meets the auditability bar clinical workflows require.

Robotics and Drones

Strong fit

8-hour shifts and 6-hour missions need fixed memory. The state never grows regardless of duration. Accuracy score gates actions before execution.

Government and Defense

Strong fit

Air-gap is a hard requirement. Ships as weights, runs on local hardware, no external connectivity. Accuracy score produces auditable records per claim.

Financial Research and Risk

Strong fit

Long sessions over structured data. Every claim traceable before it influences a decision. The accuracy score is the audit trail. Self-hosting removes data sovereignty concerns.

Persistent AI Agents

Strong fit

Constant memory means no session limit. Knowledge graph input persists context without growing prompts. Behavioral state stays consistent across the full session.

Pharmaceutical and Drug Development

Strong fit

Molecular databases, drug interaction graphs, and trial data are graph-shaped. Regulatory submissions require every claim to be auditable. Trial data often cannot leave a jurisdiction. Wrong output has patient consequences.

Nuclear and Critical Infrastructure

Strong fit

Air-gap is mandatory. 8 to 12 hour shifts run without memory degradation. The accuracy score flags uncertain operational recommendations before an operator acts. No cloud vendor has clearance to run inside a nuclear facility.

Maritime and Offshore

Strong fit

Ships at sea have no reliable internet for weeks. Engine monitoring, navigation, and cargo management need a model that runs offline for the full voyage. Long sessions, fixed memory, no connectivity required.

Mining and Extraction

Strong fit

Remote sites, zero connectivity, safety-critical decisions. Long operational cycles. The accuracy score gates decisions before they become incidents. No ongoing infrastructure required after deployment.

Insurance and Claims

Strong fit

Every decision needs a traceable justification for regulatory review. Structured policy, precedent, and actuarial knowledge fits the graph encoder. The accuracy score is the audit trail regulators require.

Compliance and Regulatory

Strong fit

Pharmaceutical, financial, and environmental compliance all require AI output auditable at the claim level. Regulatory frameworks are graph-shaped. General models produce plausible compliance text with no verifiability.

Emergency Services

Strong fit

Works offline when infrastructure is down. Medical triage, search and rescue, resource allocation. A model requiring connectivity is unavailable exactly when needed most. Behavioral conditioning keeps output calibrated for high-stakes decisions.

Audit and Financial Forensics

Strong fit

Every figure needs a source. Long document review over structured financial data. The accuracy score identifies which claims to scrutinize before the report is signed. Self-hosting removes the conflict of sending client data to a third-party API.

Agriculture and Precision Farming

Strong fit

Unreliable connectivity, thin margins, structured knowledge of crop disease and soil. Zero marginal cost after deployment is the only viable model at this scale. Self-hosted deployment is the only distribution path that reaches this market.

Emerging Markets

Strong fit

One license, local hardware, zero marginal cost after deployment. Works without internet. Healthcare, legal, agricultural, and financial organizations across Africa, South Asia, Southeast Asia, and Latin America.

Code Generation

Strong fit

Entire repository in context with no limit. Calls compilers, test runners, and linters mid-generation through native tool calling. Verification head scores confidence per line before you run anything. Multi-hour sessions without memory degradation.

General-Purpose Chat

Good fit

Fixed memory means conversations never truncate. Adaptive gate reduces cost on routine exchanges. Thinking and tool calling work natively. At 7B it will not match 70B+ frontier models on open-ended tasks, but Grande Regent at 70B+ is competitive.

Four-phase pipeline

Run all four phases from the Model Studio interface with one click, or from the command line with one command. Validated end-to-end on Apple Silicon and NVIDIA CUDA.

# Command line: full pipeline from scrape to alignment PYTHONPATH=. python3 scripts/run_pipeline.py \ --config configs/regent_370m.yaml \ --scrape-config pipeline.yaml # Or open the Model Studio UI and click Start ./start.sh # opens at http://localhost:3000

Phase 1: Base

Language modeling on a general corpus. Full model trains. Learns grammar, world knowledge, and reasoning.

Phase 2: Identity

Fine-tuning on domain conversations. Knowledge encoder trains jointly. Learns output format and tone for the deployment context.

Phase 3: Verification

Accuracy head trains, backbone frozen. Trained on grounded, fabricated, contradicted, and entity-swapped pairs. 0.1% of parameters update.

Phase 4: Alignment

Preference learning against a frozen reference. Higher accuracy and better domain alignment are preferred. The model learns what good output means for its use case.

Alchymia Labs

Researching, building, and accelerating AI for developing economies.

The first production model from Africa

Regent is the first real language model to come out of Africa. Not a fine-tune. Not a wrapper. A ground-up architecture designed to be on par with the best models in the world at the workloads it is built for.

Alchymia Labs is founded by Ayomide I. Daniels. The team is in the diaspora. The work is global.

The 10x imperative

Developing economies do not need cheaper versions of Western AI. They need AI with different properties: offline, owned, auditable, affordable.

Getting there from where we stand requires a 10x magnitude of ingenuity. That is the operating requirement, and the core ethos of the people at Alchymia.

Open source commitment

7B to 50B is open source. Deployable, modifiable, free to build on. The organizations with the most to gain from AI are often the least able to pay for it.

Grande Regent at 70B to 1T is commercial. That revenue funds the open tier.

Distributed Shared Training Protocol

Training frontier-scale models currently requires concentrating tens of thousands of GPUs in one facility and a $50M to $200M check to a single cloud provider.

Alchymia is developing DSTP, a protocol to train 1 to 2 trillion parameter models by pooling compute across universities, national labs, government centers, private organizations, and individuals. Frontier-scale AI should not require a single nine-figure infrastructure investment.

Who this is for

Three billion people stand to gain the most from AI in healthcare, legal access, agriculture, finance, and education. None of the major labs build for them primarily.

Regent ships as infrastructure, not a subscription. It is the first model released under that mandate.

Get in touch

Open Source

Regent

Architecture, code, weights, and training pipeline are on GitHub. Issues, questions, and contributions welcome.

  • Bug reports and feature requests via GitHub Issues
  • Architecture and training questions via Discussions
  • Research collaboration: research@alchymia.ai
GitHub
Commercial

Grande Regent

Frontier-scale checkpoints, enterprise tooling, and SLA support. Distributed through Alchymia Groom.

  • 70B to 1T checkpoints
  • Production verification tooling
  • On-premise and air-gap deployment
  • Enterprise SLA and dedicated support
  • Custom fine-tuning and deployment services
Request Access
Press and media inquiries: research@alchymia.ai