Regent Model - The Model Built for Decisions That Matter

Architecture

Not a transformer

Every frontier model today is a transformer. Regent is a Mamba-2 state-space model with grouped-query attention at selected layers. Different engine, different properties. Recurrent layers handle processing with a fixed memory footprint. Attention layers are placed every eight blocks for precise recall. Two output heads share the same internal state.

⚡

Recurrent Layers (Mamba-2)

Compresses each input into a fixed-size state buffer. The buffer stays the same size whether the session runs one minute or eight hours.

🎯

Attention Layers (Sparse)

Placed every eight recurrent layers for precise recall. Finds specific facts the recurrent layers compressed away. Sliding window keeps memory bounded.

🧠

Knowledge Graph Encoder

Converts structured knowledge nodes into native model inputs. Each node carries confidence, recency, emotional weight, and category. No text conversion needed.

💜

Behavioral Conditioning

7 numbers injected at multiple layers during generation: mood, influence, truth bias, civility, good/evil, curiosity, self-preservation. Shapes the entire response. Not a prompt prefix.

What makes it different

Eight things this model does that others do not

No production language model ships with all eight. Most ship with none.

It scores its own accuracy while it writes

Two output channels, one pass. One produces the next word. The other scores it 0 to 1: confident or guessing. Other systems check after the fact by re-running the model or sending output to a second one.

It changes behavior when confidence drops

Above 0.6: writes normally. Between 0.3 and 0.6: slows, picks conservative words. Below 0.3: stops, retrieves facts, tries again from the uncertain point. Other models write at the same pace whether right or wrong.

It reads structured knowledge directly

Typed nodes as input: facts, relationships, memories, constraints, each with confidence and category. No text conversion. Other systems flatten knowledge into prompt text and re-inject it on every call.

Behavioral state is a dial, not a paragraph

7 numbers alongside the conversation: mood, influence, truth bias, civility, good/evil, curiosity, self-preservation. Injected at multiple layers. Shapes the entire response. Other models use a prompt persona that dilutes as the response grows.

Constant memory regardless of session length

Other models accumulate tens of gigabytes over a multi-hour session. Regent compresses past context into a fixed 1 GB state. It does not grow. A session running for weeks uses the same memory it used in the first second.

Accuracy scoring adds 0.1% to the cost

Reads state that already exists. No extra passes. Most safety systems run the model 5 to 20 times for a reliability signal. This one runs once.

It thinks before it answers

When the question requires reasoning, the model works through it internally first, then responds. The reasoning is visible to the caller. Built into the generation loop with dedicated tokens, not a prompt engineering trick.

It calls tools natively

When the model needs external information, it emits a structured tool request, pauses, receives the result, and continues. APIs, databases, search. Built into the architecture with dedicated tokens, not a plugin layer.

Accuracy Scoring

Built-in grounding verification

The verification head reads the same internal state as the generation head and outputs an accuracy score per word. One pass. No separate model, no multiple samples, no post-hoc filtering.

FLOW (score > 0.6)

Model is confident and grounded. Writes normally at the configured temperature and sampling settings. Standard generation for topics the model knows well.

CAUTION (0.3 to 0.6)

Model is uncertain. Temperature drops, output shifts toward conservative and hedged language. The response signals its own uncertainty rather than asserting it.

HALT (score < 0.3)

Model is fabricating. Generation stops. The system retrieves relevant knowledge, re-injects it as context, and re-generates from the point where confidence broke down.

Per-word, not per-response

Most checks score an entire response. This scores every word, catching fabricated claims mid-sentence and identifying the exact span.

No extra inference cost

Sampling verification runs the model 5 to 20 times. This reads state that already exists. 0.1% of parameters, zero extra passes.

Grounded in the knowledge base

Retrieval augmentation gives the model documents but does not stop it ignoring them. The verification head is trained to flag claims not grounded in the knowledge base.

Comparison

Against the flagships

Compared on what Regent is built for, not on general benchmarks. The models listed are the latest publicly available frontier models from each lab as of 2026.

Property	Regent	GPT-5	Claude Opus 4.6	Gemini 2.5	Llama 4	DeepSeek V3
Non-transformer architecture	Mamba-2 SSM	Transformer	Transformer	Transformer	Transformer	MoE Transformer
Thinks before answering	Yes	Yes	Yes	Yes	No	Yes
Native tool calling	Yes	Yes	Yes	Yes	Yes	Yes
Per-word accuracy score	Yes	No	No	No	No	No
Stops and self-corrects mid-generation	Yes	Post-hoc only	Post-hoc only	Post-hoc only	Post-hoc only	Post-hoc only
Fixed memory at any session length	Yes (~1 GB fixed)	Grows with session	Grows with session	Grows with session	Grows with session	Grows with session
Native structured knowledge input	Yes	Text only	Text only	Text only	Text only	Text only
Self-hosted, no API required	Yes	API only	API only	API only	Open weights	Open weights
Air-gap deployable	Yes	No	No	No	Yes	Yes
Self-hosted, single server	7B on 16 GB GPU	API only	API only	API only	8B at ~5GB	Too large
Multi-hour session runtime	Hours to days	Context window only	Context window only	Context window only	Context window only	Context window only
Auditable per-claim accuracy trace	Yes	Reasoning trace only	Reasoning trace only	Reasoning trace only	No	No
HuggingFace compatible	Yes	No	No	No	Yes	Yes

Not a general capability ranking. On language quality, instruction following, math, and coding, the frontier transformers lead. Regent is a different architecture built for different properties: accuracy-scored, long-running, self-hosted workloads where wrong answers have a measurable cost.

Memory at scale: the operational difference

Standard models accumulate memory with every word processed. Regent's memory is fixed. For workloads measured in hours, not turns, this is the difference between running and not running.

Model	Architecture	Memory at 10K words	Memory at 100K words	Memory at 1M words
Regent 7B	Hybrid recurrent + attention	~1 GB	~1 GB	~1 GB
GPT-5	Transformer	API managed	API managed	Exceeds context
Claude Opus 4.6	Transformer	API managed	API managed	Exceeds context
Llama 4 70B	Transformer	~2.5 GB	~25 GB	Will not fit
DeepSeek V3	Mixture-of-experts	~1.5 GB	~15 GB	Will not fit

Deployment

Three ways to deploy

Export from the Model Studio interface or the command line. One click to HuggingFace. One click to Docker. Runs on NVIDIA CUDA and Apple Silicon.

🤗

HuggingFace

Export to a HuggingFace package. Load with two lines. Push to Hub from the interface. Works with existing fine-tuning, quantization, and deployment tooling.

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained(
    "path/to/export", trust_remote_code=True
)
        

🐳

Docker / Self-Hosted

Self-contained Docker package with FastAPI server. Runs on any machine with Docker. No Python setup needed.

⚡

CUDA + Apple Silicon

Native on NVIDIA GPUs and Apple Silicon. Optional Triton kernel on CUDA for long-sequence throughput. CPU fallback on any machine.

🖥

Model Studio Interface

Browser interface for the full lifecycle: scrape data, train, monitor, chat with live accuracy scores, manage the knowledge graph, export. No command line needed.

🔌

OpenAI-Compatible API

Drop-in replacement at /v1/chat/completions. Any SDK, framework, or application built for OpenAI works by changing the base URL. No code changes needed.

📦

Weight Formats

SafeTensors format, readable by PyTorch and MLX. Supports float32, float16, and bfloat16. Same file loads on both platforms.

Quick Start

Up and running in three steps

Clone, install, and start. The Model Studio interface opens in your browser and handles everything from there.

Step 1 — Clone and install

# Apple Silicon
git clone https://github.com/Alchymia-AI/Regent
cd regent-model
python3.12 -m venv .venv && source .venv/bin/activate
pip install mlx pyyaml sentencepiece numpy fastapi uvicorn pydantic

# NVIDIA GPU
pip install torch --index-url https://download.pytorch.org/whl/cu124
pip install pyyaml sentencepiece numpy fastapi uvicorn pydantic safetensors transformers
        

Step 2 — Start the interface

# Starts Model Studio UI + API server
./start.sh

# Opens at http://localhost:3000
# API at http://localhost:8400

# With a specific model config and weights
./start.sh --config configs/regent_7b.yaml \
           --model  checkpoints/alignment/regent.safetensors
        

Step 3 — Or load from HuggingFace

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model = AutoModelForCausalLM.from_pretrained(
    "alchymia-ai/regent-7b",
    trust_remote_code=True,
    torch_dtype=torch.float16,
    device_map="auto",
)
tok = AutoTokenizer.from_pretrained(
    "alchymia-ai/regent-7b", trust_remote_code=True
)
out = model.generate(
    **tok("Hello, Regent!", return_tensors="pt").to(model.device),
    max_new_tokens=200
)
        

No weights yet? Run with --synthetic to validate on random data. All four phases run end-to-end. Swap in a real corpus when ready.

Applications

Where it fits

Workloads where constant memory, per-word accuracy scoring, structured knowledge input, or self-hosting matter more than raw benchmark rankings.

Legal Research and Drafting

Strong fit

Every claim needs to be traceable. The accuracy score flags uncertain spans before the document is finished. Case law and statute are graph-shaped; the knowledge encoder ingests them directly.

Healthcare and Clinical Support

Strong fit

Long sessions, fixed memory, clinical consequences for errors. UMLS, SNOMED, and ICD are structured ontologies the knowledge encoder reads natively. The accuracy score meets the auditability bar clinical workflows require.

Robotics and Drones

Strong fit

8-hour shifts and 6-hour missions need fixed memory. The state never grows regardless of duration. Accuracy score gates actions before execution.

Government and Defense

Strong fit

Air-gap is a hard requirement. Ships as weights, runs on local hardware, no external connectivity. Accuracy score produces auditable records per claim.

Financial Research and Risk

Strong fit

Long sessions over structured data. Every claim traceable before it influences a decision. The accuracy score is the audit trail. Self-hosting removes data sovereignty concerns.

Persistent AI Agents

Strong fit

Constant memory means no session limit. Knowledge graph input persists context without growing prompts. Behavioral state stays consistent across the full session.

Pharmaceutical and Drug Development

Strong fit

Molecular databases, drug interaction graphs, and trial data are graph-shaped. Regulatory submissions require every claim to be auditable. Trial data often cannot leave a jurisdiction. Wrong output has patient consequences.

Nuclear and Critical Infrastructure

Strong fit

Air-gap is mandatory. 8 to 12 hour shifts run without memory degradation. The accuracy score flags uncertain operational recommendations before an operator acts. No cloud vendor has clearance to run inside a nuclear facility.

Maritime and Offshore

Strong fit

Ships at sea have no reliable internet for weeks. Engine monitoring, navigation, and cargo management need a model that runs offline for the full voyage. Long sessions, fixed memory, no connectivity required.

Mining and Extraction

Strong fit

Remote sites, zero connectivity, safety-critical decisions. Long operational cycles. The accuracy score gates decisions before they become incidents. No ongoing infrastructure required after deployment.

Insurance and Claims

Strong fit

Every decision needs a traceable justification for regulatory review. Structured policy, precedent, and actuarial knowledge fits the graph encoder. The accuracy score is the audit trail regulators require.

Compliance and Regulatory

Strong fit

Pharmaceutical, financial, and environmental compliance all require AI output auditable at the claim level. Regulatory frameworks are graph-shaped. General models produce plausible compliance text with no verifiability.

Emergency Services

Strong fit

Works offline when infrastructure is down. Medical triage, search and rescue, resource allocation. A model requiring connectivity is unavailable exactly when needed most. Behavioral conditioning keeps output calibrated for high-stakes decisions.

Audit and Financial Forensics

Strong fit

Every figure needs a source. Long document review over structured financial data. The accuracy score identifies which claims to scrutinize before the report is signed. Self-hosting removes the conflict of sending client data to a third-party API.

Agriculture and Precision Farming

Strong fit

Unreliable connectivity, thin margins, structured knowledge of crop disease and soil. Zero marginal cost after deployment is the only viable model at this scale. Self-hosted deployment is the only distribution path that reaches this market.

Emerging Markets

Strong fit

One license, local hardware, zero marginal cost after deployment. Works without internet. Healthcare, legal, agricultural, and financial organizations across Africa, South Asia, Southeast Asia, and Latin America.

Code Generation

Strong fit

Entire repository in context with no limit. Calls compilers, test runners, and linters mid-generation through native tool calling. Verification head scores confidence per line before you run anything. Multi-hour sessions without memory degradation.

General-Purpose Chat

Good fit

Fixed memory means conversations never truncate. Adaptive gate reduces cost on routine exchanges. Thinking and tool calling work natively. At 7B it will not match 70B+ frontier models on open-ended tasks, but Grande Regent at 70B+ is competitive.

Training

Four-phase pipeline

Run all four phases from the Model Studio interface with one click, or from the command line with one command. Validated end-to-end on Apple Silicon and NVIDIA CUDA.

# Command line: full pipeline from scrape to alignment
PYTHONPATH=. python3 scripts/run_pipeline.py \
    --config configs/regent_370m.yaml \
    --scrape-config pipeline.yaml

# Or open the Model Studio UI and click Start
./start.sh   # opens at http://localhost:3000

Phase 1: Base

Language modeling on a general corpus. Full model trains. Learns grammar, world knowledge, and reasoning.

Phase 2: Identity

Fine-tuning on domain conversations. Knowledge encoder trains jointly. Learns output format and tone for the deployment context.

Phase 3: Verification

Accuracy head trains, backbone frozen. Trained on grounded, fabricated, contradicted, and entity-swapped pairs. 0.1% of parameters update.

Phase 4: Alignment

Preference learning against a frozen reference. Higher accuracy and better domain alignment are preferred. The model learns what good output means for its use case.

About

Alchymia Labs

Researching, building, and accelerating AI for developing economies.

The first production model from Africa

Regent is the first real language model to come out of Africa. Not a fine-tune. Not a wrapper. A ground-up architecture designed to be on par with the best models in the world at the workloads it is built for.

Alchymia Labs is founded by Ayomide I. Daniels. The team is in the diaspora. The work is global.

The 10x imperative

Developing economies do not need cheaper versions of Western AI. They need AI with different properties: offline, owned, auditable, affordable.

Getting there from where we stand requires a 10x magnitude of ingenuity. That is the operating requirement, and the core ethos of the people at Alchymia.

Open source commitment

7B to 50B is open source. Deployable, modifiable, free to build on. The organizations with the most to gain from AI are often the least able to pay for it.

Grande Regent at 70B to 1T is commercial. That revenue funds the open tier.

Distributed Shared Training Protocol

Training frontier-scale models currently requires concentrating tens of thousands of GPUs in one facility and a $50M to $200M check to a single cloud provider.

Alchymia is developing DSTP, a protocol to train 1 to 2 trillion parameter models by pooling compute across universities, national labs, government centers, private organizations, and individuals. Frontier-scale AI should not require a single nine-figure infrastructure investment.

Who this is for

Three billion people stand to gain the most from AI in healthcare, legal access, agriculture, finance, and education. None of the major labs build for them primarily.

Regent ships as infrastructure, not a subscription. It is the first model released under that mandate.

The model built fordecisions that matter.

Two distributions

Regent

Grande Regent

Not a transformer

Recurrent Layers (Mamba-2)

Attention Layers (Sparse)

Knowledge Graph Encoder

Behavioral Conditioning

Eight things this model does that others do not

It scores its own accuracy while it writes

It changes behavior when confidence drops

It reads structured knowledge directly

Behavioral state is a dial, not a paragraph

Constant memory regardless of session length

Accuracy scoring adds 0.1% to the cost

It thinks before it answers

It calls tools natively

Built-in grounding verification

FLOW (score > 0.6)

CAUTION (0.3 to 0.6)

HALT (score < 0.3)

Per-word, not per-response

No extra inference cost

Grounded in the knowledge base

Against the flagships

Memory at scale: the operational difference

Benchmark performance

7B to 1T parameters

Three ways to deploy

HuggingFace

Docker / Self-Hosted

CUDA + Apple Silicon

Model Studio Interface

OpenAI-Compatible API

Weight Formats

Up and running in three steps

Step 1 — Clone and install

Step 2 — Start the interface

Step 3 — Or load from HuggingFace

Where it fits

Legal Research and Drafting

Healthcare and Clinical Support

Robotics and Drones

Government and Defense

Financial Research and Risk

Persistent AI Agents

Pharmaceutical and Drug Development

Nuclear and Critical Infrastructure

Maritime and Offshore

Mining and Extraction

Insurance and Claims

Compliance and Regulatory

Emergency Services

Audit and Financial Forensics

Agriculture and Precision Farming

Emerging Markets

Code Generation

General-Purpose Chat

Four-phase pipeline

Phase 1: Base

Phase 2: Identity

Phase 3: Verification

Phase 4: Alignment

Alchymia Labs

The first production model from Africa

The 10x imperative

Open source commitment

Distributed Shared Training Protocol

Who this is for

Get in touch

Regent

Grande Regent

The model built for
decisions that matter.