Grimvane AI

Intelligence,
engineered.

Inference engines, autonomous agents, knowledge retrieval, coding tools, and the infrastructure underneath them. Built from scratch. Local-first.

What we build

AI that works
for you.

Engine

Crucible

From-scratch LLM inference engine. Direct GGUF parsing, full transformer forward pass, GPU compute via CUDA. No third-party wrappers, no framework dependencies.

Inference Engine CUDA From Scratch
Explore Crucible

Product

Shoal

Multi-dock AI coding platform. Connects local inference, frontier APIs, and CLI tools through a unified architecture. No telemetry, no cloud requirement.

Code Assistant Multi-dock Local-first
Explore Shoal

Product

Cairn

Local knowledge retrieval. Feed documents and codebases, ask questions, get answers grounded in your data with source attribution. Fully offline RAG.

RAG Semantic Search Local

Product

Hearth

Self-hosted conversational AI. Backend-agnostic chat interface with model switching, session persistence, and automatic hardware detection. Your own ChatGPT, on your machine.

Chat Interface Self-hosted Multi-backend

Library Suite

Flower Garden

Six zero-dependency libraries for AI infrastructure. Full-text search, vector storage, graph databases, tiered caching, encrypted storage, and structured data. Pluggable backends throughout.

Aster Camellia Dahlia Lotus Thistle Wisteria
Explore Garden

Engine

Built from scratch.

Crucible implements a full transformer forward pass with direct GGUF parsing, RoPE embeddings, grouped-query attention, and GPU compute. No wrappers. No framework dependencies. Every layer is ours.

Direct GGUF Parsing

Memory-mapped tensor access with dequantization for F16, Q4_0, Q4_K, Q6_K, and Q8_0.

Full Forward Pass

RMSNorm, RoPE, grouped-query attention, SwiGLU FFN. The entire transformer stack, implemented from first principles.

GPU Compute

CUDA acceleration via CuPy with streaming token output and nucleus sampling.

Privacy

Private by default.

Every product in the Grimvane AI ecosystem runs locally. No API keys required, no cloud round-trips, no telemetry. Your data never leaves your machine unless you choose otherwise.

Zero Telemetry

No usage tracking, no analytics callbacks, no phone-home behavior. Period.

Hardware Detection

Automatic platform and GPU detection. Metal, CUDA, ROCm, or CPU fallback.

Offline-capable

Cairn, Hearth, and Crucible run entirely offline once models are downloaded.

Architecture

Composable by design.

Flower Garden provides six standalone libraries for search, vectors, graphs, caching, encryption, and structured data. Each one is zero-dependency at its core with pluggable storage backends.

Pluggable Backends

In-memory, SQLite, and PostgreSQL backends for every library. Swap without changing application code.

Domain-agnostic

Built for AI infrastructure but designed to work anywhere. No opinions on your architecture.

Interoperable

Cairn uses Camellia for vectors. Corvath uses pluggable memory. The pieces compose naturally.

Capabilities

What we do.

Core competencies across the AI stack — from low-level model work to high-level agent orchestration.

01

From-Scratch Inference

Direct GGUF parsing, full transformer forward pass, and GPU-accelerated token generation with no framework dependencies.

02

Autonomous Agents

Multi-step reasoning with task classification, tool orchestration, role-based security, and automatic rollback.

03

Knowledge Retrieval

Document and codebase indexing with vector search, source attribution, and fully offline question answering.

04

Code Generation

Multi-dock architecture connecting local inference, frontier APIs, and CLI tools through a unified coding platform.

05

Conversational AI

Backend-agnostic chat with model switching, session persistence, and automatic hardware detection.

06

AI Infrastructure

Full-text search, vector storage, graph databases, tiered caching, encrypted storage, and structured data libraries.

Get started

Ready to build
with AI?

Whether you need an inference engine, an autonomous agent, or local AI infrastructure, we'd like to hear about it.