KBForge

Expert AI. Built by Experts.

AI agents ground-truthed by domain scientists.
EPMA. SEM. SIMS. FIB. Scientific Instrumentation.

Pre-Seed Round | March 2026 | Confidential

The Problem

Generic AI hallucinates on expert topics. Professionals deserve better.

60%

of generic AI answers on EPMA questions contained errors
— 50-question internal benchmark

$15–$40

average cost of a single human expert interaction

Months

to build a custom AI system with ML engineers who don't know the domain

Scientists and engineers can't trust generic AI for real work. They need answers validated by people who understand the field — not chatbots trained on the internet.

The Solution

KBForge: Expert AI agents, built and validated by domain scientists.

We're analytical scientists who know EPMA, SEM, SIMS, and FIB from decades at the bench. We build AI agents validated against published literature — white-labeled under your brand, deployed in hours.

KB

Expert-Built KBs

Curated by domain scientists

SRC

Cited Sources

Published literature, not guesswork

LLM

Multi-LLM

Grok, Claude, OpenAI

4HR

Hours, Not Months

Proven 4-hr deploy

How It Works

From docs to live AI agent in 4 steps

01

Upload Docs

Customer provides Markdown, PDFs, or wiki exports. KBForge chunks, structures, and builds a FAISS vector index automatically.

02

Configure Brand

Name, logo, colors, custom domain, system prompt. 12 white-label config fields — their users see their brand, not ours.

03

Pick AI Model

Mini, chat, or reasoning models — Grok, Claude, OpenAI. Customers can switch tiers at any time.

04

Go Live

Isolated Docker container, auto-SSL via Caddy, streaming chat, user auth, admin analytics, feedback system — all included.

PROOF

Proof point: The SEM Textbook Agent was deployed from zero to production in ~4 hours using a Copilot agent session — different domain, fully white-labeled.

Market Opportunity

The AI customer support market is exploding

Total Addressable Market

$__B

Global AI customer support & knowledge mgmt

Serviceable Addressable

$__B

SMB software companies with documentation

Serviceable Obtainable

$__M

Realistic year-3 capture

Why now?

→

LLM costs dropped 90%+ in 18 months — now viable for SMB SaaS

→

RAG architecture is mature — FAISS + embeddings deliver accurate, grounded answers

→

Every software company has docs but can't afford ML engineers

→

Support costs rising while customer expectations for instant answers grow

Business Model

SaaS subscriptions + usage overages + services

STARTER

$149/mo

Solo devs, small OSS

50 users • 2K queries/mo
1 KB • Grok-mini
Overage: $0.05/query

POPULAR

PRO

$399/mo

SMB software companies

500 users • 10K queries/mo
3 KBs • Custom domain
Overage: $0.03/query

ENTERPRISE

$999+/mo

Compliance & scale needs

Unlimited users • 50K queries
All models • SSO/SAML
Overage: $0.02/query

Professional Services (KB Generation)

Light assist: $1,500–$3,000 — clean & structure messy docs
Full extraction: $5,000–$15,000 — KB authoring from source code
Ongoing maintenance: $500–$1,500/mo retainer

Unit Economics

LLM cost: $0.01–$0.05/query → 65–80% gross margin
Infra cost: ~$6–$20/mo per tenant (shared VPS)
Blended gross margin target: 70%+

Traction

Expert-built agents, production-proven

40+

Years domain expertise

2

Domains deployed

$10K

First KB buildout signed

4 hrs

New-domain deploy time

CalcZAF Support Agent

Electron microprobe analysis (EPMA) — 2 knowledge bases built from published literature and decades of hands-on experience. Cited sources on every answer.

EPMA VALIDATED

Pilot Customer

Signed pilot: $10,000 KB buildout + $200/mo maintenance. Validates both the product offering and the professional services revenue stream.

PAYING SaaS+SERVICES

Competitive Advantage

Why KBForge wins

1

Domain Scientists, Not Just Engineers

Our team has decades of experience in EPMA, SEM, SIMS, FIB, and scientific instrumentation. We understand the content we're building AI for — that's why the answers are right.

2

Literature-Validated Knowledge Bases

Every KB article is validated against published research by scientists who know what a correct answer looks like. Every response includes citable sources — not internet summaries.

3

Production-Proven Codebase

Not a prototype — two domains live, multi-KB architecture, streaming chat, auth, analytics, and feedback capture. Ship on day one.

4

Multi-LLM Flexibility

Not locked to one provider. Grok (cheapest), Claude (smartest), GPT-4o — swap in seconds.

5

Speed to Deploy

4-hour proof point (SEM Agent). Competitors take weeks or months of custom engineering.

6

Services + SaaS Flywheel

Expert KB buildout services ($1.5K–$15K) land customers who convert to recurring SaaS revenue.

The Team

Built by scientists who ship code

PHOTO

Founder Name

CEO / Founder

Bio — background, relevant experience, domain expertise

PHOTO

Name

Role — e.g., CTO

Bio — engineering background, relevant skills

PHOTO

Name

Role — e.g., Head of Sales

Bio — go-to-market experience

Published researchers and working analytical scientists who also ship production code. That combination — domain credibility plus engineering velocity — is our moat.

The Ask

Raising $500K Pre-Seed

Use of Funds (12 months runway)

Engineering (2–3 hires) 55%

Sales & Marketing 20%

Infrastructure & LLM Costs 10%

Legal, Insurance, Operations 10%

Reserve / Buffer 5%

12-Month Milestones

Q1

MVP launch + first 5 paying tenants

Self-serve onboarding, Stripe billing live

Q2

20 tenants • $__K MRR

Onboarding wizard, multi-KB support

Q3

50 tenants • $__K MRR

Enterprise tier, SSO/SAML, API export

Q4

Seed-ready • $__K MRR

Position for $2–3M seed round

hello@kbforge.com • kbforge.com

Appendix

Anticipated Q&A

Prepared answers for investor due-diligence and prospective customer objections.

5 sections | 40 questions answered

Investor Q&A

Market & Competition

Q: How is this different from Intercom Fin, Zendesk AI, or Ada?

Those are support-desk platforms that bolted on AI. We're the opposite: AI agents built by scientists who understand the content at a research level. Our team has decades of hands-on analytical laboratory experience — we don't just index docs, we validate every article against published literature. The result is cited, accurate answers. Plus: fully white-labeled, multi-LLM, deployed in hours, no CRM lock-in.

Q: Why not just embed a ChatGPT widget with your docs?

Raw ChatGPT has no knowledge of your documentation — it hallucinates. OpenAI's Assistants API requires engineering effort (chunking, embedding, prompt engineering, auth, analytics, feedback). KBForge does all of that out of the box: FAISS vector retrieval, tiered search, source citations, user auth, admin analytics, and white-label branding. It's the difference between a raw API and a finished product.

Q: What's your moat once OpenAI/Anthropic ship turnkey RAG?

LLM providers sell generic tools. Our moat is domain credibility: we're published scientists who validate KBs against peer-reviewed literature, not engineers connecting APIs. When an EPMA researcher asks about matrix corrections, our agent gives the right answer because someone who's done thousands of analyses built that KB. OpenAI can't replicate that. We're also LLM-agnostic — if they ship turnkey RAG, we use it as a backend while owning the expert layer.

Q: Isn't this a feature, not a company?

A feature would be "add generic AI to your help desk." KBForge is expert-built AI: domain scientists curating knowledge bases, validating answers, and deploying production agents. That's a full platform — ingestion pipeline, expert validation, vector indexing, multi-LLM orchestration, per-tenant containers, billing, usage metering, admin dashboards, user auth, feedback capture, and white-label deployment. The domain expertise is what makes it a company, not a wrapper.

Q: How big is the market for companies with docs but no support tool?

There are ~30 million small businesses globally, and millions of software companies specifically. Most SMB software companies rely on docs + email support. They can't afford Zendesk Enterprise or a support team, but they do have documentation. That's our sweet spot: the underserved middle between "DIY wiki" and "$50K/yr support platform."

Investor Q&A

Business Model & Unit Economics

Q: $149/mo Starter seems low — can you acquire customers profitably?

Starter is a land tier. At $149/mo with ~$6–$12/mo infra cost and ~$20–$40/mo LLM cost (2K queries), gross margin is ~50–60%. But Starter customers who outgrow 50 users or 2K queries upgrade to Pro ($399/mo, 70%+ margin). The real economics are in Pro and Enterprise. Starter also drives word-of-mouth from OSS/indie developers — low-cost acquisition channel.

Q: What's your expected CAC and LTV?

Early CAC is near-zero (founder-led sales, content marketing, community). At scale, target CAC is $500–$1,500. Pro customers at $399/mo with 18-month average retention = ~$7,200 LTV. That's a 5–14x LTV:CAC ratio. Professional services ($1.5K–$15K KB buildouts) are CAC-negative — we get paid to acquire customers.

Q: How much revenue from SaaS vs. services? Does services scale?

Year 1: likely 50/50 as KB buildouts land customers. Year 2+: target 80% SaaS / 20% services as self-serve onboarding matures. Services don't need to scale infinitely — they're a customer acquisition tool. Over time, we'll productize the ingestion pipeline so more customers self-serve, shifting the mix toward recurring revenue.

Q: What if LLM API costs go up?

LLM costs have dropped 90%+ in 18 months and the trend continues. But we're hedged: multi-LLM means we can route to the cheapest provider. Our overage pricing ($0.02–$0.05/query) gives 2–5x margin over LLM cost even at today's prices. If costs spike, we adjust pricing tiers — standard SaaS practice.

Q: Why wouldn't customers churn and self-host after you build their KB?

They could, just like any SaaS customer could rebuild in-house. But they'd need to maintain the RAG pipeline, manage LLM API keys, handle auth/security updates, monitor uptime, and update the KB as their docs evolve. Our $200–$1,500/mo is cheaper than an engineer's time. Plus, the ongoing KB maintenance retainer creates switching costs.

Investor Q&A

Traction, Risk & Team

Q: One pilot customer — how do you know this isn't consulting disguised as SaaS?

The $10K KB buildout is services, but it converts into a $200/mo SaaS subscription — that's the model working as designed. The pilot also validates willingness to pay for both components. We're building the self-serve pipeline in parallel so future customers can onboard without custom work.

Q: What does your pipeline look like beyond the one pilot?

We're pre-launch — the pilot is validation, not the pipeline. With funding, we'll launch the marketing site (already built), attend developer conferences, do content marketing around "AI support for docs," and leverage the CalcZAF + SEM case studies as proof points. Target: 5 paying tenants in Q1 post-funding.

Q: The codebase was built for one scientific tool — does it generalize?

We proved generalization with the SEM Textbook Agent — completely different domain, deployed in 4 hours. But here's the key insight: our credibility is the product. We're not building chatbots for everyone — we're building AI agents for scientific and technical domains where wrong answers have real consequences. Analytical microscopy, spectroscopy, laboratory instrumentation — these are fields where we've worked and published, and where generic AI fails badly. The architecture is domain-agnostic, but our advantage is domain credibility.

Q: Biggest technical risk at scale?

Container density — currently Docker Compose, which works up to ~50–100 tenants per node. Beyond that, we migrate to Kubernetes. The migration path is clear (containerized services, stateless app, external volume mounts), and K8s orchestration is a solved problem. We've budgeted this for the Q3 milestone.

Q: Why is this team the right one to build this?

We're not just engineers wrapping an API — we're published scientists with decades of hands-on analytical laboratory experience. We built the entire product — literature-validated knowledge bases, production deployment, full platform — before raising a dollar. We know what a correct answer looks like in these fields because we've published in them. That combination of domain credibility and shipping speed is extremely rare.

Q: How do you hire 4–5 people on $500K?

55% of funds ($275K) goes to engineering — enough for 2–3 hires at competitive startup salaries for 12 months. We're targeting strong mid-level engineers who want early equity + startup upside. Remote-first keeps costs lower than SF/NYC. The founder continues full-time product work, so we're effectively a 3–4 person engineering team from day one.

Client Q&A

Data Security & "Why Not ChatGPT?"

Q: How do we know our documentation data is secure?

Each tenant gets an isolated Docker container with its own file system, database, and API keys. No data is shared between tenants. Documentation is stored on our infrastructure (DigitalOcean/Hetzner, US or EU) and accessed only by your container. We never use your data to train models — it's processed through LLM APIs that also don't train on API inputs (OpenAI, Anthropic, and xAI all confirm this in their API terms).

Q: Do you store chat conversations? Can we delete them?

Conversations are stored in your tenant's isolated database for analytics and feedback review. You have full control — admin dashboard lets you view, export, or delete conversations. On the Enterprise plan, you can configure retention policies (auto-delete after N days). If you cancel, we archive and delete all data within 30 days (or immediately on request).

Q: Is our data used to train AI models?

No. We use LLM APIs (not training endpoints). OpenAI, Anthropic, and xAI all confirm that API inputs are not used for model training. Your documentation stays in your container, your conversations stay in your database. We will provide a DPA on request for Enterprise customers.

Q: SOC 2 / GDPR / HIPAA compliance?

GDPR: we're compliant by design (data isolation, deletion on request, EU hosting option). SOC 2: on our roadmap for Q3 — required for Enterprise tier. HIPAA: not currently supported, but the isolated-container architecture makes a BAA feasible for a future healthcare vertical. Enterprise customers on dedicated VPS can bring their own compliance requirements.

Q: What makes this different from just giving ChatGPT our docs?

ChatGPT doesn't know your field — it hallucinates on niche scientific and technical topics. KBForge agents are built by domain experts who curate and validate every KB article. We build a FAISS vector index of expert-verified content, and the LLM only synthesizes from that — not from the open internet. Every response includes source citations. It's the difference between asking a random person vs. asking a colleague who's spent 20 years in your field.

Q: How do you handle hallucination?

Four layers: (1) Knowledge bases built from published literature — the source material is verified before the LLM ever sees it. (2) Tiered FAISS retrieval ensures the LLM only sees relevant, validated documentation chunks, not the entire internet. (3) System prompts instruct the model to say "I don't have information about that" when retrieval confidence is low. (4) Source citations on every answer let users verify, and a feedback system lets them flag bad answers for review. Fundamentally different from generic AI — every answer is grounded in citable, verified sources.

Q: What happens when the AI gives a wrong answer?

Users can click a feedback button (thumbs down + optional comment, with screenshot capture). Admins see all flagged responses in the analytics dashboard. You can then update KB articles to improve future answers, adjust the system prompt, or add the question to a "known issues" list. The feedback loop continuously improves accuracy.

Client Q&A

Product Fit, Integration & Pricing

Q: Our docs are messy — PDFs, wikis, scattered READMEs. Can you work with that?

Yes. Our ingestion pipeline handles Markdown, PDFs, and wiki exports (Confluence, GitBook, ReadTheDocs). For messy sources, our professional services team ($1.5K–$3K) will clean, structure, and chunk your content. For source-code-heavy projects, we offer agent-assisted KB authoring ($5K–$15K).

Q: What if our KB changes frequently?

Upload updated docs anytime — we rebuild the FAISS vector index automatically. On our maintenance retainer ($500–$1,500/mo), we handle ongoing updates as your software and docs evolve. Self-serve customers can re-trigger the indexing pipeline via admin panel.

Q: Can the agent say "I don't know"?

Yes — this is a core design principle. When FAISS retrieval finds no relevant documentation, the system prompt instructs the LLM to respond with "I don't have information about that in my knowledge base" rather than guessing. You can customize this fallback message.

Q: Can we control which topics the agent refuses to answer?

Yes. The custom system prompt (Pro and Enterprise plans) lets you define boundaries: "Only answer questions about [product]. Do not discuss competitors, pricing, or legal matters." The agent inherently stays within your KB scope since retrieval is grounded in your documentation.

Q: Will our customers know they're talking to a third-party tool?

No. Full white-label: your company name, your logo, your colors, your domain. 12 configurable branding fields plus full CSS control. Your users see "Acme Support Agent" at acme-support.com — no mention of KBForge anywhere in the UI.

Q: Can we embed this in our existing app?

Currently deployed as a standalone web app on your custom domain. Embeddable widget (iframe/JS snippet) is on our roadmap. Enterprise customers can request iframe integration as a custom feature today.

Q: Can it integrate with our ticketing system (Jira, Linear, Freshdesk)?

Not yet natively. Enterprise plan includes API export of conversations and feedback. Webhook-based integrations (escalate to Jira ticket when confidence is low) are on the Q2 roadmap. We prioritize integrations based on customer demand.

Q: What's your uptime SLA?

Pro: 99.5% uptime target. Enterprise: 99.9% SLA with dedicated VPS and uptime monitoring (UptimeRobot + Sentry). If our service is down, the support page shows a graceful fallback message — it doesn't break your site.

Q: Overage pricing worries us — is there a cap?

We can set a hard query cap (agent stops responding) or a soft cap (alerts you, keeps responding). Enterprise plans include custom overage terms. You always see real-time usage in the admin dashboard so there are no surprises.

Q: Can we try before committing?

Yes. Contact us and we'll set up a demo agent on your knowledge base so you can see the quality first-hand before picking a plan.

Q: Does it support multiple languages?

The underlying LLMs (Claude, GPT-4o, Grok) support 50+ languages natively. If your docs are in English, the agent can still answer in the user's language — the LLM handles translation on the fly. For best results, provide KB content in the primary language(s) your customers use.

Q: Can we see what users are asking and where the agent fails?

Yes. The admin analytics dashboard shows: total conversations, popular topics, unanswered questions, feedback scores, and flagged responses. Full plan includes analytics; Enterprise adds API export. This data lets you continuously improve your KB and identify gaps in your documentation.