2026-04-07-vibe-erp-architecture-design.md 26 KB

Edit Raw Blame History



vibe_erp — Architecture Design (v1)

Status: Approved (brainstorm output, pre-implementation plan)
Date: 2026-04-07
Scope: High-level architecture for the entire framework. Implementation plans for individual PBCs and the v1 cut will be written separately.


1. Context and intent

vibe_erp is an ERP/EBC framework (not an ERP application) targeting the printing industry, intended to be sold worldwide and deployed self-hosted-first with a managed/hosted version added later. The reference business documentation under raw/业务流程设计文档/ describes one example printing shop and is treated as a fixture / acceptance test, never as a specification — no part of its workflow is hard-coded into the core.

The design satisfies — and in several cases establishes — the architectural guardrails in CLAUDE.md. The pre-existing guardrails (1–6) plus documentation discipline:


Core stays domain-agnostic (no printing terms in the core)
Workflows are data, not code
Extensibility seams come first
The reference customer is a test, not a requirement
Multi-tenant from day one in spirit
Global / i18n from day one
(plus "Documentation discipline" as a separate section in CLAUDE.md)


This design adds five more guardrails to CLAUDE.md (numbered 7–11), derived from validating the current design against 2026 SOTA (Gartner's Composable ERP frame, the MACH principles, SAP S/4HANA's Clean Core extension model, and ERPNext / Frappe's metadata-driven Doctype system):


Clean Core (extensions never modify the core; A/B/C/D extension grading)

Two-tier extensibility (key-user no-code metadata + developer pro-code plug-ins, both first-class)

PBC boundaries are sacred (modular monolith with strict bounded contexts; PBCs never import each other)

api.v1 is the only stable contract (semver-governed; everything else is internal)

AI agents are a first-class client (REST/OpenAPI surface must be MCP-callable; v1.0 architects the seam, v1.1 ships the endpoint)


2. Foundational decisions


Decision
Choice
Why


Deployment model

Self-hosted-first, hosted later, same artifact for both
User requirement; matches Odoo/ERPNext/Tryton/SAP S/4HANA self-host story


Architecture style

Modular monolith with strict bounded contexts (PBCs)
MACH allows "modularity OR microservices"; modular monolith is operationally sane for self-host; every successful self-hostable ERP made the same call


Backend language
Kotlin on the JVM
Mature ERP ecosystem (Hibernate, Flowable, ICU4J, JasperReports, PF4J), modern ergonomics, large global hiring pool


Backend framework
Spring Boot
De facto JVM application framework; PF4J integrates cleanly; Spring Data JPA, Spring Security, Actuator are all standard


Workflow engine
Embedded Flowable (BPMN 2.0)
Workflows-as-data is non-negotiable; BPMN is the standard; embedding avoids extra processes


Persistence

PostgreSQL as the only mandatory external dependency
Matches every modern open-source ERP; excellent JSONB + RLS support, both critical to this design


Multi-tenancy
Row-level tenant_id + Postgres RLS (defense in depth)
Same code path for self-host (one tenant) and hosted (many tenants); no schema explosion; two independent walls against data leaks


Custom fields

JSONB ext column on every business table, described by metadata rows
One row, one read, indexable via GIN, no migrations needed for additions, no joins; EAV is the wrong tool


Plug-in framework
PF4J + Spring Boot child contexts
Classloader isolation, manifest-based lifecycle, cleanest plug-in story on the JVM


Web client (v1)
React + TypeScript SPA
Single SPA covers desktop and tablet office workflows


Mobile client
React Native (v2, not v1)
Defer until core API is stable; reuses TS types from web


API style

REST + OpenAPI, MCP-callable surface
OpenAPI is the universal integration standard; the MCP server is a separate v1.1 deliverable, the seam exists in v1.0


Reporting
JasperReports
Mature, customer-skinnable, JVM-native


i18n
ICU MessageFormat (ICU4J) + Spring MessageSource
Plurals, gender, number/date formatting, locale fallback — all required for "sold worldwide"


Auth

Built-in JWT + OIDC (Keycloak-compatible)
Self-hosters get something out of the box; enterprise customers get SSO from day one


3. Topology
┌──────────────────────────────────────────────────────────────────────┐
│                          Customer's network                          │
│                                                                      │
│  Browser (React SPA) ─┐                                              │
│  AI agent (MCP, v1.1)─┼─► Reverse proxy ──► vibe_erp backend (1 image)│
│  3rd-party system    ─┘                       │                      │
│                                                │                      │
│   Inside the image (one Spring Boot process):  │                      │
│     ┌─────────────────────────────────────┐   │                      │
│     │  HTTP layer (REST + OpenAPI + MCP)  │   │                      │
│     ├─────────────────────────────────────┤   │                      │
│     │   Public Plug-in API  (api.v1.*)    │◄──┤  loaded from         │
│     │   — the only stable contract        │   │  ./plugins/*.jar     │
│     ├─────────────────────────────────────┤   │  via PF4J            │
│     │   Core PBCs (modular monolith):     │   │                      │
│     │   identity · catalog · partners ·   │   │                      │
│     │   inventory · warehousing ·         │   │                      │
│     │   orders-sales · orders-purchase ·  │   │                      │
│     │   production · quality · finance    │   │                      │
│     ├─────────────────────────────────────┤   │                      │
│     │   Cross-cutting:                    │   │                      │
│     │   • Flowable (workflows-as-data)    │   │                      │
│     │   • Metadata store (Doctype-style)  │   │                      │
│     │   • i18n (ICU MessageFormat)        │   │                      │
│     │   • Reporting (JasperReports)       │   │                      │
│     │   • Job scheduler (Quartz)          │   │                      │
│     │   • Audit, security, events         │   │                      │
│     └─────────────────────────────────────┘   │                      │
│                                                ▼                      │
│                                       PostgreSQL  (mandatory)        │
│                                       File store  (local or S3)      │
└──────────────────────────────────────────────────────────────────────┘

Optional sidecars for larger deployments (off by default):
   • Keycloak (OIDC)        • Redis (cache + queue)
   • OpenSearch (search)    • SMTP relay


The PBC names above are illustrative core capabilities; none is printing-specific. Printing-specific behavior lives in plug-ins under ./plugins/.


4. Two-tier extensibility (the "Clean Core" model)

The framework supports two extension paths, modeled on SAP S/4HANA's clean-core extensibility levels.


Tier 1 — Key user, no-code

Business analysts customize the system through the web UI. Everything they create is stored as rows in the metadata tables, scoped to their tenant, and tagged source = 'user' so it's preserved across plug-in install/uninstall and core upgrades.


Capability
Stored in


Custom field on an existing entity

metadata__custom_field → JSONB ext column at runtime


Custom form layout

metadata__form (JSON Schema + UI Schema)


Custom list view, filter, column set
metadata__list_view


Custom workflow

metadata__workflow → deployed to Flowable as BPMN


Simple "if X then Y" automation
metadata__rule


Custom entity (Doctype-style)

metadata__entity → auto-generated table at apply time


Custom report
metadata__report


Translations override
metadata__translation


No build, no restart, no deploy. The OpenAPI spec, the AI-agent function catalog, and the REST API auto-update from the metadata.


Tier 2 — Developer, pro-code

Software developers (the customer's IT, an integrator, or vibe_erp itself) ship a PF4J plug-in JAR. The plug-in:


Sees only org.vibeerp.api.v1.* — the public, semver-governed contract
Cannot import org.vibeerp.platform.* or any PBC's internal classes (rejected by the plug-in linter at install time)
Lives in its own classloader, its own Spring child context, its own DB schema namespace (plugin_<id>__*), its own metadata-source tag
Can register: new entities, new REST endpoints, new workflow tasks, new form widgets, new report templates, new event listeners, new permissions, new menu entries, new React micro-frontends


Extension grading (borrowed from SAP)


Grade
Definition
Upgrade safety


A
Tier 1 only (metadata)
Always safe across any core version


B
Tier 2, uses only api.v1 stable surface
Safe within a major version


C
Tier 2, uses deprecated-but-supported api.v1 symbols
Safe until next major; loader emits warnings


D
Tier 2, reaches into internal classes via reflection
UNSUPPORTED; loader rejects unless --allow-grade-d is set; will break


A core principle: anything a Tier 2 plug-in does should also be possible to do as a Tier 1 customization eventually. Tier 2 is the escape hatch where Tier 1 isn't expressive enough yet.


5. Module structure (Gradle multi-project)
vibe-erp/
├── api/
│   └── api-v1/                 ← THE CONTRACT (semver-governed)
│
├── platform/                   ← Framework runtime (internal)
│   ├── platform-bootstrap/
│   ├── platform-http/          REST + OpenAPI + MCP host
│   ├── platform-security/      AuthN/AuthZ, tenant resolution, OIDC
│   ├── platform-persistence/   JPA, multi-tenant routing, RLS, Liquibase
│   ├── platform-metadata/      Doctype-equivalent metadata store
│   ├── platform-workflow/      Flowable host
│   ├── platform-i18n/          ICU MessageFormat, locale resolution
│   ├── platform-events/        In-process bus + outbox
│   ├── platform-jobs/          Quartz scheduler
│   ├── platform-reporting/     JasperReports
│   ├── platform-files/         Local + S3 abstraction
│   └── platform-plugins/       PF4J host, lifecycle, classloader isolation
│
├── pbc/                        ← Core PBCs (each = bounded context)
│   ├── pbc-identity/
│   ├── pbc-catalog/
│   ├── pbc-partners/
│   ├── pbc-inventory/
│   ├── pbc-warehousing/
│   ├── pbc-orders-sales/
│   ├── pbc-orders-purchase/
│   ├── pbc-production/
│   ├── pbc-quality/
│   └── pbc-finance/
│
├── reference-customer/         ← NOT shipped in core
│   └── plugin-printing-shop/   Real PF4J plug-in expressing the
│                               raw/业务流程设计文档/ workflow.
│                               Built and tested in CI; not loaded by default.
│
├── web/                        ← React + TypeScript SPA
│
└── docs/                       ← Framework documentation


Dependency rule (strictly enforced)
api/api-v1            depends on: nothing (Kotlin stdlib + jakarta.validation only)
platform/*            depends on: api/api-v1 + Spring + libs
pbc/*                 depends on: api/api-v1 + platform/*  (NEVER another pbc)
plugins (incl. ref)   depend on:  api/api-v1 only


PBCs communicate only through (a) the event bus and (b) service interfaces declared in api.v1.ext.<pbc>. This is the rule that makes "modular monolith now, splittable later" real.


Per-PBC layout (every PBC follows this)
pbc-orders-sales/
├── api/                ← service contracts re-exported by api.v1
├── domain/             ← entities, value objects, domain services
├── application/        ← use cases / application services
├── infrastructure/     ← Hibernate mappings, repositories
├── http/               ← REST controllers
├── workflow/           ← BPMN files, task handlers
├── metadata/           ← seed metadata (default forms, rules)
├── i18n/               ← message bundles
└── migrations/         ← Liquibase changesets (own table prefix)


6. The api.v1 package

The single most important contract in the codebase. Everything in api.v1 is binary-stable within the 1.x line. Everything not in api.v1 is internal and can change in any release.
org.vibeerp.api.v1
├── core/         Tenant, Locale, Money, Quantity, Id<T>, Result<T,E>
├── entity/       Entity, Field, FieldType, EntityRegistry
├── persistence/  Repository<T>, Query, Page, Transaction
├── workflow/     WorkflowTask, WorkflowEvent, TaskHandler
├── form/         FormSchema, UiSchema
├── http/         @PluginEndpoint, RequestContext, ResponseBuilder
├── event/        DomainEvent, EventListener, EventBus
├── security/     Principal, Permission, PermissionCheck
├── i18n/         MessageKey, Translator, LocaleProvider
├── reporting/    ReportTemplate, ReportContext
├── plugin/       Plugin, PluginManifest, ExtensionPoint
└── ext/          Typed extension interfaces a plug-in implements


api.v1 is published as api-v1.jar to Maven Central so plug-in authors can build against it without pulling the entire vibe_erp source tree.


7. Plug-in lifecycle
1. Boot           ./plugins/*.jar scanned by platform-plugins
2. Manifest       plugin.yml read: id, version, requires-api, deps
3. Compatibility  rejected if requires-api ≠ current api.v1 major
4. Lint           rejected if it imports anything outside api.v1.*
5. Classload      PF4J creates an isolated classloader per plug-in
6. Register       plug-in's entry class implements api.v1.plugin.Plugin
                  and registers Extensions via @Extension
7. Wire           Spring child context per plug-in; plug-in's @Components
                  live there only
8. Migrate        plug-in's Liquibase changesets run in plugin_<id>__*
9. Seed metadata  plug-in's metadata YAML is upserted, tagged with plug-in id
10. Ready         endpoints, workflow tasks, forms, reports, listeners live
11. Disable       deregister, drop child context; data preserved
12. Uninstall     explicit operator action; only then is the schema dropped


8. Data model and multi-tenancy


Schema namespacing

PBCs and plug-ins use table name prefixes, not Postgres schemas:
identity__user, identity__role
catalog__item, catalog__item_attribute
inventory__stock_item, inventory__movement
orders_sales__order, orders_sales__order_line
production__work_order, production__operation
plugin_printingshop__plate_spec  (reference plug-in)
metadata__custom_field, metadata__form, metadata__workflow
flowable_*  (Flowable's own tables, untouched)


This keeps Hibernate, RLS policies, and migrations all in one logical schema (public), avoids search_path traps, and gives clean uninstall semantics.


Tenant isolation


Every business table has tenant_id, NOT NULL
Hibernate @TenantId filters every query at the application layer
Postgres Row-Level Security policies filter every query at the database layer
Two independent walls; a bug in one is not a data leak


Self-hosted single-customer = one tenant row called default. Hosted multi-tenant = many tenant rows. Same code path.


Custom fields

Every business table has:
ext       jsonb   not null  default '{}',
ext_meta  text    generated


Custom fields are JSON keys inside ext. A GIN index on ext makes them queryable. The metadata__custom_field table describes the JSON shape per entity per tenant. The form designer, list views, OpenAPI generator, and AI-agent function catalog all read from this table.

For the rare hot-path custom field, an operator can promote a JSON key to a real generated column via an auto-generated Liquibase changeset. This is an optimization, not the default.


The metadata store
metadata__entity         metadata__form           metadata__permission
metadata__custom_field   metadata__list_view      metadata__role_permission
metadata__workflow       metadata__rule           metadata__menu
metadata__report         metadata__translation    metadata__plugin_config


Every row carries tenant_id, source (core / plugin:<id> / user), version, is_active. The source column makes uninstall/upgrade safe: removing a plug-in cleans up its metadata; user-created metadata is sacred.


Migrations


Each PBC owns a Liquibase changelog under pbc-<name>/migrations/

Plug-ins ship their own changelogs inside their JAR
Forward-only and idempotent by default
Rollback blocks mandatory; CI rejects PRs without them
Tenant onboarding is INSERT INTO identity__tenant + seed metadata, not a migration — sub-second


Data sovereignty (sold worldwide)


Self-hosted is automatically compliant — customer chose where Postgres lives

Hosted supports per-region tenant routing: each tenant row carries a region; platform-persistence routes connections to the right regional Postgres cluster

PII tagging on field metadata (pii: true) drives auto-generated DSAR exports and erasure jobs (GDPR Articles 15/17)

Audit log (platform__audit, append-only, monthly partitions) records access to PII fields when audit-strict mode is on


9. Cross-cutting concerns


Concern
Approach


Security

PermissionCheck declared in api.v1.security; plug-ins register their own permissions, auto-listed in role editor


Transactions
Spring @Transactional at application-service layer; plug-ins use api.v1.persistence.Transaction, never Spring directly


Audit

created_at, created_by, updated_at, updated_by, tenant_id on every entity, applied by JPA listener; plug-ins inherit by extending api.v1.entity.AuditedEntity


Events
Typed DomainEvents on every state change; in-process bus by default; outbox table in Postgres for cross-crash reliability and as the seam where Kafka/NATS plugs in later without changing PBC code


AI-agent surface
Same business operations exposed through REST are exposable through an MCP server; v1.1 ships the MCP endpoint, v1.0 architects the seam


10. Packaging and deployment


Shipping artifact

One Docker image (ghcr.io/vibeerp/vibe-erp:1.0.0), plus an optional fat JAR for non-container environments.
/app/vibe-erp.jar
/app/api-v1.jar
/app/migrations/, /app/i18n/, /app/reports/   ← read-only
/opt/vibe-erp/                                ← customer-mounted volume
  ├── config/vibe-erp.yaml                    single config file
  ├── plugins/                                drop *.jar to install
  ├── i18n-overrides/
  ├── files/                                  if not using S3
  └── logs/


Single config file (closed key set)

vibe-erp.yaml covers: instance mode, database, file store, auth, i18n, plugins, observability. Plug-ins read their own config from metadata__plugin_config, not from the YAML.


Install (3 commands)
docker run -d --name vibe-erp \
  -p 8080:8080 \
  -v /srv/vibeerp:/opt/vibe-erp \
  -e DB_PASSWORD=... \
  ghcr.io/vibeerp/vibe-erp:1.0.0


First boot: connect → migrate → create default tenant → bootstrap admin → ready. Under 30 seconds.


Upgrade (1 command)

docker rm + docker run with the new image tag. Within a major version, all plug-ins continue to load. Across a major version, api.v1 and api.v2 ship side by side for at least one major release. Customer data is never destroyed by an upgrade by default.


Upgrade contract


Change
Allowed within 1.x?


Add a class to api.v1

yes


Add a method to an api.v1 interface (with default impl)
yes


Remove or rename anything in api.v1

no — major bump


Change behavior of an api.v1 symbol in a way plug-ins can observe
no — major bump


Anything in platform.* or pbc.*.internal.*

yes — that's why it's internal


11. v1.0 cut line


v1.0 ships


Single Docker image, fat JAR alternative
Core PBCs: identity, catalog, partners, inventory, warehousing, orders-sales, orders-purchase, production (basic), quality (basic), finance (basic)

api.v1 published to Maven Central
PF4J plug-in loader with classloader isolation, manifest validation, lifecycle
Metadata store: custom fields, forms, list views, simple rules
Embedded Flowable + BPMN designer in web UI
JSON Schema form designer in web UI
Built-in JWT auth + OIDC SSO
React web SPA covering all core PBCs and customization UIs
REST + OpenAPI on every endpoint
ICU i18n with shipping locales: en-US, zh-CN, de-DE, ja-JP, es-ES

Reference printing-shop plug-in (built and CI-tested, not loaded by default)
Liquibase migrations with mandatory rollback blocks
Audit log, PII tagging, basic DSAR export
Documentation site
One-command install, one-command upgrade
Health, metrics, structured logs


v1.0 deferred (architecturally accommodated)


React Native mobile app (v2)
MCP server for AI agents (v1.1)
Hosted multi-tenant deployment with per-region routing, billing, tenant provisioning UI (v2)
Plug-in marketplace / signed plug-ins (v2)
Webhooks-out and Kafka/NATS event streaming (v1.1, outbox seam already exists)
Advanced finance: tax engines, multi-currency revaluation (v1.2+)
Production scheduling / APS (v1.2+)
Hot plug-in reload without restart (v1.2+)
Full-text search beyond Postgres tsvector (v1.2+)


Release policy


Semver on api.v1. Major bumps overlap with previous major for ≥1 major release window
Semver on the core image
Plug-ins declare requires-api: "1.x"; mismatches fail at install, never at runtime
Minor releases every 6 weeks
LTS on every other major (1.x, 3.x, 5.x), supported 3 years


12. Risks and how the design addresses them


Risk
Mitigation


Core gradually accreting printing-specific concepts
The dependency rule + the reference plug-in: anything printing-specific that creeps into core breaks the build of plugin-printing-shop only if it's wrong; reviewers must reject any printing terminology in pbc/*


Plug-in API churn breaks the ecosystem

api.v1 is the only supported surface; plug-in linter rejects internal imports at install time; semver discipline + 1-major deprecation window


Cross-PBC coupling silently appears
Gradle dependency rule enforced by the build (pbc-orders-sales cannot declare pbc-inventory as a dependency); CI fails on violations


Multi-tenancy bug causes data leak in hosted version
Two independent walls (Hibernate filter + Postgres RLS); integration tests with multiple tenants in every PBC


"Workflows as data" turns into a custom DSL
BPMN 2.0 standard via Flowable; the temptation to invent a vibe_erp-only workflow language must be rejected


Metadata store becomes a write-once, read-by-no-one configuration graveyard
Every consumer (form renderer, list view, OpenAPI generator, AI function catalog, role editor) reads from it; no parallel sources of truth


JVM RAM cost makes self-hosting on small shops painful
Minimum spec documented (2 GB RAM, 1 vCPU); GraalVM native image evaluated for v2


Customer wants a different DB
Hibernate makes Postgres-only a soft constraint; JSONB and RLS make it harder; we explicitly do not support other DBs in v1.0 and document this


13. Verification (how the design will be proved out)

The design is verified by building the framework AND simultaneously building the reference printing-shop plug-in. The plug-in is the executable acceptance test:


If the plug-in can express the workflows in raw/业务流程设计文档/ using only api.v1, the framework is sufficient
If the plug-in needs to reach into a platform.* or pbc.* internal class, the seam is wrong and api.v1 needs to grow (deliberately)
If a feature in pbc/* is only there to make the printing plug-in work, the design is failing guardrail #1 and the feature must move into the plug-in


CI runs the full vibe_erp test suite and loads plugin-printing-shop in an integration test environment, exercising its key flows end-to-end against a real Postgres.


14. What happens after this spec

This spec is the architecture-level design. It is NOT an implementation plan. The next steps are:


The user reviews this document and either approves it or requests changes
On approval, hand off to the writing-plans skill to produce a sequenced implementation plan, broken into work units (each PBC, each platform module, each major capability)
CLAUDE.md is updated to reflect the named patterns adopted here (Clean Core, two-tier extensibility, PBCs, api.v1, AI-agent seam)
The plan is executed incrementally, with the reference printing-shop plug-in built alongside the framework so the abstraction is constantly stress-tested