# vibe_erp — Architecture Design (v1)

**Status:** Approved (brainstorm output, pre-implementation plan)
**Date:** 2026-04-07
**Scope:** High-level architecture for the entire framework. Implementation plans for individual PBCs and the v1 cut will be written separately.

---

## 1. Context and intent

`vibe_erp` is an **ERP/EBC framework** (not an ERP application) targeting the **printing industry**, intended to be **sold worldwide** and deployed **self-hosted-first** with a managed/hosted version added later. The reference business documentation under `raw/业务流程设计文档/` describes one example printing shop and is treated as a *fixture / acceptance test*, never as a specification — no part of its workflow is hard-coded into the core.

The design satisfies — and in several cases *establishes* — the architectural guardrails in `CLAUDE.md`. The pre-existing guardrails (1–6) plus documentation discipline:

1. Core stays domain-agnostic (no printing terms in the core)
2. Workflows are data, not code
3. Extensibility seams come first
4. The reference customer is a test, not a requirement
5. Multi-tenant from day one in spirit
6. Global / i18n from day one
   (plus "Documentation discipline" as a separate section in CLAUDE.md)

This design adds five more guardrails to CLAUDE.md (numbered 7–11), derived from validating the current design against 2026 SOTA (Gartner's **Composable ERP** frame, the **MACH** principles, SAP S/4HANA's **Clean Core** extension model, and ERPNext / Frappe's **metadata-driven Doctype** system):

7. **Clean Core** (extensions never modify the core; A/B/C/D extension grading)
8. **Two-tier extensibility** (key-user no-code metadata + developer pro-code plug-ins, both first-class)
9. **PBC boundaries are sacred** (modular monolith with strict bounded contexts; PBCs never import each other)
10. **`api.v1` is the only stable contract** (semver-governed; everything else is internal)
11. **AI agents are a first-class client** (REST/OpenAPI surface must be MCP-callable; v1.0 architects the seam, v1.1 ships the endpoint)

---

## 2. Foundational decisions

| Decision | Choice | Why |
|---|---|---|
| Deployment model | **Self-hosted-first**, hosted later, same artifact for both | User requirement; matches Odoo/ERPNext/Tryton/SAP S/4HANA self-host story |
| Architecture style | **Modular monolith** with strict bounded contexts (PBCs) | MACH allows "modularity OR microservices"; modular monolith is operationally sane for self-host; every successful self-hostable ERP made the same call |
| Backend language | **Kotlin on the JVM** | Mature ERP ecosystem (Hibernate, Flowable, ICU4J, JasperReports, PF4J), modern ergonomics, large global hiring pool |
| Backend framework | **Spring Boot** | De facto JVM application framework; PF4J integrates cleanly; Spring Data JPA, Spring Security, Actuator are all standard |
| Workflow engine | **Embedded Flowable (BPMN 2.0)** | Workflows-as-data is non-negotiable; BPMN is the standard; embedding avoids extra processes |
| Persistence | **PostgreSQL** as the only mandatory external dependency | Matches every modern open-source ERP; excellent JSONB + RLS support, both critical to this design |
| Multi-tenancy | **Row-level `tenant_id` + Postgres RLS (defense in depth)** | Same code path for self-host (one tenant) and hosted (many tenants); no schema explosion; two independent walls against data leaks |
| Custom fields | **JSONB `ext` column on every business table**, described by metadata rows | One row, one read, indexable via GIN, no migrations needed for additions, no joins; EAV is the wrong tool |
| Plug-in framework | **PF4J + Spring Boot child contexts** | Classloader isolation, manifest-based lifecycle, cleanest plug-in story on the JVM |
| Web client (v1) | **React + TypeScript SPA** | Single SPA covers desktop and tablet office workflows |
| Mobile client | **React Native (v2, not v1)** | Defer until core API is stable; reuses TS types from web |
| API style | **REST + OpenAPI**, MCP-callable surface | OpenAPI is the universal integration standard; the MCP server is a separate v1.1 deliverable, the seam exists in v1.0 |
| Reporting | **JasperReports** | Mature, customer-skinnable, JVM-native |
| i18n | **ICU MessageFormat (ICU4J) + Spring `MessageSource`** | Plurals, gender, number/date formatting, locale fallback — all required for "sold worldwide" |
| Auth | **Built-in JWT** + **OIDC** (Keycloak-compatible) | Self-hosters get something out of the box; enterprise customers get SSO from day one |

---

## 3. Topology

```
┌──────────────────────────────────────────────────────────────────────┐
│                          Customer's network                          │
│                                                                      │
│  Browser (React SPA) ─┐                                              │
│  AI agent (MCP, v1.1)─┼─► Reverse proxy ──► vibe_erp backend (1 image)│
│  3rd-party system    ─┘                       │                      │
│                                                │                      │
│   Inside the image (one Spring Boot process):  │                      │
│     ┌─────────────────────────────────────┐   │                      │
│     │  HTTP layer (REST + OpenAPI + MCP)  │   │                      │
│     ├─────────────────────────────────────┤   │                      │
│     │   Public Plug-in API  (api.v1.*)    │◄──┤  loaded from         │
│     │   — the only stable contract        │   │  ./plugins/*.jar     │
│     ├─────────────────────────────────────┤   │  via PF4J            │
│     │   Core PBCs (modular monolith):     │   │                      │
│     │   identity · catalog · partners ·   │   │                      │
│     │   inventory · warehousing ·         │   │                      │
│     │   orders-sales · orders-purchase ·  │   │                      │
│     │   production · quality · finance    │   │                      │
│     ├─────────────────────────────────────┤   │                      │
│     │   Cross-cutting:                    │   │                      │
│     │   • Flowable (workflows-as-data)    │   │                      │
│     │   • Metadata store (Doctype-style)  │   │                      │
│     │   • i18n (ICU MessageFormat)        │   │                      │
│     │   • Reporting (JasperReports)       │   │                      │
│     │   • Job scheduler (Quartz)          │   │                      │
│     │   • Audit, security, events         │   │                      │
│     └─────────────────────────────────────┘   │                      │
│                                                ▼                      │
│                                       PostgreSQL  (mandatory)        │
│                                       File store  (local or S3)      │
└──────────────────────────────────────────────────────────────────────┘

Optional sidecars for larger deployments (off by default):
   • Keycloak (OIDC)        • Redis (cache + queue)
   • OpenSearch (search)    • SMTP relay
```

The PBC names above are **illustrative core capabilities**; none is printing-specific. Printing-specific behavior lives in plug-ins under `./plugins/`.

---

## 4. Two-tier extensibility (the "Clean Core" model)

The framework supports **two extension paths**, modeled on SAP S/4HANA's clean-core extensibility levels.

### Tier 1 — Key user, no-code

Business analysts customize the system through the web UI. Everything they create is stored as **rows in the metadata tables**, scoped to their tenant, and tagged `source = 'user'` so it's preserved across plug-in install/uninstall and core upgrades.

| Capability | Stored in |
|---|---|
| Custom field on an existing entity | `metadata__custom_field` → JSONB `ext` column at runtime |
| Custom form layout | `metadata__form` (JSON Schema + UI Schema) |
| Custom list view, filter, column set | `metadata__list_view` |
| Custom workflow | `metadata__workflow` → deployed to Flowable as BPMN |
| Simple "if X then Y" automation | `metadata__rule` |
| Custom entity (Doctype-style) | `metadata__entity` → auto-generated table at apply time |
| Custom report | `metadata__report` |
| Translations override | `metadata__translation` |

No build, no restart, no deploy. The OpenAPI spec, the AI-agent function catalog, and the REST API auto-update from the metadata.

### Tier 2 — Developer, pro-code

Software developers (the customer's IT, an integrator, or vibe_erp itself) ship a **PF4J plug-in JAR**. The plug-in:
- Sees only `org.vibeerp.api.v1.*` — the public, semver-governed contract
- Cannot import `org.vibeerp.platform.*` or any PBC's internal classes (rejected by the plug-in linter at install time)
- Lives in its own classloader, its own Spring child context, its own DB schema namespace (`plugin_<id>__*`), its own metadata-source tag
- Can register: new entities, new REST endpoints, new workflow tasks, new form widgets, new report templates, new event listeners, new permissions, new menu entries, new React micro-frontends

### Extension grading (borrowed from SAP)

| Grade | Definition | Upgrade safety |
|---|---|---|
| **A** | Tier 1 only (metadata) | Always safe across any core version |
| **B** | Tier 2, uses only `api.v1` stable surface | Safe within a major version |
| **C** | Tier 2, uses deprecated-but-supported `api.v1` symbols | Safe until next major; loader emits warnings |
| **D** | Tier 2, reaches into internal classes via reflection | UNSUPPORTED; loader rejects unless `--allow-grade-d` is set; will break |

A core principle: **anything a Tier 2 plug-in does should also be possible to do as a Tier 1 customization eventually.** Tier 2 is the escape hatch where Tier 1 isn't expressive enough yet.

---

## 5. Module structure (Gradle multi-project)

```
vibe-erp/
├── api/
│   └── api-v1/                 ← THE CONTRACT (semver-governed)
│
├── platform/                   ← Framework runtime (internal)
│   ├── platform-bootstrap/
│   ├── platform-http/          REST + OpenAPI + MCP host
│   ├── platform-security/      AuthN/AuthZ, tenant resolution, OIDC
│   ├── platform-persistence/   JPA, multi-tenant routing, RLS, Liquibase
│   ├── platform-metadata/      Doctype-equivalent metadata store
│   ├── platform-workflow/      Flowable host
│   ├── platform-i18n/          ICU MessageFormat, locale resolution
│   ├── platform-events/        In-process bus + outbox
│   ├── platform-jobs/          Quartz scheduler
│   ├── platform-reporting/     JasperReports
│   ├── platform-files/         Local + S3 abstraction
│   └── platform-plugins/       PF4J host, lifecycle, classloader isolation
│
├── pbc/                        ← Core PBCs (each = bounded context)
│   ├── pbc-identity/
│   ├── pbc-catalog/
│   ├── pbc-partners/
│   ├── pbc-inventory/
│   ├── pbc-warehousing/
│   ├── pbc-orders-sales/
│   ├── pbc-orders-purchase/
│   ├── pbc-production/
│   ├── pbc-quality/
│   └── pbc-finance/
│
├── reference-customer/         ← NOT shipped in core
│   └── plugin-printing-shop/   Real PF4J plug-in expressing the
│                               raw/业务流程设计文档/ workflow.
│                               Built and tested in CI; not loaded by default.
│
├── web/                        ← React + TypeScript SPA
│
└── docs/                       ← Framework documentation
```

### Dependency rule (strictly enforced)

```
api/api-v1            depends on: nothing (Kotlin stdlib + jakarta.validation only)
platform/*            depends on: api/api-v1 + Spring + libs
pbc/*                 depends on: api/api-v1 + platform/*  (NEVER another pbc)
plugins (incl. ref)   depend on:  api/api-v1 only
```

PBCs communicate **only** through (a) the event bus and (b) service interfaces declared in `api.v1.ext.<pbc>`. This is the rule that makes "modular monolith now, splittable later" real.

### Per-PBC layout (every PBC follows this)

```
pbc-orders-sales/
├── api/                ← service contracts re-exported by api.v1
├── domain/             ← entities, value objects, domain services
├── application/        ← use cases / application services
├── infrastructure/     ← Hibernate mappings, repositories
├── http/               ← REST controllers
├── workflow/           ← BPMN files, task handlers
├── metadata/           ← seed metadata (default forms, rules)
├── i18n/               ← message bundles
└── migrations/         ← Liquibase changesets (own table prefix)
```

---

## 6. The `api.v1` package

The single most important contract in the codebase. Everything in `api.v1` is binary-stable within the `1.x` line. Everything not in `api.v1` is internal and can change in any release.

```
org.vibeerp.api.v1
├── core/         Tenant, Locale, Money, Quantity, Id<T>, Result<T,E>
├── entity/       Entity, Field, FieldType, EntityRegistry
├── persistence/  Repository<T>, Query, Page, Transaction
├── workflow/     WorkflowTask, WorkflowEvent, TaskHandler
├── form/         FormSchema, UiSchema
├── http/         @PluginEndpoint, RequestContext, ResponseBuilder
├── event/        DomainEvent, EventListener, EventBus
├── security/     Principal, Permission, PermissionCheck
├── i18n/         MessageKey, Translator, LocaleProvider
├── reporting/    ReportTemplate, ReportContext
├── plugin/       Plugin, PluginManifest, ExtensionPoint
└── ext/          Typed extension interfaces a plug-in implements
```

`api.v1` is published as `api-v1.jar` to Maven Central so plug-in authors can build against it without pulling the entire vibe_erp source tree.

---

## 7. Plug-in lifecycle

```
1. Boot           ./plugins/*.jar scanned by platform-plugins
2. Manifest       plugin.yml read: id, version, requires-api, deps
3. Compatibility  rejected if requires-api ≠ current api.v1 major
4. Lint           rejected if it imports anything outside api.v1.*
5. Classload      PF4J creates an isolated classloader per plug-in
6. Register       plug-in's entry class implements api.v1.plugin.Plugin
                  and registers Extensions via @Extension
7. Wire           Spring child context per plug-in; plug-in's @Components
                  live there only
8. Migrate        plug-in's Liquibase changesets run in plugin_<id>__*
9. Seed metadata  plug-in's metadata YAML is upserted, tagged with plug-in id
10. Ready         endpoints, workflow tasks, forms, reports, listeners live
11. Disable       deregister, drop child context; data preserved
12. Uninstall     explicit operator action; only then is the schema dropped
```

---

## 8. Data model and multi-tenancy

### Schema namespacing

PBCs and plug-ins use **table name prefixes**, not Postgres schemas:

```
identity__user, identity__role
catalog__item, catalog__item_attribute
inventory__stock_item, inventory__movement
orders_sales__order, orders_sales__order_line
production__work_order, production__operation
plugin_printingshop__plate_spec  (reference plug-in)
metadata__custom_field, metadata__form, metadata__workflow
flowable_*  (Flowable's own tables, untouched)
```

This keeps Hibernate, RLS policies, and migrations all in one logical schema (`public`), avoids `search_path` traps, and gives clean uninstall semantics.

### Tenant isolation

- Every business table has `tenant_id`, NOT NULL
- Hibernate `@TenantId` filters every query at the application layer
- Postgres Row-Level Security policies filter every query at the database layer
- Two independent walls; a bug in one is not a data leak

Self-hosted single-customer = one tenant row called `default`. Hosted multi-tenant = many tenant rows. **Same code path.**

### Custom fields

Every business table has:

```sql
ext       jsonb   not null  default '{}',
ext_meta  text    generated
```

Custom fields are JSON keys inside `ext`. A GIN index on `ext` makes them queryable. The `metadata__custom_field` table describes the JSON shape per entity per tenant. The form designer, list views, OpenAPI generator, and AI-agent function catalog all read from this table.

For the rare hot-path custom field, an operator can promote a JSON key to a real generated column via an auto-generated Liquibase changeset. This is an optimization, not the default.

### The metadata store

```
metadata__entity         metadata__form           metadata__permission
metadata__custom_field   metadata__list_view      metadata__role_permission
metadata__workflow       metadata__rule           metadata__menu
metadata__report         metadata__translation    metadata__plugin_config
```

Every row carries `tenant_id`, `source` (`core` / `plugin:<id>` / `user`), `version`, `is_active`. The `source` column makes uninstall/upgrade safe: removing a plug-in cleans up its metadata; user-created metadata is sacred.

### Migrations

- Each PBC owns a Liquibase changelog under `pbc-<name>/migrations/`
- Plug-ins ship their own changelogs inside their JAR
- Forward-only and idempotent by default
- Rollback blocks mandatory; CI rejects PRs without them
- Tenant onboarding is `INSERT INTO identity__tenant` + seed metadata, not a migration — sub-second

### Data sovereignty (sold worldwide)

- **Self-hosted** is automatically compliant — customer chose where Postgres lives
- **Hosted** supports **per-region tenant routing**: each tenant row carries a region; `platform-persistence` routes connections to the right regional Postgres cluster
- **PII tagging** on field metadata (`pii: true`) drives auto-generated **DSAR exports** and **erasure jobs** (GDPR Articles 15/17)
- **Audit log** (`platform__audit`, append-only, monthly partitions) records access to PII fields when audit-strict mode is on

---

## 9. Cross-cutting concerns

| Concern | Approach |
|---|---|
| Security | `PermissionCheck` declared in `api.v1.security`; plug-ins register their own permissions, auto-listed in role editor |
| Transactions | Spring `@Transactional` at application-service layer; plug-ins use `api.v1.persistence.Transaction`, never Spring directly |
| Audit | `created_at`, `created_by`, `updated_at`, `updated_by`, `tenant_id` on every entity, applied by JPA listener; plug-ins inherit by extending `api.v1.entity.AuditedEntity` |
| Events | Typed `DomainEvent`s on every state change; in-process bus by default; **outbox table** in Postgres for cross-crash reliability and as the seam where Kafka/NATS plugs in later without changing PBC code |
| AI-agent surface | Same business operations exposed through REST are exposable through an MCP server; v1.1 ships the MCP endpoint, v1.0 architects the seam |

---

## 10. Packaging and deployment

### Shipping artifact

**One Docker image** (`ghcr.io/vibeerp/vibe-erp:1.0.0`), plus an optional fat JAR for non-container environments.

```
/app/vibe-erp.jar
/app/api-v1.jar
/app/migrations/, /app/i18n/, /app/reports/   ← read-only
/opt/vibe-erp/                                ← customer-mounted volume
  ├── config/vibe-erp.yaml                    single config file
  ├── plugins/                                drop *.jar to install
  ├── i18n-overrides/
  ├── files/                                  if not using S3
  └── logs/
```

### Single config file (closed key set)

`vibe-erp.yaml` covers: instance mode, database, file store, auth, i18n, plugins, observability. Plug-ins read their own config from `metadata__plugin_config`, not from the YAML.

### Install (3 commands)

```bash
docker run -d --name vibe-erp \
  -p 8080:8080 \
  -v /srv/vibeerp:/opt/vibe-erp \
  -e DB_PASSWORD=... \
  ghcr.io/vibeerp/vibe-erp:1.0.0
```

First boot: connect → migrate → create `default` tenant → bootstrap admin → ready. Under 30 seconds.

### Upgrade (1 command)

`docker rm` + `docker run` with the new image tag. Within a major version, all plug-ins continue to load. Across a major version, `api.v1` and `api.v2` ship side by side for at least one major release. Customer data is never destroyed by an upgrade by default.

### Upgrade contract

| Change | Allowed within 1.x? |
|---|---|
| Add a class to `api.v1` | yes |
| Add a method to an `api.v1` interface (with default impl) | yes |
| Remove or rename anything in `api.v1` | no — major bump |
| Change behavior of an `api.v1` symbol in a way plug-ins can observe | no — major bump |
| Anything in `platform.*` or `pbc.*.internal.*` | yes — that's why it's internal |

---

## 11. v1.0 cut line

### v1.0 ships

- Single Docker image, fat JAR alternative
- Core PBCs: identity, catalog, partners, inventory, warehousing, orders-sales, orders-purchase, production (basic), quality (basic), finance (basic)
- `api.v1` published to Maven Central
- PF4J plug-in loader with classloader isolation, manifest validation, lifecycle
- Metadata store: custom fields, forms, list views, simple rules
- Embedded Flowable + BPMN designer in web UI
- JSON Schema form designer in web UI
- Built-in JWT auth + OIDC SSO
- React web SPA covering all core PBCs and customization UIs
- REST + OpenAPI on every endpoint
- ICU i18n with shipping locales: `en-US`, `zh-CN`, `de-DE`, `ja-JP`, `es-ES`
- Reference printing-shop plug-in (built and CI-tested, not loaded by default)
- Liquibase migrations with mandatory rollback blocks
- Audit log, PII tagging, basic DSAR export
- Documentation site
- One-command install, one-command upgrade
- Health, metrics, structured logs

### v1.0 deferred (architecturally accommodated)

- React Native mobile app (v2)
- MCP server for AI agents (v1.1)
- Hosted multi-tenant deployment with per-region routing, billing, tenant provisioning UI (v2)
- Plug-in marketplace / signed plug-ins (v2)
- Webhooks-out and Kafka/NATS event streaming (v1.1, outbox seam already exists)
- Advanced finance: tax engines, multi-currency revaluation (v1.2+)
- Production scheduling / APS (v1.2+)
- Hot plug-in reload without restart (v1.2+)
- Full-text search beyond Postgres `tsvector` (v1.2+)

### Release policy

- Semver on `api.v1`. Major bumps overlap with previous major for ≥1 major release window
- Semver on the core image
- Plug-ins declare `requires-api: "1.x"`; mismatches fail at install, never at runtime
- Minor releases every 6 weeks
- LTS on every other major (`1.x`, `3.x`, `5.x`), supported 3 years

---

## 12. Risks and how the design addresses them

| Risk | Mitigation |
|---|---|
| Core gradually accreting printing-specific concepts | The dependency rule + the reference plug-in: anything printing-specific that creeps into core breaks the build of `plugin-printing-shop` only if it's wrong; reviewers must reject any printing terminology in `pbc/*` |
| Plug-in API churn breaks the ecosystem | `api.v1` is the only supported surface; plug-in linter rejects internal imports at install time; semver discipline + 1-major deprecation window |
| Cross-PBC coupling silently appears | Gradle dependency rule enforced by the build (`pbc-orders-sales` cannot declare `pbc-inventory` as a dependency); CI fails on violations |
| Multi-tenancy bug causes data leak in hosted version | Two independent walls (Hibernate filter + Postgres RLS); integration tests with multiple tenants in every PBC |
| "Workflows as data" turns into a custom DSL | BPMN 2.0 standard via Flowable; the temptation to invent a vibe_erp-only workflow language must be rejected |
| Metadata store becomes a write-once, read-by-no-one configuration graveyard | Every consumer (form renderer, list view, OpenAPI generator, AI function catalog, role editor) reads from it; no parallel sources of truth |
| JVM RAM cost makes self-hosting on small shops painful | Minimum spec documented (2 GB RAM, 1 vCPU); GraalVM native image evaluated for v2 |
| Customer wants a different DB | Hibernate makes Postgres-only a soft constraint; JSONB and RLS make it harder; we explicitly do not support other DBs in v1.0 and document this |

---

## 13. Verification (how the design will be proved out)

The design is verified by **building the framework AND simultaneously building the reference printing-shop plug-in**. The plug-in is the executable acceptance test:

- If the plug-in can express the workflows in `raw/业务流程设计文档/` using **only `api.v1`**, the framework is sufficient
- If the plug-in needs to reach into a `platform.*` or `pbc.*` internal class, the seam is wrong and `api.v1` needs to grow (deliberately)
- If a feature in `pbc/*` is only there to make the printing plug-in work, the design is failing guardrail #1 and the feature must move into the plug-in

CI runs the full vibe_erp test suite **and** loads `plugin-printing-shop` in an integration test environment, exercising its key flows end-to-end against a real Postgres.

---

## 14. What happens after this spec

This spec is the **architecture-level** design. It is NOT an implementation plan. The next steps are:

1. The user reviews this document and either approves it or requests changes
2. On approval, hand off to the **writing-plans** skill to produce a sequenced implementation plan, broken into work units (each PBC, each platform module, each major capability)
3. CLAUDE.md is updated to reflect the named patterns adopted here (Clean Core, two-tier extensibility, PBCs, `api.v1`, AI-agent seam)
4. The plan is executed incrementally, with the reference printing-shop plug-in built alongside the framework so the abstraction is constantly stress-tested