Files
space-game/docs/VALIDATION.md

16 KiB

Manual Validation Plan

This document defines the manual validation passes to run against the current game basis.

It is intentionally focused on behavior validation, not implementation details.

The goal is to verify that the simulation can perform the core actions of the game correctly before writing deeper automated simulation tests.

Purpose

This validation plan answers the following questions:

  • does the world boot cleanly and reproducibly
  • can we create the minimum actors needed to exercise gameplay
  • can ships receive and complete direct orders
  • can ships run supported default behaviors without getting stuck
  • do movement, mining, docking, and combat work at the simulation level
  • does the viewer reflect the same state the backend is executing

This document is the manual test source of truth for the current phase.

Later, these same checks should become simulation-first tests running directly against the real runtime.

Scope

This phase is intentionally centered on empty.json.

That is correct for now.

The purpose of empty.json is to validate primitive actions and control behavior with minimal scenario noise.

It is not yet the basis for validating full economy, expansion, or long-horizon faction behavior.

Those should be validated later using richer scenarios after the primitives are trustworthy.

Current Baseline

Development startup currently loads:

The backend startup path is defined in:

World reset returns to the startup scenario through:

Environment

Manual runs should use a reproducible local development setup.

Suggested startup:

  1. Start postgres with ./scripts/start-postgres.sh
  2. Start backend in development mode
  3. Start the viewer
  4. Log in as a GM user
  5. Reset the world before each test pass

Relevant files:

Development GM credentials currently include:

  • gm / gm
  • admin / admin

Test Method

Each manual test should record:

  • setup
  • action
  • expected result
  • observed result
  • pass or fail
  • notes

Recommended rule:

  • if a test leaves the world in a noisy or questionable state, reset before the next test

Recommended evidence to capture:

  • ship state
  • ship spatial state
  • active plan and subtasks
  • order queue
  • inventory changes
  • station docking state
  • viewer selection and inspector state

Phase 1: Boot And Baseline

These tests must pass before behavior testing has value.

V-001 Backend boots cleanly

Setup:

  • start backend in development mode

Expected:

  • startup succeeds
  • auth schema initializes
  • dev users seed
  • world loads from empty.json
  • no startup exception is thrown

V-002 Viewer connects and renders world

Setup:

  • start viewer and open the app

Expected:

  • world snapshot loads
  • live delta stream connects
  • no obvious contract mismatch or rendering crash appears

V-003 Reset returns world to clean baseline

Setup:

  • use the GM reset action

Expected:

  • world returns to startup scenario
  • previously spawned factions, ships, and stations are gone
  • sequence and snapshot refresh behave cleanly

V-004 Empty world is actually minimal

Setup:

  • inspect the world after reset

Expected:

  • systems, celestials, anchors, and resource nodes exist
  • no initial factions, stations, or ships exist unless intentionally seeded later

Phase 2: Minimal Actor Creation

These tests prove the empty world can be turned into a controlled validation sandbox.

V-010 Create a faction

Method:

  • use the GM faction creation flow

Relevant API:

  • POST /api/gm/factions

Expected:

  • the faction appears in the world
  • it is visible in the GM UI
  • no duplicate or invalid-creation error occurs for a valid faction id

V-011 Spawn a ship

Method:

  • spawn a ship for the created faction in a known system

Relevant API:

  • POST /api/gm/ships

Expected:

  • the ship appears in the selected system
  • the ship has a valid id, faction, system, and spatial state
  • the viewer can select and inspect it

V-012 Spawn a station

Method:

  • spawn a station for the created faction in a known system

Relevant API:

  • POST /api/gm/stations

Expected:

  • the station appears in the world
  • the station has a valid anchor association or valid placement according to current runtime rules
  • the viewer can focus and inspect it

V-013 Spawn multiple ships of different roles

Method:

  • create at least:
    • one miner-capable ship
    • one combat-capable ship
    • one generic utility or trader if available

Expected:

  • each ship spawns without corrupting world state
  • each ship reports sensible movement, cargo, and behavior fields

Phase 3: Direct Order Validation

This phase validates immediate control and plan execution.

Relevant backend surface:

V-020 Queue a move or fly order

Method:

  • issue a direct move-style order to a ship
  • prefer fly-and-wait through the current viewer flow

Expected:

  • the order appears in the queue
  • an active plan is created
  • subtasks are coherent
  • the ship moves toward the target
  • the order eventually completes

Watch for:

  • ship never leaving idle
  • plan created but no subtask progress
  • target position mismatch
  • order stays executing forever

V-021 Queue follow ship

Method:

  • spawn two ships
  • issue follow-ship from one to the other

Expected:

  • the follower tracks the target ship
  • the follower updates position as the target moves
  • no oscillation or runaway drift appears

V-022 Queue attack target

Method:

  • spawn two ships from opposing factions if required by current hostility logic
  • issue attack-target

Expected:

  • order is accepted
  • attacker closes to engagement range
  • combat state transitions occur
  • health changes on the target if combat is functioning

Watch for:

  • invalid target acceptance
  • attacker never approaching
  • attacker stuck in transit or wait state
  • combat order silently failing

V-023 Queue mine resource

Method:

  • issue mine-and-deliver against a valid resource in the current system

Expected:

  • ship selects a valid resource node or deposit
  • ship reaches the mining location
  • mining progress occurs
  • cargo increases
  • delivery or post-mining behavior is coherent

Watch for:

  • no valid mining target selected
  • ship arrives but never mines
  • cargo remains unchanged
  • order fails with missing target when a target exists

V-024 Queue dock and wait if station exists

Method:

  • spawn a station
  • issue a docking-capable order path

Expected:

  • ship requests or performs docking
  • docked state is visible
  • station dock count updates
  • undocking or wait completion works

V-025 Remove an order

Method:

  • queue an order, then remove it before completion

Expected:

  • the order is removed cleanly
  • the ship replans safely
  • the ship returns to default behavior or idle state
  • no orphan active subtasks remain

Phase 4: Default Behavior Validation

This phase validates autonomous ship control rather than one-shot direct orders.

Relevant backend surface:

Relevant viewer surfaces:

V-030 Hold position

Method:

  • set default behavior to hold-position

Expected:

  • ship remains stable
  • behavior is reflected in inspector state
  • no unintended autonomous orders are generated

V-031 Fly and wait behavior

Method:

  • set default behavior to fly-and-wait
  • provide a valid target position or object

Expected:

  • behavior-backed order is synthesized
  • ship moves to the target
  • ship waits as configured
  • behavior continues to own control after completion

V-032 Follow ship behavior

Method:

  • set default behavior to follow-ship

Expected:

  • managed follow behavior is generated
  • the ship stays near its target within reasonable tolerance

V-033 Patrol behavior

Method:

  • configure patrol points if supported through current UI flow

Expected:

  • patrol orders are generated from the behavior
  • ship cycles patrol movement cleanly
  • if a threat appears, patrol can interrupt into attack behavior as intended

V-034 Local auto mine

Method:

  • set local-auto-mine on a miner in a system with mineable nodes

Expected:

  • if a valid local mining target exists, the ship mines it
  • if no valid target exists, failure is readable and not destructive

Note:

  • current catalog marks this as partially supported

V-035 Advanced or expert auto mine

Method:

  • set advanced-auto-mine or expert-auto-mine

Expected:

  • behavior synthesizes a mine-and-deliver run
  • ship selects a resource source and delivery path
  • behavior can repeat after completion

V-036 Combat guard behaviors

Method:

  • validate one or more of:
    • protect-position
    • protect-ship
    • protect-station
    • police

Expected:

  • behavior creates managed guard or intercept orders
  • threat response is coherent
  • ship returns to guarding behavior after engagement if still valid

Phase 5: Spatial And Transit Validation

This phase validates the new universe-model runtime behavior.

Primary concern:

  • ships should behave as anchor-aware entities rather than generic free-flying system dots

V-040 Spatial state is coherent at rest

Expected:

  • a resting ship reports a sensible SpatialState
  • SpaceLayer, CurrentSystemId, CurrentAnchorId, and MovementRegime agree with the visible world state

V-041 Local movement remains local

Expected:

  • local movement updates local position coherently
  • the ship does not accidentally enter invalid transit state

V-042 Intra-system transit is explicit

Method:

  • send a ship between distant anchors if the current order flow supports it

Expected:

  • movement regime transitions are explicit
  • transit state reports origin, destination, and progress
  • arrival returns the ship to a valid anchor-local state

V-043 Inter-system travel if available

Method:

  • attempt a cross-system route through current supported mechanics

Expected:

  • system change happens through a coherent transit path
  • no entity duplication or dropped ship occurs

Phase 6: Docking, Cargo, And Station Interaction

These tests prove basic station interaction works.

V-050 Docking updates both sides

Expected:

  • ship shows docked station id
  • station docked ship list updates
  • dock count changes are visible in the viewer

V-051 Cargo transfer changes inventory

Method:

  • use a mining or delivery flow involving a station

Expected:

  • ship inventory changes
  • station inventory changes
  • transfer is not purely cosmetic

V-052 Invalid docking fails cleanly

Method:

  • attempt docking or a delivery path with a ship or station that should not support it

Expected:

  • failure is visible and readable
  • ship does not become stuck in permanent docking state

Phase 7: Combat Validation

These tests are still primitive in the empty-world phase.

The goal is not full tactical balance.

The goal is to prove the combat loop exists and behaves coherently.

V-060 Attack order enters engagement

Expected:

  • attacker closes on target
  • attack state appears
  • target health changes if weapons and hostility permit combat

V-061 Combat resolves to a stable end state

Expected:

  • one of the following happens cleanly:
    • target destroyed
    • attacker disengages
    • order fails with a readable reason

No permanent broken state should remain.

V-062 Non-combat ship does not behave like a combat ship

Method:

  • issue combat pressure to a non-combat or civilian ship if possible

Expected:

  • behavior is limited, defensive, or clearly incapable
  • it should not unrealistically perform like a dedicated combat hull unless current design says it can

Phase 8: Invalid And Edge Cases

These are mandatory because many simulation regressions hide in failure handling rather than happy paths.

V-070 Invalid target order

Method:

  • send an order with a missing or invalid target

Expected:

  • backend rejects the order or marks it failed cleanly
  • no corrupted plan remains

V-071 Remove target during execution

Method:

  • destroy or invalidate the target context while a ship is executing an order

Expected:

  • ship replans or fails safely
  • no null-state or endless execution loop appears

V-072 Reset during active simulation

Method:

  • reset the world while ships are active

Expected:

  • viewer refreshes cleanly
  • no stale selected entity state causes crashes
  • world stream recovers to fresh baseline

V-073 Behavior with impossible prerequisites

Method:

  • assign a behavior that requires a target, station, or ware that is not available

Expected:

  • failure is readable
  • ship falls back safely
  • behavior does not create runaway order spam

Phase 9: Viewer-State Validation

The simulation may be correct while the viewer is misleading.

That is still a failure.

V-080 Inspector reflects real ship state

Expected:

  • order queue, active plan, subtasks, inventory, health, and spatial state match observed behavior

V-081 Selection survives world updates

Expected:

  • selecting a ship or station remains stable through normal delta updates

V-082 Focus and follow modes remain usable

Expected:

  • camera focus and tracking do not break during movement, docking, or combat

V-083 Context actions target the intended entity

Method:

  • use context menu actions such as:
    • mine resource
    • fly to and wait
    • follow ship
    • attack

Expected:

  • the generated order matches the selected target
  • the resulting ship action matches the command label

Run in this order:

  1. boot and reset validation
  2. faction creation
  3. ship spawn
  4. station spawn
  5. direct navigation order
  6. direct mining order
  7. docking and delivery
  8. direct attack order
  9. default behavior checks
  10. edge and failure checks
  11. viewer consistency pass

Minimum Pass Criteria

The current basis of the game should be considered working only if all of the following are true:

  • world startup and reset are reliable
  • actors can be spawned into an empty baseline
  • at least one ship can move successfully
  • at least one ship can mine successfully
  • at least one ship can attack successfully
  • at least one ship can dock and transfer inventory successfully
  • direct orders can be added and removed cleanly
  • default behaviors can control ships without obvious stuck states
  • viewer state remains trustworthy during all of the above

Failure Reporting

When a test fails, record:

  • test id
  • exact setup
  • exact action taken
  • whether failure happened in backend, simulation behavior, or viewer representation
  • whether reset recovers the world cleanly
  • likely regression area if visible from inspector or logs

Suggested failure categories:

  • startup
  • API contract
  • planning
  • subtask execution
  • movement
  • docking
  • mining
  • combat
  • inventory
  • viewer sync
  • reset or stream lifecycle

Follow-Up

After this manual pass stabilizes:

  1. turn the most important Phase 1 through Phase 4 checks into runtime-level simulation tests
  2. prefer real simulation execution over mocked unit tests
  3. add richer scenario validation only after primitive behavior passes consistently

That next phase should validate composed loops:

  • mine -> dock -> unload
  • trade route -> station inventory update
  • construction support
  • guard and intercept response
  • longer-run autonomous behavior without manual intervention