Administrators Guide¶

This guide is for technical volunteers responsible for deploying and maintaining the Equipment Status Board. It covers Docker deployment, environment configuration, Slack App setup, and ongoing maintenance.

Prerequisites¶

Before you begin, ensure you have:

Docker and Docker Compose installed on the server
Git for cloning the repository
A server or machine on the makerspace local network (or accessible to members)
A Slack workspace for Slack integration (check current Slack plan requirements for Socket Mode at api.slack.com)

Installation & Deployment¶

1. Clone the Repository¶

git clone https://github.com/jantman/equipment-status-board.git
cd equipment-status-board

2. Configure Environment Variables¶

cp .env.example .env

Edit .env and set the required values. See the Environment Variable Reference below for details on each variable.

At minimum, you must change:

SECRET_KEY — Set to a random string for production (e.g., python3 -c "import secrets; print(secrets.token_hex(32))")
MARIADB_ROOT_PASSWORD — Set a strong database password

3. Start All Services¶

docker compose up -d

This starts three containers: the web application, the MariaDB database, and the background notification worker.

4. Run Database Migrations¶

docker compose exec app flask db upgrade

This creates all required database tables.

5. Create the First Staff User¶

docker compose exec app flask seed-admin <username> <email> --password <password> [--slack-handle <handle>]

For example:

docker compose exec app flask seed-admin admin admin@example.com --password changeme123 --slack-handle @adminuser

This creates a user with the Staff role who can then log in and create additional users through the web interface.

The --slack-handle option is optional but recommended if your workspace uses Slack integration. Setting it enables the system to send the user password reset notifications via Slack DM. The handle should include the @ prefix (e.g. @username). The Slack handle can also be set or updated later via the admin UI at Admin → Users.

6. Verify¶

Open http://localhost:5000 in a browser (or the server's IP/hostname on port 5000). You should see the status dashboard. Log in with the Staff user you just created.

Environment Variable Reference¶

Variable	Description	Required	Default	Example
`SECRET_KEY`	Flask secret key for session signing. Must be random in production.	Yes	`dev-secret-change-me`	`a1b2c3d4e5f6...` (use `python3 -c "import secrets; print(secrets.token_hex(32))"`)
`DATABASE_URL`	SQLAlchemy database connection URL. In Docker, the hostname is `db`.	Yes	`mysql+pymysql://root:esb_dev_password@localhost/esb`	`mysql+pymysql://root:yourpassword@db/esb`
`ESB_BASE_URL`	Externally-reachable base URL of this ESB instance. Used as the prefix for QR code target URLs (the URL members' phones open when they scan a printed QR label). Must be set to enable QR code generation; otherwise the "Generate QR Code" button on each equipment detail page is disabled. Inside a container the request host is unreliable, so this must be set explicitly. Trailing slashes are stripped; must be an `http(s)://host[:port]` URL with no path, query, fragment, or credentials.	Yes	(empty)	`http://esb.example.com:8080`
`MARIADB_ROOT_PASSWORD`	Root password for the MariaDB container. Must match the password in `DATABASE_URL`.	Yes	`esb_dev_password`	`strong-random-password`
`UPLOAD_PATH`	Directory for uploaded files (photos, documents). Relative to app root or absolute path.	No	`uploads`	`/app/uploads`
`UPLOAD_MAX_SIZE_MB`	Maximum upload file size in megabytes.	No	`500`	`100`
`SLACK_BOT_TOKEN`	Slack Bot User OAuth Token. Leave empty to disable Slack integration.	No	(empty)	`xoxb-1234567890-...`
`SLACK_APP_TOKEN`	Slack App-Level Token for Socket Mode. Required for Slack integration. Leave empty to disable.	No	(empty)	`xapp-1-...`
`SLACK_SOCKET_MODE_CONNECT`	Set to `true` to enable the Socket Mode WebSocket connection. Only the app container should set this; worker and other services should leave it unset.	No	(empty)	`true`
`SLACK_OOPS_CHANNEL`	Slack channel for cross-area notifications. Can be set in `.env` (not included in `.env.example` by default).	No	`#oops`	`#equipment-alerts`
`STATIC_PAGE_PUSH_METHOD`	How to publish the static status page. Options: `local` (write to directory), `s3` (upload to S3 bucket via boto3), or `gcs` (upload to Google Cloud Storage bucket).	No	`local`	`s3`
`STATIC_PAGE_PUSH_TARGET`	Target for static page push. For `local`: a directory path. For `s3` and `gcs`: `bucket-name/optional/key/path` (key defaults to `index.html`).	No	(empty)	`my-status-bucket/index.html`
`CLOUDFRONT_DISTRIBUTION_ID`	CloudFront distribution ID. Only meaningful when `STATIC_PAGE_PUSH_METHOD=s3`. When set, a CloudFront invalidation is issued for the uploaded key after every successful S3 upload, so the CDN serves the just-uploaded content immediately. Requires the IAM principal to have `cloudfront:CreateInvalidation` on the distribution. The AWS Free Tier covers 1000 invalidation paths per month; pushes more frequently than that will incur per-invalidation charges.	No	(empty)	`EDFDVBD6EXAMPLE`
`FLASK_APP`	Flask application entry point. Do not change.	No	`esb:create_app`	`esb:create_app`
`FLASK_DEBUG`	Enable Flask debug mode. Set to `0` in production.	No	`1`	`0`
`AWS_ACCESS_KEY_ID`	AWS access key for S3 static page push. Only needed if `STATIC_PAGE_PUSH_METHOD=s3` and not using an IAM role.	No	(empty)	`AKIAIOSFODNN7EXAMPLE`
`AWS_SECRET_ACCESS_KEY`	AWS secret key for S3 static page push. Only needed if `STATIC_PAGE_PUSH_METHOD=s3` and not using an IAM role.	No	(empty)	`wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY`
`GOOGLE_APPLICATION_CREDENTIALS`	Path to Google Cloud service account JSON key file. Only needed if `STATIC_PAGE_PUSH_METHOD=gcs` and not using instance metadata or Workload Identity.	No	(empty)	`/path/to/service-account.json`
`NEW_RELIC_LICENSE_KEY`	New Relic license key. Enables APM and browser monitoring when set. Leave empty to disable.	No	(empty)	`abc123def456...`
`NEW_RELIC_APP_NAME`	Application name shown in the New Relic dashboard.	No	`Equipment Status Board`	`ESB Production`
`TZ`	IANA timezone name for the worker container. Controls the timezone displayed in the static status page's generation timestamp (sub-heading near top of page) and the year used in the footer. Set this to your local timezone for accurate display. The `worker` service is the only consumer; if you set `TZ` via `.env` it will also propagate to the `app` container (currently unused there) since both services load the same env file.	No	`America/New_York`	`America/Chicago`

Warning

Always set SECRET_KEY to a unique random value in production. The default value is insecure and only suitable for development.

Warning

Set FLASK_DEBUG=0 in production. Debug mode exposes detailed error pages and enables the interactive debugger.

Docker Services¶

The application runs as three Docker containers defined in docker-compose.yml:

App Service¶

The main web application. Runs Flask via Gunicorn with 1 worker process and 2 threads on port 5000.

Image: Built from the project Dockerfile (Python 3.14-slim base)
Port: 5000 (mapped to host)
Volume: ./uploads bind mount for persistent file storage (uploaded photos and documents)
Depends on: db service (waits for healthy database)

Database Service¶

MariaDB 12.2.2 database server. Stores all application data.

Image: mariadb:12.2.2
Volume: mariadb_data named volume for persistent data storage
Health check: Pings the database every 10 seconds to verify availability
Port: Not mapped to host (only accessible from other containers)

Worker Service¶

Background notification processor. Polls the database every 30 seconds for pending notifications and delivers them via Slack.

Image: Same as the app service
Command: flask worker run
Depends on: db service
Healthcheck: The worker writes /tmp/worker_heartbeat at three points: once at startup, once after each DB poll returns, and once after each individual notification is processed. Docker reports the container as unhealthy if the heartbeat file is older than 180 seconds, which catches a wedged loop (e.g. silently dropped DB connection or a single Slack call hung past its timeout). Refreshing per-notification — rather than only at the end of an iteration — means a legitimately long batch of slow Slack calls cannot falsely trip the healthcheck.

Autoheal Sidecar¶

Docker on its own does not restart unhealthy containers — it only marks them unhealthy. The autoheal service (willfarrell/autoheal) watches for containers labelled autoheal=true (the worker and app services) and restarts any that go unhealthy. It needs the host's Docker socket mounted so it can issue restart commands:

autoheal:
  image: willfarrell/autoheal:latest
  environment:
    - AUTOHEAL_CONTAINER_LABEL=autoheal
  volumes:
    - /var/run/docker.sock:/var/run/docker.sock
  restart: unless-stopped

If you do not want autoheal running on your host, you can remove the service from docker-compose.yml; the worker's healthcheck will still reflect status in docker compose ps, you'll just need to restart it manually when it goes unhealthy.

All four services have a restart policy of unless-stopped, meaning they automatically restart after crashes or host reboots (unless explicitly stopped).

Runtime Dependencies¶

The application Docker image includes these key Python packages:

Flask — Web framework
SQLAlchemy / Flask-SQLAlchemy — Database ORM
PyMySQL — MariaDB database driver
slack-bolt / slack_sdk — Slack integration (slash commands, modals, events via Socket Mode)
websocket-client — WebSocket transport for Slack Socket Mode
boto3 — AWS S3 client for static page push (when using s3 method)
google-cloud-storage — Google Cloud Storage client for static page push (when using gcs method)
qrcode[pil] — QR code generation for equipment pages
newrelic — New Relic APM and browser monitoring agent (optional, activated by NEW_RELIC_LICENSE_KEY)
gunicorn — Production WSGI server

Slack App Configuration¶

Slack integration is optional — the core web application works without it. If you want Slack commands, notifications, and the status bot, follow these steps.

1. Create a Slack App¶

Go to api.slack.com/apps and click Create New App
Choose From scratch
Name the app (e.g., "Equipment Status Board") and select your workspace

2. Configure Bot Token Scopes¶

Under OAuth & Permissions, add these Bot Token OAuth Scopes:

chat:write — Send messages and notifications
commands — Register slash commands
users:read — Look up user information
users:read.email — Look up users by email
im:write — Send direct messages (for temporary password delivery)

3. Enable Socket Mode¶

Go to Settings > Socket Mode in the Slack App settings
Turn on Enable Socket Mode
Create an App-Level Token with the connections:write scope
Name it (e.g., "esb-socket") and copy the token (starts with xapp-)

4. Set Up Slash Commands¶

Under Slash Commands, create four commands:

Command	Description
`/esb-report`	Report an equipment problem
`/esb-status`	Check equipment status (area or equipment name)
`/esb-repair`	Technician dispatcher (no args) or create a repair record (with arg)
`/esb-update`	Update a repair record (full edit)

With Socket Mode enabled, slash commands are automatically routed to your app via WebSocket. No Request URL is needed.

5. Enable Event Subscriptions¶

Under Event Subscriptions:

Turn on Enable Events

Event subscriptions are not currently required but may be used for future features.

6. Install the App¶

Go to Install App and click Install to Workspace
Authorize the permissions

7. Copy Credentials¶

After installation:

Copy the Bot User OAuth Token (starts with xoxb-) and set it as SLACK_BOT_TOKEN in your .env
Copy the App-Level Token (starts with xapp-, created in step 3) and set it as SLACK_APP_TOKEN in your .env
Restart the app and worker: docker compose restart app worker

Note

Socket Mode uses an outbound WebSocket connection — no public URL or reverse proxy is needed. Your ESB server can remain on a private network.

Notification Trigger Configuration¶

Slack outbound notifications are governed by per-event app-config keys. Each is a boolean stored in app_config (string 'true' / 'false') and toggled via the admin UI at Admin → App Configuration. All five default to 'true', so a fresh deployment inherits notifications automatically.

Config key	Default	Fires on
`notify_new_report`	`'true'`	A new problem report is filed (member or technician path).
`notify_resolved`	`'true'`	A repair record's status transitions to a closed status (`Resolved`, `Closed - Duplicate`, `Closed - No Issue Found`).
`notify_severity_changed`	`'true'`	A repair record's severity level changes.
`notify_status_changed`	`'true'`	A repair record's status changes between open states (e.g., `New` → `In Progress`, `Assigned` → `Parts Needed`). Closed-status transitions go through `notify_resolved` instead, so disabling this key does not silence resolutions.
`notify_eta_updated`	`'true'`	A repair record's ETA is set or changed.

If notify_resolved is 'false' AND a status transition lands on a closed status, no notification fires — the elif-structure does NOT fall through to status_changed.

Static Status Page Setup¶

The static status page provides a lightweight, externally accessible version of the equipment status dashboard. It is regenerated and pushed automatically whenever equipment status changes.

Configuration¶

Set the push method via the STATIC_PAGE_PUSH_METHOD environment variable:

local — Writes the static page to a local directory specified by STATIC_PAGE_PUSH_TARGET. Useful for serving from a local web server or shared drive.
s3 — Uploads the static page to an S3 bucket specified by STATIC_PAGE_PUSH_TARGET. Requires AWS credentials configured in the environment (via AWS_ACCESS_KEY_ID / AWS_SECRET_ACCESS_KEY or an IAM role). Optionally set CLOUDFRONT_DISTRIBUTION_ID to also issue a CloudFront invalidation for the uploaded key after every successful upload (requires cloudfront:CreateInvalidation on the distribution).
gcs — Uploads the static page to a Google Cloud Storage bucket specified by STATIC_PAGE_PUSH_TARGET. Uses Google's default credential chain (GOOGLE_APPLICATION_CREDENTIALS environment variable, GCE instance metadata, or Workload Identity). When using Docker with a service account key file, add a volume mount for the credentials file in docker-compose.yml (e.g., - ./service-account.json:/app/service-account.json:ro) and set GOOGLE_APPLICATION_CREDENTIALS=/app/service-account.json.

The static page is pushed by the background worker whenever it detects a status change during its polling cycle.

The static page's generation timestamp reflects the worker container's TZ environment variable. The variable resolves against the OS tzdata database (/usr/share/zoneinfo), which is provided by the tzdata system package. Both the python:3.14-slim base image and this image's Dockerfile install list include tzdata; do not remove it. To use a non-default zone, set TZ in .env before running docker compose up.

New Relic Monitoring (Optional)¶

New Relic integration provides server-side APM (Application Performance Monitoring) and browser monitoring for end users. When enabled, it automatically instruments the Flask application, background worker, and injects browser monitoring JavaScript into all pages.

Enabling New Relic¶

Set the NEW_RELIC_LICENSE_KEY environment variable in your .env file:

NEW_RELIC_LICENSE_KEY=your-license-key-here
NEW_RELIC_APP_NAME=Equipment Status Board

Restart all services after updating:

docker compose restart app worker

Both the web application and background worker will begin reporting to New Relic. Browser monitoring JavaScript is automatically injected into every page served by the application.

Verifying¶

After enabling, check the New Relic dashboard for your application name. You should see:

APM data — web transactions, throughput, error rates, and response times
Browser data — page load times, JavaScript errors, and AJAX calls from end users

If no data appears, check the app and worker logs for New Relic-related errors:

docker compose logs app | grep -i "new.relic\|newrelic"

Disabling¶

To disable New Relic, remove or comment out NEW_RELIC_LICENSE_KEY in your .env file and restart the services. When the license key is not set, no New Relic code is loaded and there is zero performance impact.

Monitoring and Alerting¶

Overview¶

ESB exposes Prometheus metrics on /metrics (unauthenticated; trusted-network deployment). Both the app and worker containers run with PYTHONUNBUFFERED=1 so log lines reach Loki/Promtail without buffering latency. The metrics are designed for direct Grafana panel use. This section is complementary to the optional New Relic integration above; it gives recommended signals, not a turnkey configuration.

Prometheus Metrics¶

Example scrape config:

scrape_configs:
  - job_name: esb
    metrics_path: /metrics
    static_configs:
      - targets: ['esb.example.com:5000']

Metric	Type	Description	Emission
`esb_pending_notifications_count`	gauge	Number of rows in `pending_notifications` with `status='pending'`	Always
`esb_oldest_pending_notification_timestamp_seconds`	gauge	Unix epoch seconds of the oldest pending row's `created_at`	Omitted when queue empty (alert with `absent()`)
`esb_worker_last_iteration_timestamp_seconds`	gauge	Unix epoch seconds of the worker's last successful poll cycle (read from `AppConfig.value`)	Omitted when worker has never run, or when the `AppConfig` query fails (alert with `absent()`, `for: 5m` minimum)
`esb_socket_mode_enabled`	gauge	`1` if `init_slack` entered the Socket Mode setup block (tokens set, not `TESTING`, opt-in flag true); `0` otherwise	Always
`esb_socket_mode_connected`	gauge	`1` if a Bolt SocketModeHandler is currently bound; `0` otherwise. Transitions 1→0 at process shutdown.	Always

Example alert rules:

- alert: ESBNotificationQueueStuck
  expr: time() - esb_oldest_pending_notification_timestamp_seconds > 300
  for: 1m
  annotations:
    summary: "ESB notification worker is not draining the queue"

- alert: ESBWorkerStalled
  expr: time() - esb_worker_last_iteration_timestamp_seconds > 120
  for: 1m
  annotations:
    summary: "ESB notification worker has not iterated in 2+ minutes"

- alert: ESBWorkerNeverRan
  expr: absent(esb_worker_last_iteration_timestamp_seconds)
  for: 5m
  annotations:
    summary: "ESB worker has not produced a heartbeat row since deploy (or DB reset / transient query failure)"

- alert: ESBSocketModeFailedAtBoot
  expr: (esb_socket_mode_enabled == 1 and esb_socket_mode_connected == 0) unless on(instance) up == 0
  for: 5m
  annotations:
    summary: "ESB intended to run Slack Socket Mode but the handler failed at boot"

ESBWorkerStalled and ESBWorkerNeverRan are complementary and should both be loaded. ESBWorkerStalled detects "worker was alive recently but stopped iterating" — fires on a normal stall but doesn't fire when the metric is missing entirely. ESBWorkerNeverRan detects the metric-missing case — fires on cold-deploy time-to-first-poll AND on transient AppConfig query failures. Together they cover the full failure space.

Clock skew

time() - <gauge> rules mix Prometheus's clock with the worker container's clock. Run NTP on every node and pick the threshold ≥ 4× poll_interval (so 120s for the default 30s). Note the failure asymmetry: if the worker's clock runs behind Prometheus's, the rule fires aggressively; if the worker's clock runs ahead, time() - gauge goes negative and the rule is silent forever. NTP is mandatory, not optional.

Single-worker assumption

These metrics assume the current single-gunicorn-worker deployment (--workers 1). Scaling app-side gunicorn workers makes the Socket Mode metrics non-deterministic across scrapes.

Information disclosure

The Socket Mode gauges let any unauthenticated reader of /metrics distinguish "Slack not configured" from "Slack configured but failed at boot" from "Slack working." Acceptable on a trusted network; something to be aware of if /metrics is ever exposed more broadly.

ESBSocketModeFailedAtBoot shutdown safety

The rule includes unless on(instance) up == 0 to suppress the alert during a full app outage (where up == 0 is already the dominant signal) and uses for: 5m to absorb gunicorn worker reloads (--max-requests recycling) where _shutdown_socket() briefly leaves state at (1, 0). The on(instance) clause assumes targets share an instance label; if your relabeling adds richer labels (e.g. cluster, env) you may need unless on(job, instance) up == 0 to keep the join correct.

/metrics endpoint resilience

The endpoint returns HTTP 200 even when the app_config table is missing (e.g., on a fresh deployment that hasn't yet run flask db upgrade). The worker-timestamp metric is simply omitted; alert via ESBWorkerNeverRan. The first per-process query failure logs a full stack trace; subsequent failures log a one-line warning to avoid log flooding.

Container and Process Liveness¶

up{job="esb"} == 0 for ≥ 1 minute indicates the app is not responding to scrapes — covers OOM kills, gunicorn wedges, and network partitions. cAdvisor's container_last_seen (or its restart-count rate) catches container restart loops where the app keeps crashing fast enough that up{} may briefly recover between scrapes.

Log-Based Alerting (Loki)¶

ESB writes logs to stdout/stderr; both the app and worker containers run with PYTHONUNBUFFERED=1 so lines reach Promtail/Loki without Python's default block-buffering latency.

What to detect	Source	Log substring
Worker poll-cycle failure (any exception in the loop body)	`notification_service.py` (worker outer-try)	`Error in worker polling loop`
Slack delivery exception (per notification, app-log line)	`notification_service.py` (per-notification failure log)	`delivery failed:` (trailing colon required — uniquely matches the failure-line format string `'Notification %d delivery failed: %s'`; does NOT match the success log or the `NotImplementedError` log even though all three share the `Notification %d ...` prefix; also does not match the JSON mutation log)
Worker heartbeat write failure	`notification_service.py` (`_write_heartbeat`)	`Failed to update worker heartbeat at`
Worker last-iteration write failure	`notification_service.py` (`_record_iteration_timestamp`)	`Failed to update worker last-iteration timestamp`
Buggy iteration-timestamp helper	`notification_service.py` (worker-loop call site)	`BUG: _record_iteration_timestamp raised unexpectedly`
Slack Socket Mode setup failure (import / instantiation / connect)	`esb/slack/__init__.py` (unified setup `try`)	`Failed to set up Slack Socket Mode`
`/metrics` AppConfig query failure (first per process)	`esb/services/metrics_service.py`	`Failed to query worker_last_iteration_at from AppConfig`
Generic ERROR-level traffic	any	`ERROR` (level) and/or `Traceback`

Permanent-fail signal lives in the structured JSON mutation log. When a notification is permanently failed after MAX_RETRIES, the mutation logger writes a single-line JSON record to logger esb.mutations containing event: notification.permanently_failed. Two equivalent alerting options:

Substring match on notification.permanently_failed — simplest; works regardless of JSON-whitespace variation.
Promtail JSON-stage parsing — extract event as a structured Loki label. Use a Promtail match stage to apply the JSON parser only to lines starting with {, since the regular Python logger and the mutation logger share the same stdout stream.

Operators write their own LogQL queries and alert rules; this guide intentionally lists signals, not queries. Note that the substrings in the table above are stable today but unanchored — a future log message containing the same text would also match. For a future-resistant query, anchor with the Notification N prefix (regex Notification \d+ delivery failed: for the per-notification case), or filter by log level.

What to Alert On¶

App down — up{job="esb"} == 0 for ≥ 1 m
Worker stalled — the ESBWorkerStalled rule
Worker never ran since deploy / DB reset — the ESBWorkerNeverRan rule
Notification queue stuck — the existing ESBNotificationQueueStuck rule
Slack Socket Mode failed at boot — the ESBSocketModeFailedAtBoot rule (covers import, instantiation, and connect failures; suppressed during full app outage)
Elevated rate of Slack delivery failures — Loki on delivery failed: substring exceeding a per-minute threshold
Container flapping — cAdvisor restart-count rate

Grafana Dashboards¶

The ESB metrics are designed for direct panel use — gauge panels for the Socket Mode and worker-timestamp metrics, time-series panels for the queue gauges, and log panels backed by the Loki substrings above. ESB does not ship a dashboard JSON; operators build the panels they actually want to watch.

Relationship to New Relic¶

New Relic (above) and the Prometheus/Loki/Grafana stack here observe different layers and are complementary. New Relic provides per-transaction APM and end-user browser monitoring; Prometheus provides system-health gauges and log-based alerting suitable for on-call paging. They can run together with no additional configuration.

Ongoing Maintenance¶

Viewing Logs¶

# Application logs
docker compose logs -f app

# Worker logs (notification delivery)
docker compose logs -f worker

# Database logs
docker compose logs -f db

Restarting Services¶

# Restart the web application
docker compose restart app

# Restart the notification worker
docker compose restart worker

# Restart all services
docker compose restart

Applying Updates¶

# Pull latest code
git pull

# Rebuild containers
docker compose build

# Restart with new images
docker compose up -d

# Apply any new database migrations
docker compose exec app flask db upgrade

Monitoring the Worker¶

The background worker processes pending notifications every 30 seconds. It includes retry logic with backoff for failed deliveries. Check the worker logs for:

Successful notification deliveries
Failed delivery attempts and retry counts
Slack API errors (usually indicate an expired or invalid token)

docker compose logs -f worker

The worker container also exposes a Docker healthcheck driven by a heartbeat file (/tmp/worker_heartbeat) refreshed at three points: at startup, after each DB poll returns, and after each individual notification is processed. The healthcheck fails when the file is older than 180 seconds. To check current health:

docker inspect --format '{{.State.Health.Status}}' equipment-status-board-worker-1

If the worker is reported as unhealthy, the autoheal sidecar will restart it automatically (typically within a minute).

For metrics, log-based alerting, and recommended dashboards, see the Monitoring and Alerting section above. (External links to #prometheus-metrics continue to resolve — the new ### Prometheus Metrics subsection auto-generates the same anchor.)

Upload Storage¶

Uploaded files (equipment photos, documents, diagnostic images) are stored in the ./uploads/ directory, which is bind-mounted into the app container. Monitor disk usage on the host:

du -sh ./uploads/

Database¶

MariaDB data is persisted in the mariadb_data Docker volume. This volume survives container restarts and docker compose down. It is only removed if you explicitly run docker compose down -v (which deletes volumes — do not do this unless you intend to lose all data).

Troubleshooting¶

App won't start¶

Check that DATABASE_URL is correct and uses db as the hostname (not localhost) when running in Docker
Verify the db service is healthy: docker compose ps
Check app logs: docker compose logs app

Slack commands not working¶

Verify SLACK_BOT_TOKEN and SLACK_APP_TOKEN are set correctly in .env
Verify Socket Mode is enabled in the Slack App settings and the app-level token has the connections:write scope
Verify SLACK_SOCKET_MODE_CONNECT=true is set in the app service environment
Check that the Slack App has the required OAuth scopes
Check app logs for Slack-related errors: docker compose logs app | grep -i slack

Notifications not delivering¶

Verify the worker is running: docker compose ps worker
Check worker logs: docker compose logs -f worker
Confirm SLACK_BOT_TOKEN is valid and the bot is installed to the workspace
Check that notification triggers are enabled in Admin > Config

Static page not updating¶

Verify STATIC_PAGE_PUSH_METHOD and STATIC_PAGE_PUSH_TARGET are set
Check that the worker is running (it handles the push)
For s3 method: verify AWS credentials and bucket permissions
For gcs method: verify Google Cloud credentials and bucket permissions
For local method: verify the target directory exists and is writable
Check worker logs for push errors