Administrators Guide¶
This guide is for technical volunteers responsible for deploying and maintaining the Equipment Status Board. It covers Docker deployment, environment configuration, Slack App setup, and ongoing maintenance.
Prerequisites¶
Before you begin, ensure you have:
- Docker and Docker Compose installed on the server
- Git for cloning the repository
- A server or machine on the makerspace local network (or accessible to members)
- A Slack workspace for Slack integration (check current Slack plan requirements for Socket Mode at api.slack.com)
Installation & Deployment¶
1. Clone the Repository¶
git clone https://github.com/jantman/equipment-status-board.git
cd equipment-status-board
2. Configure Environment Variables¶
cp .env.example .env
Edit .env and set the required values. See the Environment Variable Reference below for details on each variable.
At minimum, you must change:
SECRET_KEY— Set to a random string for production (e.g.,python3 -c "import secrets; print(secrets.token_hex(32))")MARIADB_ROOT_PASSWORD— Set a strong database password
3. Start All Services¶
docker compose up -d
This starts three containers: the web application, the MariaDB database, and the background notification worker.
4. Run Database Migrations¶
docker compose exec app flask db upgrade
This creates all required database tables.
5. Create the First Staff User¶
docker compose exec app flask seed-admin <username> <email> --password <password> [--slack-handle <handle>]
For example:
docker compose exec app flask seed-admin admin admin@example.com --password changeme123 --slack-handle @adminuser
This creates a user with the Staff role who can then log in and create additional users through the web interface.
The --slack-handle option is optional but recommended if your workspace uses Slack integration. Setting it enables the system to send the user password reset notifications via Slack DM. The handle should include the @ prefix (e.g. @username). The Slack handle can also be set or updated later via the admin UI at Admin → Users.
6. Verify¶
Open http://localhost:5000 in a browser (or the server's IP/hostname on port 5000). You should see the status dashboard. Log in with the Staff user you just created.
Environment Variable Reference¶
| Variable | Description | Required | Default | Example |
|---|---|---|---|---|
SECRET_KEY |
Flask secret key for session signing. Must be random in production. | Yes | dev-secret-change-me |
a1b2c3d4e5f6... (use python3 -c "import secrets; print(secrets.token_hex(32))") |
DATABASE_URL |
SQLAlchemy database connection URL. In Docker, the hostname is db. |
Yes | mysql+pymysql://root:esb_dev_password@localhost/esb |
mysql+pymysql://root:yourpassword@db/esb |
ESB_BASE_URL |
Externally-reachable base URL of this ESB instance. Used as the prefix for QR code target URLs (the URL members' phones open when they scan a printed QR label). Must be set to enable QR code generation; otherwise the "Generate QR Code" button on each equipment detail page is disabled. Inside a container the request host is unreliable, so this must be set explicitly. Trailing slashes are stripped; must be an http(s)://host[:port] URL with no path, query, fragment, or credentials. |
Yes | (empty) | http://esb.example.com:8080 |
MARIADB_ROOT_PASSWORD |
Root password for the MariaDB container. Must match the password in DATABASE_URL. |
Yes | esb_dev_password |
strong-random-password |
UPLOAD_PATH |
Directory for uploaded files (photos, documents). Relative to app root or absolute path. | No | uploads |
/app/uploads |
UPLOAD_MAX_SIZE_MB |
Maximum upload file size in megabytes. | No | 500 |
100 |
SLACK_BOT_TOKEN |
Slack Bot User OAuth Token. Leave empty to disable Slack integration. | No | (empty) | xoxb-1234567890-... |
SLACK_APP_TOKEN |
Slack App-Level Token for Socket Mode. Required for Slack integration. Leave empty to disable. | No | (empty) | xapp-1-... |
SLACK_SOCKET_MODE_CONNECT |
Set to true to enable the Socket Mode WebSocket connection. Only the app container should set this; worker and other services should leave it unset. |
No | (empty) | true |
SLACK_OOPS_CHANNEL |
Slack channel for cross-area notifications. Can be set in .env (not included in .env.example by default). |
No | #oops |
#equipment-alerts |
STATIC_PAGE_PUSH_METHOD |
How to publish the static status page. Options: local (write to directory), s3 (upload to S3 bucket via boto3), or gcs (upload to Google Cloud Storage bucket). |
No | local |
s3 |
STATIC_PAGE_PUSH_TARGET |
Target for static page push. For local: a directory path. For s3 and gcs: bucket-name/optional/key/path (key defaults to index.html). |
No | (empty) | my-status-bucket/index.html |
CLOUDFRONT_DISTRIBUTION_ID |
CloudFront distribution ID. Only meaningful when STATIC_PAGE_PUSH_METHOD=s3. When set, a CloudFront invalidation is issued for the uploaded key after every successful S3 upload, so the CDN serves the just-uploaded content immediately. Requires the IAM principal to have cloudfront:CreateInvalidation on the distribution. The AWS Free Tier covers 1000 invalidation paths per month; pushes more frequently than that will incur per-invalidation charges. |
No | (empty) | EDFDVBD6EXAMPLE |
FLASK_APP |
Flask application entry point. Do not change. | No | esb:create_app |
esb:create_app |
FLASK_DEBUG |
Enable Flask debug mode. Set to 0 in production. |
No | 1 |
0 |
AWS_ACCESS_KEY_ID |
AWS access key for S3 static page push. Only needed if STATIC_PAGE_PUSH_METHOD=s3 and not using an IAM role. |
No | (empty) | AKIAIOSFODNN7EXAMPLE |
AWS_SECRET_ACCESS_KEY |
AWS secret key for S3 static page push. Only needed if STATIC_PAGE_PUSH_METHOD=s3 and not using an IAM role. |
No | (empty) | wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY |
GOOGLE_APPLICATION_CREDENTIALS |
Path to Google Cloud service account JSON key file. Only needed if STATIC_PAGE_PUSH_METHOD=gcs and not using instance metadata or Workload Identity. |
No | (empty) | /path/to/service-account.json |
NEW_RELIC_LICENSE_KEY |
New Relic license key. Enables APM and browser monitoring when set. Leave empty to disable. | No | (empty) | abc123def456... |
NEW_RELIC_APP_NAME |
Application name shown in the New Relic dashboard. | No | Equipment Status Board |
ESB Production |
TZ |
IANA timezone name for the worker container. Controls the timezone displayed in the static status page's generation timestamp (sub-heading near top of page) and the year used in the footer. Set this to your local timezone for accurate display. The worker service is the only consumer; if you set TZ via .env it will also propagate to the app container (currently unused there) since both services load the same env file. |
No | America/New_York |
America/Chicago |
Warning
Always set SECRET_KEY to a unique random value in production. The default value is insecure and only suitable for development.
Warning
Set FLASK_DEBUG=0 in production. Debug mode exposes detailed error pages and enables the interactive debugger.
Docker Services¶
The application runs as three Docker containers defined in docker-compose.yml:
App Service¶
The main web application. Runs Flask via Gunicorn with 1 worker process and 2 threads on port 5000.
- Image: Built from the project
Dockerfile(Python 3.14-slim base) - Port: 5000 (mapped to host)
- Volume:
./uploadsbind mount for persistent file storage (uploaded photos and documents) - Depends on:
dbservice (waits for healthy database)
Database Service¶
MariaDB 12.2.2 database server. Stores all application data.
- Image:
mariadb:12.2.2 - Volume:
mariadb_datanamed volume for persistent data storage - Health check: Pings the database every 10 seconds to verify availability
- Port: Not mapped to host (only accessible from other containers)
Worker Service¶
Background notification processor. Polls the database every 30 seconds for pending notifications and delivers them via Slack.
- Image: Same as the app service
- Command:
flask worker run - Depends on:
dbservice - Healthcheck: The worker writes
/tmp/worker_heartbeatat three points: once at startup, once after each DB poll returns, and once after each individual notification is processed. Docker reports the container as unhealthy if the heartbeat file is older than 180 seconds, which catches a wedged loop (e.g. silently dropped DB connection or a single Slack call hung past its timeout). Refreshing per-notification — rather than only at the end of an iteration — means a legitimately long batch of slow Slack calls cannot falsely trip the healthcheck.
Autoheal Sidecar¶
Docker on its own does not restart unhealthy containers — it only marks them unhealthy. The autoheal service (willfarrell/autoheal) watches for containers labelled autoheal=true (the worker and app services) and restarts any that go unhealthy. It needs the host's Docker socket mounted so it can issue restart commands:
autoheal:
image: willfarrell/autoheal:latest
environment:
- AUTOHEAL_CONTAINER_LABEL=autoheal
volumes:
- /var/run/docker.sock:/var/run/docker.sock
restart: unless-stopped
If you do not want autoheal running on your host, you can remove the service from docker-compose.yml; the worker's healthcheck will still reflect status in docker compose ps, you'll just need to restart it manually when it goes unhealthy.
All four services have a restart policy of unless-stopped, meaning they automatically restart after crashes or host reboots (unless explicitly stopped).
Runtime Dependencies¶
The application Docker image includes these key Python packages:
- Flask — Web framework
- SQLAlchemy / Flask-SQLAlchemy — Database ORM
- PyMySQL — MariaDB database driver
- slack-bolt / slack_sdk — Slack integration (slash commands, modals, events via Socket Mode)
- websocket-client — WebSocket transport for Slack Socket Mode
- boto3 — AWS S3 client for static page push (when using
s3method) - google-cloud-storage — Google Cloud Storage client for static page push (when using
gcsmethod) - qrcode[pil] — QR code generation for equipment pages
- newrelic — New Relic APM and browser monitoring agent (optional, activated by
NEW_RELIC_LICENSE_KEY) - gunicorn — Production WSGI server
Slack App Configuration¶
Slack integration is optional — the core web application works without it. If you want Slack commands, notifications, and the status bot, follow these steps.
1. Create a Slack App¶
- Go to api.slack.com/apps and click Create New App
- Choose From scratch
- Name the app (e.g., "Equipment Status Board") and select your workspace
2. Configure Bot Token Scopes¶
Under OAuth & Permissions, add these Bot Token OAuth Scopes:
chat:write— Send messages and notificationscommands— Register slash commandsusers:read— Look up user informationusers:read.email— Look up users by emailim:write— Send direct messages (for temporary password delivery)
3. Enable Socket Mode¶
- Go to Settings > Socket Mode in the Slack App settings
- Turn on Enable Socket Mode
- Create an App-Level Token with the
connections:writescope - Name it (e.g., "esb-socket") and copy the token (starts with
xapp-)
4. Set Up Slash Commands¶
Under Slash Commands, create four commands:
| Command | Description |
|---|---|
/esb-report |
Report an equipment problem |
/esb-status |
Check equipment status (area or equipment name) |
/esb-repair |
Technician dispatcher (no args) or create a repair record (with arg) |
/esb-update |
Update a repair record (full edit) |
With Socket Mode enabled, slash commands are automatically routed to your app via WebSocket. No Request URL is needed.
5. Enable Event Subscriptions¶
Under Event Subscriptions:
- Turn on Enable Events
Event subscriptions are not currently required but may be used for future features.
6. Install the App¶
- Go to Install App and click Install to Workspace
- Authorize the permissions
7. Copy Credentials¶
After installation:
- Copy the Bot User OAuth Token (starts with
xoxb-) and set it asSLACK_BOT_TOKENin your.env - Copy the App-Level Token (starts with
xapp-, created in step 3) and set it asSLACK_APP_TOKENin your.env - Restart the app and worker:
docker compose restart app worker
Note
Socket Mode uses an outbound WebSocket connection — no public URL or reverse proxy is needed. Your ESB server can remain on a private network.
Notification Trigger Configuration¶
Slack outbound notifications are governed by per-event app-config keys. Each is a boolean stored in app_config (string 'true' / 'false') and toggled via the admin UI at Admin → App Configuration. All five default to 'true', so a fresh deployment inherits notifications automatically.
| Config key | Default | Fires on |
|---|---|---|
notify_new_report |
'true' |
A new problem report is filed (member or technician path). |
notify_resolved |
'true' |
A repair record's status transitions to a closed status (Resolved, Closed - Duplicate, Closed - No Issue Found). |
notify_severity_changed |
'true' |
A repair record's severity level changes. |
notify_status_changed |
'true' |
A repair record's status changes between open states (e.g., New → In Progress, Assigned → Parts Needed). Closed-status transitions go through notify_resolved instead, so disabling this key does not silence resolutions. |
notify_eta_updated |
'true' |
A repair record's ETA is set or changed. |
If notify_resolved is 'false' AND a status transition lands on a closed status, no notification fires — the elif-structure does NOT fall through to status_changed.
Static Status Page Setup¶
The static status page provides a lightweight, externally accessible version of the equipment status dashboard. It is regenerated and pushed automatically whenever equipment status changes.
Configuration¶
Set the push method via the STATIC_PAGE_PUSH_METHOD environment variable:
local— Writes the static page to a local directory specified bySTATIC_PAGE_PUSH_TARGET. Useful for serving from a local web server or shared drive.s3— Uploads the static page to an S3 bucket specified bySTATIC_PAGE_PUSH_TARGET. Requires AWS credentials configured in the environment (viaAWS_ACCESS_KEY_ID/AWS_SECRET_ACCESS_KEYor an IAM role). Optionally setCLOUDFRONT_DISTRIBUTION_IDto also issue a CloudFront invalidation for the uploaded key after every successful upload (requirescloudfront:CreateInvalidationon the distribution).gcs— Uploads the static page to a Google Cloud Storage bucket specified bySTATIC_PAGE_PUSH_TARGET. Uses Google's default credential chain (GOOGLE_APPLICATION_CREDENTIALSenvironment variable, GCE instance metadata, or Workload Identity). When using Docker with a service account key file, add a volume mount for the credentials file indocker-compose.yml(e.g.,- ./service-account.json:/app/service-account.json:ro) and setGOOGLE_APPLICATION_CREDENTIALS=/app/service-account.json.
The static page is pushed by the background worker whenever it detects a status change during its polling cycle.
The static page's generation timestamp reflects the worker container's TZ environment variable. The variable resolves against the OS tzdata database (/usr/share/zoneinfo), which is provided by the tzdata system package. Both the python:3.14-slim base image and this image's Dockerfile install list include tzdata; do not remove it. To use a non-default zone, set TZ in .env before running docker compose up.
New Relic Monitoring (Optional)¶
New Relic integration provides server-side APM (Application Performance Monitoring) and browser monitoring for end users. When enabled, it automatically instruments the Flask application, background worker, and injects browser monitoring JavaScript into all pages.
Enabling New Relic¶
Set the NEW_RELIC_LICENSE_KEY environment variable in your .env file:
NEW_RELIC_LICENSE_KEY=your-license-key-here
NEW_RELIC_APP_NAME=Equipment Status Board
Restart all services after updating:
docker compose restart app worker
Both the web application and background worker will begin reporting to New Relic. Browser monitoring JavaScript is automatically injected into every page served by the application.
Verifying¶
After enabling, check the New Relic dashboard for your application name. You should see:
- APM data — web transactions, throughput, error rates, and response times
- Browser data — page load times, JavaScript errors, and AJAX calls from end users
If no data appears, check the app and worker logs for New Relic-related errors:
docker compose logs app | grep -i "new.relic\|newrelic"
Disabling¶
To disable New Relic, remove or comment out NEW_RELIC_LICENSE_KEY in your .env file and restart the services. When the license key is not set, no New Relic code is loaded and there is zero performance impact.
Monitoring and Alerting¶
Overview¶
ESB exposes Prometheus metrics on /metrics (unauthenticated; trusted-network deployment). Both the app and worker containers run with PYTHONUNBUFFERED=1 so log lines reach Loki/Promtail without buffering latency. The metrics are designed for direct Grafana panel use. This section is complementary to the optional New Relic integration above; it gives recommended signals, not a turnkey configuration.
Prometheus Metrics¶
Example scrape config:
scrape_configs:
- job_name: esb
metrics_path: /metrics
static_configs:
- targets: ['esb.example.com:5000']
| Metric | Type | Description | Emission |
|---|---|---|---|
esb_pending_notifications_count |
gauge | Number of rows in pending_notifications with status='pending' |
Always |
esb_oldest_pending_notification_timestamp_seconds |
gauge | Unix epoch seconds of the oldest pending row's created_at |
Omitted when queue empty (alert with absent()) |
esb_worker_last_iteration_timestamp_seconds |
gauge | Unix epoch seconds of the worker's last successful poll cycle (read from AppConfig.value) |
Omitted when worker has never run, or when the AppConfig query fails (alert with absent(), for: 5m minimum) |
esb_socket_mode_enabled |
gauge | 1 if init_slack entered the Socket Mode setup block (tokens set, not TESTING, opt-in flag true); 0 otherwise |
Always |
esb_socket_mode_connected |
gauge | 1 if a Bolt SocketModeHandler is currently bound; 0 otherwise. Transitions 1→0 at process shutdown. |
Always |
Example alert rules:
- alert: ESBNotificationQueueStuck
expr: time() - esb_oldest_pending_notification_timestamp_seconds > 300
for: 1m
annotations:
summary: "ESB notification worker is not draining the queue"
- alert: ESBWorkerStalled
expr: time() - esb_worker_last_iteration_timestamp_seconds > 120
for: 1m
annotations:
summary: "ESB notification worker has not iterated in 2+ minutes"
- alert: ESBWorkerNeverRan
expr: absent(esb_worker_last_iteration_timestamp_seconds)
for: 5m
annotations:
summary: "ESB worker has not produced a heartbeat row since deploy (or DB reset / transient query failure)"
- alert: ESBSocketModeFailedAtBoot
expr: (esb_socket_mode_enabled == 1 and esb_socket_mode_connected == 0) unless on(instance) up == 0
for: 5m
annotations:
summary: "ESB intended to run Slack Socket Mode but the handler failed at boot"
ESBWorkerStalled and ESBWorkerNeverRan are complementary and should both be loaded. ESBWorkerStalled detects "worker was alive recently but stopped iterating" — fires on a normal stall but doesn't fire when the metric is missing entirely. ESBWorkerNeverRan detects the metric-missing case — fires on cold-deploy time-to-first-poll AND on transient AppConfig query failures. Together they cover the full failure space.
Clock skew
time() - <gauge> rules mix Prometheus's clock with the worker container's clock. Run NTP on every node and pick the threshold ≥ 4× poll_interval (so 120s for the default 30s). Note the failure asymmetry: if the worker's clock runs behind Prometheus's, the rule fires aggressively; if the worker's clock runs ahead, time() - gauge goes negative and the rule is silent forever. NTP is mandatory, not optional.
Single-worker assumption
These metrics assume the current single-gunicorn-worker deployment (--workers 1). Scaling app-side gunicorn workers makes the Socket Mode metrics non-deterministic across scrapes.
Information disclosure
The Socket Mode gauges let any unauthenticated reader of /metrics distinguish "Slack not configured" from "Slack configured but failed at boot" from "Slack working." Acceptable on a trusted network; something to be aware of if /metrics is ever exposed more broadly.
ESBSocketModeFailedAtBoot shutdown safety
The rule includes unless on(instance) up == 0 to suppress the alert during a full app outage (where up == 0 is already the dominant signal) and uses for: 5m to absorb gunicorn worker reloads (--max-requests recycling) where _shutdown_socket() briefly leaves state at (1, 0). The on(instance) clause assumes targets share an instance label; if your relabeling adds richer labels (e.g. cluster, env) you may need unless on(job, instance) up == 0 to keep the join correct.
/metrics endpoint resilience
The endpoint returns HTTP 200 even when the app_config table is missing (e.g., on a fresh deployment that hasn't yet run flask db upgrade). The worker-timestamp metric is simply omitted; alert via ESBWorkerNeverRan. The first per-process query failure logs a full stack trace; subsequent failures log a one-line warning to avoid log flooding.
Container and Process Liveness¶
up{job="esb"} == 0 for ≥ 1 minute indicates the app is not responding to scrapes — covers OOM kills, gunicorn wedges, and network partitions. cAdvisor's container_last_seen (or its restart-count rate) catches container restart loops where the app keeps crashing fast enough that up{} may briefly recover between scrapes.
Log-Based Alerting (Loki)¶
ESB writes logs to stdout/stderr; both the app and worker containers run with PYTHONUNBUFFERED=1 so lines reach Promtail/Loki without Python's default block-buffering latency.
| What to detect | Source | Log substring |
|---|---|---|
| Worker poll-cycle failure (any exception in the loop body) | notification_service.py (worker outer-try) |
Error in worker polling loop |
| Slack delivery exception (per notification, app-log line) | notification_service.py (per-notification failure log) |
delivery failed: (trailing colon required — uniquely matches the failure-line format string 'Notification %d delivery failed: %s'; does NOT match the success log or the NotImplementedError log even though all three share the Notification %d ... prefix; also does not match the JSON mutation log) |
| Worker heartbeat write failure | notification_service.py (_write_heartbeat) |
Failed to update worker heartbeat at |
| Worker last-iteration write failure | notification_service.py (_record_iteration_timestamp) |
Failed to update worker last-iteration timestamp |
| Buggy iteration-timestamp helper | notification_service.py (worker-loop call site) |
BUG: _record_iteration_timestamp raised unexpectedly |
| Slack Socket Mode setup failure (import / instantiation / connect) | esb/slack/__init__.py (unified setup try) |
Failed to set up Slack Socket Mode |
/metrics AppConfig query failure (first per process) |
esb/services/metrics_service.py |
Failed to query worker_last_iteration_at from AppConfig |
| Generic ERROR-level traffic | any | ERROR (level) and/or Traceback |
Permanent-fail signal lives in the structured JSON mutation log. When a notification is permanently failed after MAX_RETRIES, the mutation logger writes a single-line JSON record to logger esb.mutations containing event: notification.permanently_failed. Two equivalent alerting options:
- Substring match on
notification.permanently_failed— simplest; works regardless of JSON-whitespace variation. - Promtail JSON-stage parsing — extract
eventas a structured Loki label. Use a Promtailmatchstage to apply the JSON parser only to lines starting with{, since the regular Python logger and the mutation logger share the same stdout stream.
Operators write their own LogQL queries and alert rules; this guide intentionally lists signals, not queries. Note that the substrings in the table above are stable today but unanchored — a future log message containing the same text would also match. For a future-resistant query, anchor with the Notification N prefix (regex Notification \d+ delivery failed: for the per-notification case), or filter by log level.
What to Alert On¶
- App down —
up{job="esb"} == 0for ≥ 1 m - Worker stalled — the
ESBWorkerStalledrule - Worker never ran since deploy / DB reset — the
ESBWorkerNeverRanrule - Notification queue stuck — the existing
ESBNotificationQueueStuckrule - Slack Socket Mode failed at boot — the
ESBSocketModeFailedAtBootrule (covers import, instantiation, and connect failures; suppressed during full app outage) - Elevated rate of Slack delivery failures — Loki on
delivery failed:substring exceeding a per-minute threshold - Container flapping — cAdvisor restart-count rate
Grafana Dashboards¶
The ESB metrics are designed for direct panel use — gauge panels for the Socket Mode and worker-timestamp metrics, time-series panels for the queue gauges, and log panels backed by the Loki substrings above. ESB does not ship a dashboard JSON; operators build the panels they actually want to watch.
Relationship to New Relic¶
New Relic (above) and the Prometheus/Loki/Grafana stack here observe different layers and are complementary. New Relic provides per-transaction APM and end-user browser monitoring; Prometheus provides system-health gauges and log-based alerting suitable for on-call paging. They can run together with no additional configuration.
Ongoing Maintenance¶
Viewing Logs¶
# Application logs
docker compose logs -f app
# Worker logs (notification delivery)
docker compose logs -f worker
# Database logs
docker compose logs -f db
Restarting Services¶
# Restart the web application
docker compose restart app
# Restart the notification worker
docker compose restart worker
# Restart all services
docker compose restart
Applying Updates¶
# Pull latest code
git pull
# Rebuild containers
docker compose build
# Restart with new images
docker compose up -d
# Apply any new database migrations
docker compose exec app flask db upgrade
Monitoring the Worker¶
The background worker processes pending notifications every 30 seconds. It includes retry logic with backoff for failed deliveries. Check the worker logs for:
- Successful notification deliveries
- Failed delivery attempts and retry counts
- Slack API errors (usually indicate an expired or invalid token)
docker compose logs -f worker
The worker container also exposes a Docker healthcheck driven by a heartbeat file (/tmp/worker_heartbeat) refreshed at three points: at startup, after each DB poll returns, and after each individual notification is processed. The healthcheck fails when the file is older than 180 seconds. To check current health:
docker inspect --format '{{.State.Health.Status}}' equipment-status-board-worker-1
If the worker is reported as unhealthy, the autoheal sidecar will restart it automatically (typically within a minute).
For metrics, log-based alerting, and recommended dashboards, see the Monitoring and Alerting section above. (External links to #prometheus-metrics continue to resolve — the new ### Prometheus Metrics subsection auto-generates the same anchor.)
Upload Storage¶
Uploaded files (equipment photos, documents, diagnostic images) are stored in the ./uploads/ directory, which is bind-mounted into the app container. Monitor disk usage on the host:
du -sh ./uploads/
Database¶
MariaDB data is persisted in the mariadb_data Docker volume. This volume survives container restarts and docker compose down. It is only removed if you explicitly run docker compose down -v (which deletes volumes — do not do this unless you intend to lose all data).
Troubleshooting¶
App won't start¶
- Check that
DATABASE_URLis correct and usesdbas the hostname (notlocalhost) when running in Docker - Verify the
dbservice is healthy:docker compose ps - Check app logs:
docker compose logs app
Slack commands not working¶
- Verify
SLACK_BOT_TOKENandSLACK_APP_TOKENare set correctly in.env - Verify Socket Mode is enabled in the Slack App settings and the app-level token has the
connections:writescope - Verify
SLACK_SOCKET_MODE_CONNECT=trueis set in the app service environment - Check that the Slack App has the required OAuth scopes
- Check app logs for Slack-related errors:
docker compose logs app | grep -i slack
Notifications not delivering¶
- Verify the worker is running:
docker compose ps worker - Check worker logs:
docker compose logs -f worker - Confirm
SLACK_BOT_TOKENis valid and the bot is installed to the workspace - Check that notification triggers are enabled in Admin > Config
Static page not updating¶
- Verify
STATIC_PAGE_PUSH_METHODandSTATIC_PAGE_PUSH_TARGETare set - Check that the worker is running (it handles the push)
- For
s3method: verify AWS credentials and bucket permissions - For
gcsmethod: verify Google Cloud credentials and bucket permissions - For
localmethod: verify the target directory exists and is writable - Check worker logs for push errors