# HostSeba NOC — Deployment Guide

End-to-end deployment of the NOC system on a single Linux server. Backend, frontend, WebSocket, and queue all run on the same box. For HA/scale: see § Scaling at the end.

---

## Components

| Component | Purpose | Where |
|---|---|---|
| **Backend** | Laravel 13 API + scheduler | `https://noc.hostseba.com` |
| **Frontend** | React 19 SPA | `https://noc.hostseba.com` (root) |
| **WebSocket** | Reverb live channel | `wss://ws.hostseba.com` |
| **Database** | MariaDB 10.6+ | local socket |
| **Queue / Cache** | Redis 7 | local socket |
| **Agents** | Bash / PowerShell | each monitored host |

---

## 1. Server prerequisites

cPanel/AlmaLinux 9 or Ubuntu 22.04+. Minimum: 2 vCPU, 4 GB RAM, 40 GB SSD.

```bash
# RHEL family
sudo dnf install -y php83 php83-php-{cli,common,fpm,mysqlnd,xml,mbstring,curl,zip,intl,bcmath,redis,opcache,gd}
sudo dnf install -y mariadb-server redis nginx certbot python3-certbot-nginx git unzip nodejs

# Debian/Ubuntu
sudo apt update
sudo apt install -y php8.3-{cli,fpm,mysql,xml,mbstring,curl,zip,intl,bcmath,redis,opcache,gd} \
                    mariadb-server redis-server nginx certbot python3-certbot-nginx git unzip nodejs
```

Composer:
```bash
curl -sS https://getcomposer.org/installer | sudo php -- --install-dir=/usr/local/bin --filename=composer
```

---

## 2. Database

```bash
sudo systemctl enable --now mariadb
sudo mysql_secure_installation

sudo mysql -u root <<EOF
CREATE DATABASE hostseba_noc CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
CREATE USER 'hostseba_noc'@'localhost' IDENTIFIED BY 'CHANGE_ME_STRONG_PASSWORD';
GRANT ALL ON hostseba_noc.* TO 'hostseba_noc'@'localhost';
FLUSH PRIVILEGES;
EOF
```

---

## 3. Backend deploy

```bash
sudo mkdir -p /var/www/noc.hostseba.com
sudo chown $USER:$USER /var/www/noc.hostseba.com
cd /var/www/noc.hostseba.com

# Drop the backend zip here and unzip
unzip ~/HostSebaNOC_backend_v3.zip
cd hostseba-noc-backend

composer install --no-dev --optimize-autoloader

cp .env.example .env
nano .env   # set DB_PASSWORD, REVERB_*, MAIL_*, APP_URL=https://noc.hostseba.com
php artisan key:generate

php artisan migrate
php artisan db:seed   # creates admin@hostseba.com / changeme-noyon — CHANGE IT IMMEDIATELY

php artisan storage:link

# Copy agent installers into the storage path served by web.php
mkdir -p storage/app/agent
unzip ~/HostSebaNOC_agent_v1.zip -d /tmp/agent
cp /tmp/agent/hostseba-noc-agent/linux/install.sh storage/app/install.sh
cp /tmp/agent/hostseba-noc-agent/linux/agent.sh storage/app/agent/agent.sh
cp /tmp/agent/hostseba-noc-agent/linux/uninstall.sh storage/app/agent/uninstall.sh
cp /tmp/agent/hostseba-noc-agent/windows/install.ps1 storage/app/install.ps1
cp /tmp/agent/hostseba-noc-agent/windows/agent.ps1 storage/app/agent/agent.ps1
cp /tmp/agent/hostseba-noc-agent/windows/uninstall.ps1 storage/app/agent/uninstall.ps1
cp /tmp/agent/hostseba-noc-agent/macos/install.sh storage/app/install-macos.sh

# Permissions
sudo chown -R nginx:nginx storage bootstrap/cache
sudo chmod -R 775 storage bootstrap/cache
```

### Reverb install
```bash
php artisan reverb:install
# Update REVERB_APP_KEY in .env from what install printed (or auto-set).
```

---

## 4. Frontend build

The frontend is the single React file. Bundle it into a normal Vite project that ships an `index.html`.

```bash
mkdir -p ~/noc-frontend && cd ~/noc-frontend
npm create vite@latest . -- --template react
npm install
npm install lucide-react

# Drop the V7 single-file App and wire it up
cp ~/HostSebaNOC_v7.jsx src/App.jsx

# index.html runtime config — points the SPA at the backend
cat > public/runtime-config.js <<'EOF'
window.__NOC_API_BASE__ = '/api';
window.__NOC_REVERB_KEY__ = 'YOUR_REVERB_KEY_HERE';
window.__NOC_REVERB_HOST__ = 'ws.hostseba.com';
window.__NOC_REVERB_PORT__ = 443;
window.__NOC_REVERB_SCHEME__ = 'wss';
EOF
# Add `<script src="/runtime-config.js"></script>` to index.html before main.jsx

npm run build
sudo cp -r dist/* /var/www/noc.hostseba.com/public/
```

---

## 5. nginx + TLS

```nginx
# /etc/nginx/sites-available/noc.hostseba.com
server {
    listen 80;
    server_name noc.hostseba.com;
    return 301 https://$host$request_uri;
}

server {
    listen 443 ssl http2;
    server_name noc.hostseba.com;

    ssl_certificate     /etc/letsencrypt/live/noc.hostseba.com/fullchain.pem;
    ssl_certificate_key /etc/letsencrypt/live/noc.hostseba.com/privkey.pem;

    root /var/www/noc.hostseba.com/hostseba-noc-backend/public;
    index index.php index.html;

    add_header X-Frame-Options DENY;
    add_header X-Content-Type-Options nosniff;
    add_header Referrer-Policy strict-origin-when-cross-origin;

    location / {
        try_files $uri $uri/ /index.php?$query_string;
    }

    location ~ \.php$ {
        include snippets/fastcgi-php.conf;
        fastcgi_pass unix:/run/php/php8.3-fpm.sock;
    }

    location ~ /\.(?!well-known).* { deny all; }

    client_max_body_size 25M;
}

# WebSocket reverse proxy
server {
    listen 443 ssl http2;
    server_name ws.hostseba.com;

    ssl_certificate     /etc/letsencrypt/live/ws.hostseba.com/fullchain.pem;
    ssl_certificate_key /etc/letsencrypt/live/ws.hostseba.com/privkey.pem;

    location / {
        proxy_pass http://127.0.0.1:8080;
        proxy_http_version 1.1;
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection "upgrade";
        proxy_set_header Host $host;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_read_timeout 86400;
    }
}
```

```bash
sudo ln -s /etc/nginx/sites-available/noc.hostseba.com /etc/nginx/sites-enabled/
sudo certbot --nginx -d noc.hostseba.com -d ws.hostseba.com
sudo nginx -t && sudo systemctl reload nginx
```

---

## 6. systemd units

### Queue worker
```ini
# /etc/systemd/system/noc-queue.service
[Unit]
Description=HostSeba NOC queue worker
After=mariadb.service redis.service

[Service]
User=nginx
Group=nginx
Restart=always
RestartSec=5s
WorkingDirectory=/var/www/noc.hostseba.com/hostseba-noc-backend
ExecStart=/usr/bin/php artisan queue:work --queue=alerts,notifications,default --sleep=1 --tries=3 --max-time=3600

[Install]
WantedBy=multi-user.target
```

### Reverb daemon
```ini
# /etc/systemd/system/noc-reverb.service
[Unit]
Description=HostSeba NOC Reverb WebSocket
After=network.target

[Service]
User=nginx
Group=nginx
Restart=always
RestartSec=5s
WorkingDirectory=/var/www/noc.hostseba.com/hostseba-noc-backend
ExecStart=/usr/bin/php artisan reverb:start --host=127.0.0.1 --port=8080

[Install]
WantedBy=multi-user.target
```

### Scheduler (cron)
```cron
* * * * * cd /var/www/noc.hostseba.com/hostseba-noc-backend && php artisan schedule:run >> /dev/null 2>&1
```

```bash
sudo systemctl daemon-reload
sudo systemctl enable --now noc-queue.service noc-reverb.service
```

---

## 7. First login + add a host

1. Browse to `https://noc.hostseba.com` — log in as `admin@hostseba.com` / `changeme-noyon`
2. Settings → change password immediately
3. Settings → Security → Enable 2FA
4. Servers → Add Host → generate token → copy install command
5. SSH into the target server, paste the command, watch it appear in the dashboard

---

## 8. Testing checklist

```bash
# Backend
cd /var/www/noc.hostseba.com/hostseba-noc-backend
php artisan test                              # Pest suite
php artisan route:list | grep agent           # confirm agent endpoints live
curl https://noc.hostseba.com/health          # 200

# Agent install on a test server
curl -fsSL https://noc.hostseba.com/install.sh | sudo bash -s -- --token noc_test_xxx

# After ~60s, check
sudo journalctl -u hostseba-noc-agent -n 50
sudo tail -f /var/log/hostseba-noc/agent.log

# Backend confirms
mysql -u hostseba_noc -p hostseba_noc -e "SELECT name, status, cpu, ram, last_metric_at FROM servers;"

# WebSocket
# Open DevTools console at https://noc.hostseba.com — should see "ws: connected" in topbar
```

---

## 9. Hardening reminder

| What | Why | How |
|---|---|---|
| Change default admin password | Default seed has a known password | First login → settings |
| Enable 2FA on all admins | Compromised cookie ≠ access | Settings → Security |
| Rotate Reverb key | If it was logged anywhere | `php artisan reverb:install` again |
| Firewall | Only 80/443 inbound to web | `ufw allow 80,443/tcp` |
| Backup strategy | Nightly DB + storage dumps | `mysqldump` cron + s3 sync |
| Monitor the monitor | NOC down ≠ visible | External uptime check (UptimeRobot etc) |
| Restrict registration tokens | Don't share tokens publicly | Tokens auto-expire 60 min default |

---

## 10. Scaling notes

For >100 monitored hosts:

- **Queue**: split worker per queue (alerts, notifications, default) on separate processes
- **Reverb**: run multiple instances behind a sticky-session LB; share Redis for pub/sub
- **DB**: enable `server_metrics` partitioning by month; archive >30d into a cold table
- **Cache**: separate Redis instances for cache vs queue
- **Backups**: replicate MariaDB read replica for reports queries

---

## 11. Common errors

| Error | Fix |
|---|---|
| `419 unknown status` on login | CSRF cookie not being set. Check `SANCTUM_STATEFUL_DOMAINS` and `SESSION_DOMAIN` in `.env` |
| Agent stuck on "registering" | Backend's `agent_secret_hash` already set on a server with same name. Drop server in dashboard, retry |
| `ws: offline` permanent | Reverb daemon down (`systemctl status noc-reverb`) or wrong key in `runtime-config.js` |
| Metrics arrive but charts empty | `noc:aggregate-metrics` hasn't run yet (hourly buckets used for >1h ranges). Wait an hour or run manually |
| Agent CPU shows 0% | `top -bn2` not available on minimal containers. Install `procps`/`procps-ng` |

---

## 12. Notification system setup

The dashboard ships with six channel types. None are pre-configured — each needs credentials before it can deliver. Test every channel after configuring with the **Send test** button before linking it to alert rules in production.

### 12.1 Email (custom SMTP)

Settings → Channels → **Email** → expand. Fill in your SMTP credentials. The recommended setup for HostSeba is your own webmail server:

| Field | Value |
|---|---|
| SMTP host | `mail.hostseba.com` (or your provider's hostname) |
| SMTP port | `587` (STARTTLS) or `465` (SSL) |
| Encryption | `tls` for port 587, `ssl` for port 465 |
| SMTP username | `noc@hostseba.com` |
| SMTP password | The mailbox password |
| From address | `noc@hostseba.com` |
| From name | `HostSeba NOC` |
| Org recipients | comma-separated list — these are the always-CC'd team members |

**Verify**: Click **Send test** and watch for the success banner. Check the recipient inbox.

If the test fails with `Connection could not be established`, your firewall is blocking outbound 587/465. On cPanel hosts run:
```bash
sudo iptables -A OUTPUT -p tcp --dport 587 -j ACCEPT
```

### 12.2 WhatsApp (three providers)

Settings → Channels → **WhatsApp** → expand. Pick a provider from the dropdown:

#### Provider: WhatsApp Cloud API (Meta — recommended for production)
1. Go to <https://developers.facebook.com/apps/> → Create app → "Business" type
2. Add the WhatsApp product. Get the temporary access token (24h) from the dashboard.
3. Note the **Phone number ID** (numeric, shown in the WhatsApp panel).
4. For permanent tokens: System User in Business Manager → generate a permanent token with `whatsapp_business_messaging` scope.

| Field | Value |
|---|---|
| access_token | Your Bearer token |
| phone_number_id | e.g. `109876543210` |
| api_version | `v18.0` (default) |

For the test recipient field use a number that's been added as a test recipient in the Meta dashboard (or any number once your business is verified).

**Free tier**: 1000 service conversations / month after business verification.

#### Provider: Twilio
1. Sign up at <https://twilio.com> — get $15 free credit
2. Console → Messaging → Try it → Send a WhatsApp message → enable Sandbox
3. Copy your Account SID and Auth Token

| Field | Value |
|---|---|
| account_sid | Starts with `AC...` |
| auth_token | Console secret |
| from_number | `+14155238886` (sandbox) or your purchased WhatsApp number |

For sandbox testing, the recipient must first text `join <keyword>` to your sandbox number.

#### Provider: WAHA (self-hosted, free)
Run WAHA in Docker on the NOC server:
```bash
docker run -d --restart unless-stopped \
    --name noc-waha \
    -p 3000:3000 \
    -e WHATSAPP_API_KEY=your-secret-key-here \
    devlikeapro/waha:latest
```

Open `http://your-server:3000` in browser → scan QR with the WhatsApp account you want to use for sending.

| Field | Value |
|---|---|
| base_url | `http://localhost:3000` (or hostname) |
| api_key | The `WHATSAPP_API_KEY` you set |
| session | `default` |

⚠️ Don't use a personal WhatsApp number for high-volume alerts. Get a dedicated SIM. WhatsApp may flag accounts that spam.

### 12.3 Telegram

1. Talk to `@BotFather` → `/newbot` → name it `HostSebaNOC` → get the bot token
2. Add the bot to your team's group → make it admin
3. Send a message in the group, then visit `https://api.telegram.org/bot<token>/getUpdates` to find the `chat_id` (negative number)

| Field | Value |
|---|---|
| bot_token | From BotFather |
| chat_id | e.g. `-1001234567890` |

### 12.4 Discord / Slack

Both use incoming webhooks:

- **Discord**: Server settings → Integrations → Webhooks → New webhook → copy URL
- **Slack**: <https://api.slack.com/apps> → Create app → Incoming webhooks → enable → add to channel → copy URL

Paste into the `webhook_url` field. No further config.

### 12.5 Custom webhook

For integrating with your own systems (PagerDuty, OpsGenie, custom dashboards). The payload posted is:
```json
{
  "subject": "🔴 [critical] Server down: web-09.hostseba",
  "text": "...",
  "structured": { "title": "...", "fields": [...] },
  "sent_at": "2026-05-07T12:34:56+06:00"
}
```

If you set a `secret`, an `X-NOC-Signature: sha256(secret + payload)` header is added for verification.

### 12.6 Per-user preferences

Each team member can override the org defaults in Settings → **My alerts**:

- **Channels**: enable email/WhatsApp/Telegram/Discord per-user, with personal addresses
- **Severity filter**: only critical+high by default. Junior team members might prefer everything; managers might want only critical
- **Quiet hours**: e.g. 22:00–07:00 Asia/Dhaka. Critical alerts pierce by default — enable the override checkbox to block even those
- **Daily digest**: instead of per-event sends, get one email at e.g. 09:00 summarising the previous 24 hours

Per-user channels apply on top of org channels — both fire. Disable email at user-level to opt out of the org email blast for yourself.

### 12.7 Operations: queue worker

Notifications are dispatched via a `notifications` queue. Run the dedicated worker:
```ini
# /etc/supervisor/conf.d/noc-notifications.conf
[program:noc-notifications]
process_name=%(program_name)s_%(process_num)02d
command=php /var/www/noc-backend/artisan queue:work --queue=notifications --sleep=1 --tries=1 --max-time=3600
autostart=true
autorestart=true
numprocs=2
user=www-data
```

`--tries=1` is intentional — retry logic is managed by `NotificationDelivery.next_retry_at` rather than the queue, so we can use exponential backoff and surface failures in the UI.

### 12.8 Operations: delivery log

Settings → **Delivery log** shows every send attempt across every channel: who got what, when, and why it failed. Filter by status / channel.

Failed deliveries auto-retry 5 times with exponential backoff (30s → 2m → 8m → 32m → 2h). After that they enter `failed` permanently — manual retry available via the ↻ button.

### 12.9 Composer dependency

The notification system uses Laravel's HTTP client and Symfony Mailer (both included in Laravel). No extra `composer require` needed unless you want server-side image generation in templates — not currently used.


---

## 13. Auth: password reset flow

The dashboard has a self-service password reset flow plus admin-triggered welcome emails.

### 13.1 Forgot password

User clicks "Forgot password?" on the login page → enters email → backend mails a reset link valid for 60 minutes. The endpoint always returns success regardless of whether the email exists, to prevent enumeration attacks.

Endpoints:
- `POST /api/auth/forgot-password` — body: `{ email }`, throttled per-IP and per-email
- `POST /api/auth/reset-password` — body: `{ token, password, password_confirmation }`

The reset URL points at `/reset-password?token=...` on the SPA. You'll need a frontend route that calls the reset endpoint with the token.

### 13.2 Welcome emails for new users

When an admin creates a user via `POST /api/users`, the API automatically sends a welcome email with a 24-hour setup link. Set `send_welcome_email: false` in the request to skip this (e.g. for test users with pre-known passwords).

The setup URL points at `/setup-account?token=...` and uses the same token table.

### 13.3 2FA recovery codes by email

When a user enables 2FA, the recovery codes are also emailed (in addition to being shown on screen). This is best-effort — if email fails, enrollment still succeeds.

### 13.4 Email templates

All transactional emails share a layout at `resources/views/emails/layout.blade.php`. Per-purpose templates:
- `emails/user-welcome.blade.php`
- `emails/password-reset.blade.php`
- `emails/two-factor-enabled.blade.php`

Customise the brand colors / footer copy by editing the layout. Inline-styled because email clients are inconsistent with `<style>` tags.

### 13.5 Migrations needed

Run after pulling this version:
```bash
php artisan migrate
# Adds: password_reset_tokens_v2 table
```

---

## 14. Alert rules + escalation policies

### 14.1 Alert rule routing

Each alert rule routes to **one** of:
- A flat channel list (`channels: ['email', 'whatsapp']`) — every channel fires immediately
- An escalation policy (`escalation_policy_id: 5`) — multi-step routing

Set both at once and the API rejects with 422.

### 14.2 Escalation policy structure

A policy has 1+ steps. Each step:
- `delay_seconds`: wait this long after the previous step before firing (0 = immediate, alongside)
- `channels`: array of channel keys — `['email','whatsapp']`
- `targets`: who to notify — `{ users: [1,2], roles: ['admin'], external: ['+8801...'] }`
- `only_if_unacknowledged`: skip this step if someone already acked the incident (default true)

Example: critical web servers
```
Step 1 (immediate):  Email + Telegram → on-call engineer
Step 2 (5 min):      WhatsApp + Email → admin role
Step 3 (15 min):     WhatsApp + Discord → admin + oncall roles
```

### 14.3 Endpoints

```
GET    /api/alert-rules              List rules with policy names
POST   /api/alert-rules              Create rule
PATCH  /api/alert-rules/{id}         Update
DELETE /api/alert-rules/{id}         Delete

GET    /api/escalation-policies      List policies + steps
POST   /api/escalation-policies      Create policy with steps array
PATCH  /api/escalation-policies/{id} Update (replaces steps wholesale)
DELETE /api/escalation-policies/{id} Delete (cascades to steps)
```

### 14.4 UI

Settings → **Alert rules** has a slide-over editor with a routing-mode toggle (Channels vs Policy). Settings → **Escalation** is the policy CRUD with a per-step editor.

---

## 15. Tests

The new notification system is covered by `tests/Feature/NotificationTest.php`:
- Channel test endpoint (admin-only, records `last_test_at`)
- Per-user preference get/save
- Delivery log filtering + retry
- Quiet-hours suppression logic
- WhatsApp provider factory + BD phone normalisation
- Escalation policy creation
- Alert rule routing validation (channels XOR policy)

Run with:
```bash
php artisan test --filter=NotificationTest
```

---

## 16. Hosting-specific monitoring (cPanel/LiteSpeed)

The agent on each cPanel/LiteSpeed server collects telemetry beyond just CPU/RAM. Each metric type has its own ingestion endpoint and dashboard page.

### 16.1 Mail queue monitoring

The agent calls `exim -bpc` and `exim -bpr` (Postfix uses `postqueue -p`) every 5 minutes and posts to `POST /api/agent/mail-queue`. Tracked: total queue size, frozen messages, deferred messages, top sending domain, bounces in the last hour.

A spike detector inside the controller compares against the 24h rolling average — if frozen count is 3x baseline (and >50), an incident opens automatically with detector key `mail_queue_spike`.

**Dashboard:** Hosting → Mail Queue. Sorts servers by frozen count (worst first).

### 16.2 Email send analytics

Agent parses `/var/log/exim_mainlog` for the previous hour, aggregates by sender domain, and posts hourly to `POST /api/agent/email-stats`. The controller computes a 7-day rolling baseline per (server, domain, hour-of-day) — anything >5x baseline AND >100 mails creates an `OutgoingMailAbuse` row.

**Dashboard:**
- Hosting → Top Senders — ranked list with anomaly highlighting
- Hosting → Mail Abuse — pending review queue with suspend/clear/false-positive actions

This is your front-line defense against compromised accounts. A single compromised customer can blacklist the whole IP within hours.

### 16.3 Backup monitoring (JetBackup)

Two integration points:

1. **Agent reports backup runs** — when JetBackup finishes a job, your post-script calls:
   ```bash
   /opt/hostseba-noc/agent-hosting.sh report-backup jetbackup "Daily-Local" success
   /opt/hostseba-noc/agent-hosting.sh report-backup jetbackup "Daily-Local" failed "Disk full"
   ```
2. **Stale backup detector** — hourly cron job. Any active server without a successful backup in the last 30 hours opens an incident (detector key `backup_stale`). Cleared automatically when a fresh backup lands.

Backup failures open incidents immediately at `high` severity. Partial failures (some accounts skipped) at `medium`.

**Dashboard:** Hosting → Backups. Shows 7 days of runs with summary tiles + the list of stale servers at the top.

### 16.4 Per-partition disk monitoring + inodes

Agent walks every real filesystem (skips tmpfs, devtmpfs) and reports byte + inode usage every 5 minutes via `POST /api/agent/disks`. **Inodes are tracked separately** — a mailbox-heavy hosting box can run out of inodes while the disk is half empty.

The dashboard's server detail page renders a row per partition with two progress bars (bytes + inodes). Alert rules can fire on either.

### 16.5 MySQL/MariaDB metrics

Agent reads from `information_schema.GLOBAL_STATUS` every minute. Tracks: connection saturation, slow queries, queries/sec, deadlocks, replication lag (if configured).

**Setup**: drop a read-only MySQL user's credentials at `/home/hostseba-noc/.my.cnf`:
```
[client]
user=monitor_ro
password=...
```

The user only needs `PROCESS, REPLICATION CLIENT, SHOW VIEW` privileges.

### 16.6 cPanel/WHM monitoring

Counts accounts (active vs suspended), email accounts, databases, domains. Tracks license expiry — a missed cPanel license renewal locks customers out, so this is a critical metric. The agent reads `/etc/userdomains`, `/etc/trueuserdomains`, `/var/cpanel/expired`.

### 16.7 LiteSpeed monitoring

Reads `/tmp/lshttpd/.rtreport*` files for real-time stats: concurrent connections, requests/sec, bandwidth, cache hit ratio, vhost count. The cache hit ratio is the headline metric for LiteSpeed — anything below 80% means LSCache isn't earning its keep.

### 16.8 Firewall events (CSF/lfd/ModSecurity)

Agent tails `/var/log/lfd.log` every 2 minutes, parses brute-force attempts, scanner hits, and blocks. Posts to `POST /api/agent/firewall` in batches.

**Dashboard:** Hosting → Firewall Events. Includes top attacking IPs (last 24h) with country codes, plus a chronological event log.

### 16.9 Process monitoring

Agent checks for httpd, nginx, mysqld, mariadbd, lsws, exim, postfix, dovecot, named every minute via `pgrep`. If a critical process (mysqld, lsws, exim) is down → critical incident, otherwise high.

**Dashboard:** Server detail → Services tab.

### 16.10 Cron job tracking

Agent reads `/var/spool/cron/*` every 15 minutes and reports the registry. CronCheck rows store the schedule + last-run info. The `is_overdue` computed property identifies crons that should have run but didn't.

**Dashboard:** Hosting → Cron Jobs. "Overdue only" filter for triage.

### 16.11 Daily checks (run by scheduler, not agent)

```
05:00 UTC  noc:check-blacklists  RBL/DNSBL lookups for each server IP
05:30 UTC  noc:check-mail-auth   SPF/DKIM/DMARC validation per domain
hourly     noc:check-stale-backups  Open incidents for servers w/o recent backup
```

Both blacklist and mail-auth checks are rate-limited within Bangladesh's typical DNS query budget (5 lists × ~30 servers = 150 queries/day, well under any RBL's free tier).

### 16.12 Migration

```bash
php artisan migrate
# Adds:
#   email_queue_snapshots, email_send_stats, backup_runs, disk_partitions,
#   mysql_snapshots, cpanel_snapshots, litespeed_snapshots, firewall_events,
#   blacklist_checks, outgoing_mail_abuse, mail_auth_records,
#   process_checks, cron_checks, service_uptime
```

### 16.13 Status page (public)

Settings → Status Page lets staff preview what `status.hostseba.com` would show. A public read-only mirror is recommended — your customers shouldn't have to email support to know whether their email is broken because of you or them.

### 16.14 On-call scheduling

Operations → On-Call. Simple weekly rotation. The current on-call person is highlighted; escalation policies can route the `oncall` role to whoever's currently active.

**Roadmap items for next iteration:** runbooks linked to incidents, custom dashboards, multi-server batch actions, mobile push notifications.

---

## 17. Monitoring-only policy

**This platform observes. It does not act.**

### What this means in practice

The dashboard never:
- restarts services on your servers
- suspends or unsuspends customer accounts
- modifies firewall rules
- blocks IPs
- executes any command on any server
- modifies any file outside its own database

When the platform detects a problem, it does **two** things and only those two:
1. Records the observation in the database
2. Sends notifications through configured channels (email, WhatsApp, Telegram, etc.)

The operator (a human) reviews and decides what to do. They take action through their usual tools — SSH, WHM, control panels, etc. The platform tracks what was decided (status fields) but does not perform the action itself.

### Why this design?

- **Lower blast radius**: a bug in the platform can never break a server. The worst case is a missed notification.
- **No credential sprawl**: the platform never needs SSH keys, WHM API tokens, or firewall management credentials. The agent runs as an unprivileged user that can read but not modify.
- **Auditability**: every action that affects infrastructure is performed by an identified human. No ambiguous "the system did it".
- **Compliance-friendly**: separation of detection from response is what audit frameworks expect.

### What the agent CAN do

The agent script (Linux/Windows/macOS) is **read-only by design**. It runs as `hostseba-noc:hostseba-noc` (a dedicated unprivileged user) with these capabilities only:
- read system metrics (`/proc`, `df`, `ps`, etc.)
- read log files it has been granted access to (typically `exim_mainlog`, `lfd.log`, etc.)
- query MySQL with a read-only monitoring user (PROCESS, REPLICATION CLIENT, SHOW VIEW)
- POST observations to the dashboard

It explicitly cannot:
- run as root or any privileged user
- write to anywhere except `/var/log/hostseba-noc/`
- be remotely instructed to do anything (the dashboard has no agent-control channel)
- accept inbound network connections

### What the dashboard surfaces

For each detected issue, you'll see:
- **What** was detected (the observation)
- **When** it started
- **Where** (which server, which account, which domain)
- **A copy-pasteable command** (where applicable) you can run yourself

For example, when a service is detected as down:
- Notification: "exim is down on mail-02"
- Dashboard shows: detected time, last-seen time, severity
- A "Copy restart command" button gives you `sudo systemctl restart exim` to paste into your own terminal

### Mail abuse status vocabulary

When the platform detects outgoing mail abuse, it records the case but never suspends. Status flow:

```
detected         Platform spotted the anomaly. No human action yet.
acknowledged     A team member is investigating.
suspended_by_op  Operator manually suspended the account in WHM.
cleared          Investigated, legitimate traffic.
false_positive   Tunes baseline so it won't trigger next time.
```

Each transition is logged with the operator who made the call.

### Endpoints that were intentionally removed

Old versions of this platform had endpoints like `POST /api/servers/{id}/services/restart` that issued SSH commands. These are removed. The endpoint paths still exist but return `410 Gone` so old clients fail loudly rather than silently:

```http
POST /api/servers/{id}/services/restart
HTTP/1.1 410 Gone
{
  "error": "monitoring_only",
  "message": "This is a monitoring-only platform. ..."
}
```

### Pause vs disable

`POST /api/servers/{id}/disable-agent` does NOT touch the agent on the server. It only flips `is_active=false` in our database, so we stop showing data for that server. The agent on the host keeps running unchanged. To actually stop the agent, you uninstall it via the host's shell.
