Jaypore CI

> Jaypore CI: Minimal, Offline, Local CI system.
Log | Files | Refs | README | LICENSE

ARCH.md (13771B)


      1 # JCI Architecture
      2 
      3 JCI is a local-first CI system. CI results are stored directly inside the git repository
      4 as regular git objects under custom refs, so they travel with the repo on push/pull.
      5 `git jci server` and `git jci runner` extend this with a webhook-driven, distributed
      6 CI execution layer — the server is a thin coordination point; all actual CI work happens
      7 inside runner-managed Docker containers.
      8 
      9 ---
     10 
     11 ## Repository layout
     12 
     13 ```
     14 cmd/git-jci/main.go   entry point, CLI dispatch
     15 internal/jci/
     16   run.go              execute CI for a commit
     17   git.go              low-level git plumbing helpers
     18   web.go              HTTP server + SPA (single-file, inline)
     19   push.go             push CI refs to a remote
     20   pull.go             fetch CI refs from a remote
     21   prune.go            delete old CI refs locally or on a remote
     22   cron.go             cron subcommands: ls, sync
     23   server.go           coordination server (webhook + runner poll)
     24   runner.go           runner (Docker job dispatch)
     25 www_jci/              project website (separate, not embedded in the binary)
     26 ```
     27 
     28 ---
     29 
     30 ## Ref layout inside git
     31 
     32 ```
     33 refs/jci-runs/<commit>/<run-id>   every run result, including one-off manual runs
     34 ```
     35 
     36 Every run — whether triggered manually, by cron, or by the distributed runner — gets a
     37 unique run ID (`<unix_timestamp>-<4_random_hex_chars>`), so multiple runs on the same
     38 commit are stored independently and none overwrite another.
     39 
     40 Each ref points to a git **commit** whose **tree** holds the CI artefacts:
     41 
     42 ```
     43 status.txt        "ok" | "err" | "running"
     44 run.output.txt    combined stdout+stderr from .jci/run.sh
     45 index.html        standalone HTML view of the run (generated only if the job
     46                   did not produce its own index.html)
     47 <anything else>   files written by the CI script to $JCI_OUTPUT_DIR
     48 ```
     49 
     50 Because these are normal git objects, `git push origin 'refs/jci-runs/*:refs/jci-runs/*'`
     51 is all it takes to share results with a team.
     52 
     53 ---
     54 
     55 ## CI execution flow (`git jci run`)
     56 
     57 ```
     58 1. GetCurrentCommit()          resolve HEAD → commit hash
     59 2. generateRunID()             timestamp + 2 random bytes → unique run ID
     60 3. check .jci/run.sh exists
     61 4. mkdir .jci/<commit>/        temporary output directory
     62 5. write status.txt = "running"
     63 6. exec bash .jci/run.sh       working dir = output dir
     64                                 env: JCI_COMMIT, JCI_REPO_ROOT, JCI_OUTPUT_DIR
     65                                 stdout+stderr captured to run.output.txt
     66 7. write status.txt = "ok"|"err"
     67 8. generateIndexHTML()         only if index.html was NOT written by the job
     68 9. StoreTree()                 git hash-object each file → git mktree → git commit-tree
     69                                 → git update-ref refs/jci-runs/<commit>/<runID>
     70 10. rm -rf .jci/<commit>/      clean up temp dir
     71 ```
     72 
     73 The CI script receives three environment variables and should write any extra
     74 artefacts into `$JCI_OUTPUT_DIR`. Whatever exists in that directory when the
     75 script exits is committed to git.
     76 
     77 ---
     78 
     79 ## Web UI (`git jci web [port]`)
     80 
     81 A minimal three-panel SPA served entirely from a single Go handler.
     82 No external assets; the HTML/CSS/JS is embedded inline in `showMainPage()`.
     83 
     84 ```
     85 GET /api/branches              list local branch names
     86 GET /api/commits?branch=&page= paginated commit list with CI status per commit
     87 GET /api/commit/<hash>         commit detail + file list (latest run for that commit)
     88 GET /api/commit/<hash>/<runId> same but for a specific run
     89 
     90 GET /jci/<commit>/<runId>/<file>/raw    serve file from refs/jci-runs/<commit>/<runId>
     91 GET /jci/<commit>/<file>/raw            serve file from the latest run for <commit>
     92 GET /jci/...  (other)                   serve SPA shell (JS handles routing)
     93 GET /                                   serve SPA shell
     94 ```
     95 
     96 The UI keeps a client-side page index for infinite-scroll commit loading
     97 (`commitsPageSize = 100` per page).
     98 
     99 ---
    100 
    101 ## Cron integration (`git jci cron`)
    102 
    103 **`cron ls`** — shows what is configured in `.jci/crontab` and what is currently
    104 installed in the user's system crontab.
    105 
    106 **`cron sync`** — idempotent sync from `.jci/crontab` → system crontab:
    107 
    108 ```
    109 1. parse .jci/crontab          5-field schedule + optional branch:X name:Y
    110 2. crontab -l                  read current system crontab
    111 3. strip lines containing      # JCI:<sha256(repoRoot)>
    112 4. append new lines, one per entry:
    113      <schedule>  cd <repoRoot> && [git checkout <branch> &&] git-jci run  # JCI:<id> [<name>]
    114 5. crontab -                   install new crontab
    115 ```
    116 
    117 Each repo is identified by `sha256(absolute_path)` so entries from different
    118 repos never collide.
    119 
    120 ---
    121 
    122 ## Push / Pull
    123 
    124 ```
    125 git jci push [remote]   discover all local refs/jci-runs/* not on remote → git push each
    126 git jci pull [remote]   git fetch refs/jci-runs/*:refs/jci-runs/*
    127 ```
    128 
    129 ---
    130 
    131 ## Prune
    132 
    133 Removes CI refs to reclaim space. Works on local repo or a remote.
    134 
    135 ```
    136 git jci prune [--older-than=<duration>] [--commit]
    137 git jci prune --on-remote=<remote> [--older-than=<duration>] [--commit]
    138 ```
    139 
    140 Duration format: `30d`, `2w`, `6m`, `4h`, or any Go duration string.
    141 Without `--commit` the command is a dry run.
    142 After local deletion it runs `git gc --prune=now` to actually free objects.
    143 
    144 ---
    145 
    146 ## Distributed CI (`git jci server` / `git jci runner`)
    147 
    148 Both commands are added to the existing `git-jci` binary and run in the foreground,
    149 managed by Docker or systemd.
    150 
    151 ```
    152 Gitea ──webhook──▶ jci server ◀──poll── jci runner (Docker container)
    153                        │                      │
    154                        │  Gitea API           │  docker socket
    155                        ▼                      ▼
    156                     Gitea                 job container
    157                   (set status)          (git-jci run → git-jci push → Gitea)
    158 ```
    159 
    160 ### Server (`git jci server`)
    161 
    162 #### Configuration (env vars)
    163 
    164 ```
    165 GITEA_HOST=gitea.example.com
    166 GITEA_USER=gitea_service_user
    167 GITEA_TOKEN=<token>              # must have permission to create/delete scoped tokens
    168 GITEA_WEBHOOK_SECRET=<hmac secret>
    169 RUNNER_SECRET=<shared secret for runners>
    170 JCI_MAX_JOBS=4                   # max concurrent jobs assignable to a single runner
    171 JCI_JOB_TIMEOUT=60m              # server-wide job timeout (default: 60 minutes)
    172 ```
    173 
    174 #### SQLite schema
    175 
    176 ```sql
    177 -- Known runners; auto-created on first poll.
    178 CREATE TABLE runners (
    179     runner_id   TEXT PRIMARY KEY,
    180     last_seen   DATETIME NOT NULL
    181 );
    182 
    183 -- Active (pending or running) jobs only.
    184 -- Completed/timed-out jobs are deleted immediately.
    185 CREATE TABLE jobs (
    186     job_id        TEXT PRIMARY KEY,   -- UUID
    187     repo_owner    TEXT NOT NULL,
    188     repo_name     TEXT NOT NULL,
    189     commit_sha    TEXT NOT NULL,      -- idempotency key
    190     runner_id     TEXT,               -- NULL = unassigned
    191     assigned_at   DATETIME,
    192     expires_at    DATETIME,           -- assigned_at + JCI_JOB_TIMEOUT
    193     gitea_token   TEXT NOT NULL,      -- one-time scoped token for this job
    194     status_cache  TEXT,               -- last known status ("pending"|"running"|"success"|"failure")
    195     cache_until   DATETIME            -- when status_cache expires (15s TTL)
    196 );
    197 ```
    198 
    199 #### Startup check
    200 
    201 On startup the server verifies that `GITEA_TOKEN` has permission to create and delete
    202 per-repo tokens via the Gitea API. If the check fails, the server exits with a clear
    203 error. No silent degradation.
    204 
    205 #### Webhook handling (`POST /webhook`)
    206 
    207 1. Verify HMAC-SHA256 signature using `GITEA_WEBHOOK_SECRET`. Return 400 on failure.
    208 2. Extract `repo.owner`, `repo.name`, `commit_sha` (from `after` field on push events).
    209 3. Check `jobs` table for an existing active job with the same `commit_sha`. If found,
    210    respond 200 and drop — no duplicate jobs.
    211 4. Insert a new job row with `runner_id = NULL`, `status_cache = "pending"`.
    212 5. Respond 200 immediately.
    213 
    214 #### Runner poll endpoint (`POST /poll`)
    215 
    216 Request body: `{ "runner_id": "...", "secret": "..." }`
    217 
    218 1. Verify `secret == RUNNER_SECRET`. Return 403 on failure.
    219 2. Upsert runner row (`runner_id`, `last_seen = now()`). This is auto-registration.
    220 3. Count active jobs already assigned to this runner.
    221    - If count >= `JCI_MAX_JOBS`: respond 429 with `Retry-After: 5`.
    222 4. Poll Gitea for cached status of each assigned job (see *Status polling* below).
    223    Completed jobs are deleted from the DB.
    224 5. Pick one unassigned job from the queue (FIFO per-repo, round-robin across repos).
    225    If none: respond 200 with `{ "job": null }`.
    226 6. For the selected job:
    227    a. Create a fresh scoped Gitea token (scoped to that repo, expiry = `now + JCI_JOB_TIMEOUT`) IF POSSIBLE. Some versions of gitea don't allow this at all.
    228    b. Set `runner_id`, `assigned_at`, `expires_at`, store token in `jobs.gitea_token`.
    229    c. Set Gitea commit status to `"pending"`.
    230 7. Respond 200 with the job payload:
    231 
    232 ```json
    233 {
    234   "job": {
    235     "job_id":     "...",
    236     "clone_url":  "https://<user>:<token>@gitea.example.com/owner/repo",
    237     "commit_sha": "...",
    238     "repo_owner": "...",
    239     "repo_name":  "..."
    240   }
    241 }
    242 ```
    243 
    244 #### Status polling (server → Gitea)
    245 
    246 The server checks job status by reading `refs/jci-runs/<commit>/<runID>` on Gitea and
    247 inspecting `status.txt` in that ref's tree.
    248 
    249 - Results are cached per-job for **15 seconds** (`jobs.cache_until`).
    250 - A check is triggered every time the assigned runner calls `/poll`.
    251 - A final check is performed at `jobs.expires_at` (60-minute timeout); if still
    252   unresolved, the job is deleted and Gitea status is set to `"failure"`.
    253 - On `status.txt = "ok"` or `"err"` the server:
    254   1. Sets Gitea commit status to `"success"` or `"failure"`.
    255   2. Deletes the one-time token via the Gitea API.
    256   3. Deletes the job row from SQLite.
    257 
    258 #### Token cleanup
    259 
    260 Two-layer cleanup for the one-time Gitea token:
    261 1. **Gitea-native expiry**: token created with hard expiry at `assigned_at + 60m`.
    262 2. **Server-side deletion**: explicit delete via Gitea API on job completion or timeout.
    263 
    264 This ensures the token cannot be used beyond 60 minutes even if the server crashes.
    265 
    266 ---
    267 
    268 ### Runner (`git jci runner`)
    269 
    270 #### Configuration (env vars)
    271 
    272 ```
    273 JCI_SERVER=https://jci.example.com
    274 JCI_RUNNER_SECRET=<shared secret>
    275 ```
    276 
    277 #### Persistent state
    278 
    279 A SQLite database at a fixed path on a mounted volume:
    280 
    281 ```sql
    282 CREATE TABLE identity (
    283     runner_id TEXT PRIMARY KEY   -- generated once, reused across restarts
    284 );
    285 ```
    286 
    287 #### Poll loop
    288 
    289 ```
    290 every 5s + random jitter (0–2s):
    291     POST /poll  →  { runner_id, secret }
    292     if 429: wait Retry-After seconds, then continue
    293     if job == null: continue
    294     if job != null: dispatch(job)   # non-blocking; runner returns to poll immediately
    295 ```
    296 
    297 #### Job dispatch
    298 
    299 For each received job the runner:
    300 
    301 1. **Pulls the `git-jci` binary** from a known location (e.g. mounted at
    302    `/usr/local/bin/git-jci` in the runner container) to inject into the job container.
    303 2. **Starts a detached job container**:
    304    ```
    305    docker run --rm -d \
    306      -v /usr/local/bin/git-jci:/usr/local/bin/git-jci:ro \
    307      -e JCI_COMMIT=<commit_sha> \
    308      --label jci-job=y \
    309      --label jci-job-timeout=60m \
    310      <image from job config> \
    311      /bin/sh -c "
    312        git clone --depth=1 --branch <commit> <clone_url> /repo &&
    313        cd /repo &&
    314        git-jci run
    315        git-jci push   # always runs, even if run failed, to commit status.txt
    316      "
    317    ```
    318    `clone_url` embeds the one-time credentials, so `git push` (via `git-jci push`)
    319    is transparent — `origin` is already set correctly by the clone.
    320 3. **Tracks the container ID** in memory (not persisted — runner crash means the
    321    container runs to completion or is reaped by Docker's own restart policy).
    322 
    323 The job container never talks to the `jci server` directly.
    324 
    325 #### Container reaping
    326 
    327 The runner periodically checks for containers labelled `jci-job=y` that have been
    328 running longer than the configured timeout and kills them via `docker rm -f`. This
    329 prevents capacity leaks if a job container hangs.
    330 
    331 ---
    332 
    333 ## End-to-end distributed flow
    334 
    335 ```
    336 1.  Dev pushes to Gitea
    337 2.  Gitea sends webhook → POST /webhook on jci server
    338 3.  Server verifies HMAC, deduplicates on commit SHA, inserts job (status=pending)
    339 4.  Runner polls → POST /poll
    340 5.  Server creates scoped one-time Gitea token, assigns job, sets Gitea status = "pending"
    341 6.  Server responds with job payload (clone URL with embedded token)
    342 7.  Runner starts job container (detached), returns to poll loop
    343 8.  Job container: clone → git-jci run → git-jci push (pushes refs/jci-runs/* to Gitea)
    344 9.  Runner polls again (5s + jitter)
    345 10. Server checks Gitea for status.txt in refs/jci-runs/<commit>/<runID> (15s cache)
    346 11. Server sees "ok"/"err" → sets Gitea commit status → deletes token → deletes job row
    347 12. Runner poll returns 429 if JCI_MAX_JOBS reached; backs off via Retry-After
    348 ```
    349 
    350 ---
    351 
    352 ## Key design constraints
    353 
    354 - **No external dependencies** — pure Go stdlib + git CLI + SQLite (driver only for
    355   server/runner). No separate storage service.
    356 - **Results live in the repo** — CI artefacts are normal git objects under
    357   `refs/jci-runs/*`; they travel with the repo on push/pull.
    358 - **Runner is pull-only** — server never initiates contact with a runner; no runner
    359   address is stored.
    360 - **All Gitea API interaction is server-only** — runner uses only plain `git` commands
    361   (clone/push); the coordination layer is opaque to it.
    362 - **One-time credentials per job** — scoped to one repo, expire in 60 minutes via both
    363   Gitea-native expiry and explicit server-side deletion.
    364 - **Stateless job containers** — no volumes; crash of a container loses only that run.
    365 - **Artefacts are arbitrary files** — anything written to `$JCI_OUTPUT_DIR` is stored.
    366 - **Single binary** — `git-jci` placed on `$PATH` becomes available as `git jci`.
    367 - **Server runs in foreground** — managed by Docker or systemd; no self-daemonization.