ARCH.md (13771B)
1 # JCI Architecture 2 3 JCI is a local-first CI system. CI results are stored directly inside the git repository 4 as regular git objects under custom refs, so they travel with the repo on push/pull. 5 `git jci server` and `git jci runner` extend this with a webhook-driven, distributed 6 CI execution layer — the server is a thin coordination point; all actual CI work happens 7 inside runner-managed Docker containers. 8 9 --- 10 11 ## Repository layout 12 13 ``` 14 cmd/git-jci/main.go entry point, CLI dispatch 15 internal/jci/ 16 run.go execute CI for a commit 17 git.go low-level git plumbing helpers 18 web.go HTTP server + SPA (single-file, inline) 19 push.go push CI refs to a remote 20 pull.go fetch CI refs from a remote 21 prune.go delete old CI refs locally or on a remote 22 cron.go cron subcommands: ls, sync 23 server.go coordination server (webhook + runner poll) 24 runner.go runner (Docker job dispatch) 25 www_jci/ project website (separate, not embedded in the binary) 26 ``` 27 28 --- 29 30 ## Ref layout inside git 31 32 ``` 33 refs/jci-runs/<commit>/<run-id> every run result, including one-off manual runs 34 ``` 35 36 Every run — whether triggered manually, by cron, or by the distributed runner — gets a 37 unique run ID (`<unix_timestamp>-<4_random_hex_chars>`), so multiple runs on the same 38 commit are stored independently and none overwrite another. 39 40 Each ref points to a git **commit** whose **tree** holds the CI artefacts: 41 42 ``` 43 status.txt "ok" | "err" | "running" 44 run.output.txt combined stdout+stderr from .jci/run.sh 45 index.html standalone HTML view of the run (generated only if the job 46 did not produce its own index.html) 47 <anything else> files written by the CI script to $JCI_OUTPUT_DIR 48 ``` 49 50 Because these are normal git objects, `git push origin 'refs/jci-runs/*:refs/jci-runs/*'` 51 is all it takes to share results with a team. 52 53 --- 54 55 ## CI execution flow (`git jci run`) 56 57 ``` 58 1. GetCurrentCommit() resolve HEAD → commit hash 59 2. generateRunID() timestamp + 2 random bytes → unique run ID 60 3. check .jci/run.sh exists 61 4. mkdir .jci/<commit>/ temporary output directory 62 5. write status.txt = "running" 63 6. exec bash .jci/run.sh working dir = output dir 64 env: JCI_COMMIT, JCI_REPO_ROOT, JCI_OUTPUT_DIR 65 stdout+stderr captured to run.output.txt 66 7. write status.txt = "ok"|"err" 67 8. generateIndexHTML() only if index.html was NOT written by the job 68 9. StoreTree() git hash-object each file → git mktree → git commit-tree 69 → git update-ref refs/jci-runs/<commit>/<runID> 70 10. rm -rf .jci/<commit>/ clean up temp dir 71 ``` 72 73 The CI script receives three environment variables and should write any extra 74 artefacts into `$JCI_OUTPUT_DIR`. Whatever exists in that directory when the 75 script exits is committed to git. 76 77 --- 78 79 ## Web UI (`git jci web [port]`) 80 81 A minimal three-panel SPA served entirely from a single Go handler. 82 No external assets; the HTML/CSS/JS is embedded inline in `showMainPage()`. 83 84 ``` 85 GET /api/branches list local branch names 86 GET /api/commits?branch=&page= paginated commit list with CI status per commit 87 GET /api/commit/<hash> commit detail + file list (latest run for that commit) 88 GET /api/commit/<hash>/<runId> same but for a specific run 89 90 GET /jci/<commit>/<runId>/<file>/raw serve file from refs/jci-runs/<commit>/<runId> 91 GET /jci/<commit>/<file>/raw serve file from the latest run for <commit> 92 GET /jci/... (other) serve SPA shell (JS handles routing) 93 GET / serve SPA shell 94 ``` 95 96 The UI keeps a client-side page index for infinite-scroll commit loading 97 (`commitsPageSize = 100` per page). 98 99 --- 100 101 ## Cron integration (`git jci cron`) 102 103 **`cron ls`** — shows what is configured in `.jci/crontab` and what is currently 104 installed in the user's system crontab. 105 106 **`cron sync`** — idempotent sync from `.jci/crontab` → system crontab: 107 108 ``` 109 1. parse .jci/crontab 5-field schedule + optional branch:X name:Y 110 2. crontab -l read current system crontab 111 3. strip lines containing # JCI:<sha256(repoRoot)> 112 4. append new lines, one per entry: 113 <schedule> cd <repoRoot> && [git checkout <branch> &&] git-jci run # JCI:<id> [<name>] 114 5. crontab - install new crontab 115 ``` 116 117 Each repo is identified by `sha256(absolute_path)` so entries from different 118 repos never collide. 119 120 --- 121 122 ## Push / Pull 123 124 ``` 125 git jci push [remote] discover all local refs/jci-runs/* not on remote → git push each 126 git jci pull [remote] git fetch refs/jci-runs/*:refs/jci-runs/* 127 ``` 128 129 --- 130 131 ## Prune 132 133 Removes CI refs to reclaim space. Works on local repo or a remote. 134 135 ``` 136 git jci prune [--older-than=<duration>] [--commit] 137 git jci prune --on-remote=<remote> [--older-than=<duration>] [--commit] 138 ``` 139 140 Duration format: `30d`, `2w`, `6m`, `4h`, or any Go duration string. 141 Without `--commit` the command is a dry run. 142 After local deletion it runs `git gc --prune=now` to actually free objects. 143 144 --- 145 146 ## Distributed CI (`git jci server` / `git jci runner`) 147 148 Both commands are added to the existing `git-jci` binary and run in the foreground, 149 managed by Docker or systemd. 150 151 ``` 152 Gitea ──webhook──▶ jci server ◀──poll── jci runner (Docker container) 153 │ │ 154 │ Gitea API │ docker socket 155 ▼ ▼ 156 Gitea job container 157 (set status) (git-jci run → git-jci push → Gitea) 158 ``` 159 160 ### Server (`git jci server`) 161 162 #### Configuration (env vars) 163 164 ``` 165 GITEA_HOST=gitea.example.com 166 GITEA_USER=gitea_service_user 167 GITEA_TOKEN=<token> # must have permission to create/delete scoped tokens 168 GITEA_WEBHOOK_SECRET=<hmac secret> 169 RUNNER_SECRET=<shared secret for runners> 170 JCI_MAX_JOBS=4 # max concurrent jobs assignable to a single runner 171 JCI_JOB_TIMEOUT=60m # server-wide job timeout (default: 60 minutes) 172 ``` 173 174 #### SQLite schema 175 176 ```sql 177 -- Known runners; auto-created on first poll. 178 CREATE TABLE runners ( 179 runner_id TEXT PRIMARY KEY, 180 last_seen DATETIME NOT NULL 181 ); 182 183 -- Active (pending or running) jobs only. 184 -- Completed/timed-out jobs are deleted immediately. 185 CREATE TABLE jobs ( 186 job_id TEXT PRIMARY KEY, -- UUID 187 repo_owner TEXT NOT NULL, 188 repo_name TEXT NOT NULL, 189 commit_sha TEXT NOT NULL, -- idempotency key 190 runner_id TEXT, -- NULL = unassigned 191 assigned_at DATETIME, 192 expires_at DATETIME, -- assigned_at + JCI_JOB_TIMEOUT 193 gitea_token TEXT NOT NULL, -- one-time scoped token for this job 194 status_cache TEXT, -- last known status ("pending"|"running"|"success"|"failure") 195 cache_until DATETIME -- when status_cache expires (15s TTL) 196 ); 197 ``` 198 199 #### Startup check 200 201 On startup the server verifies that `GITEA_TOKEN` has permission to create and delete 202 per-repo tokens via the Gitea API. If the check fails, the server exits with a clear 203 error. No silent degradation. 204 205 #### Webhook handling (`POST /webhook`) 206 207 1. Verify HMAC-SHA256 signature using `GITEA_WEBHOOK_SECRET`. Return 400 on failure. 208 2. Extract `repo.owner`, `repo.name`, `commit_sha` (from `after` field on push events). 209 3. Check `jobs` table for an existing active job with the same `commit_sha`. If found, 210 respond 200 and drop — no duplicate jobs. 211 4. Insert a new job row with `runner_id = NULL`, `status_cache = "pending"`. 212 5. Respond 200 immediately. 213 214 #### Runner poll endpoint (`POST /poll`) 215 216 Request body: `{ "runner_id": "...", "secret": "..." }` 217 218 1. Verify `secret == RUNNER_SECRET`. Return 403 on failure. 219 2. Upsert runner row (`runner_id`, `last_seen = now()`). This is auto-registration. 220 3. Count active jobs already assigned to this runner. 221 - If count >= `JCI_MAX_JOBS`: respond 429 with `Retry-After: 5`. 222 4. Poll Gitea for cached status of each assigned job (see *Status polling* below). 223 Completed jobs are deleted from the DB. 224 5. Pick one unassigned job from the queue (FIFO per-repo, round-robin across repos). 225 If none: respond 200 with `{ "job": null }`. 226 6. For the selected job: 227 a. Create a fresh scoped Gitea token (scoped to that repo, expiry = `now + JCI_JOB_TIMEOUT`) IF POSSIBLE. Some versions of gitea don't allow this at all. 228 b. Set `runner_id`, `assigned_at`, `expires_at`, store token in `jobs.gitea_token`. 229 c. Set Gitea commit status to `"pending"`. 230 7. Respond 200 with the job payload: 231 232 ```json 233 { 234 "job": { 235 "job_id": "...", 236 "clone_url": "https://<user>:<token>@gitea.example.com/owner/repo", 237 "commit_sha": "...", 238 "repo_owner": "...", 239 "repo_name": "..." 240 } 241 } 242 ``` 243 244 #### Status polling (server → Gitea) 245 246 The server checks job status by reading `refs/jci-runs/<commit>/<runID>` on Gitea and 247 inspecting `status.txt` in that ref's tree. 248 249 - Results are cached per-job for **15 seconds** (`jobs.cache_until`). 250 - A check is triggered every time the assigned runner calls `/poll`. 251 - A final check is performed at `jobs.expires_at` (60-minute timeout); if still 252 unresolved, the job is deleted and Gitea status is set to `"failure"`. 253 - On `status.txt = "ok"` or `"err"` the server: 254 1. Sets Gitea commit status to `"success"` or `"failure"`. 255 2. Deletes the one-time token via the Gitea API. 256 3. Deletes the job row from SQLite. 257 258 #### Token cleanup 259 260 Two-layer cleanup for the one-time Gitea token: 261 1. **Gitea-native expiry**: token created with hard expiry at `assigned_at + 60m`. 262 2. **Server-side deletion**: explicit delete via Gitea API on job completion or timeout. 263 264 This ensures the token cannot be used beyond 60 minutes even if the server crashes. 265 266 --- 267 268 ### Runner (`git jci runner`) 269 270 #### Configuration (env vars) 271 272 ``` 273 JCI_SERVER=https://jci.example.com 274 JCI_RUNNER_SECRET=<shared secret> 275 ``` 276 277 #### Persistent state 278 279 A SQLite database at a fixed path on a mounted volume: 280 281 ```sql 282 CREATE TABLE identity ( 283 runner_id TEXT PRIMARY KEY -- generated once, reused across restarts 284 ); 285 ``` 286 287 #### Poll loop 288 289 ``` 290 every 5s + random jitter (0–2s): 291 POST /poll → { runner_id, secret } 292 if 429: wait Retry-After seconds, then continue 293 if job == null: continue 294 if job != null: dispatch(job) # non-blocking; runner returns to poll immediately 295 ``` 296 297 #### Job dispatch 298 299 For each received job the runner: 300 301 1. **Pulls the `git-jci` binary** from a known location (e.g. mounted at 302 `/usr/local/bin/git-jci` in the runner container) to inject into the job container. 303 2. **Starts a detached job container**: 304 ``` 305 docker run --rm -d \ 306 -v /usr/local/bin/git-jci:/usr/local/bin/git-jci:ro \ 307 -e JCI_COMMIT=<commit_sha> \ 308 --label jci-job=y \ 309 --label jci-job-timeout=60m \ 310 <image from job config> \ 311 /bin/sh -c " 312 git clone --depth=1 --branch <commit> <clone_url> /repo && 313 cd /repo && 314 git-jci run 315 git-jci push # always runs, even if run failed, to commit status.txt 316 " 317 ``` 318 `clone_url` embeds the one-time credentials, so `git push` (via `git-jci push`) 319 is transparent — `origin` is already set correctly by the clone. 320 3. **Tracks the container ID** in memory (not persisted — runner crash means the 321 container runs to completion or is reaped by Docker's own restart policy). 322 323 The job container never talks to the `jci server` directly. 324 325 #### Container reaping 326 327 The runner periodically checks for containers labelled `jci-job=y` that have been 328 running longer than the configured timeout and kills them via `docker rm -f`. This 329 prevents capacity leaks if a job container hangs. 330 331 --- 332 333 ## End-to-end distributed flow 334 335 ``` 336 1. Dev pushes to Gitea 337 2. Gitea sends webhook → POST /webhook on jci server 338 3. Server verifies HMAC, deduplicates on commit SHA, inserts job (status=pending) 339 4. Runner polls → POST /poll 340 5. Server creates scoped one-time Gitea token, assigns job, sets Gitea status = "pending" 341 6. Server responds with job payload (clone URL with embedded token) 342 7. Runner starts job container (detached), returns to poll loop 343 8. Job container: clone → git-jci run → git-jci push (pushes refs/jci-runs/* to Gitea) 344 9. Runner polls again (5s + jitter) 345 10. Server checks Gitea for status.txt in refs/jci-runs/<commit>/<runID> (15s cache) 346 11. Server sees "ok"/"err" → sets Gitea commit status → deletes token → deletes job row 347 12. Runner poll returns 429 if JCI_MAX_JOBS reached; backs off via Retry-After 348 ``` 349 350 --- 351 352 ## Key design constraints 353 354 - **No external dependencies** — pure Go stdlib + git CLI + SQLite (driver only for 355 server/runner). No separate storage service. 356 - **Results live in the repo** — CI artefacts are normal git objects under 357 `refs/jci-runs/*`; they travel with the repo on push/pull. 358 - **Runner is pull-only** — server never initiates contact with a runner; no runner 359 address is stored. 360 - **All Gitea API interaction is server-only** — runner uses only plain `git` commands 361 (clone/push); the coordination layer is opaque to it. 362 - **One-time credentials per job** — scoped to one repo, expire in 60 minutes via both 363 Gitea-native expiry and explicit server-side deletion. 364 - **Stateless job containers** — no volumes; crash of a container loses only that run. 365 - **Artefacts are arbitrary files** — anything written to `$JCI_OUTPUT_DIR` is stored. 366 - **Single binary** — `git-jci` placed on `$PATH` becomes available as `git jci`. 367 - **Server runs in foreground** — managed by Docker or systemd; no self-daemonization.