Ick2 design discussion

[[!tag ick continuous-integration]]

Recently, Daniel visited us in Helsinki. In addition to enjoying local food and scenerey, we spent some time together in front of a whiteboard to sketch out designs for Ick2. Ick is my continuous integration system, and it's all Daniel's fault for suggesting the name. Ahem.

I am currently using the first generation of Ick and it is a rigid, cumbersome, and fragile thing. It works well enough that I don't miss Jenkins, but I would like something better. That's the second generation of Ick, or Ick2, and that's what we discussed with Daniel.

Where pretty much everything in Ick1 is hardcoded, everything in Ick2 will be user-configurable. It's my last, best chance to go completely overboard in the second system syndrome manner.

Where Ick1 was written in a feverish two-week hacking session, rushed because my Jenkins install at the time had broken one time too many, we're taking our time with Ick2. Slow and careful is the tune this time around.

Our "minimum viable product" or MVP for Ick2 is defined like this:

Ick2 builds static websites from source in a git repository, using ikiwiki, and published to a web server using rsync. A change to the git repository triggers a new build. It can handle many separate websites, and if given enough worker machines, can build many of them concurrently.

This is a real task, and something we already do with Ick1 at work. It's a reasonable first step for the new program.

Some decisions we made:

The Ick2 controller, which decides which projects to build, and what's the next build step at any one time, will be reactive only. It will do nothing except in response to an HTTP API request. This includes things like timed events. An external service will need to poke the controller at the right time.
The controller will be accompanied by worker manager processes, which fetch instructions of what to do next, and control actual worker over ssh.
Provisioning of the workers is out of scope for the MVP. For the MVP we are OK with a static list of workers. In the future we might make worker registration be a dynamic things, but not for the MVP. (Parts or all of this decision may be changed in the future, but we need to start somewhere.)
The MVP publishing will happen by running rsync to a web server. Providing credentials for the workers to do that is the sysadmin's problem, not something the MVP will handle itself.
The MVP needs to handle more than one worker, and more than one pipelines, and needs to build things concurrently when there's call for it.
The MVP will need to read the pipelines (and their steps and any other info) from YAML config files, and can't have that stuff hardcoded.
The MVP API will have no authentication or authorization stuff yet.

The initial pipelines will be basically like this, but expressed in some way by the user:

Clone the source repoistory.
Run ikiwiki --build to build the website.
Run rsync to publish the website on a server.

Assumptions:

Every worker can clone from the git server.
Every worker has all the build tools.
Every worker has rsync and access to every web server.
Every pipeline run is clean.

Actions the Ick2 controller API needs to support:

List all existing projects.
Trigger a project to build.
Query what project builds are running.
Get build logs for a project: current log (from the running build), and the most recent finished build.

A sketch API:

POST /projects/foo/+trigger

Trigger build of project foo. If the git hasn't changed, the build runs anyway.
GET /projects

List names of all projects.
GET /projects/foo

On second thought, I can't think of anything useful for this to return for the MVP. Scratch.
GET /projects/foo/logs/current

Return entire known build log captured so far for the currently running build.
GET /projects/foo/logs/previous

Return entire build log for latest finished build.
GET /work/bar

Used by worker bar: return next not-yet-finished step to run as a JSON object containing fields "project" (name of project for which to run the step) and "shell" (a shell command to run). The call will return the same JSON object until the worker reports it as having finished.
POST /work/bar/snippet

Used by worker bar to report progress on the currently running step: a JSON object containing fields "stdout" (string with output from the shell command's stdout), "stderr" (ditto but stderr), and "exit_code" (the shell command's exit code, if it's finished, or null).

Sequence:

Git server has a hook that calls "GET /projects/foo/+trigger" (or else this is simulated by user).
Controller add a build of project foo to queue.
Worker manager calls "GET /work/bar", gets a shell command to run, and starts running it on its worker.
While worker runs shell command, every second or so, worker manager calls "POST /work/bar/snippet" to report progress including collected output, if any.
Controller responds with OK or KILL, and if the latter, worker kills the command it is running. Worker manager continues reporting progress via snippet until shell command is finished (on its own or by having been killed).
Controller appends any output reported via .../snippet. When it learns a shell command has finished, it updates its idea of the next step to run.
When controller learns a project has finished building, it rotates the current build log to be the previous one.

The next step will probably be to sketch a yarn test suite of the API and implement a rudimentary one.