mirror of
https://github.com/actions/checkout.git
synced 2026-03-15 18:53:29 +08:00
Rewrite ADR
This commit is contained in:
parent
ed69f3bbdd
commit
9be4f3c9fd
@ -1,37 +1,127 @@
|
|||||||
# Reference Cache für schnelle Checkouts
|
# ADR 2303: Reference cache for faster checkouts
|
||||||
|
|
||||||
## Zusammenfassung
|
**Date**: 2026-03-10
|
||||||
Einführung eines lokal verwalteten Git-Referenz-Caches für Haupt-Repositories und Submodule, um Netzwerk-Traffic und Checkout-Zeiten auf persistenten Runnern (z.B. Self-Hosted) massiv zu reduzieren.
|
|
||||||
|
|
||||||
## Implementierungsplan
|
**Status**: Proposed
|
||||||
|
|
||||||
1. **Inputs:**
|
## Context
|
||||||
- In `action.yml` einen neuen Input `reference-cache` (Pfad zum Cache-Verzeichnis) hinzufügen. Default ist leer.
|
|
||||||
- In `src/git-source-settings.ts` und `src/input-helper.ts` den Input auslesen und bereitstellen (`settings.referenceCache`).
|
|
||||||
|
|
||||||
2. **Cache Manager (`src/git-cache-helper.ts`):**
|
Repeated checkouts of the same repositories are expensive on runners with persistent storage.
|
||||||
- Eine neue Klasse/Helper-Logik, die das Erstellen (`git clone --bare`) und Aktualisieren (`git fetch --force`) von Bare Cache-Repos übernimmt.
|
This is especially noticeable for self-hosted runners and custom runner images that execute
|
||||||
- **Namenskonvention Cache-Verzeichnis:** Damit Admin-Lesbarkeit und Kollisionsfreiheit gewährleistet sind, wird das Cache-Verzeichnis aus der Repository-URL gebildet:
|
many jobs against the same repositories and submodules.
|
||||||
- Alle Sonderzeichen in der URL durch `_` ersetzen.
|
|
||||||
- Ein kurzer Hash (z. B. erste 8 Zeichen des SHA256) der echten URL zur Eindeutigkeit anhängen.
|
|
||||||
- Beispiel: `<reference-cache>/https___github_com_actions_checkout_8f9b1c2a.git`
|
|
||||||
|
|
||||||
3. **Haupt-Repo Checkout (`src/git-source-provider.ts`):**
|
Today, each checkout fetches objects from the remote even when the runner already has most of
|
||||||
- Vor dem Setup des Checkouts prüfen, ob `reference-cache` gesetzt ist.
|
the repository history available locally from previous jobs. This increases network traffic,
|
||||||
- Wenn ja: den Cache-Ordner für die Haupt-URL aktualisieren/anlegen.
|
slows down checkout time, and makes recursive submodule initialization more expensive than
|
||||||
- Nach dem initialen `git.init()` den Pfad in `.git/objects/info/alternates` schreiben, der auf das `objects`-Verzeichnis des Cache-Ordners zeigt.
|
necessary.
|
||||||
|
|
||||||
4. **Submodule Checkouts (Iterativ statt monolithisch):**
|
Git supports reference repositories and alternates, which allow one working repository to reuse
|
||||||
- Der aktuelle Befehl `git submodule update --recursive` funktioniert nicht out-of-the-box mit `reference`, wenn jedes Submodul seinen individuellen Referenz-Cache benötigt.
|
objects from another local repository. This mechanism is a good fit for persistent runners,
|
||||||
- Wenn `reference-cache` aktiv ist und Submodule initialisiert werden sollen:
|
provided the cache is managed safely and works for both the main repository and submodules.
|
||||||
- Lese `.gitmodules` aus (alle Sub-URLs ermitteln).
|
|
||||||
- Für jedes Submodul den Cache (genauso wie in Step 2) anlegen oder aktualisieren.
|
|
||||||
- Submodul einzeln auschecken per `git submodule update --init --reference <cache-pfad/.git> <pfad>`.
|
|
||||||
- Bei der Einstellung `recursive`: In jedes Submodul-Verzeichnis wechseln und den Vorgang für `.gitmodules` rekursiv auf Skript-Ebene durchführen (anstatt Git's `--recursive` Flag einfach weiterzugeben).
|
|
||||||
|
|
||||||
## Akzeptanzkriterien
|
## Decision
|
||||||
1. **Neue Option konfigurierbar**: Der Input `reference-cache` kann übergeben werden, der Code reagiert darauf.
|
|
||||||
2. **Ordnerstruktur korrekt**: Der Cache-Ordner für das Hauptrepo und Submodule erhält Namen nach der "URL_Sonderzeichen_Ersetzt+SHA_Cut"-Logik.
|
Add an optional `reference-cache` input that points to a local directory used to store managed
|
||||||
3. **Bandbreite gespart / Alternates genutzt**: Beim Hauptcheckout wird eine `.git/objects/info/alternates`-Datei mit Pfad zum lokalen Cache erzeugt. Danach ausgeführte `git fetch`-Befehle sind signifikant schneller bzw. laden deutlich weniger Bytes herunter.
|
bare repositories for the primary repository and its submodules.
|
||||||
4. **Submodule erhalten Caches**: Auch tiefe (rekursive) Submodule profitieren für deren jeweilige Remote-URL vom Cache, da pro Submodul ein passender `--reference` Punkt dynamisch berechnet und übergeben wird.
|
|
||||||
5. **Kein --dissociate**: Aus Performance-Gründen bleibt der Arbeitsordner an den Cache gebunden (`git repack` ist zeitaufwändig). Fällt der Cache weg, muss der Workspace erst einmal neu erzeugt werden (was bei Action Runnern die Norm ist, falls es nicht ohnehin "single-use" Runner sind).
|
### Input
|
||||||
|
|
||||||
|
Add a new input in `action.yml`:
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
reference-cache:
|
||||||
|
description: >
|
||||||
|
Path to a local directory used as a reference cache for Git clones.
|
||||||
|
```
|
||||||
|
|
||||||
|
The value is exposed through `settings.referenceCache`.
|
||||||
|
|
||||||
|
### Cache layout
|
||||||
|
|
||||||
|
Each cached repository is stored as a bare repository inside the configured cache directory.
|
||||||
|
|
||||||
|
The cache directory name is derived from the repository URL by:
|
||||||
|
|
||||||
|
- replacing non-alphanumeric characters with `_`
|
||||||
|
- appending a short SHA-256 hash of the original URL to avoid collisions
|
||||||
|
|
||||||
|
Example:
|
||||||
|
|
||||||
|
```text
|
||||||
|
<reference-cache>/https___github_com_actions_checkout_8f9b1c2a.git
|
||||||
|
```
|
||||||
|
|
||||||
|
### Cache lifecycle
|
||||||
|
|
||||||
|
Introduce helper logic in `src/git-cache-helper.ts` responsible for:
|
||||||
|
|
||||||
|
- creating a bare cache repository with `git clone --bare`
|
||||||
|
- updating an existing bare cache repository with `git fetch --force`
|
||||||
|
- serializing access with file-based locking so concurrent jobs do not corrupt the cache
|
||||||
|
- using a temporary clone-and-rename flow to avoid leaving behind partial repositories
|
||||||
|
|
||||||
|
### Main repository checkout
|
||||||
|
|
||||||
|
When `reference-cache` is configured:
|
||||||
|
|
||||||
|
- prepare or update the cache for the main repository URL
|
||||||
|
- configure the checkout repository to use the cache through Git alternates
|
||||||
|
- keep the working repository attached to the cache instead of dissociating it
|
||||||
|
|
||||||
|
This allows later fetch operations to reuse local objects instead of downloading them again.
|
||||||
|
|
||||||
|
### Submodules
|
||||||
|
|
||||||
|
When submodules are enabled together with `reference-cache`, submodules are processed one by one
|
||||||
|
instead of relying solely on a monolithic `git submodule update --recursive` flow.
|
||||||
|
|
||||||
|
For each submodule:
|
||||||
|
|
||||||
|
- read the submodule URL from `.gitmodules`
|
||||||
|
- resolve relative URLs where possible
|
||||||
|
- create or update a dedicated cache for that submodule repository
|
||||||
|
- run `git submodule update --init --reference <cache> <path>` for that submodule
|
||||||
|
|
||||||
|
When recursive submodules are requested, repeat the same process inside each initialized submodule.
|
||||||
|
|
||||||
|
### Fetch depth behavior
|
||||||
|
|
||||||
|
When `reference-cache` is enabled, shallow fetches are usually counterproductive because object
|
||||||
|
negotiation overhead can outweigh the benefit of a local object store.
|
||||||
|
|
||||||
|
For that reason:
|
||||||
|
|
||||||
|
- the default `fetch-depth` is overridden to `0` when `reference-cache` is enabled
|
||||||
|
- if the user explicitly sets `fetch-depth`, keep the user-provided value and emit a warning
|
||||||
|
|
||||||
|
### No `--dissociate`
|
||||||
|
|
||||||
|
The checkout should remain connected to the reference cache.
|
||||||
|
|
||||||
|
Using `--dissociate` would copy objects into the working repository and typically require extra
|
||||||
|
repacking work, which reduces the performance benefit of the cache. If the cache is removed, the
|
||||||
|
workspace is expected to be recreated, which is acceptable for the target runner scenarios.
|
||||||
|
|
||||||
|
## Consequences
|
||||||
|
|
||||||
|
### Positive
|
||||||
|
|
||||||
|
- reduces network traffic for repeated checkouts on persistent runners
|
||||||
|
- improves checkout performance for the main repository and submodules
|
||||||
|
- reuses standard Git mechanisms instead of introducing a custom object store
|
||||||
|
- keeps cache naming deterministic and readable for administrators
|
||||||
|
|
||||||
|
### Trade-offs
|
||||||
|
|
||||||
|
- adds cache management complexity, including locking and recovery from interrupted operations
|
||||||
|
- submodule handling becomes more complex because each submodule may require its own cache
|
||||||
|
- benefits are limited on ephemeral runners, where the cache is not reused across jobs
|
||||||
|
- workspaces remain dependent on the presence of the cache until they are recreated
|
||||||
|
|
||||||
|
## Acceptance criteria
|
||||||
|
|
||||||
|
1. The `reference-cache` input can be configured and is exposed through the action settings.
|
||||||
|
2. Cache directories for the main repository and submodules follow the sanitized-URL-plus-hash naming scheme.
|
||||||
|
3. The main checkout uses Git alternates so later fetches can reuse local cached objects.
|
||||||
|
4. Submodules, including recursive submodules, can use repository-specific caches.
|
||||||
|
5. The checkout does not use `--dissociate` and remains attached to the cache for performance.
|
||||||
|
|||||||
Loading…
x
Reference in New Issue
Block a user