commits
When extracting from an archive it is possible the leading directories
are not part of the archive. Add them to the manifest as otherwise the
behaviour of "index.html" varies depending how the archive was created.
This bug would cause POST hooks triggered for large repositories to
silently fail.
We need the update context to have the principal (which is tied to
the HTTP request), but not the cancellation (which is also tied to
the HTTP request and is triggered once the request is done either way).
Before this change Cache-Control header would always be overridden, this
change allows custom Cache-Control, provided Cache-Control is added to
the header allow list.
Otherwise you get reports like:
(archive)
: directory shadows redirect "/ /foo 301"; remove the directory or use a 301! forced redirect instead
Like limiting the size of an archive, it is a supplementary check meant
to limit resource consumption prior to the final check done in
`StoreManifest()`.
The limit is applied to the original size and not compressed size for
predictability and fairness.
This is added to aid migration from Codeberg Pages v2. Forgejo allows
both `_` and `-` in usernames, and it is necessary to be able to accept
host names like `user_name.codeberg.page` under a wildcard domain.
(It is not possible to get a TLS certificate for a host name like this,
so only a wildcard certificate will be able to cover it.)
Currently you can specify "Branch: HEAD" or "Branch: refs/tags/v1" and
go-git will resolve it to the relevant ref. Given the HTTP header is
called Branch this is confusing.
This feature is useful if you need to restore data after an accidental
overwrite or compromise.
Given this is already depending on zstd I don't see a reason not to.
Can be tested with libarchive via: `bsdtar -a --options zip:compression=zstd -cf file.zip files...`
Reviewed-on: https://codeberg.org/git-pages/git-pages/pulls/91
Co-authored-by: David Leadbeater <dgl@dgl.cx>
Co-committed-by: David Leadbeater <dgl@dgl.cx>
Previously, you could issue e.g. a `GET /%2e%2e/%2e%2e` and it would
get interpreted as a parent directory path segment in the handler.
This didn't result in a path traversal vulnerability when passed to
the S3 backend because of a `path.Clean()` call indirectly done by
`makeWebRoot()`, but it's prudent to not take chances.
This reverts commit 351d0a0c85a946d5a453b36ffbc5fe8a64f52a5f.
This option does not have any effect at the moment and may potentially
confuse users. It can be easily reintroduced later (by reverting this
commit) once we start logging at any level other than `info`.
Previously, this would disallow all git clones except for those via
wildcard domains. This is highly unintuitive. It also meant that
disabling this function via environment variable was not possible.
It is expected that in most deployments, a reverse proxy server like
Caddy or Nginx will be connecting to Caddy; listening on any address
by default is a privacy and security concern.
This is much easier to read, and can be used as a template for
a new configuration.
Using a non-forced redirect with a URL matching a manifest entry turns
out to be a common and confusing mistake.
The code would branch on the value of `freeze` in basically all
implementations and call sites.
This isn't a concurrent GC and it cannot provide a reliable result;
the output is just an estimate.
This caused the principal to not be available when creating the new
audit record.
The new API replaces the `ListManifests` API.
This also adds `Name` and `Size` to manifest metadata.
This also adds `Name` to blob metadata.
* No `Accept:` header should be the same as `Accept: */*`.
* For unresolved reference error, `text/plain` should take priority.
This will be used for incremental archive updates.
This will be used for incremental archive uploads.
This will be used for incremental archive uploads.
This will be used for incremental archive uploads.
Unfortunately this is still not enough to fit into codeberg-medium :(
It has been tested on Grebedoc (Fly.io servers) and found to work
satisfactorily, though without any apparent benefit. It requires client
opt-in and so enabling it at all times is benign.
The PATCH method has been tested by myself and on Codeberg and found
to work satisfactorily.
Because using PATCH causes the git-pages server to store state that
is not necessarily easily reproducible from any single specific source
(i.e. it stores a composition of many disparate requests), it may be
necessary to back it up. For this, the feature `archive-site` is also
stabilized. It has not seen much use but not providing a backup method
would be a disservice.
This helps debugging slow scripts (e.g. using ClamAV).
To use this function, configure git-pages with e.g.:
[audit]
collect = true
notify-url = "http://localhost:3004/"
and run an audit server with e.g.:
git-pages -audit-server tcp/:3004 python $(pwd)/process.py
The provided command line is executed after appending two arguments
(audit record ID and event type), and runs in a temporary directory
with the audit record extracted into it. The following files will
be present in this directory:
* `$1-event.json` (always)
* `$1-manifest.json` (if type is `CommitManifest`)
* `$1-archive.tar` (if type is `CommitManifest`)
The script must complete successfully for the event processing to
finish. The notification will keep being re-sent (by the worker) with
exponential backoff until it does.
This acts like `mkdir -p`, making it much less annoying to deploy
e.g. documentation preview generators that use deep paths.
Like before, the site must already exist: we cannot do a CAS on
a non-existent manifest at the moment.
Neither of these names is self-explanatory, and it is better to have
fewer distinct identifiers for the same concept.
This makes it *much* faster.
This is added to aid migration from Codeberg Pages v2. Forgejo allows
both `_` and `-` in usernames, and it is necessary to be able to accept
host names like `user_name.codeberg.page` under a wildcard domain.
(It is not possible to get a TLS certificate for a host name like this,
so only a wildcard certificate will be able to cover it.)
Given this is already depending on zstd I don't see a reason not to.
Can be tested with libarchive via: `bsdtar -a --options zip:compression=zstd -cf file.zip files...`
Reviewed-on: https://codeberg.org/git-pages/git-pages/pulls/91
Co-authored-by: David Leadbeater <dgl@dgl.cx>
Co-committed-by: David Leadbeater <dgl@dgl.cx>
Previously, you could issue e.g. a `GET /%2e%2e/%2e%2e` and it would
get interpreted as a parent directory path segment in the handler.
This didn't result in a path traversal vulnerability when passed to
the S3 backend because of a `path.Clean()` call indirectly done by
`makeWebRoot()`, but it's prudent to not take chances.
The PATCH method has been tested by myself and on Codeberg and found
to work satisfactorily.
Because using PATCH causes the git-pages server to store state that
is not necessarily easily reproducible from any single specific source
(i.e. it stores a composition of many disparate requests), it may be
necessary to back it up. For this, the feature `archive-site` is also
stabilized. It has not seen much use but not providing a backup method
would be a disservice.
To use this function, configure git-pages with e.g.:
[audit]
collect = true
notify-url = "http://localhost:3004/"
and run an audit server with e.g.:
git-pages -audit-server tcp/:3004 python $(pwd)/process.py
The provided command line is executed after appending two arguments
(audit record ID and event type), and runs in a temporary directory
with the audit record extracted into it. The following files will
be present in this directory:
* `$1-event.json` (always)
* `$1-manifest.json` (if type is `CommitManifest`)
* `$1-archive.tar` (if type is `CommitManifest`)
The script must complete successfully for the event processing to
finish. The notification will keep being re-sent (by the worker) with
exponential backoff until it does.