98 lines
4.6 KiB
Markdown
98 lines
4.6 KiB
Markdown
# TeamCity Executors - A new way to execute builds - Test Task
|
|
This is a small backup utility for uploading/restoring a local directory to/from
|
|
an AWS S3 bucket.
|
|
|
|
## Usage
|
|
This tool is released as a JAR in the [release page](https://git.koval.net/cyclane/teamcity-executors-test-task/releases).
|
|
Use `java -jar <backup-jar-name>.jar --help` for more detailed usage instructions.
|
|
|
|
### --help
|
|
```
|
|
Usage: s3backup-tool [<options>] <command> [<args>]...
|
|
|
|
A simple AWS S3 backup tool. This tool assumes credentials are properly configured using aws-cli.
|
|
|
|
Options:
|
|
-h, --help Show this message and exit
|
|
|
|
Commands:
|
|
create Create a backup of a file or directory.
|
|
restore Restore a backup from AWS S3.
|
|
restore-file Restore a single file from a backup from AWS S3.
|
|
```
|
|
|
|
#### Subcommands
|
|
```
|
|
Usage: s3backup-tool create [<options>] <source> <bucket>
|
|
|
|
Create a backup of a file or directory.
|
|
|
|
Options:
|
|
-h, --help Show this message and exit
|
|
|
|
Arguments:
|
|
<source> File or directory to backup
|
|
<bucket> Name of S3 bucket to backup to
|
|
```
|
|
|
|
```
|
|
Usage: s3backup-tool restore [<options>] <bucket> <backupkey> <destination>
|
|
|
|
Restore a backup from AWS S3.
|
|
|
|
Options:
|
|
-h, --help Show this message and exit
|
|
|
|
Arguments:
|
|
<bucket> Name of S3 bucket to restore the backup from
|
|
<backupkey> The S3 key of the backup to restore
|
|
<destination> Directory to restore to
|
|
```
|
|
|
|
```
|
|
Usage: s3backup-tool restore-file [<options>] <bucket> <backupkey> <filepath> <destination>
|
|
|
|
Restore a single file from a backup from AWS S3.
|
|
|
|
Options:
|
|
-h, --help Show this message and exit
|
|
|
|
Arguments:
|
|
<bucket> Name of S3 bucket to restore the backup from
|
|
<backupkey> The S3 key of the backup to restore
|
|
<filepath> File path within the backup
|
|
<destination> Directory to restore to
|
|
```
|
|
|
|
## Assumptions
|
|
1. This test task is not interested in re-implementations of common libraries (AWS SDK, Clikt, Gradle Shadow, ...)
|
|
2. The last part (restoration of a single file) should be optimised so that only the part of the blob required for this
|
|
file is downloaded.
|
|
3. Only this tool is ever used to create backups, so S3 object keys are in the expected format, and ZIP files do not have
|
|
a comment in the *end of central directory* record (making it a predictable length of 22 bytes).
|
|
- EOCD64 should similarly not have a comment.
|
|
|
|
## Design decisions
|
|
- Backups may be large, so we want to use multipart uploads if possible (< 100mb is recommended).
|
|
https://docs.aws.amazon.com/AmazonS3/latest/userguide/mpuoverview.html
|
|
- The Java SDK has high-level support for this via [S3TransferManager](https://sdk.amazonaws.com/java/api/latest/software/amazon/awssdk/transfer/s3/S3TransferManager.html),
|
|
but unfortunately when the content is too small, the HTTP `Content-Length` is not automatically calculated resulting
|
|
in an error response from the API.
|
|
- I'm not sure whether this is intended behaviour or a bug, but decided to manually implement multipart uploads using
|
|
the Kotlin SDK instead anyway.
|
|
- **Note**: I could have just used a temporary file (with a known `Content-Length`), but I wanted to play around with
|
|
streams and kotlin concurrency a bit, which is why I went with the more scalable way using streams.
|
|
- Zip files are used so that the backups can be stored in a very common format which also provides compression.
|
|
- Java zip specification: https://docs.oracle.com/en/java/javase/17/docs/api/java.base/java/util/zip/package-summary.html
|
|
- ZIP64 implementation is optional, but possible, so we'll handle it.
|
|
- The End of Central Directory record is also useful for locating the exact positions of files in the blob, so that
|
|
single files can be downloaded using the HTTP `Range` header.
|
|
- End of Central Directory comment must be blank (assumption 3). Otherwise, the EOCD length is unpredictable and so we
|
|
cannot use just a single request the HTTP `Range` header to get the entire EOCD.
|
|
- Alternative: use S3 object tags to store the EOCD offset, but this way the blob itself would no longer contain all
|
|
the data required by this backup tool.
|
|
- Alternative: store the EOCD offset in the EOCD comment or the beginning of the file, but this makes a similar, but
|
|
more strict assumption anyway.
|
|
|
|
## Instructions
|
|
Create a backup utility that copies files to AWS S3. The utility should take a local directory with files and put it into AWS S3 in the form of one blob file. The reverse behavior should also be possible. We should be able to specify what backup we want to restore and where it should put the files on the local system. The utility should be able to restore one individual file from a backup. |