This is a small backup utility for uploading/restoring a local directory to/from an AWS S3 bucket.
This repository has been archived on 2024-02-08. You can view files and clone it, but cannot push or open issues or pull requests.
Go to file
Gleb Koval ba050cbc79
Test Workflow / Lint and test library (push) Successful in 1m59s Details
Implement multipart tests (#4)
Currently, tests do not test multipart uploads (since they use very small files). Implement some large (randomly generated) files as well.

Reviewed-on: #4
2023-12-31 09:02:36 +00:00
.github/workflows Round-trip integration tests (#3) 2023-12-30 16:42:51 +00:00
.idea Round-trip integration tests (#3) 2023-12-30 16:42:51 +00:00
gradle/wrapper Initial implementations 2023-12-29 20:38:05 +06:00
src Implement multipart tests (#4) 2023-12-31 09:02:36 +00:00
.editorconfig Round-trip integration tests (#3) 2023-12-30 16:42:51 +00:00
.gitignore Initial implementations 2023-12-29 20:38:05 +06:00
README.md Revert "CI broken notice" 2023-12-31 13:21:27 +06:00
build.gradle.kts Round-trip integration tests (#3) 2023-12-30 16:42:51 +00:00
gradle.properties Initial implementations 2023-12-29 20:38:05 +06:00
gradlew Initial implementations 2023-12-29 20:38:05 +06:00
gradlew.bat Initial implementations 2023-12-29 20:38:05 +06:00
settings.gradle.kts Round-trip integration tests (#3) 2023-12-30 16:42:51 +00:00

README.md

TeamCity Executors - A new way to execute builds - Test Task

This is a small backup utility for uploading/restoring a local directory to/from an AWS S3 bucket.

Usage

This tool is released as a JAR in the releases page. Use java -jar s3backup-tool-<version>.jar --help for more detailed usage instructions.

--help

Usage: s3backup-tool [<options>] <command> [<args>]...

  A simple AWS S3 backup tool. This tool assumes credentials are properly configured using aws-cli.

Options:
  -h, --help  Show this message and exit

Commands:
  create        Create a backup of a file or directory.
  restore       Restore a backup from AWS S3.
  restore-file  Restore a single file from a backup from AWS S3.

Subcommands

Usage: s3backup-tool create [<options>] <source> <bucket>

  Create a backup of a file or directory.

Options:
  -h, --help  Show this message and exit

Arguments:
  <source>  File or directory to backup
  <bucket>  Name of S3 bucket to backup to
Usage: s3backup-tool restore [<options>] <bucket> <backupkey> <destination>

  Restore a backup from AWS S3.

Options:
  -h, --help  Show this message and exit

Arguments:
  <bucket>       Name of S3 bucket to restore the backup from
  <backupkey>    The S3 key of the backup to restore
  <destination>  Directory to restore to
Usage: s3backup-tool restore-file [<options>] <bucket> <backupkey> <filepath> <destination>

  Restore a single file from a backup from AWS S3.

Options:
  -h, --help  Show this message and exit

Arguments:
  <bucket>       Name of S3 bucket to restore the backup from
  <backupkey>    The S3 key of the backup to restore
  <filepath>     File path within the backup
  <destination>  Directory to restore to

Assumptions

  1. The test task is not interested in re-implementations of common libraries (AWS SDK, Clikt, Gradle Shadow, ...)
  2. The test task is more interested in Kotlin JVM (not Kotlin Native).

Design decisions

  • Backups may be large, so we want to use multipart uploads if possible (< 100mb is recommended). https://docs.aws.amazon.com/AmazonS3/latest/userguide/mpuoverview.html
    • The Java SDK has high-level support for this via S3TransferManager, but unfortunately when the content is too small, the HTTP Content-Length is not automatically calculated resulting in an error response from the API.
      • I'm not sure whether this is intended behaviour or a bug, but decided to manually implement multipart uploads using the Kotlin SDK instead anyway.
  • ZIP files are used so that the backups can be stored in a very common format which also provides compression.
    • Allows future expansion to allow for encryption as well.
    • Java ZIP specification:
      • ZIP64 implementation is optional, but possible, so we'll handle it.
    • The End of Central Directory record is also useful for locating the exact positions of files in the blob, so that single files can be downloaded using the HTTP Range header.
    • End of Central Directory comment must be blank. Otherwise, the EOCD length is unpredictable, and so we cannot use just a single request the HTTP Range header to get the entire EOCD.
      • Alternative: use S3 object tags to store the EOCD size, fallback to 22 bytes otherwise. This could be interesting if we want the backup tool to be able to import existing ZIPs (which could potentially have a comment), but that is beyond the scope of the instructions.
  • Only the BackupClient is tested, since by testing it, all other (internal) classes are functions are tested as well.

Instructions

Create a backup utility that copies files to AWS S3. The utility should take a local directory with files and put it into AWS S3 in the form of one blob file. The reverse behavior should also be possible. We should be able to specify what backup we want to restore and where it should put the files on the local system. The utility should be able to restore one individual file from a backup.