Compare commits

...

5 Commits
v0.1.0 ... main

Author SHA1 Message Date
Gleb Koval ba050cbc79 Implement multipart tests (#4)
Test Workflow / Lint and test library (push) Successful in 1m59s Details
Currently, tests do not test multipart uploads (since they use very small files). Implement some large (randomly generated) files as well.

Reviewed-on: #4
2023-12-31 09:02:36 +00:00
Gleb Koval ddcbe28eca
Revert "CI broken notice"
Test Workflow / Lint and test library (push) Successful in 1m43s Details
This reverts commit 53e6d30e53.
2023-12-31 13:21:27 +06:00
Gleb Koval 53e6d30e53
CI broken notice
Test Workflow / Lint and test library (push) Successful in 17m36s Details
2023-12-30 23:29:41 +06:00
Gleb Koval 97f9cc4b2b
Clarify testing only BackupClient in README
Test Workflow / Lint and test library (push) Successful in 5m54s Details
2023-12-30 22:56:44 +06:00
Gleb Koval e09bf9e9f8 Round-trip integration tests (#3)
Test Workflow / Lint and test library (push) Successful in 1m43s Details
Publish Workflow / Publish library (push) Successful in 2m7s Details
- Implement round-trip tests for some cases.
- Run tests in CI.

Reviewed-on: #3
2023-12-30 16:42:51 +00:00
9 changed files with 206 additions and 32 deletions

View File

@ -12,5 +12,8 @@ tab_width = 4
[{*.yaml,*.yml}]
indent_size = 2
[src/test/**/*]
ktlint_standard_no-wildcard-imports = disabled
[{*.kt,*.kts}]
ij_kotlin_packages_to_use_import_on_demand = org.junit.jupiter.api,aws.sdk.kotlin.services.s3,kotlinx.coroutines,java.io,ziputils

View File

@ -22,6 +22,12 @@ jobs:
- name: Setup Gradle
uses: gradle/gradle-build-action@v2
- name: Setup AWS Credentials
run: |
mkdir ~/.aws
echo ${{ secrets.aws_config }} | base64 -d > ~/.aws/config
echo ${{ secrets.aws_credentials }} | base64 -d > ~/.aws/credentials
- name: Run checks
run: ./gradlew check

View File

@ -25,5 +25,11 @@ jobs:
- name: Setup Gradle
uses: gradle/gradle-build-action@v2
- name: Setup AWS Credentials
run: |
mkdir ~/.aws
echo ${{ secrets.aws_config }} | base64 -d > ~/.aws/config
echo ${{ secrets.aws_credentials }} | base64 -d > ~/.aws/credentials
- name: Run checks
run: ./gradlew check

1
.idea/.name Normal file
View File

@ -0,0 +1 @@
s3backup-tool

View File

@ -3,8 +3,8 @@ This is a small backup utility for uploading/restoring a local directory to/from
an AWS S3 bucket.
## Usage
This tool is released as a JAR in the [release page](https://git.koval.net/cyclane/teamcity-executors-test-task/releases).
Use `java -jar <backup-jar-name>.jar --help` for more detailed usage instructions.
This tool is released as a JAR in the [releases page](https://git.koval.net/cyclane/teamcity-executors-test-task/releases).
Use `java -jar s3backup-tool-<version>.jar --help` for more detailed usage instructions.
### --help
```
@ -65,34 +65,29 @@ Arguments:
```
## Assumptions
1. This test task is not interested in re-implementations of common libraries (AWS SDK, Clikt, Gradle Shadow, ...)
2. The last part (restoration of a single file) should be optimised so that only the part of the blob required for this
file is downloaded.
3. Only this tool is ever used to create backups, so S3 object keys are in the expected format, and ZIP files do not have
a comment in the *end of central directory* record (making it a predictable length of 22 bytes).
- EOCD64 should similarly not have a comment.
1. The test task is not interested in re-implementations of common libraries (AWS SDK, Clikt, Gradle Shadow, ...)
2. The test task is more interested in Kotlin JVM (not Kotlin Native).
## Design decisions
- Backups may be large, so we want to use multipart uploads if possible (< 100mb is recommended).
- Backups may be large, so we want to use **multipart uploads** if possible (< 100mb is recommended).
https://docs.aws.amazon.com/AmazonS3/latest/userguide/mpuoverview.html
- The Java SDK has high-level support for this via [S3TransferManager](https://sdk.amazonaws.com/java/api/latest/software/amazon/awssdk/transfer/s3/S3TransferManager.html),
but unfortunately when the content is too small, the HTTP `Content-Length` is not automatically calculated resulting
in an error response from the API.
- I'm not sure whether this is intended behaviour or a bug, but decided to manually implement multipart uploads using
the Kotlin SDK instead anyway.
- **Note**: I could have just used a temporary file (with a known `Content-Length`), but I wanted to play around with
streams and kotlin concurrency a bit, which is why I went with the more scalable way using streams.
- Zip files are used so that the backups can be stored in a very common format which also provides compression.
- Java zip specification: https://docs.oracle.com/en/java/javase/17/docs/api/java.base/java/util/zip/package-summary.html
- I'm not sure whether this is intended behaviour or a bug, but decided to manually implement multipart uploads using
the Kotlin SDK instead anyway.
- **ZIP files** are used so that the backups can be stored in a very common format which also provides compression.
- Allows future expansion to allow for encryption as well.
- [Java ZIP specification](https://docs.oracle.com/en/java/javase/17/docs/api/java.base/java/util/zip/package-summary.html):
- ZIP64 implementation is optional, but possible, so we'll handle it.
- The End of Central Directory record is also useful for locating the exact positions of files in the blob, so that
single files can be downloaded using the HTTP `Range` header.
- End of Central Directory comment must be blank (assumption 3). Otherwise, the EOCD length is unpredictable and so we
- End of Central Directory comment must be blank. Otherwise, the EOCD length is unpredictable, and so we
cannot use just a single request the HTTP `Range` header to get the entire EOCD.
- Alternative: use S3 object tags to store the EOCD offset, but this way the blob itself would no longer contain all
the data required by this backup tool.
- Alternative: store the EOCD offset in the EOCD comment or the beginning of the file, but this makes a similar, but
more strict assumption anyway.
- *Alternative*: use S3 object tags to store the EOCD size, fallback to 22 bytes otherwise. This could be
interesting if we want the backup tool to be able to import existing ZIPs (which could potentially have a comment),
but that is beyond the scope of the instructions.
- Only the `BackupClient` is tested, since by testing it, all other (internal) classes are functions are tested as well.
## Instructions
Create a backup utility that copies files to AWS S3. The utility should take a local directory with files and put it into AWS S3 in the form of one blob file. The reverse behavior should also be possible. We should be able to specify what backup we want to restore and where it should put the files on the local system. The utility should be able to restore one individual file from a backup.

View File

@ -14,7 +14,7 @@ repositories {
dependencies {
implementation("aws.sdk.kotlin:s3:1.0.25")
implementation("org.slf4j:slf4j-simple:2.0.9")
implementation("org.slf4j:slf4j-simple:2.0.9") // AWS SDK wants a SLF4J provider.
implementation("org.jetbrains.kotlinx:kotlinx-coroutines-core:1.7.3")
implementation("com.github.ajalt.clikt:clikt:4.2.1")
testImplementation("org.jetbrains.kotlin:kotlin-test")

View File

@ -1,4 +1,4 @@
plugins {
id("org.gradle.toolchains.foojay-resolver-convention") version "0.5.0"
}
rootProject.name = "teamcity-executors-test-task"
rootProject.name = "s3backup-tool"

View File

@ -214,10 +214,8 @@ class BackupClient(
bucket = bucketName
key = backupKey
range = "bytes=$localHeaderOffset-${
// Add CEN min size (46 bytes) so that the next CEN / LOC header is seen by the ZipInputStream
// and so it can see the current entry has stopped.
// Note: yes ZipInputStream should know the exact content length from the LOC, but it was still sending
// EOF errors. Perhaps due to fetching multiples of a power of two, or something else. But this helps.
// Over-fetch a bit so that ZipInputStream can see the next header (otherwise it EOFs, even though
// it knows exactly how much data is should read, so not sure why it reads ahead).
localHeaderOffset + cen.size.toULong() + compressedSize + CentralDirectoryFileHeader.SIZE.toULong()
}"
}
@ -237,7 +235,7 @@ class BackupClient(
* @param number The part number that was used for this part upload.
* @return The CompletedPart object.
*/
private fun UploadPartResponse.toCompletedPart(number: Int): CompletedPart {
internal fun UploadPartResponse.toCompletedPart(number: Int): CompletedPart {
val part = this
return CompletedPart {
partNumber = number
@ -254,7 +252,7 @@ private fun UploadPartResponse.toCompletedPart(number: Int): CompletedPart {
* @param n The number of items to take.
* @return A ByteArray of the first `n` items.
*/
private fun ByteArray.take(n: Int) =
internal fun ByteArray.take(n: Int) =
if (n == size) {
this // No copy
} else {
@ -265,7 +263,7 @@ private fun ByteArray.take(n: Int) =
* Compress a file or directory as a ZIP file to an `OutputStream`.
* @param outputStream The `OutputStream` to write the ZIP file contents to.
*/
private fun File.compressToZip(outputStream: OutputStream) =
internal fun File.compressToZip(outputStream: OutputStream) =
ZipOutputStream(outputStream).use { zipStream ->
val parentDir = this.canonicalFile.parent + "/"
val fileQueue = ArrayDeque<File>()
@ -294,7 +292,7 @@ private fun File.compressToZip(outputStream: OutputStream) =
* @param bufSize The buffer size to use for writing the decompressed files.
* @param entryNameToPath A function to convert ZIP entry names to destination `Path`s.
*/
private fun ZipInputStream.decompress(
internal fun ZipInputStream.decompress(
bufSize: Int = 1024 * 1024,
limit: Int? = null,
entryNameToPath: (String) -> Path,
@ -328,7 +326,7 @@ private fun ZipInputStream.decompress(
* @param entry The `ZipEntry` to set attributes of.
* @param path The `Path` of the file to get the attributes from.
*/
private fun setZipAttributes(
internal fun setZipAttributes(
entry: ZipEntry,
path: Path,
) {
@ -346,7 +344,7 @@ private fun setZipAttributes(
* @param entry The `ZipEntry` to get the attributes from.
* @param path The `Path` of the file to set the attributes of.
*/
private fun applyZipAttributes(
internal fun applyZipAttributes(
entry: ZipEntry,
path: Path,
) {

View File

@ -0,0 +1,165 @@
package backup
import aws.sdk.kotlin.services.s3.*
import kotlinx.coroutines.runBlocking
import org.junit.jupiter.api.*
import org.junit.jupiter.api.Assertions.*
import java.io.File
import kotlin.io.path.*
import kotlin.random.Random
val bucketName = System.getenv("BACKUP_BUCKET") ?: "teamcity-executors-test-task"
class BackupClientTest {
lateinit var s3: S3Client
lateinit var backupClient: BackupClient
@BeforeEach
fun `before each`() =
runBlocking {
s3 = S3Client.fromEnvironment {}
backupClient = BackupClient(s3, bucketName, 1024 * 1024 * 10)
}
@AfterEach
fun `after each`() {
s3.close()
}
@TestFactory
fun `round-trip tests`() =
listOf(
"empty directory" to {
listOf(
Path("_test").createDirectory(),
)
},
"single file" to {
listOf(
Path("_test.txt").apply { writeText("Hello World!") },
)
},
"directory structure" to {
listOf(
Path("_test").createDirectory(),
Path("_test/a.txt").apply { writeText("This is file A!\nAnother line here.") },
Path("_test/folder").createDirectory(),
Path("_test/another-folder").createDirectory(),
Path("_test/another-folder/b").apply { writeText("This is file B\n") },
Path("_test/another-folder/c.txt").createFile(),
Path("_test/README.md").apply { writeText("# This is a test directory structure.") },
)
},
"single large file" to {
val bytes = ByteArray(1024 * 1024 * 32)
Random.nextBytes(bytes)
listOf(
Path("_test.txt").apply { writeBytes(bytes) },
)
},
"large directory structure" to {
val bytes1 = ByteArray(1024 * 1024 * 32)
val bytes2 = ByteArray(1024 * 1024 * 48)
listOf(
Path("_test").createDirectory(),
Path("_test/a.txt").apply { writeBytes(bytes1) },
Path("_test/folder").createDirectory(),
Path("_test/another-folder").createDirectory(),
Path("_test/another-folder/b").apply { writeText("This is file B\n") },
Path("_test/another-folder/c.txt").createFile(),
Path("_test/README.md").apply { writeBytes(bytes2) },
)
},
).map { (name, pathsGen) ->
DynamicTest.dynamicTest(name) {
val paths = pathsGen()
val backupKey =
assertDoesNotThrow("should upload files") {
runBlocking {
backupClient.upload(paths.first().toFile())
}
}
val restoreDir = Path("_test_restore").createDirectory()
assertDoesNotThrow("should recover files") {
runBlocking {
backupClient.restore(restoreDir, backupKey)
}
}
assertEquals(
paths.size,
restoreDir.toFile().countEntries() - 1,
"number of files in backup restore should be equal to original",
)
val individualRestoreDir = Path("_test_restore_individual").createDirectory()
paths.forEach { path ->
if (path.isDirectory()) {
individualRestoreDir.resolve(path).createDirectory()
} else {
assertDoesNotThrow("should recover file '$path'") {
runBlocking {
backupClient.restoreFile(
individualRestoreDir.resolve(path).parent,
backupKey,
path.toString(),
)
}
}
}
}
assertEquals(
paths.size,
individualRestoreDir.toFile().countEntries() - 1,
"number of files in individual backup restore should be equal to original",
)
paths.asReversed().forEach { path ->
val restorePath = restoreDir.resolve(path)
val individualRestorePath = individualRestoreDir.resolve(path)
if (path.isDirectory()) {
assertTrue(restorePath.exists(), "'$path' should exist in backup")
assertTrue(restorePath.isDirectory(), "'$path' should be a directory in backup")
assertTrue(individualRestorePath.exists(), "'$path' should exist in backup (individual)")
assertTrue(
individualRestorePath.isDirectory(),
"'$path' should be a directory in backup (individual)",
)
} else {
val originalBytes = path.toFile().readBytes().asList()
assertEquals(
originalBytes,
restorePath.toFile().readBytes().asList(),
"File contents of '$path' should equal",
)
assertEquals(
originalBytes,
individualRestorePath.toFile().readBytes().asList(),
"File contents of '$path' should equal (individual)",
)
}
// cleanup
path.deleteExisting()
restorePath.deleteExisting()
individualRestorePath.deleteExisting()
}
// cleanup
restoreDir.deleteExisting()
individualRestoreDir.deleteExisting()
runBlocking {
s3.deleteObject {
bucket = bucketName
key = backupKey
}
}
}
}
}
internal fun File.countEntries(): Int {
val queue = ArrayDeque<File>()
queue.add(this)
return queue.count { file ->
if (file.isDirectory) {
queue.addAll(file.listFiles()!!) // shouldn't ever be null, since we know it's a directory
}
true
}
}