The above architecture achieves the following goals:
- Scalability based on on demand AWS Lambda functions
- Cost efficiency and security (Golang is a lightweight high throughput language with a small memory and CPU footprint to Java with reliable libraries available). AWS Lambda is charged on the basis of number of requests and execution time for those requests, depending on memory usage.
- Maintenance (the Golang lambda function only needs to be provisioned once)
- Reliability (CPU load spikes from EC2 instance are offset to lambda functions which allows for predictable resource usage on EC2 instance)
- Magnolia instance invokes the lambda functions using AWS SDK (https://docs.aws.amazon.com/sdk-for-java/v1/developer-guide/examples-lambda.html). To send a request for creating image variants, the magnolia instance sends over the reference to the customer and the corresponding image, and gets as a response the success or failure message with mediaId. and a list of variant ids.
- Golang image service performs the on demand processing of the image, creating the variants and dumping them in the S3 bucket relevant for the customer
- AWS Cloudfront hooks onto the S3 and returns the images requested by the user and at the same time caches them
- The Golang image service can be bundled as a docker image (supported by Lambda) and can be tested locally
- Stable library support in Golang
- CPU and memory requirements on Magnolia instance are predictable and can enable high density SaaS offering reducing our costs
- Twirp framework allows both JSON and gRPC bindings, with the benefit that gRPC allows streaming request/response, with a side benefit that gRPC is more compact and performant compared to JSON for (de)serialization.
- Golang image service remains portable and can also be spun off as a replica on Kubernetes cluster and does not lock us into AWS platform
- The magnolia instance can be replaced with a scalable Golang image upload service, which allows for chaining of user defined image operations as a lambda functions on AWS.
- AWS SDK for Golang supports multiple part uploads with configurable number of goroutines.
Phase 1 (Implement Golang Image Service PoC)
- The image service can accept a zip file with images
- Generate multiple variants of the images concurrently
- Store the image variants in S3
- Return references to image variants
Two implementations for the imaging exist:
- A Sharp based imaging handler implementation from AWS - Performance results can be found here CLOUD-72 - Getting issue details... STATUS
- A gRPC based Golang implementation on top of libvips - https://git.magnolia-cms.com/users/rdhar/repos/imaging/browse
- Thanks to Ilgun Ilgun, Netflix uses a similar approach to the architecture mentioned above, and is able to scale significantly more, with almost an order of magnitude improvement https://medium.com/@NetflixTechBlog/netflix-images-enhanced-with-aws-lambda-9eda989249bf
- https://aws.amazon.com/cloudfront/getting-started/S3/ (AWS Cloudfront)
- https://github.com/h2non/bimg (high performance go image processing library)
- https://docs.aws.amazon.com/lambda/latest/dg/golang-handler.html (Golang AWS Lambda)
- https://twitchtv.github.io/twirp/docs/proto_and_json.html (provides wrappers around protobuf to provide compatibility with both gRPC and JSON request/response)
- https://docs.aws.amazon.com/sdk-for-go/api/ (AWS SDK for Golang)
- https://docs.aws.amazon.com/sdk-for-go/api/service/lambda/ (Calling lambda functions from Golang)
- https://docs.aws.amazon.com/sdk-for-go/api/service/s3/ (AWS S3 API for streaming uploads and downloads using goroutines)
- https://docs.aws.amazon.com/sdk-for-go/api/service/s3/s3manager/#Uploader (Upload manager API for S3)
- https://docs.aws.amazon.com/lambda/latest/dg/lambda-edge.html (Using Lambda@Edge with cloudfront to rewrite URLs)