Page tree
Skip to end of metadata
Go to start of metadata

Your Rating: Results: 1 Star2 Star3 Star4 Star5 Star 79 rates

Implemented in 4.4

 

Official Documentation Available

This topic is now covered in Cache module.

Big binaries should be streamed from a BlobStore and not read into the memory

Rationale

The cache as today can only store objects which are serialized to the filesystem or kept in the memory. While this is ideal for HTML pages, this is not for big binaries. This issue is multiplied by the fact that in addition to the plain content we also add a gzipped version to the cache entry.

If a such an entry is not in the memory but in the filesystem the following will happen:

  • read the entry (containing the binary and its gzip version) into the memory
  • stream it to the response

What we have learned

  • having a object cache is very powerful as we want to store other information as just the content
    • response status, redirects, ...
    • uuid of the content, ..
  • be independent of the filesystem
    • the API must be independent of File classes as hight performant implementations will use memory clusters

Goal

  • outsource blobs to the filesystem and stream them from there
  • don't read blobs into the memory
  • keep smaller content (HTML pages) in the memory to serve the fast
  • keep the object cache nature of todays solution to cache other information than binaries

Basic ideas

BlobStream with threshold

  • when we wrap the response for later caching we use a special BlobStream
    • if this stream reaches a threshold (500k) it outsources the content to a BlobStore (filesystem)
  • only save the handle and not the content to the cache entry if the threshold was reached

Filesystem

  • use a uuid (abcd-efgh-iklmnop)
  • save it in a structure like this /ab/cd/efgc/abcd-efgh-iklmnop
    • avoids to long filenames/pathes
    • not to many files per directory
  • mark the blobstore as stale by just moving it
    • delete it asynchronously

Flushing

Flushing

  • flushing the cache or removing entries should also remove/mark as stale the blobs
  • FlushPolicy should only interact with the Cache and should not know the BlobStore

A) CacheEventListener informs the BlobStore

  • introduce a listener similar to the one of EHCache which informs about removal of entires and flushing of the cache
  • call Entry.detach() or similar to inform about removed entries

B) BlobStore checks asynchronously

  • the BlobStore checks periodically if the blobs are still valid
  • this is done asynchronously

Solutions

There are basically two solutions. Either we keep the Cache independent of the BlobStore and it remains a pure object cache or we extend the Cache interface

A) pass the stream to the cache
  • Store streams in the cache
  • put(key, entry, stream)
  • stream(key) and get(key)

+ simple API
- we are not directly streaming to the BlobStore and the content has to be buffered

B) Cache returns stream which is used to produce the result
  • start method returns the stream to use
  • end method finishes the caching and takes the cache entry

+ no additional buffering needed
- start/end synchronization

C) pure object Cache and independent BlobStore
  • make the BlobStream serializable and add it to the cache entry
  • use a CacheEventListener to cleanup the BlobStore

+ cache is a pure object store
- new listener, event handling

D) Serve from the repository
  • similar to C) but without introducing a blob store
  • directly stream content once the threshold is reached
  • cache an entry but mark it as being a big file
  • when serving such a cached entry just forward to the filter chain

+ no changes in the API
+ solves the memory issue, but still supports things like the is-modified-since requests
- if the blobs are stored in the database the serving will be slow. This is not the case if you use a local data store.

Further ideas

Serializable ResponseWrapper

  • we could make a ResponseWrapper which is serializable
    • when serving the response we could simply do cachedResponse.replay(response)

Conclusion

We prefer C) and are going to approach it by implementing D) for 4.4.