WIP experiment: allow extended lz4 lookback window#26
Conversation
|
@magibney this looks awesome. QQ
|
| * provide {@link #DEFAULT_EXTENDED_MAX_DISTANCE} to allow tests to be run exercising lz4 with | ||
| * {@link #EXTENDED_MAX_DISTANCE}. | ||
| */ | ||
| public static final boolean DEFAULT_EXTENDED_MAX_DISTANCE = false; |
There was a problem hiding this comment.
could do ... I opted not to because the main use for this would be to run full test suite with EXTENDED_MAX_DISTANCE config, and that would be a bit more involved than simply adding a sysprop. If we're going make this configurable via sysprop there would be some benefit, but IMO only if we go as far as supporting configuration via project property for running tests.
Practically this only affects the lookback window for existing "inner" lz4 -- it has no impact on the configuration of lz4 as employed by TeeDirectory/CompressingDirectory. For these, EXTENDED_MAX_DISTANCE defaults to true, and is configurable via solrconfig.xml.
Just opened cowpaths/fullstory-solr#213, which addresses this: "extended lookback window" feature boolean is encoded in the file compression header, so both existing files and new files will be read properly (backward compatible) and all new files will be written with the extended lookback window. |
There are some use cases (e. g., 256k block-level compression applied over index files) where the period of pattern repetition is longer. Such cases benefit from a combination of LZ4.HighCompressionHashTable and a longer lookback window (256k instead of the default Lucene lz4 64k lookback window). The benefits are both in compression (real-world cases with ~3x improved compression!), but also in latency/ CPU-efficiency, in some cases with >2x faster execution.
This is quick/dirty patch to support this approach for demonstration/experimentation.