--------------------------------------------------------------------------- The plan /5 ======= v0a - initial release that at least correctly restored one pdf ======= v0b1 - bugfix release with lossless restoration of all tested files ======= v0c1 - compression improvement (entropy prefiter, match indexing) 1. (v0d) add commandline options to control matchfinder's winsize/memsize - build dec2dif,dif2dec utils with new defl_iter lib, but disabled indexing - analyze the window size in .decs of various pdfs - manually tweak dec2dif to generate minimal difs for pdfs - add extra fields to the "level" to control extra params 2. (v0e) matchfinder mode detection - implement level detection in rawdet for the first block of a stream - add detection of win/mem settings - encode/decode the lv/ws/ms params as a part of block header in .hif 3. (v0f) dictionary reset detection (for .docx) - detect window reset (no distances go out of current block) - encode the dictionary reset flag as a part of block header - detect all-literal blocks and force the literal mode for matchfinder (lv=0?) 4. (v0g) winzip matchfinder - support for levels 10-18 in dec2dif - generate test cases for all winzip levels - detection of winzip levels - .hif coding of winzip levels 5. (v1) support for >4G input/streams - prepare a test case (>4G comp, >4G uncomp, some diffs - winzip?) - fix dec2dif (remove the common parts from i,j) - fix dif2dec (.dec for 4G stream has to be restored correctly) - fix rawdec (unplen used for dist checks) - check other modules ======= v1 - full precomp equivalent 1. (v1a) rawdet/raw2hif integration - coro wrapper for rawdet; store .out on r2, write other files internally - merge raw2hif into rawdet: generate unp/hif without storing .raw 2. (v1a) raw2hif/rawrest integration - coro wrapper for rawrest; read .out on r1, write restored on r2 - merge raw2hif into rawreset: output raw streams directly from unp/hif 3. (v1b) solid .hif - solid .hif encoding in rawdet - solid .hif decoding in rawrest 4. (v1b) solid .unp - solid .unp writing in rawdet (easy) - hif2raw needs to flush dif2dec based on hif's EOF, not unp's EOF - solid .unp reading in rawrest 5. (v1c) out/unp merge - rawdet: write .unp data to .out file - rawrest: read .unp data from .out file 6. (v1c) hif/str merge - rawdet: encode .str data (out chunk lengths) to .hif (have to support >4G .out chunks (potentially infinite) ) - rawrest: decode .str data 7. (v1d) out/hif merge - attach .hif to the end of .out - buffer .hif data and attach without creating a temp file if its small 8. (v2) integration - turn rawdet into a standard 1-to-2 coroutine without any i/o inside - turn rawrest into a 2-to-1 coroutine - combine rawdet and rawrest into a single "reflate" utility ======= v2 - single output file, single utility 1. (v2a) double recompression - extract .unp streams from a solid .out stream outside of reflate instance - process .unp streams with another reflate instance, separately 2. (v2a) full recursion support - nesting depth commandline option - pass r2 outputs to another reflate instance - encode .hifs to separate temp files, then attach to the end of archive in order of nesting - decode nested .hifs - discard .hifs which only contain .str data 3. (v2b) MT layout again - encoding based on MT pipes (rawdet) (i/o, reflate instances; rawdec/bhd2hif are not threaded) - decoding based on MT pipes (rawrest) 4. (v2b) integrated plzma compression - encoding + plzma (2 more threads, lzmaenc+lzmarec) - decoding + plzma - plzma stream size prefix (it should be possible to locate hif streams) ======= v3 - integrated recursion and compression 1. (v3a) archiving with reflate support - process files with reflate and compress .out stream, output hif streams to a temp file - archive layout: [file_data] [hif_data] [compressed_index] 2. (v3b) explicit zip recompression - treat .zip as a virtual folder - pass deflate streams to archiver as named files - recompress .zip structure in context of .paf folder index 3. (v3c) explicit png recompression - treat .png as a virtual folder - concatenated zlib streams, text records, etc - recompress .png structure - recompress data from zlib streams to .bmp 4. (v3d) support jpegs in rawdet loop - detect jpeg start - detect jpeg end (pass via pjpg decoding) - extract jpeg and recompress with more-or-less external packjpg ======= v4 - archive format with recompression support After that it won't be about reflate anymore *. Misc things (speed optimizations etc) - fill up the incomplete huffman distribution, don't return an error - delayed partial streams - cut them off initially, but remember - BUG: plzma fails at SonyAR11-E.pdf lv4 shar - send_tree - turn send_tree into a standalone function (not a method) - merge send_tree version (idx[] or not) - tables (_dist_code etc) - generate what's possible - make them somehow local - gzip streams (header, crc32, size) - zlib streams (header, adler32) *. Different layouts of reflate output file a) [out+unp][str+hif] -- encoding uses tempfile, decoding uses file seek; supports recursion b) [out+str+hif][unp] -- manual recursion (via pipes) c) [out+str+unp+hif] -- streamable, hif flush to sync with unp buffer, no recursion, dumb compression